Disjointness of subset types

In hybrid type checking, a subtyping relationship between subset types {x:T|e} determines when it’s safe to omit a cast. The structural extension of subtyping to, e.g., function types, gives us a straightforward way to achieve verification by optimization: if we can prove that a cast is from a subtype to a supertype, there’s no need to pay the runtime cost of checking anything.

When should we reject a hybrid typed program? Flanagan’s seminal paper, he offers a straightforward plan:


Flanagan enacts this plan by rejecting casts which are provably not from a subtype to a supertype: if the SMT solver finds that a cast might fail, then it rejects the program. In his model of an SMT solver, returning a checkmark (✓) when checking proves subtyping, a question mark (?) when checking times out, or an x-mark (×) when a counterexample of subtyping is found.



The compilation and checking judgment (a/k/a cast insertion) will insert a cast on successful or indeterminate checking, but has no rules for (and therefore rejects) programs where there is a counterexample.



Unfortunately, such a policy is too restrictive. For example, an SMT solver can easily prove that {x:Int|true} doesn’t imply {x:Int|even x}… but such a program shouldn’t be rejected! We should only reject programs that must fail.

When must a cast fail? When the two types are disjoint, i.e., when they don’t share any values. Since our language with casts has when the only normal form typeable at both types is an error. Cătălin Hriţcu got a good start on this in his thesis (see sections 2.4.6 and the end of 3.4.4), where he defines a notion of non-disjointness. He doesn’t define the notion for function types, but that’s actually very easy: so long as two function types are compatible (i.e., equal after erasure of refinements), they are non-disjoint. Why? Well, there’s at least one value that {x:T|e11}→{x:T|e12} and {x:T|e21}→{x:T|e22} have in common: λx. blame.

I can imagine slightly stronger version: if the domain type is a refinement of base type, then we could exclude functions which could never even be called, i.e., with disjoint domains. At higher orders, though, there’s no way to tell (maybe the function in question never calls its argument), so I think the more liberal interpretation is the right starting point.

NB I briefly discuss when to reject hybrid-typed programs in my dissertation (Section 6.1.4).

A refinement type by any other name

Frank Pfenning originated the idea of refinement types in his seminal PLDI 1991 paper with Tim Freeman. Freeman and Pfenning’s refinement types allow programmers to work with refined datatypes, that is, sub-datatypes induced by refining the set of available constructors. For example, here’s what that looks like for lists, with a single refinement type, α singleton:

datatype α list = nil | cons of α * α list
rectype α singleton = cons α nil

That is, a programmer defines a datatype α list, but can identify refined types like α singleton—lists with just one element. We can imagine a lattice of type refinements where α list is at the top, but below it is the refinement of lists of length 0 or 1—written α singleton ∨ α nil. This type is itself refined by its constituents, which are are refinements of the empty type. Here’s such a lattice, courtesy of remarkably nice 1991-era TeX drawing:
Refinements of α list
Another way of phrasing all of this is that refinement types identify subsets of types. Back in 1983, Bengt Nordström and Kent Petersson introduced—as far as I know—the idea of subset types in a paper called Types and Specifications at the IFIP Congress. Unfortunately, I couldn’t find a copy of the paper, so it’s not clear where the notation {x∈A|B(x)} set-builder-esque notation first came from, but it shows up in Bengt Nordström, Kent Petersson, and Jan M. Smith’s Programming in Martin-Löf’s Type Theory in 1990. Any earlier references would be appreciated. Update (2015-03-18): Colin Gordon pointed out that Robert Constable‘s Mathematics as programming from 1984 uses the subset type notation, as does the NUPRL tech report from 1983. The NUPRL TR came out in January ’83 while IFIP ’83 happened in September. Nate Foster, who works Bob Constable, suspects that Constable has priority. Alright: subset types go to Robert Constable in January 1983 with the Nearly Ultimate Pearl. Going once…

My question is: when did we start calling {x∈A|B(x)} and other similar subset types a “refinement type”? Any advice or pointers would be appreciated—I’ll update the post.

Susumu Hayashi in Logic of refinement types describes “ATTT”, which, according to the abstract, “has refi nement types which are intended to be subsets of ordinary types or speci cations of programs”, where he builds up these refinements out of some set theoretic operators on singletons. By rights, this paper is probably the first to use “refinement type” to mean “subset type”… though I have some trouble pinpointing where the paper lives up to that claim in the abstract.

Ewen Denney was using refinement types to mean types and specifications augmented with logical propositions. This terminology shows up in his 1998 PhD thesis and his 1996 IFIP paper, Refinement Types for Specification.

In 1998, Hongwei Xi and Frank Pfenning opened the door to flexible interpretations of “refinements” in Eliminating Array Bound Checking Through Dependent Types. In Section 2.4, they use ‘refinement’ in a rather different sense:

Besides the built-in type families int, bool, and array, any user-defined data type may be refined by explicit declarations. …

typeref ’a list of nat
with nil <| ’a list(0)
| :: <| {n:nat} ’a * ’a list(n) -> ’a list(n+1)

Later on, in Section 3.1, they have a similar use of the term:

In the standard basis we have refined the types of many common functions on integers such as addition, subtraction, multiplication, division, and the modulo operation. For instance,

+ <| {m:int} {n:int} int(m) * int(n) -> int(m+n)

is declared in the system. The code in Figure 3 is an implementation of binary search through an array. As before, we assume:

sub <| {n:nat} {i:nat | i < n} ’a array(n) * int(i) -> ’a

So indices allow users to refine types, though they aren’t quite refinement types. In 1999, Xi and Pfenning make a strong distinction in Dependent Types in Practical Programming; from Section 9:

…while refinement types incorporate intersection and can thus ascribe multiple types to terms in a uniform way, dependent types can express properties such as “these two argument lists have the same length” which are not recognizable by tree automata (the basis for type refinements).

Now, throughout the paper they do things like “refine the datatype with type index objects” and “refine the built-in types: (a) for every integer n, int(n) is a singleton type which contains only n, and (b) for every natural number n, 0 a array(n) is the type of arrays of size n”. So here there’s a distinction between “refinement types”—the Freeman and Pfenning discipline—and a “refined type”, which is a subset of a type indicated by some kind of predicate and curly braces.

Joshua Dunfield published a tech report in 2002, Combining Two Forms of Type Refinements, where makes an impeccably clear distinction:

… the datasort refinements (often called refinement types) of Freeman, Davies, and Pfenning, and the index refinements of Xi and Pfenning. Both systems refine the simple types of Hindley-Milner type systems.

In his 2004 paper with Frank, Tridirectional Typechecking, he maintains the distinction between refinements, but uses a term I quite like—“property types”, i.e., types that guarantee certain properties.

Yitzhak Mandelbaum, my current supervisor David Walker, and Bob Harper wrote An Effective Theory of Type Refinements in 2003, but they didn’t quite have subset types. Their discussion of related work makes it seem that they interpret refinement types as just about any device that allows programmers to use the existing types of a language more precisely:

Our initial inspiration for this project was derived from work on refinement types by Davies and Pfenning and Denney and the practical dependent types proposed by Xi and Pfenning. Each of these authors proposed to sophisticated type systems that are able to specify many program properties well beyond the range of conventional type systems such as those for Java or ML.

In the fairly related and woefully undercited 2004 paper, Dynamic Typing with Dependent Types, Xinming Ou, Gang Tan, Yitzhak Mandelbaum, and David Walker used the term “set type” to define {x∈A|B(x)}.

Cormac Flanagan‘s Hybrid Type Checking in 2006 is probably the final blow for any distinction between datasort refinements and index refinements: right there on page 3, giving the syntax for types, he writes “{x:B|t} refinement type“. He says on the same page, at the beginning of Section 2, “Our refinement types are inspired by prior work on decidable refinement type systems”, citing quite a bit of the literature: Mandelbaum, Walker, and Harper; Freeman and Pfenning; Davies and Pfenning ICFP 2000; Xi and Pfenning 1999; Xi LICS 2000; and Ou, Tan, Mandelbaum, and Walker. After Cormac, everyone just seems to call them refinement types: Ranjit Jhala‘s Liquid Types, Robby Findler and Phil Wadler in Well typed programs can’t be blame, my own work, Andy Gordon in Semantic Subtyping with an SMT Solver. This isn’t a bad thing, but perhaps we can be more careful with names. Now that we’re all in the habit of calling them refinements, I quite like “indexed refinements” as a distinction. Alternatively, “subset types” are a very clear term with solid grounding in the literature.

Finally: I didn’t cite it in this discussion, but Rowan Davies‘s thesis, Practical Refinement-Type Checking, was extremely helpful in looking through the literature.

Edited to add: thanks to Ben Greenman for some fixes to broken links.

PLVNET

I’m really happy to be part of the first PLVNET, a workshop on the intersection of PL, verification, and networking. I have two abstracts up for discussion.

The first abstract, Temporal NetKAT, is about adding reasoning about packet histories to a network policy language like NetKAT. The work on this is moving along quite nicely (thanks in large part to Ryan Beckett!), and I’m looking forward to the conversations it will spark.

The second abstract, Type systems for SDN controllers, is about using type systems to statically guarantee the absence of errors in controller programs. Fancy new switches have tons of features, which can be tricky to operate—can we make sure that a controller doesn’t make any mistakes when it talks to a switch? Some things are easy, like making sure that the match/action rules are sent to tables that can handle them; some things are harder, like making sure the controller doesn’t fill up a switch’s tables. I think this kind of work is a nice complement to the NetKAT “whole policy” approach, a sort of OpenFlow 1.3+ version of VeriCon with slightly different goals.

Should be fun!

Space-Efficient Manifest Contracts at POPL 15

I am delighted to announce that Space-Efficient Manifest Contracts will appear at POPL 2015 in Mumbai. Here’s the abstract:

The standard algorithm for higher-order contract checking can lead to unbounded space consumption and can destroy tail recursion, altering a program’s asymptotic space complexity. While space efficiency for gradual types—contracts mediating untyped and typed code—is well studied, sound space efficiency for manifest contracts—contracts that check stronger properties than simple types, e.g., “is a natural” instead of “is an integer”—remains an open problem.

We show how to achieve sound space efficiency for manifest contracts with strong predicate contracts. The essential trick is breaking the contract checking down into coercions: structured, blame-annotated lists of checks. By carefully preventing duplicate coercions from appearing, we can restore space efficiency while keeping the same observable behavior.

The conference version is a slightly cut down version of my submission, focusing on the main result: eidetic λH is a space-efficient manifest contract calculus with the same operational behavior as classic λH. More discussion and intermediate results—all in a unified framework for space efficiency—can be found in the technical report on the arXiv.

Contracts: first-order interlopers in a higher-order world

Reading Aseem Rastogi, Avik Chaudhuri, and Basil Hosmer‘s POPL 2012 paper The Ins and Outs of Gradual Type Inference, I ran across a quote that could well appear directly in my POPL 2015 paper, Space-Efficient Manifest Contracts:

The key insight is that … we must recursively deconstruct higher-order types down to their first-order parts, solve for those …, and then reconstruct the higher-order parts … . [Emphasis theirs]

Now, they’re deconstructing “flows” in their type inference and I’m deconstructing types themselves. They have to be careful about what’s known in the program and what isn’t, and I have to be careful about blame labels. But in both cases, a proper treatment of errors creates some asymmetries. And in both cases, the solution is to break everything down to the first-order checks, reconstructing a higher-order solution afterwards.

The “make it all first order” approach contrasts with subtyping approaches (like in Well Typed Programs Can’t Be Blamed and Threesomes, with and without blame). I think it’s worth pointing out that as we begin to consider blame, contract composition operators look less and less like meet operations and more like… something entirely different. Should contracts with blame inhabit some kind of skew lattice? Something else?

I highly recommend the Rastogi et al. paper, with one note: when they say kind, I think they mean “type shape” or “type skeleton”—not “kind” in the sense of classifying types and type constructors. Edited to add: also, how often does a type inference paper include a performance evaluation? Just delightful!

New and improved: Space-Efficient Manifest Contracts

I have a new and much improved draft of my work on Space-Efficient Manifest Contracts. Here’s the abstract:

The standard algorithm for higher-order contract checking can lead to unbounded space consumption and can destroy tail recursion, altering a program’s asymptotic space complexity. While space efficiency for gradual types—contracts mediating untyped and typed code—is well studied, sound space efficiency for manifest contracts—contracts that check stronger properties than simple types, e.g., “is a natural” instead of “is an integer”—remains an open problem.

We show how to achieve sound space efficiency for manifest contracts with strong predicate contracts. We define a framework for space efficiency, traversing the design space with three different space-efficient manifest calculi. Along the way, we examine the diverse correctness criteria for contract semantics; we conclude with a language whose contracts enjoy (galactically) bounded, sound space consumption—they are observationally equivalent to the standard, space-inefficient semantics.

Update: it was accepted to POPL’15!

Concurrent NetCore: From Policies to Pipelines

Cole Schlesinger, Dave Walker, and I submitted a paper to ICFP 2014. It’s called Concurrent NetCore: From Policies to Pipelines. Here’s the abstract:

In a Software-Defined Network (SDN), a central, computationally powerful controller manages a set of distributed, computationally simple switches. The controller computes a policy describing how each switch should route packets and populates packet-processing tables on each switch with rules to enact the routing policy. As network conditions change, the controller continues to add and remove rules from switches to adjust the policy as needed.

Recently, the SDN landscape has begun to change as several proposals for new, reconfigurable switching architectures, such as RMT and FlexPipe have emerged. These platforms provide switch programmers with many, flexible tables for storing packet-processing rules, and they offer programmers control over the packet fields that each table can analyze and act on. These reconfigurable switch architectures support a richer SDN model in which a switch configuration phase precedes the rule population phase. In the configuration phase, the controller sends the switch a graph describing the layout and capabilities of the packet processing tables it will require during the population phase. Armed with this foreknowledge, the switch can allocate its hardware (or software) resources more efficiently.

We present a new, typed language, called Concurrent NetCore, for specifying routing policies and graphs of packet-processing tables. Concurrent NetCore includes features for specifying sequential, conditional and concurrent control-flow between packet- processing tables. We develop a fine-grained operational model for the language and prove this model coincides with a higher level denotational model when programs are well typed. We also prove several additional properties of well typed programs, including strong normalization and determinism. To illustrate the utility of the language, we develop linguistic models of both the RMT and FlexPipe architectures and we give a multi-pass compilation algorithm that translates graphs and routing policies to the RMT model.

A Balance of Power: Expressive, Analyzable Controller Programming

I just finished reading A Balance of Power: Expressive, Analyzable Controller Programming. It’s an interesting proposal, but I’m writing just to express my satisfaction with the following sentence:

When we hit expressive limits, however, our goal is not to keep growing this language—down that path lies sendmail.cf and other sulphurous designs—but to call out to full-language code.

‘Sulphurous’ indeed. Come for the nonmonotonic interpretation of learning, stay for the colorful prose.

Bug in “Polymorphic Contracts”

The third chapter of my dissertation is effectively a longer version of an ESOP 2011 paper, Polymorphic Contracts. We define FH, a polymorphic calculus with manifest contracts. Atsushi Igarashi, with whom I did the original FH work that appeared in ESOP 2011, and his student Taro Sekiyama have been working on continuing some of the FH work. They discovered—after my defense!—a bug in FH’s metatheory.

Short version: FH used parallel reduction as a conversion relation. A key property of this relation is substitutivity. We phrased it as “if e1 ⇒ e1′ and e2 ⇒ e2′ then e1{e2/x} ⇒ e1′{e2’/x}”. Unfortunately, this doesn’t hold for FH, due to subtleties in FH’s reduction rules for casts. The cast reduction rules are implicitly performing equality checks on types, and these equality checks can be affected by substitutions to change which reduction rule applies. The (tentative) solution in my thesis is to use a simpler type (and term) conversion relation which we call common subexpression reduction (CSR). In CSR, we relate types and terms that are closed by closing substitutions σ1* σ2. That is, the CSR conversion is the smallest congruence which is substitutive for →*, i.e., where if e →* e’ then T{e/x} ≡ T{e’/x}.

Long version: I’ve excerpted Section 3.5 of my thesis which discusses the System FH type conversion bug.

PhD thesis: Manifest Contracts

I defended my PhD thesis, Manifest Contracts on November 7th, 2013, with the final document submitted on December 6th. Since the doctoral degree shows up on my Penn transcript, I feel comfortable telling the world: I got a PhD! My thesis committee, comprising Stephanie Weirich (the chair), Rajeev Alur, Greg Morrisett, and Steve Zdancewic. Here’s the abstract:

Eiffel popularized design by contract, a software design philosophy where programmers specify the requirements and guarantees of functions via executable pre- and post-conditions written in code. Findler and Felleisen brought contracts to higher-order programming, inspiring the PLT Racket implementation of contracts. Existing approaches for runtime checking lack reasoning principles and stop short of their full potential—most Racket contracts check only simple types. Moreover, the standard algorithm for higher-order contract checking can lead to unbounded space consumption and can destroy tail recursion. In this dissertation, I develop so-called manifest contract systems which integrate more coherently in the type system, and relate them to Findler-and-Felleisen-style latent contracts. I extend a manifest system with type abstraction and relational parametricity, and also show how to integrate dynamic types and contracts in a space efficient way, i.e., in a way that doesn’t destroy tail recursion. I put manifest contracts on a firm type-theoretic footing, showing that they support extensions necessary for real programming. Developing these principles is the first step in designing and implementing higher-order languages with contracts and refinement types.

I’ll be starting as a post-doc with Dave Walker on Monday.