A refinement type by any other name

Frank Pfenning originated the idea of refinement types in his seminal PLDI 1991 paper with Tim Freeman. Freeman and Pfenning’s refinement types allow programmers to work with refined datatypes, that is, sub-datatypes induced by refining the set of available constructors. For example, here’s what that looks like for lists, with a single refinement type, ? singleton:

datatype ? list = nil | cons of ? * ? list
rectype ? singleton = cons ? nil
Refinements of α list

That is, a programmer defines a datatype ? list, but can identify refined types like ? singleton—lists with just one element. We can imagine a lattice of type refinements where ? list is at the top, but below it is the refinement of lists of length 0 or 1—written ? singleton ? ? nil. This type is itself refined by its constituent refinements, which are refined by the empty type. Here’s such a lattice, courtesy of a remarkably nice 1991-era TeX drawing:

Another way of phrasing all of this is that refinement types identify subsets of types. Back in 1983, Bengt Nordström and Kent Petersson introduced—as far as I know—the idea of subset types in a paper called Types and Specifications at the IFIP Congress. Unfortunately, I couldn’t find a copy of the paper, so it’s not clear where the notation {x:A|B(x)} set-builder-esque notation first came from, but it shows up in Bengt Nordström, Kent Petersson, and Jan M. Smith’s Programming in Martin-Löf’s Type Theory in 1990. Any earlier references would be appreciated. Update (2015-03-18): Colin Gordon pointed out that Robert Constable‘s Mathematics as programming from 1984 uses the subset type notation, as does the NUPRL tech report from 1983. The NUPRL TR came out in January ’83 while IFIP ’83 happened in September. Nate Foster, who works with Bob Constable, suspects that Constable has priority. Alright: subset types go to Robert Constable in January 1983 with the Nearly Ultimate Pearl. Going once…

My question is: when did we start calling {x:A | B(x)} and other similar subset types a “refinement type”? Any advice or pointers would be appreciated—I’ll update the post.

Susumu Hayashi in Logic of refinement types describes “ATTT”, which, according to the abstract, “has refi nement types which are intended to be subsets of ordinary types or specifications of programs”, where he builds up these refinements out of some set theoretic operators on singletons. By rights, this paper is probably the first to use “refinement type” to mean “subset type”… though I have some trouble pinpointing where the paper lives up to that claim in the abstract.

Ewen Denney was using refinement types to mean types and specifications augmented with logical propositions. This terminology shows up in his 1998 PhD thesis and his 1996 IFIP paper, Refinement Types for Specification.

In 1998, Hongwei Xi and Frank Pfenning opened the door to flexible interpretations of “refinements” in Eliminating Array Bound Checking Through Dependent Types. In Section 2.4, they use ‘refinement’ in a rather different sense:

Besides the built-in type families int, bool, and array, any user-defined data type may be refined by explicit declarations. …

typeref ? list of nat
with nil <| ? list(0)
   | ::  <| {n:nat} ? * ? list(n) -> ? list(n+1)

Later on, in Section 3.1, they have a similar use of the term:

In the standard basis we have refined the types of many common functions on integers such as addition, subtraction, multiplication, division, and the modulo operation. For instance,

+ <| {m:int} {n:int} int(m) * int(n) -> int(m+n)

is declared in the system. The code in Figure 3 is an implementation of binary search through an array. As before, we assume:

sub <| {n:nat} {i:nat | i < n} ? array(n) * int(i) -> ?

So indices allow users to refine types, though they aren’t quite refinement types. In 1999, Xi and Pfenning make a strong distinction in Dependent Types in Practical Programming; from Section 9:

…while refinement types incorporate intersection and can thus ascribe multiple types to terms in a uniform way, dependent types can express properties such as “these two argument lists have the same length” which are not recognizable by tree automata (the basis for type refinements).

Now, throughout the paper they do things like “refine the datatype with type index objects” and “refine the built-in types: (a) for every integer n, int(n) is a singleton type which contains only n, and (b) for every natural number n, 0 a array(n) is the type of arrays of size n”. So here there’s a distinction between “refinement types”—the Freeman and Pfenning discipline—and a “refined type”, which is a subset of a type indicated by some kind of predicate and curly braces.

Jana Dunfield published a tech report in 2002, Combining Two Forms of Type Refinements, where she makes an impeccably clear distinction:

… the datasort refinements (often called refinement types) of Freeman, Davies, and Pfenning, and the index refinements of Xi and Pfenning. Both systems refine the simple types of Hindley-Milner type systems.

In her 2004 paper with Frank, Tridirectional Typechecking, she maintains the distinction between refinements, but uses a term I quite like—“property types”, i.e., types that guarantee certain properties.

Yitzhak Mandelbaum, my current supervisor David Walker, and Bob Harper wrote An Effective Theory of Type Refinements in 2003, but they didn’t quite have subset types. Their discussion of related work makes it seem that they interpret refinement types as just about any device that allows programmers to use the existing types of a language more precisely:

Our initial inspiration for this project was derived from work on refinement types by Davies and Pfenning and Denney and the practical dependent types proposed by Xi and Pfenning. Each of these authors proposed to sophisticated type systems that are able to specify many program properties well beyond the range of conventional type systems such as those for Java or ML.

In the fairly related and woefully undercited 2004 paper, Dynamic Typing with Dependent Types, Xinming Ou, Gang Tan, Yitzhak Mandelbaum, and David Walker used the term “set type” to define {x:A  | B(x)}.

Cormac Flanagan‘s Hybrid Type Checking in 2006 is probably the final blow for any distinction between datasort refinements and index refinements: right there on page 3, giving the syntax for types, he writes “{x:B|t} refinement type“. He says on the same page, at the beginning of Section 2, “Our refinement types are inspired by prior work on decidable refinement type systems”, citing quite a bit of the literature: Mandelbaum, Walker, and Harper; Freeman and Pfenning; Davies and Pfenning ICFP 2000; Xi and Pfenning 1999; Xi LICS 2000; and Ou, Tan, Mandelbaum, and Walker. After Cormac, everyone just seems to call them refinement types: Ranjit Jhala‘s Liquid Types, Robby Findler and Phil Wadler in Well typed programs can’t be blame, my own work, Andy Gordon in Semantic Subtyping with an SMT Solver. This isn’t a bad thing, but perhaps we can be more careful with names. Now that we’re all in the habit of calling them refinements, I quite like “indexed refinements” as a distinction. Alternatively, “subset types” are a very clear term with solid grounding in the literature.

Finally: I didn’t cite it in this discussion, but Rowan Davies‘s thesis, Practical Refinement-Type Checking, was extremely helpful in looking through the literature.

Edited to add: thanks to Ben Greenman for some fixes to broken links and to Lindsey Kuper and Ron Garcia for helping me clarify what refines what.

2020-04-27 update: Shriram Krishnamurthi suggests that Robert (Corky) Cartwright had a notion of “refinement type” in “User-Defined Data Types as an Aid to Verifying LISP Programs” from ICALP 1976 and with John McCarthy in First order programming logic in POPL 1979. I haven’t been able to get a PDF copy of the ICALP paper (please send me one if you can find it!). The POPL paper is clearly related:

The key idea underlying our formal systems is that recursive definitions of partial functions can be interpreted as equations extending a first order theory of the program domain.

Their model is typed, and the paper is about how Corky and John independently discovered ways of addressing recursion/fixed points. They translate programs to logic, treating checks in negative positions as ?‚ like Blume and McAllester’s “A sound (and complete) model of contracts”, but they don’t seem to think of themselves as actually refining types per se. This paper is an interesting early use of an SMT-like logic to prove properties of programs… though they do the proofs by hand!

Cartwright’s dissertation, A Practical Formal Semantic Definition and Verification System for Typed Lisp (which I’ve hosted here, since I could only find it on a very slow server elsewhere) makes it clear that the work is indeed very closely related. Here’s a long quote from the end of his introduction:

The auxiliary function ATOMLIST [a program predicate] serves as a clumsy mechanism for specifying the implicit data type atom-list [which he defined by hand]. If we included atom-list as a distinct, explicit data type in our programming language and expanded our first-order theory to include atom-lists as well as S-expressions, the informal proof using induction on atom-lists [given earlier] could be formalized directly in our first order system. However, since LISP programs typically involve a wide variety of abstract data types, simply adding a few extra data types such as atom-list to LISP will not eliminate the confusion caused by dealing with abstract data type representations rather than the abstract types themselves. In fact, the more complex that an abstract type is, the more confusing that proofs involving its representations are likely to be. Consequently, I decided that the best solution to this problem is to include a comprehensive data type definition facility in LISP and to formally define the semantics of a program P by creating a first-order theory for the particular data types defined in P. The resulting language TYPED LISP is described in the next chapter.

Twitter bots and OAuth

I’m working on a Twitter bot, and I ran up against something very annoying: apps need to be on an account with a mobile phone number. I have just one mobile phone, and it’s already tied to my real Twitter account. Rather than finding a way to get another mobile number, I had the bot authorize my app using Twitter’s OAuth API. Here’s how to do it.

Step one: collect your API keys. There’s a consumer key and a consumer secret, both available from your app’s page at apps.twitter.com. Never commit these to a repository or post them anywhere. I put them in a special file that I tell git to ignore, keys.py.

Step two: collect OAuth tokens. The key trick here is to collect tokens that are permanently good, using the out-of-band (OOB) PIN method. To actually send the requests, I used the requests-oauthlib library for Python. Here’s how to do it:

from requests_oauthlib import OAuth1Session
from keys import *

twitter = OAuth1Session(consumer_key, consumer_secret) # loaded from keys.py!
twitter.params['oauth_callback'] = 'oob' # make sure we're in PIN mode
r = twitter.post('https://api.twitter.com/oauth/request_token')

At this point, r.text will tell you, HTTP request parameter style, your oauth_token and oauth_secret. Add these to your keys.py file and copy the oauth_token to the clipboard.

Step three: authorize the app. Fire up a web browser and log in to the bot’s account. Go to https://api.twitter.com/oauth/authorize?oauth_token=[whatever your oauth_token was]. You’ll get a PIN number. Add it to your keys.py file.

Step four: finalize the authorization.

from requests_oauthlib import OAuth1Session
from keys import *
twitter = OAuth1Session(consumer_key, consumer_secret)
twitter.params['oauth_verifier'] = pin
r = twitter.get('https://api.twitter.com/oauth/access_token?oauth_token=' + oauth_token)

Now r.text will give you a new oauth_token and oauth_token_secret. Save these—they’re OAuth access tokens, which are how you can actually make API calls.

Step five: check that it worked. Log in to the bot account, go to settings, and check the apps subheading—your app should appear. You can tweet programmatically, now:

from requests_oauthlib import OAuth1Session
from keys import *

twitter = OAuth1Session(consumer_key, consumer_secret, oauth_access_token, oauth_token_secret)
twitter.post('https://api.twitter.com/1.1/statuses/update.json?status=Testing.')

I would be lying if I said I was 100% confident that everything from here on out is easy, but it seems like these access tokens are all you need after PIN based authorization.

PHPEnkoder 1.13

I’ve resolved some E_NOTICE-level messages that were showing up when people set WP_DEBUG to true. Thanks to Rootside for pointing out this problem on the WordPress forums. As always, please let me know on the forums or via email if you run into any problems.

Cultural criticism and ‘tech’

As an academic computer scientist, I frequently interact with the world of ‘tech’, as embodied by Silicon Valley, startups, etc. Many of my friends—from college, from graduate school—work there. My younger brother works there. One of the things that has kept me out of that world is my wariness of its politics, ethics, and aesthetics. I was delighted, then, when I was introduced to Model View Culture, a venue for cultural criticism of tech, sensu lato. They cover a wide range:

I’m writing because other academics—the audience of this blog—might be interested. Many students coming out of the elite CS programs (my academic home for more than a decade) are going to end up working in the world MVC writes about. They’ll go as interns and then as employees. What is it like there?

But I’m also writing because MVC has been subjected to tremendous blowback. I’m not going to link to it, but it’s not hard to find. Silence is complicity, so: Model View Culture is writing smart things about hard problems. If you’d like to support them, just reading is a fine place to start… but of course money is good, too.

PLVNET

I’m really happy to be part of the first PLVNET, a workshop on the intersection of PL, verification, and networking. I have two abstracts up for discussion.

The first abstract, Temporal NetKAT, is about adding reasoning about packet histories to a network policy language like NetKAT. The work on this is moving along quite nicely (thanks in large part to Ryan Beckett!), and I’m looking forward to the conversations it will spark.

The second abstract, Type systems for SDN controllers, is about using type systems to statically guarantee the absence of errors in controller programs. Fancy new switches have tons of features, which can be tricky to operate—can we make sure that a controller doesn’t make any mistakes when it talks to a switch? Some things are easy, like making sure that the match/action rules are sent to tables that can handle them; some things are harder, like making sure the controller doesn’t fill up a switch’s tables. I think this kind of work is a nice complement to the NetKAT “whole policy” approach, a sort of OpenFlow 1.3+ version of VeriCon with slightly different goals.

Should be fun!

Space-Efficient Manifest Contracts at POPL 15

I am delighted to announce that Space-Efficient Manifest Contracts will appear at POPL 2015 in Mumbai. Here’s the abstract:

The standard algorithm for higher-order contract checking can lead to unbounded space consumption and can destroy tail recursion, altering a program’s asymptotic space complexity. While space efficiency for gradual types—contracts mediating untyped and typed code—is well studied, sound space efficiency for manifest contracts—contracts that check stronger properties than simple types, e.g., “is a natural” instead of “is an integer”—remains an open problem.

We show how to achieve sound space efficiency for manifest contracts with strong predicate contracts. The essential trick is breaking the contract checking down into coercions: structured, blame-annotated lists of checks. By carefully preventing duplicate coercions from appearing, we can restore space efficiency while keeping the same observable behavior.

The conference version is a slightly cut down version of my submission, focusing on the main result: eidetic λH is a space-efficient manifest contract calculus with the same operational behavior as classic λH. More discussion and intermediate results—all in a unified framework for space efficiency—can be found in the technical report on the arXiv.

Contracts: first-order interlopers in a higher-order world

Reading Aseem Rastogi, Avik Chaudhuri, and Basil Hosmer‘s POPL 2012 paper The Ins and Outs of Gradual Type Inference, I ran across a quote that could well appear directly in my POPL 2015 paper, Space-Efficient Manifest Contracts:

The key insight is that … we must recursively deconstruct higher-order types down to their first-order parts, solve for those …, and then reconstruct the higher-order parts … . [Emphasis theirs]

Now, they’re deconstructing “flows” in their type inference and I’m deconstructing types themselves. They have to be careful about what’s known in the program and what isn’t, and I have to be careful about blame labels. But in both cases, a proper treatment of errors creates some asymmetries. And in both cases, the solution is to break everything down to the first-order checks, reconstructing a higher-order solution afterwards.

The “make it all first order” approach contrasts with subtyping approaches (like in Well Typed Programs Can’t Be Blamed and Threesomes, with and without blame). I think it’s worth pointing out that as we begin to consider blame, contract composition operators look less and less like meet operations and more like… something entirely different. Should contracts with blame inhabit some kind of skew lattice? Something else?

I highly recommend the Rastogi et al. paper, with one note: when they say kind, I think they mean “type shape” or “type skeleton”—not “kind” in the sense of classifying types and type constructors. Edited to add: also, how often does a type inference paper include a performance evaluation? Just delightful!

Scheduling the discussion order at PC meetings

I recently wrote a bit of code for scheduling the discussion order of PC meetings so as to minimize traffic in and out of the room due to conflicts of interest. Given some information that HotCRP happily generates, the code generates a schedule, which can be further turned into a handout and slides showing the current paper’s conflicts and the two upcoming papers.

If you’re interested in using or contributing, I’ve put the code up at mgree/conflict on github.

New and improved: Space-Efficient Manifest Contracts

I have a new and much improved draft of my work on Space-Efficient Manifest Contracts. Here’s the abstract:

The standard algorithm for higher-order contract checking can lead to unbounded space consumption and can destroy tail recursion, altering a program’s asymptotic space complexity. While space efficiency for gradual types—contracts mediating untyped and typed code—is well studied, sound space efficiency for manifest contracts—contracts that check stronger properties than simple types, e.g., “is a natural” instead of “is an integer”—remains an open problem.

We show how to achieve sound space efficiency for manifest contracts with strong predicate contracts. We define a framework for space efficiency, traversing the design space with three different space-efficient manifest calculi. Along the way, we examine the diverse correctness criteria for contract semantics; we conclude with a language whose contracts enjoy (galactically) bounded, sound space consumption—they are observationally equivalent to the standard, space-inefficient semantics.

Update: it was accepted to POPL’15!

Concurrent NetCore: From Policies to Pipelines

Cole Schlesinger, Dave Walker, and I submitted a paper to ICFP 2014. It’s called Concurrent NetCore: From Policies to Pipelines. Here’s the abstract:

In a Software-Defined Network (SDN), a central, computationally powerful controller manages a set of distributed, computationally simple switches. The controller computes a policy describing how each switch should route packets and populates packet-processing tables on each switch with rules to enact the routing policy. As network conditions change, the controller continues to add and remove rules from switches to adjust the policy as needed.

Recently, the SDN landscape has begun to change as several proposals for new, reconfigurable switching architectures, such as RMT and FlexPipe have emerged. These platforms provide switch programmers with many, flexible tables for storing packet-processing rules, and they offer programmers control over the packet fields that each table can analyze and act on. These reconfigurable switch architectures support a richer SDN model in which a switch configuration phase precedes the rule population phase. In the configuration phase, the controller sends the switch a graph describing the layout and capabilities of the packet processing tables it will require during the population phase. Armed with this foreknowledge, the switch can allocate its hardware (or software) resources more efficiently.

We present a new, typed language, called Concurrent NetCore, for specifying routing policies and graphs of packet-processing tables. Concurrent NetCore includes features for specifying sequential, conditional and concurrent control-flow between packet- processing tables. We develop a fine-grained operational model for the language and prove this model coincides with a higher level denotational model when programs are well typed. We also prove several additional properties of well typed programs, including strong normalization and determinism. To illustrate the utility of the language, we develop linguistic models of both the RMT and FlexPipe architectures and we give a multi-pass compilation algorithm that translates graphs and routing policies to the RMT model.