Bridging the gradual typing gap at OOPSLA 2021

I want to believe in a future where the lion will lie down with the lamb; we’ll beat our swords into plowshares; and developers will migrate dynamic prototypes to robust static systems with confidence. But these Aquarian visions are elusive. Having a map of the road to paradise in theory doesn’t mean we know how to get there in practice. Let me tell you about two papers at OOPSLA that shuffle us a few steps forward on this long pilgrim’s trail.

A vintage poster of "Hair", the American Tribal Love Rock Musical, with a trippy inverted head. This poster advertises a performance at the Aquarius Theatre in Los angeles.

Migrating programs

How do you actually get a program from Scheme into ML? Or from JavaScript into TypeScript? The theory of gradual typing goes far beyond these pedestrian questions. In principle, we know how to reconcile dynamism with much more complex systems, like information flow or refinement types or effect systems. But there’s very little tooling to support moving any particular Scheme program into ML. (If your program is a Racket program, then you’re in some luck.)

People have studied program migration before, under a variety of names. Papers go back at least to 2009, arguably even earlier. There are lots of different approaches, and most comprise some form of type inference and custom constraint solving—complex! Worse still, there’s been no consensus on how to evaluate these systems. Luna Phipps-Costin, Carolyn Jane Anderson, me, and Arjun Guha dug into program migration. Our paper, “Solver-based Gradual Type Migration”, tries to build a map of the known territory so far:

  1. There are competing desiderata: maximal type precision, compatibility with code at different types, and preserving the existing semantics of your program, i.e., safety.
  2. We evaluate a variety of past techniques on prior benchmarks, and we devise a novel set of “challenge” problems. Our evaluation framework is robust, and you could plug in other approaches to type migration and evaluate them easily.
  3. We introduce a new, very simple approach to type migration, which we call TypeWhich. TypeWhich uses an off-the-shelf SMT solver. You can choose how compatible/precise you want it to be, but it’ll always be safe.

I’m excited about each of these contributions, each for its own reason.

For (1), I’m excited to formally explain that what you’re actually trying to do with your code matters. “Gradual typing” sensu lato is pretty latus indeed. Are you migrating a closed system, module by module? Or are you coming up with type annotations for a library that might well be called by untyped clients? These are very different scenarios, and you probably want your type migration algorithm to do different things! Bringing in these competing concerns—precision, compatibility, and safety—gives researchers a way to contextualize their approaches to type migration. (All that said, to me, safety is paramount. I’m not at all interested in a type migration that takes a dynamic program that runs correctly on some input and produces a statically typed program that fails on the same input… or won’t even compile! That doesn’t sound very gradual to me.)

For (2), I’m excited to be building a platform for other researchers. To be clear, there’s a long way to go. Our challenge problems are tiny toys. There’s a lot more to do here.

For (3), I’m excited to have an opportunity to simplify things. The TypeWhich constraint generator is simple, classic PL; the constraints it generates for SMT are straightforward; the models that SMT generates are easy to understand. It’s a cool approach!

One tiny final note: Luna has done a tremendous amount of incredibly high quality work on this project, both in code and concept. She’s just now starting her third-year of undergraduate study. So: watch out! You ain’t ready.

Typed functional programming isn’t about functions

If there’s a single defining ‘killer’ feature of typed functional programming, it isn’t first-class functions at all: it’s algebraic datatypes. Algebraic datatypes help make illegal states unrepresentable and ASTs easy to work with. They’re a powerful tool, and their uptake in a variety of new-hotness languages (Kotlin, Rust, Swift) speaks to their broad appeal.

Moving Scheme code to ML is an old goal, and it’s the bread and butter of the introductory sections of gradual typing papers. But are we any closer than we were fifteen years ago? (I’d say “yes”, and point at Typed Racket, or “nobody knows what’s happening anyway” and point at Idris’s Chez Scheme runtime.)

Stefan Malewski, me, and Éric Tanter tried to figure out how algebraic datatypes play with dynamic features. Our paper, “Gradually Structured Data“, uses AGT to ‘compute’ static and dynamic semantics for a language with possibly open algebraic datatypes and the unknown type in a few flavors (?, the unknown type; a new ground type for “datatype”, the same way int and bool and ?->? are ground; and a new type for “any open datatype”). The features gel in a nice way, letting us express some cool behaviors (see Section 2 for how one might evolve a simple JSON API) and sit in a novel space (see Section 5 for a thorough comparison to related features).

I’m particularly pleased that we’ve found a new place in the design spectrum (per our feature chart in Section 5) that seems to support incremental program migration (per our examples in Section 2)—and it’s formally grounded (by using AGT in the middle, formal sections).

This paper came out of conversations with Éric after my screed about gradual typing’s two lineages at SNAPL (see also my followup blogpost, “What to Define When You’re Defining Gradual Type Systems”). There’s plenty more to do: what about separate compilation? What are the right representation choices? How should runtime checks really go, and how can programmers control the costs?

I remember a question I was asked after giving the talk for “Contracts Made Manifest” at POPL 2010 with some panic fondly. That paper compares the latent approach to contracts in Racket-then-Scheme (well structured runtime checks at module boundaries) to the manifest approach (runtime checks are a form of type coercion, occurring anywhere) in the emerging refinement types literature (Sage, Liquid Types, etc.). I had shown that the two aren’t equivalent in the presence of dependency, and I concluded by talking about how the two implementation approaches differed. So: somebody asked, “Which approach should you use?” To be honest, I had hardly even thought about it.

So, suppose you wanted use algebraic datatypes and dynamic features today: which language should you use? I’ve thought about it, and the answer, sadly, is, “It depends”. OCaml’s polymorphic variants get you a long way; Haskell’s Dynamic could work great, but it’s badly in need of usable surface syntax. (I’ve tried to get Richard Eisenberg to help me with the fancy work to make that happen, but he’s justifiably worried that the Haskell community would run him out of town.) Scala, Haskell, and OCaml are your best bets if you want true algebraic datatypes. If you’re more relaxed about things, Typed Racket or TypeScript could work well for you. If what you’re looking for is a type system expressive enough to capture interesting dynamic idioms, then I think there’s a clear choice: CDuce. Ever since un bel recensore anonimo at SNAPL 2019 showed me that CDuce can type flatten, I’ve been impressed. Check this out:

let flatten ( Any -> [ (Any\[Any*])* ] )  (* returns a list of non-lists ???? *)
  | [] -> []                              (* nil *)
  | (h,t) -> (flatten h)@(flatten t)      (* cons *)
  | x -> [x]                              (* anything else *)

Look at that type! In just a few lines of CDuce, we can show that flatten produces not just a list of elements, but a list of things that are not themselves lists. The price here is that CDuce’s types are set-theoretic, which means things are a touch different from what people are used to in OCaml or Haskell. But if you’re okay with that, CDuce is a serious contender!

Coda: see you at OOPSLA?

I’m planning on going to OOPSLA 2021 in Chicago, given the twoopsla and the opportunity to present a paper from OOPSLA 2020, “Formulog: Datalog for SMT-based static analysis”, with Aaron Bembenek and Steve Chong. I’ve already blogged about it, but I’m excited to get to give an in-person version of the talk, too. You can still watch Aaron’s excellent recorded talk on YouTube and enjoy the cabin vibes. There won’t be cabin vibes at my OOPSLA 2020 talk, but there will be terrible jokes. So: think about it. Will I see you at OOPSLA? I hope so!

I’m looking for PhD students!

I’m looking for PhD students in the Fall 2021 application cycle, to start in Fall 2022. Come work with me at Stevens CS in Hoboken, NJ!

I work in Gateway South (the left-hand side of this photo). You could, too! (Photo credit: Stevens Alumni.)

What will we work on?

I’m interested in applying formalism — all those pretty Greek letters in program semantics, type systems, and static analysis — directly to real systems — all that nitty gritty code that makes these beautiful, horrible machines do their thing. I’m working on a few projects that variously emphasize theoretical or practical aspects. My main goal these days is to provide better support for the POSIX shell and its ecosystem, but here’s a sampling from recent papers:

  • Smoosh (POPL 2020): I’m interested in improving and supporting the shell. Smoosh is a formal model of the POSIX shell that can be executed and passes the POSIX test suite. Continuing work on Smoosh means hacking in Lem, OCaml, and Coq (and maybe Rust or C or JS or Elm), and thinking about virtualization, symbolic execution, fuzzing, and how specifications and implementations interact. Or maybe it just means building cool tools for the POSIX world!
  • Formulog (OOPSLA 2020): Datalog, functional programming, and SMT combine to let you write down and run things that look a lot like your formal spec. Continuing work in this line means hacking in Rust (and maybe C++ or Java), and thinking about SMT and how we can be confident that the formalism we write is the code that we run—and that our code is efficient.
  • Gradual types (OOPSLA 2021) and type migration (OOPSLA 2021): People have been trying to combine the benefits of dynamic and static types for years. Work in this line will mean hacking in Rust (and maybe JS or TS or Haskell) and doing classic PL stuff like type soundness, type inference, and proofs of contextual equivalence (by logical relations or bisimulation, on paper or in Coq).
A slide from a Keynote deck. The title is "semantics engineering". The left-hand side illustrates systems challenges:

 - a "C" monster
 - complicated specs
 - a dog in front of a laptop (programming is hard!)

The right-hand side illustrates PL formalism: inference rules, helper functions, grammars, etc.
I’ve been calling it this combination of executable systems and PL formalism “semantics engineering”, with inspiration from the PLT folks (though I don’t really use Redex).

You can check out a list of all my papers. Are any of these papers the sort of thing you’d like to write? Come join me for a PhD!

Who will you work with?

Stevens has about thirty research faculty, and we’re growing fast. We have a great group of people interested in PL, security, and systems: Eduardo Bonelli, Tegan Brennan, Dominic Duggan, Eric Koskinen, Philippe Meunier, David Naumann, Georgios Portokalidis, Susanne Wetzel, and Jun Xu. And there are of course many other fantastic researchers in other topics to learn from in class and collaborate with on research. And beyond all that, I got a lot out of my internships (AT&T Shannon Labs; MSR Cambridge), and I encourage my students to find stimulating opportunities.

Where is Hoboken, again?

Hoboken, NJ is directly across the Hudson River from Manhattan a/k/a New York City. There’s 24-hour train service and frequent ferries to and from New York. Hoboken is a vision zero city, where it’s safe and comfortable to bike and walk. There are other cool cities nearby, like Jersey City.

How do you apply?

You can learn more about the CS PhD program at Stevens and apply online. If you have questions, please don’t hesitate to get in touch.

Heaven, Hell, or Hoboken!

After six years at Pomona College, I’ve moved to Stevens Institute of Technology as an assistant professor in the computer science department. I miss my lovely Pomona colleagues—they’re hiring!—but I’m excited to be on the East Coast and to be doing more research with a new set of lovely colleagues.

A photo of my office nameplate. The Stevens logo in red, with the following text:

Michael Greenberg
Assistant Professor
Department of Computer Science

447
447 (in Braille)

I’ve got a new webpage, but the old webpage should stay up.

We’ll be spinning up the Stevens PL/systems/security seminar soon, and I’m hopeful we can involve lots of interesting people, as speakers and attendees. If you’re in the New York area, come by and say hi!

Also… I’ll be looking to hire PhD students for the coming year! More info on that soon.

Pomona College is hiring!

Pomona College’s computer science department is hiring Fall of 2021 for Fall 2022. I used to work at Pomona, and there is a lot to recommend it. Pomona College is a small liberal arts college (SLAC) in LA County, 35mi/45-240min outside DTLA. It’s a 2:2 teaching load.

Steps on campus, with a view of the mountains behind.

First and foremost, you’ll have excellent colleagues. They are friendly, collegial, supportive, and hardworking. There’s a sense of shared purpose and responsibility. Disagreements are resolved amicably, because everyone is on the same team. Nobody shirks. They’re great people!

Second, the students are bright. They’re motivated, broad-minded, and often interested in social justice. Pomona’s student body overall is quite diverse, along a variety of axes (income, ethnicity, national origin), and the CS enjoys that diversity. Pomona is a very wealthy institution, and it’s putting its wealth to work helping many students who have very little money.

Third, Pomona offers a great deal of research freedom. I felt zero pressure to get grants when I was there, which allowed me to pursue whatever research interests felt worthwhile.

I’ve written in the past about what I loved (and didn’t) about Pomona College. I’ve left that document up, since it provides more detail on what I think is really good about working at a SLAC in general and Pomona in particular.

Joining Pomona’s CS department will let you join a community of lovely colleagues. You’ll have the opportunity to shape the culture and trajectory of a department, work closely with smart and interesting students, do the research you want… all while enjoying the mountains, high desert, city, and coast. It could be right for you — if you’re not sure, feel free to get in touch and we can chat about it.

A student asked why STLC programs always terminate… so I showed them! Pomona offers the opportunity to teach interesting things to interested students. That student and I later worked through the Homotopy Type Theory book together.