A Balance of Power: Expressive, Analyzable Controller Programming

I just finished reading A Balance of Power: Expressive, Analyzable Controller Programming. It’s an interesting proposal, but I’m writing just to express my satisfaction with the following sentence:

When we hit expressive limits, however, our goal is not to keep growing this language—down that path lies sendmail.cf and other sulphurous designs—but to call out to full-language code.

‘Sulphurous’ indeed. Come for the nonmonotonic interpretation of learning, stay for the colorful prose.

PHPEnkoder 1.12.1: “PHP is laughably bad” edition

I’ve released a bugfix for PHPEnkoder version 1.12. Get this: before version 5.5, PHP didn’t support arbitrary array index expressions. The problem was a line:

$ord = unpack("N",$c)[1];

Which I changed to:

$bs = unpack("N",$c);
$ord = $bs[1];

This is really ridiculous. Like, serious amateur hour ridiculous. Like, if your final project in a compiler’s class had syntactic limitation, you would not get a good grade. PHP is a fractal of bad design. Ugh. Just: ugh.

And while I’m complaining about these sorts of things, where are the structured content management systems? PHPEnkoder’s email and mailto: detection is a giant, horrible kludge of regular expressions. Where’s the CMS where the filters get passed a structured, syntactic representation of the page to be rendered? This is a serious question, and if you know of one, please tell me.

PHPEnkoder 1.12: unicode support!

I’m really delighted to have resolved a longstanding problem with PHPEnkoder and Unicode: PHPEnkoder should no longer choke on the various multi-byte characters, such as é and ü and ß. (No really, look: email hidden; JavaScript is required!)

As usual, updates are available from the WordPress plugin directory or from your dashboard.

Bug in “Polymorphic Contracts”

The third chapter of my dissertation is effectively a longer version of an ESOP 2011 paper, Polymorphic Contracts. We define FH, a polymorphic calculus with manifest contracts. Atsushi Igarashi, with whom I did the original FH work that appeared in ESOP 2011, and his student Taro Sekiyama have been working on continuing some of the FH work. They discovered—after my defense!—a bug in FH’s metatheory.

Short version: FH used parallel reduction as a conversion relation. A key property of this relation is substitutivity. We phrased it as “if e1 ⇒ e1′ and e2 ⇒ e2′ then e1{e2/x} ⇒ e1′{e2’/x}”. Unfortunately, this doesn’t hold for FH, due to subtleties in FH’s reduction rules for casts. The cast reduction rules are implicitly performing equality checks on types, and these equality checks can be affected by substitutions to change which reduction rule applies. The (tentative) solution in my thesis is to use a simpler type (and term) conversion relation which we call common subexpression reduction (CSR). In CSR, we relate types and terms that are closed by closing substitutions σ1* σ2. That is, the CSR conversion is the smallest congruence which is substitutive for →*, i.e., where if e →* e’ then T{e/x} ≡ T{e’/x}.

Long version: I’ve excerpted Section 3.5 of my thesis which discusses the System FH type conversion bug.

PhD thesis: Manifest Contracts

I defended my PhD thesis, Manifest Contracts on November 7th, 2013, with the final document submitted on December 6th. Since the doctoral degree shows up on my Penn transcript, I feel comfortable telling the world: I got a PhD! My thesis committee, comprising Stephanie Weirich (the chair), Rajeev Alur, Greg Morrisett, and Steve Zdancewic. Here’s the abstract:

Eiffel popularized design by contract, a software design philosophy where programmers specify the requirements and guarantees of functions via executable pre- and post-conditions written in code. Findler and Felleisen brought contracts to higher-order programming, inspiring the PLT Racket implementation of contracts. Existing approaches for runtime checking lack reasoning principles and stop short of their full potential—most Racket contracts check only simple types. Moreover, the standard algorithm for higher-order contract checking can lead to unbounded space consumption and can destroy tail recursion. In this dissertation, I develop so-called manifest contract systems which integrate more coherently in the type system, and relate them to Findler-and-Felleisen-style latent contracts. I extend a manifest system with type abstraction and relational parametricity, and also show how to integrate dynamic types and contracts in a space efficient way, i.e., in a way that doesn’t destroy tail recursion. I put manifest contracts on a firm type-theoretic footing, showing that they support extensions necessary for real programming. Developing these principles is the first step in designing and implementing higher-order languages with contracts and refinement types.

I’ll be starting as a post-doc with Dave Walker on Monday.

Axiomatizing manifest contracts

I was speaking with Phil Wadler at ICFP about full-spectrum programming languages, and it came up that Phil wasn’t familiar with the inconsistency in the Sage theorem proving axioms from Knowles et al. ’06. He prodded me to talk about it here.

Here’s the axioms from Knowles et al.’s ’06 Scheme Workshop paper.
Sage axioms
The inconsistency is between Faithfulness and Hypothesis. Hypothesis demands that x:{y:Int|false} ⊨ false[y:=x] = false, but Faithfulness requires that x:{y:Int|false} ⊭ false.

Talking with Kenn Knowles, he agreed—Faithfulness should instead say E ⊨ not false—the soundness of the logic comes from not being able to prove false from an empty environment. Kenn had an interesting analysis, which I’m including wholesale:

In practice, of course, the soundness of e.g. SMT is not in question. There are really two important and separable questions that axioms such as those of Ou et al ’04, our Scheme ’06, and our PLPV ’09 conflate somewhat. Neither is answered to my satisfaction.

  1. How shall we give semantics to contracts and executable refinement types?

    And the meta-question: Why do so many authors create new and rather different semantics? Perhaps your POPL ’10 and our TOPLAS ’10 may represent a convergence (knock on wood) towards techniques that are both generally effective and simple enough that future work will build upon them rather than just trying to make something simpler, as has happened a few times in the contract space.

    For this question, I feel an axiomatization yields little insight, while a lightweight denotation has a nice explanatory flavor. Thus my perspective is that critiquing axiomatizations does not substantially contribute to this question.

  2. How effectively can specifications / proof obligations written in Type Theory or subsets thereof be translated to other logics such as SMT?

    While an axiomatization seems like a way to characterize such translations, the question itself was not the focus of Ou et al ’04, our Scheme ’06, or our PLPV ’09. Without such results the direct and concrete approach of Liquid Types (e.g. PLDI ’08 and ESOP ’13) that simply illustrates a particular mapping may be clearer.

    For this question, I feel an axiomatization falls short unless it has proofs of interesting properties which illuminate why it is a minimal or otherwise “good” set of axioms for characterizing the space of translations/logics. Thus my perspective is that critiquing axiomatizations that already lack such properties/proofs does not substantially contribute to this question.

With regard to Kenn’s first point, Belo et al.’s ’11 syntactic semantics gives one answer for how to give semantics to contracts, though there’s plenty of room for less syntactic analyses.

Space-Efficient Manifest Contracts

I just submitted a paper to POPL 2014; it’s called Space-Efficient Manifest Contracts. Here’s the abstract:

Gradual types mediate the interaction between dynamic and simple types, offering an easy transition from scripts to programs; gradual types allow programmers to evolve prototype scripts into fully fledged, deployable programs. Similarly, contracts and refinement types mediate the interaction between simple types and more precise types, offering an easy transition from programs to robust, verified programs. A full-spectrum language with both gradual and refinement types offers low-level support for the development of programs throughout their lifecycle, from prototype script to verified program.

One attractive formulation of languages with gradual or refinement types uses casts to represent the runtime checks necessary for type changes (from dynamic to simple types, and from simple types to refinement types). Briefly, a cast ⟨T1⇒T2⟩ e takes a term e from type T1 to T2, possibly wrapping or tagging e in the process—or even failing, if e doesn’t meet the criteria of the type T2. Casts are attractive because they offer a unified view of changes in type, have straightforward operational semantics, and enjoy an interesting and fruitful relationship with subtyping.

One longstanding problem with casts is space efficiency: casts in their naive formulation can consume unbounded amounts of space at runtime both through excessive wrapping as well as through tail-recursion-breaking stack growth. Prior work offers space-efficient solutions exclusively in the domain of gradual types. In this paper, we define a new full-spectrum language that is (a) more expressive than prior languages, and (b) space efficient. We are the first to obtain space-efficient refinement types. Our approach to space efficiency is based on the coercion calculi of Herman et al. and Henglein’s work, though our explicitly enumerated canonical coercions and our straightforward merge operator are a novel approach to coercions with a simpler theory. We show that space efficiency avoids some checks, failing and diverging less often than naive calculi—but the two are otherwise operationally equivalent.

(Yes, I know that reviewing is double blind. But as the FAQ clearly states, double-blind review isn’t meant to hinder communication of results.)

NJPLS Fall 2012

Penn hosted NJPLS today. It’s been a while, but these local meetings are always fun. I presented on the status of my thesis work, Compiling Dynamic Information Flow Control. I’m building a compiler for λNaV (see our POPL submission).

One thing I omitted from the presentation—which came up more than once in conversations afterwards—is that my compiler uses LLVM as a backend. I parse a program, translate it to my own IR, optimize, and then send it off to LLVM. I’m not writing LLVM passes, or adding any kinds of AST nodes. Having my own IR probably isn’t the best architecture, but it’s much more agile than trying to modify LLVM itself. It also seems like a natural choice: my IR is tailored to reasoning about DIFC label lattices, while LLIR is particularly well suited for lower level languages, like C.

LLVM 3.1, Haskell 7.4.1, and OS X

Haskell on OS X can be a little frustrating, what with the 32-bit/64-bit divide. I spent a little bit trying to get the latest 32-bit Haskell platform to work with LLVM 3.1, via the existing bindings. There were a few tricks, which I reproduce here for posterity.

First, here’s how I configured LLVM:

./configure --enable-optimized --enable-jit --with-ocaml-libdir=$GODI_PATH/lib/ocaml/std-lib/
make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64"
sudo make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64" install

Then, get clone the Git HEAD of the bindings. The llvm-base package is in the base/ subdirectory, and you need to build it first. If the configure script fails because it can’t find LLVMModuleCreateWithName (even though it’s obviously there in the library), the problem is that LLVM didn’t build with 32-bit bindings. Go back and build LLVM with the UNIVERSAL and UNIVERSAL_ARCH flags. Beyond this, there is a tiny wrinkle. Open up base/cbits/extra.cpp, and go to line 390; change error.Print to error.print. Now you should be able to run cabal install from the base directory; when that’s successful, go up one level and cabal install will give you the LLVM bindings.

I should warn you: something isn’t perfect here. The examples using the JIT didn’t work for me (I get a bus error when I try to call the Haskell-ized, JITed functions), but I was able to generate real code:

module Hello (main) where

import Data.Word

import LLVM.Core

llvmModule :: TFunction (IO Word32)
llvmModule =
  withStringNul "Hello, world!" $ \s -> do
    puts <- newNamedFunction ExternalLinkage "puts" :: TFunction (Ptr Word8 -> IO Word32)
    main <- newNamedFunction ExternalLinkage "main" :: TFunction (IO Word32)
    defineFunction main $ do
      tmp <- getElementPtr0 s (0::Word32, ())
      _ <- call puts tmp
      ret (0::Word32)
    return main


main :: IO ()
main = do
  m <- newNamedModule "hello"
  hello <- defineModule m llvmModule
  dumpValue hello
  writeBitcodeToFile "hello.bc" m

Then you can compile or interpret the bitcode, as usual:

$ ghc -o hello Hello.hs -main-is Hello.main
[1 of 1] Compiling Hello            ( Hello.hs, Hello.o )
Linking hello ...

define i32 @main() {
_L1:
  %0 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @_str1, i32 0, i32 0))
  ret i32 0
}

$ ./hello
$ lli hello.bc
Hello, world!

I must admit---it took me some time to grok the LLVM bindings. Typeclass fanciness is just dandy when you're the one who did it, but Haskell's error messages aren't an easy way to figure out how something wants to be used. Then again, they call it the bleeding edge for a reason.

Exceptionally Available Dynamic IFC

I’m happy to report that Cătălin Hriţcu, Ben Karel, Benjamin Pierce, Greg Morrisett, and I submitted a paper to POPL 2013: the paper is called Exceptionally Available Dynamic IFC. Here’s the abstract:

Existing designs for fine-grained, dynamic information-flow control assume that it is acceptable to terminate the entire system when an incorrect flow is detected—i.e, they give up availability for the sake of confidentiality and integrity. This is an unrealistic limitation for systems such as long-running servers.

We identify public labels and delayed exceptions as crucial ingredients for making information-flow errors recoverable while retaining the fundamental soundness property of non-interference, and we propose two new error-handling mechanisms that make all errors recoverable. The first mechanism builds directly on these basic ingredients, using not-a-values (NaVs) and data flow to propagate errors. The second mechanism adapts the standard exception model to satisfy the extra constraints arising from information flow control, converting thrown exceptions to delayed ones at certain points. We prove both mechanisms sound. Finally, we describe a prototype implementation of a full-scale language with NaVs and report on our experience building high-availability software components in this setting.