Space-Efficient Manifest Contracts

I just submitted a paper to POPL 2014; it’s called Space-Efficient Manifest Contracts. Here’s the abstract:

Gradual types mediate the interaction between dynamic and simple types, offering an easy transition from scripts to programs; gradual types allow programmers to evolve prototype scripts into fully fledged, deployable programs. Similarly, contracts and refinement types mediate the interaction between simple types and more precise types, offering an easy transition from programs to robust, verified programs. A full-spectrum language with both gradual and refinement types offers low-level support for the development of programs throughout their lifecycle, from prototype script to verified program.

One attractive formulation of languages with gradual or refinement types uses casts to represent the runtime checks necessary for type changes (from dynamic to simple types, and from simple types to refinement types). Briefly, a cast ⟨T1⇒T2⟩ e takes a term e from type T1 to T2, possibly wrapping or tagging e in the process—or even failing, if e doesn’t meet the criteria of the type T2. Casts are attractive because they offer a unified view of changes in type, have straightforward operational semantics, and enjoy an interesting and fruitful relationship with subtyping.

One longstanding problem with casts is space efficiency: casts in their naive formulation can consume unbounded amounts of space at runtime both through excessive wrapping as well as through tail-recursion-breaking stack growth. Prior work offers space-efficient solutions exclusively in the domain of gradual types. In this paper, we define a new full-spectrum language that is (a) more expressive than prior languages, and (b) space efficient. We are the first to obtain space-efficient refinement types. Our approach to space efficiency is based on the coercion calculi of Herman et al. and Henglein’s work, though our explicitly enumerated canonical coercions and our straightforward merge operator are a novel approach to coercions with a simpler theory. We show that space efficiency avoids some checks, failing and diverging less often than naive calculi—but the two are otherwise operationally equivalent.

(Yes, I know that reviewing is double blind. But as the FAQ clearly states, double-blind review isn’t meant to hinder communication of results.)

NJPLS Fall 2012

Penn hosted NJPLS today. It’s been a while, but these local meetings are always fun. I presented on the status of my thesis work, Compiling Dynamic Information Flow Control. I’m building a compiler for λNaV (see our POPL submission).

One thing I omitted from the presentation—which came up more than once in conversations afterwards—is that my compiler uses LLVM as a backend. I parse a program, translate it to my own IR, optimize, and then send it off to LLVM. I’m not writing LLVM passes, or adding any kinds of AST nodes. Having my own IR probably isn’t the best architecture, but it’s much more agile than trying to modify LLVM itself. It also seems like a natural choice: my IR is tailored to reasoning about DIFC label lattices, while LLIR is particularly well suited for lower level languages, like C.

LLVM 3.1, Haskell 7.4.1, and OS X

Haskell on OS X can be a little frustrating, what with the 32-bit/64-bit divide. I spent a little bit trying to get the latest 32-bit Haskell platform to work with LLVM 3.1, via the existing bindings. There were a few tricks, which I reproduce here for posterity.

First, here’s how I configured LLVM:

./configure --enable-optimized --enable-jit --with-ocaml-libdir=$GODI_PATH/lib/ocaml/std-lib/
make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64"
sudo make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64" install

Then, get clone the Git HEAD of the bindings. The llvm-base package is in the base/ subdirectory, and you need to build it first. If the configure script fails because it can’t find LLVMModuleCreateWithName (even though it’s obviously there in the library), the problem is that LLVM didn’t build with 32-bit bindings. Go back and build LLVM with the UNIVERSAL and UNIVERSAL_ARCH flags. Beyond this, there is a tiny wrinkle. Open up base/cbits/extra.cpp, and go to line 390; change error.Print to error.print. Now you should be able to run cabal install from the base directory; when that’s successful, go up one level and cabal install will give you the LLVM bindings.

I should warn you: something isn’t perfect here. The examples using the JIT didn’t work for me (I get a bus error when I try to call the Haskell-ized, JITed functions), but I was able to generate real code:

module Hello (main) where

import Data.Word

import LLVM.Core

llvmModule :: TFunction (IO Word32)
llvmModule =
  withStringNul "Hello, world!" $ \s -> do
    puts <- newNamedFunction ExternalLinkage "puts" :: TFunction (Ptr Word8 -> IO Word32)
    main <- newNamedFunction ExternalLinkage "main" :: TFunction (IO Word32)
    defineFunction main $ do
      tmp <- getElementPtr0 s (0::Word32, ())
      _ <- call puts tmp
      ret (0::Word32)
    return main

main :: IO ()
main = do
  m <- newNamedModule "hello"
  hello <- defineModule m llvmModule
  dumpValue hello
  writeBitcodeToFile "hello.bc" m

Then you can compile or interpret the bitcode, as usual:

$ ghc -o hello Hello.hs -main-is Hello.main
[1 of 1] Compiling Hello            ( Hello.hs, Hello.o )
Linking hello ...

define i32 @main() {
  %0 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @_str1, i32 0, i32 0))
  ret i32 0

$ ./hello
$ lli hello.bc
Hello, world!

I must admit---it took me some time to grok the LLVM bindings. Typeclass fanciness is just dandy when you're the one who did it, but Haskell's error messages aren't an easy way to figure out how something wants to be used. Then again, they call it the bleeding edge for a reason.

Exceptionally Available Dynamic IFC

I’m happy to report that Cătălin Hriţcu, Ben Karel, Benjamin Pierce, Greg Morrisett, and I submitted a paper to POPL 2013: the paper is called Exceptionally Available Dynamic IFC. Here’s the abstract:

Existing designs for fine-grained, dynamic information-flow control assume that it is acceptable to terminate the entire system when an incorrect flow is detected—i.e, they give up availability for the sake of confidentiality and integrity. This is an unrealistic limitation for systems such as long-running servers.

We identify public labels and delayed exceptions as crucial ingredients for making information-flow errors recoverable while retaining the fundamental soundness property of non-interference, and we propose two new error-handling mechanisms that make all errors recoverable. The first mechanism builds directly on these basic ingredients, using not-a-values (NaVs) and data flow to propagate errors. The second mechanism adapts the standard exception model to satisfy the extra constraints arising from information flow control, converting thrown exceptions to delayed ones at certain points. We prove both mechanisms sound. Finally, we describe a prototype implementation of a full-scale language with NaVs and report on our experience building high-availability software components in this setting.


Following Robby Findler’s excellent presentation on PLT Redex at POPL (check out the paper and presentation!), I hacked up something similar in Haskell. Naturally, it’s all manual, and there’s none of the publication or visualization support, but the essence is there. Here’s the code for the untyped lambda calculus:

module Untyped where

import Control.Monad
import Data.Maybe
import Test.QuickCheck

data Expr =
    Var Int
  | Lambda Expr
  | App Expr Expr

showExpr :: Int -> Expr -> String
showExpr _b (Var n) = "var" ++ show n
showExpr b (Lambda e) = "lambda. " ++ showExpr (b+1) e
showExpr b (App e1 e2) = "(" ++ showExpr b e1 ++ " " ++ showExpr b e2 ++ ")"

instance Show Expr where
  show e = showExpr 0 e

size :: Expr -> Int
size (Var _) = 1
size (Lambda e) = 1 + size e
size (App e1 e2) = size e1 + size e2

wellLexed :: Expr -> Bool
wellLexed = wellLexedAux 0
  where wellLexedAux :: Int -> Expr -> Bool
        wellLexedAux b (Var n) = 0 <= n && n < b
        wellLexedAux b (Lambda e) = wellLexedAux (b+1) e
        wellLexedAux b (App e1 e2) = wellLexedAux b e1 && wellLexedAux b e2

arbitraryExpr :: Int -> Int -> Gen Expr
arbitraryExpr n 0 = oneof [return $ Lambda $ Var 0] -- base case
arbitraryExpr n max = do
  oneof $ 
    (if n < max 
     then [arbitraryExpr (n+1) (max-1) >>= return . Lambda]
     else []) ++ 
    (if n > 0
     then [choose (0,n-1) >>= return . Var]
     else []) ++
    [do { e1 <- arbitraryExpr n (max `div` 2);
          e2 <- arbitraryExpr n (max `div` 2);
          return $ App e1 e2 }]

instance Arbitrary Expr where
  arbitrary = sized $ \max -> arbitraryExpr 0 max

-- All the terms we generate should be well-lexed
prop_WellLexed e = collect (size e) $ wellLexed e

subst :: Expr -> Int -> Expr -> Expr
subst e n (Var n') 
  | n == n' = e
  | otherwise = (Var n')
subst e n (Lambda e') = Lambda $ subst e (n+1) e'
subst e n (App e1 e2) = App (subst e n e1) (subst e n e2)

shift :: Int -> Expr -> Expr
shift i (Var n) = if n < i then Var n else Var (n-1)
shift i (Lambda e) = Lambda $ shift (i+1) e
shift i (App e1 e2) = App (shift i e1) (shift i e2)

value :: Expr -> Bool
value (Lambda e) = True
value _ = False

-- we can run this with m = Maybe or m = List
step :: MonadPlus m => Expr -> m Expr
step (Var _) = mzero -- unbound variable
step (Lambda e) = mzero -- found a term
step (App e1@(Lambda e11) e2) 
  | value e2 = return $ shift 0 $ subst e2 0 e11
  | otherwise = do
    e2' <- step e2
    return $ App e1 e2'
step (App e1 e2) = do
  e1' <- step e1
  return $ App e1' e2

-- if we can step, we'd better preserve scope
prop_StepWellLexed e = isJust next ==> wellLexed (fromJust next)
  where next = step e
-- verify progress
prop_Progress e = (classify isValue "value") $ (classify (not isValue) "step") $ value e || isJust (step e)
  where isValue = value e

-- use the List monad to ensure determinacy
prop_Deterministic e = nextStates > 0 ==> nextStates == 1
  where nextStates = length $ step e

One of the trickiest things here was making sure I was generating well lexed lambda terms that were small enough to be tractable. It would have been even harder, I think, with a more explicit representation of variables or binding. Thoughts, variations? Or, perhaps, complaints that I should have done this with GADTS or in Coq?

Towards a core calculus for implicitly migration-capable applications

Yitzhak Mandelbaum and I have been thinking about language support for program migration. We submitted a short paper, Towards a core calculus for implicitly migration-capable applications, to PEPM’12 summarizing what we’ve done so far and the direction we’re headed. Here’s the abstract:

Mobile computational devices, like smartphones, tablets and laptops, have become a standard part of the computing landscape. Moreover, many users regularly interact with an assortment of devices, including mobile ones. Therefore, the ability to migrate UI-enabled applications is becoming increasingly important. We describe a design-pattern for applications to simplify support for user-session migration and provide an overview of a lambda calculus for which significant elements of the design pattern can be implemented automatically.

We’d appreciate any ideas, comments, or questions.