Computer science at a small liberal arts college

It’s my third year as a tenure-track assistant professor at Pomona College, a small liberal arts college (SLAC) in Claremont, CA. You may have read about why mathematicians might find the liberal arts setting appealing. Here’s why I think computer scientists might, too. I’ll start with what I love about working at Pomona and then I’ll talk about what I don’t.

Research freedom

First and foremost, I get to do the research I want to do. Neither the department nor the institution rely on overheads to operate, so there’s no grant pressure. Without graduate students, who require both financial support and through-lines of work for a dissertation, I can choose my focus freely.

The “without graduate students” part isn’t for everyone. If your research needs a lab of ten students to build enormous systems, then a SLAC may not be the right place. But if you want to build smaller systems, then a SLAC could be great; if you’re theoretically inclined, I think a SLAC is an ideal home.

Research expectations vary at liberal arts schools, and I’d count Pomona among the more robust, research-wise—the tenure evaluation split between research and teaching is somewhere between 50/50 and 40/60. If you’re sick of research and want a teaching-only position, then SLACs may not be the right fit.

I’ve been particularly enjoying the freedom to pursue side projects. I spent some of this last summer working with two undergraduates to write software for controlling the art department’s CNC router, marrying two of my interests: programming and woodworking. It was a good experience for the students, and it might even lead to a publication, though I never expected it to. While I might have had the time and resources to do such a thing at a larger institution… would I have bothered to lay out so much time on a project unlikely to produce anything for my tenure case?

Students

Pomona has an 8-to-1 student-to-faculty ratio, with a 2:2 teaching load; some SLACs might have more teaching, like 3:2 or even 3:3. I couldn’t handle a heavier load—I wouldn’t get enough research done to be happy with myself—but I enjoy teaching well rounded students in moderately sized classes (a required junior/senior PL course has fewer than 30 students each semester). The high end of students is astonishingly high—juniors and seniors working at an early graduate student level—and it’s rewarding to work with them on things that actually advance my research. My field (programming languages) requires a lot of background to do much of anything; when I came to Pomona, I didn’t expect that I’d be able to do any of my actual research with students. But I’ve been pleasantly surprised—I’ve been able to have one or two students at a time working on real PL research.

Community

Our department of seven tenure-track professors is convivial, genial, and collaborative. It’s not that we’re magically free of disagreement, but rather that we can disagree productively. I’ve spent significant time in CS departments since high school, and many of them are lovely places to be—but I’m charmed by my department’s mellow, friendly relations.

On a larger scale, Pomona College has roughly 200 professors. I’m pleased to have friends and colleagues from many different departments and disciplines across Pomona’s campus (and beyond—there are 400 more faculty at the other Claremont Colleges—the Consortium here is an idiosyncratic collection of institutions). As someone who majored in both computer science and Egyptology, colleagues with diverse interests enliven and enrich my days.

What’s more, Pomona has a culture of service; service is a contribution, a lending of myself that makes me feel like I’m part of the college. Starting from their second year, all professors are expected to sit on college-wide committees—the sort of thing that one might not do until a much higher rank elsewhere. In my committee work, I help steer Pomona’s course; in so doing, I invest in the school as an enterprise. Sure, that investment takes the form of extra meetings and work that doesn’t directly serve me or my research or my teaching or my department. I can imagine a SLAC would be a frustrating place to be for someone who wasn’t community-minded or who was unwilling to take part in bureaucracy.

Values

Many of my values align with institutional goals and practices. Pomona’s diversity plan is a strong, aspirational one—the school is very dedicated to equity, with a focus on students who are the first in their family to attend college, students from low-income families, and students from other underrepresented minorities. I’ve been particularly pleased to see the way the school has rallied around DACA-mented students. My department does a good job bringing in and retaining women, who constitute 30-40% of our majors and just about half of our faculty.

To be clear: while Pomona is already doing a lot of things right, there’s of course more to do. I don’t expect an institution to mirror all of my values—and I think it’s healthy to separate my identity, my work, and my employer. But the service culture prevailing at liberal arts colleges leaves the door open for my values to help guide the institution.

Of course, it’s not all roses

Let’s be honest: it’s still a job; the arrangement is that I exchange my time and energy for a nice salary. There are things I wish were different. I often wonder what it would be like to train a grad student or two; I suspect I would be much more productive and have a greater influence on my field. When I submitted an NSF CAREER proposal this year, I had to write everything myself: there’s no big grant office with templates for my data management plan; there’s nobody to look things over for boneheaded mistakes. A larger institution with a bigger department would offer more room for research collaboration. Like every other computer science department, we have more majors than we can handle—our classes aren’t as small as we’d like. The liberal arts context is enriching, but students tend to go for more breadth than depth; not enough of our students go on to graduate school.

On balance, though, I’m very happy. I’m less productive than I could be, but I think my best papers are my most recent ones. A colleague (who was in the office while on sabbatical, the poor dear) helped me with a small research problem just yesterday afternoon. I have to hunt for students who want depth, but they’re there—and hungry for it! And most importantly: I can work at Pomona without being consumed. The work is tiring, to be sure, but there’s room left for the rest of myself… and, given my own peculiar, personal limits and abilities and aspirations, I’m not so sure there would be elsewhere.

P.S. we’re hiring

haha you saw this coming

If you’re interested… we’re hiring! Feel free to email hidden; JavaScript is required if you have any questions.

A refinement type by any other name

Frank Pfenning originated the idea of refinement types in his seminal PLDI 1991 paper with Tim Freeman. Freeman and Pfenning’s refinement types allow programmers to work with refined datatypes, that is, sub-datatypes induced by refining the set of available constructors. For example, here’s what that looks like for lists, with a single refinement type, α singleton:

datatype α list = nil | cons of α * α list
rectype α singleton = cons α nil

That is, a programmer defines a datatype α list, but can identify refined types like α singleton—lists with just one element. We can imagine a lattice of type refinements where α list is at the top, but below it is the refinement of lists of length 0 or 1—written α singleton ∨ α nil. This type is itself refined by its constituents, which are are refinements of the empty type. Here’s such a lattice, courtesy of remarkably nice 1991-era TeX drawing:
Refinements of α list
Another way of phrasing all of this is that refinement types identify subsets of types. Back in 1983, Bengt Nordström and Kent Petersson introduced—as far as I know—the idea of subset types in a paper called Types and Specifications at the IFIP Congress. Unfortunately, I couldn’t find a copy of the paper, so it’s not clear where the notation {x∈A|B(x)} set-builder-esque notation first came from, but it shows up in Bengt Nordström, Kent Petersson, and Jan M. Smith’s Programming in Martin-Löf’s Type Theory in 1990. Any earlier references would be appreciated. Update (2015-03-18): Colin Gordon pointed out that Robert Constable‘s Mathematics as programming from 1984 uses the subset type notation, as does the NUPRL tech report from 1983. The NUPRL TR came out in January ’83 while IFIP ’83 happened in September. Nate Foster, who works Bob Constable, suspects that Constable has priority. Alright: subset types go to Robert Constable in January 1983 with the Nearly Ultimate Pearl. Going once…

My question is: when did we start calling {x∈A|B(x)} and other similar subset types a “refinement type”? Any advice or pointers would be appreciated—I’ll update the post.

Susumu Hayashi in Logic of refinement types describes “ATTT”, which, according to the abstract, “has refi nement types which are intended to be subsets of ordinary types or speci cations of programs”, where he builds up these refinements out of some set theoretic operators on singletons. By rights, this paper is probably the first to use “refinement type” to mean “subset type”… though I have some trouble pinpointing where the paper lives up to that claim in the abstract.

Ewen Denney was using refinement types to mean types and specifications augmented with logical propositions. This terminology shows up in his 1998 PhD thesis and his 1996 IFIP paper, Refinement Types for Specification.

In 1998, Hongwei Xi and Frank Pfenning opened the door to flexible interpretations of “refinements” in Eliminating Array Bound Checking Through Dependent Types. In Section 2.4, they use ‘refinement’ in a rather different sense:

Besides the built-in type families int, bool, and array, any user-defined data type may be refined by explicit declarations. …

typeref ’a list of nat
with nil <| ’a list(0)
| :: <| {n:nat} ’a * ’a list(n) -> ’a list(n+1)

Later on, in Section 3.1, they have a similar use of the term:

In the standard basis we have refined the types of many common functions on integers such as addition, subtraction, multiplication, division, and the modulo operation. For instance,

+ <| {m:int} {n:int} int(m) * int(n) -> int(m+n)

is declared in the system. The code in Figure 3 is an implementation of binary search through an array. As before, we assume:

sub <| {n:nat} {i:nat | i < n} ’a array(n) * int(i) -> ’a

So indices allow users to refine types, though they aren’t quite refinement types. In 1999, Xi and Pfenning make a strong distinction in Dependent Types in Practical Programming; from Section 9:

…while refinement types incorporate intersection and can thus ascribe multiple types to terms in a uniform way, dependent types can express properties such as “these two argument lists have the same length” which are not recognizable by tree automata (the basis for type refinements).

Now, throughout the paper they do things like “refine the datatype with type index objects” and “refine the built-in types: (a) for every integer n, int(n) is a singleton type which contains only n, and (b) for every natural number n, 0 a array(n) is the type of arrays of size n”. So here there’s a distinction between “refinement types”—the Freeman and Pfenning discipline—and a “refined type”, which is a subset of a type indicated by some kind of predicate and curly braces.

Joshua Dunfield published a tech report in 2002, Combining Two Forms of Type Refinements, where makes an impeccably clear distinction:

… the datasort refinements (often called refinement types) of Freeman, Davies, and Pfenning, and the index refinements of Xi and Pfenning. Both systems refine the simple types of Hindley-Milner type systems.

In his 2004 paper with Frank, Tridirectional Typechecking, he maintains the distinction between refinements, but uses a term I quite like—“property types”, i.e., types that guarantee certain properties.

Yitzhak Mandelbaum, my current supervisor David Walker, and Bob Harper wrote An Effective Theory of Type Refinements in 2003, but they didn’t quite have subset types. Their discussion of related work makes it seem that they interpret refinement types as just about any device that allows programmers to use the existing types of a language more precisely:

Our initial inspiration for this project was derived from work on refinement types by Davies and Pfenning and Denney and the practical dependent types proposed by Xi and Pfenning. Each of these authors proposed to sophisticated type systems that are able to specify many program properties well beyond the range of conventional type systems such as those for Java or ML.

In the fairly related and woefully undercited 2004 paper, Dynamic Typing with Dependent Types, Xinming Ou, Gang Tan, Yitzhak Mandelbaum, and David Walker used the term “set type” to define {x∈A|B(x)}.

Cormac Flanagan‘s Hybrid Type Checking in 2006 is probably the final blow for any distinction between datasort refinements and index refinements: right there on page 3, giving the syntax for types, he writes “{x:B|t} refinement type“. He says on the same page, at the beginning of Section 2, “Our refinement types are inspired by prior work on decidable refinement type systems”, citing quite a bit of the literature: Mandelbaum, Walker, and Harper; Freeman and Pfenning; Davies and Pfenning ICFP 2000; Xi and Pfenning 1999; Xi LICS 2000; and Ou, Tan, Mandelbaum, and Walker. After Cormac, everyone just seems to call them refinement types: Ranjit Jhala‘s Liquid Types, Robby Findler and Phil Wadler in Well typed programs can’t be blame, my own work, Andy Gordon in Semantic Subtyping with an SMT Solver. This isn’t a bad thing, but perhaps we can be more careful with names. Now that we’re all in the habit of calling them refinements, I quite like “indexed refinements” as a distinction. Alternatively, “subset types” are a very clear term with solid grounding in the literature.

Finally: I didn’t cite it in this discussion, but Rowan Davies‘s thesis, Practical Refinement-Type Checking, was extremely helpful in looking through the literature.

Edited to add: thanks to Ben Greenman for some fixes to broken links.

PHPEnkoder 1.13

I’ve resolved some E_NOTICE-level messages that were showing up when people set WP_DEBUG to true. Thanks to Rootside for pointing out this problem on the WordPress forums. As always, please let me know on the forums or email hidden; JavaScript is required if you run into any problems.

Cultural criticism and ‘tech’

As an academic computer scientist, I frequently interact with the world of ‘tech’, as embodied by Silicon Valley, startups, etc. Many of my friends—from college, from graduate school—work there. My younger brother works there. One of the things that has kept me out of that world is my wariness of its politics, ethics, and aesthetics. I was delighted, then, when I was introduced to Model View Culture, a venue for cultural criticism of tech, sensu lato. They cover a wide range:

I’m writing because other academics—the audience of this blog—might be interested. Many students coming out of the elite CS programs (my academic home for more than a decade) are going to end up working in the world MVC writes about. They’ll go as interns and then as employees. What is it like there?

But I’m also writing because MVC has been subjected to tremendous blowback. I’m not going to link to it, but it’s not hard to find. Silence is complicity, so: Model View Culture is writing smart things about hard problems. If you’d like to support them, just reading is a fine place to start… but of course money is good, too.

Contracts: first-order interlopers in a higher-order world

Reading Aseem Rastogi, Avik Chaudhuri, and Basil Hosmer‘s POPL 2012 paper The Ins and Outs of Gradual Type Inference, I ran across a quote that could well appear directly in my POPL 2015 paper, Space-Efficient Manifest Contracts:

The key insight is that … we must recursively deconstruct higher-order types down to their first-order parts, solve for those …, and then reconstruct the higher-order parts … . [Emphasis theirs]

Now, they’re deconstructing “flows” in their type inference and I’m deconstructing types themselves. They have to be careful about what’s known in the program and what isn’t, and I have to be careful about blame labels. But in both cases, a proper treatment of errors creates some asymmetries. And in both cases, the solution is to break everything down to the first-order checks, reconstructing a higher-order solution afterwards.

The “make it all first order” approach contrasts with subtyping approaches (like in Well Typed Programs Can’t Be Blamed and Threesomes, with and without blame). I think it’s worth pointing out that as we begin to consider blame, contract composition operators look less and less like meet operations and more like… something entirely different. Should contracts with blame inhabit some kind of skew lattice? Something else?

I highly recommend the Rastogi et al. paper, with one note: when they say kind, I think they mean “type shape” or “type skeleton”—not “kind” in the sense of classifying types and type constructors. Edited to add: also, how often does a type inference paper include a performance evaluation? Just delightful!

A Balance of Power: Expressive, Analyzable Controller Programming

I just finished reading A Balance of Power: Expressive, Analyzable Controller Programming. It’s an interesting proposal, but I’m writing just to express my satisfaction with the following sentence:

When we hit expressive limits, however, our goal is not to keep growing this language—down that path lies sendmail.cf and other sulphurous designs—but to call out to full-language code.

‘Sulphurous’ indeed. Come for the nonmonotonic interpretation of learning, stay for the colorful prose.

PHPEnkoder 1.12.1: “PHP is laughably bad” edition

I’ve released a bugfix for PHPEnkoder version 1.12. Get this: before version 5.5, PHP didn’t support arbitrary array index expressions. The problem was a line:

$ord = unpack("N",$c)[1];

Which I changed to:

$bs = unpack("N",$c);
$ord = $bs[1];

This is really ridiculous. Like, serious amateur hour ridiculous. Like, if your final project in a compiler’s class had syntactic limitation, you would not get a good grade. PHP is a fractal of bad design. Ugh. Just: ugh.

And while I’m complaining about these sorts of things, where are the structured content management systems? PHPEnkoder’s email and mailto: detection is a giant, horrible kludge of regular expressions. Where’s the CMS where the filters get passed a structured, syntactic representation of the page to be rendered? This is a serious question, and if you know of one, email hidden; JavaScript is required.

LLVM 3.1, Haskell 7.4.1, and OS X

Haskell on OS X can be a little frustrating, what with the 32-bit/64-bit divide. I spent a little bit trying to get the latest 32-bit Haskell platform to work with LLVM 3.1, via the existing bindings. There were a few tricks, which I reproduce here for posterity.

First, here’s how I configured LLVM:

./configure --enable-optimized --enable-jit --with-ocaml-libdir=$GODI_PATH/lib/ocaml/std-lib/
make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64"
sudo make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64" install

Then, get clone the Git HEAD of the bindings. The llvm-base package is in the base/ subdirectory, and you need to build it first. If the configure script fails because it can’t find LLVMModuleCreateWithName (even though it’s obviously there in the library), the problem is that LLVM didn’t build with 32-bit bindings. Go back and build LLVM with the UNIVERSAL and UNIVERSAL_ARCH flags. Beyond this, there is a tiny wrinkle. Open up base/cbits/extra.cpp, and go to line 390; change error.Print to error.print. Now you should be able to run cabal install from the base directory; when that’s successful, go up one level and cabal install will give you the LLVM bindings.

I should warn you: something isn’t perfect here. The examples using the JIT didn’t work for me (I get a bus error when I try to call the Haskell-ized, JITed functions), but I was able to generate real code:

module Hello (main) where

import Data.Word

import LLVM.Core

llvmModule :: TFunction (IO Word32)
llvmModule =
  withStringNul "Hello, world!" $ \s -> do
    puts <- newNamedFunction ExternalLinkage "puts" :: TFunction (Ptr Word8 -> IO Word32)
    main <- newNamedFunction ExternalLinkage "main" :: TFunction (IO Word32)
    defineFunction main $ do
      tmp <- getElementPtr0 s (0::Word32, ())
      _ <- call puts tmp
      ret (0::Word32)
    return main


main :: IO ()
main = do
  m <- newNamedModule "hello"
  hello <- defineModule m llvmModule
  dumpValue hello
  writeBitcodeToFile "hello.bc" m

Then you can compile or interpret the bitcode, as usual:

$ ghc -o hello Hello.hs -main-is Hello.main
[1 of 1] Compiling Hello            ( Hello.hs, Hello.o )
Linking hello ...

define i32 @main() {
_L1:
  %0 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @_str1, i32 0, i32 0))
  ret i32 0
}

$ ./hello
$ lli hello.bc
Hello, world!

I must admit---it took me some time to grok the LLVM bindings. Typeclass fanciness is just dandy when you're the one who did it, but Haskell's error messages aren't an easy way to figure out how something wants to be used. Then again, they call it the bleeding edge for a reason.