PHPEnkoder 1.12: unicode support!

I’m really delighted to have resolved a longstanding problem with PHPEnkoder and Unicode: PHPEnkoder should no longer choke on the various multi-byte characters, such as é and ü and ß. (No really, look: email hidden; JavaScript is required!)

As usual, updates are available from the WordPress plugin directory or from your dashboard.

LLVM 3.1, Haskell 7.4.1, and OS X

Haskell on OS X can be a little frustrating, what with the 32-bit/64-bit divide. I spent a little bit trying to get the latest 32-bit Haskell platform to work with LLVM 3.1, via the existing bindings. There were a few tricks, which I reproduce here for posterity.

First, here’s how I configured LLVM:

./configure --enable-optimized --enable-jit --with-ocaml-libdir=$GODI_PATH/lib/ocaml/std-lib/
make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64"
sudo make UNIVERSAL=yes UNIVERSAL_ARCH="i386 x86_64" install

Then, get clone the Git HEAD of the bindings. The llvm-base package is in the base/ subdirectory, and you need to build it first. If the configure script fails because it can’t find LLVMModuleCreateWithName (even though it’s obviously there in the library), the problem is that LLVM didn’t build with 32-bit bindings. Go back and build LLVM with the UNIVERSAL and UNIVERSAL_ARCH flags. Beyond this, there is a tiny wrinkle. Open up base/cbits/extra.cpp, and go to line 390; change error.Print to error.print. Now you should be able to run cabal install from the base directory; when that’s successful, go up one level and cabal install will give you the LLVM bindings.

I should warn you: something isn’t perfect here. The examples using the JIT didn’t work for me (I get a bus error when I try to call the Haskell-ized, JITed functions), but I was able to generate real code:

module Hello (main) where

import Data.Word

import LLVM.Core

llvmModule :: TFunction (IO Word32)
llvmModule =
  withStringNul "Hello, world!" $ \s -> do
    puts <- newNamedFunction ExternalLinkage "puts" :: TFunction (Ptr Word8 -> IO Word32)
    main <- newNamedFunction ExternalLinkage "main" :: TFunction (IO Word32)
    defineFunction main $ do
      tmp <- getElementPtr0 s (0::Word32, ())
      _ <- call puts tmp
      ret (0::Word32)
    return main

main :: IO ()
main = do
  m <- newNamedModule "hello"
  hello <- defineModule m llvmModule
  dumpValue hello
  writeBitcodeToFile "hello.bc" m

Then you can compile or interpret the bitcode, as usual:

$ ghc -o hello Hello.hs -main-is Hello.main
[1 of 1] Compiling Hello            ( Hello.hs, Hello.o )
Linking hello ...

define i32 @main() {
  %0 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @_str1, i32 0, i32 0))
  ret i32 0

$ ./hello
$ lli hello.bc
Hello, world!

I must admit---it took me some time to grok the LLVM bindings. Typeclass fanciness is just dandy when you're the one who did it, but Haskell's error messages aren't an easy way to figure out how something wants to be used. Then again, they call it the bleeding edge for a reason.


Following Robby Findler’s excellent presentation on PLT Redex at POPL (check out the paper and presentation!), I hacked up something similar in Haskell. Naturally, it’s all manual, and there’s none of the publication or visualization support, but the essence is there. Here’s the code for the untyped lambda calculus:

module Untyped where

import Control.Monad
import Data.Maybe
import Test.QuickCheck

data Expr =
    Var Int
  | Lambda Expr
  | App Expr Expr

showExpr :: Int -> Expr -> String
showExpr _b (Var n) = "var" ++ show n
showExpr b (Lambda e) = "lambda. " ++ showExpr (b+1) e
showExpr b (App e1 e2) = "(" ++ showExpr b e1 ++ " " ++ showExpr b e2 ++ ")"

instance Show Expr where
  show e = showExpr 0 e

size :: Expr -> Int
size (Var _) = 1
size (Lambda e) = 1 + size e
size (App e1 e2) = size e1 + size e2

wellLexed :: Expr -> Bool
wellLexed = wellLexedAux 0
  where wellLexedAux :: Int -> Expr -> Bool
        wellLexedAux b (Var n) = 0 <= n && n < b
        wellLexedAux b (Lambda e) = wellLexedAux (b+1) e
        wellLexedAux b (App e1 e2) = wellLexedAux b e1 && wellLexedAux b e2

arbitraryExpr :: Int -> Int -> Gen Expr
arbitraryExpr n 0 = oneof [return $ Lambda $ Var 0] -- base case
arbitraryExpr n max = do
  oneof $ 
    (if n < max 
     then [arbitraryExpr (n+1) (max-1) >>= return . Lambda]
     else []) ++ 
    (if n > 0
     then [choose (0,n-1) >>= return . Var]
     else []) ++
    [do { e1 <- arbitraryExpr n (max `div` 2);
          e2 <- arbitraryExpr n (max `div` 2);
          return $ App e1 e2 }]

instance Arbitrary Expr where
  arbitrary = sized $ \max -> arbitraryExpr 0 max

-- All the terms we generate should be well-lexed
prop_WellLexed e = collect (size e) $ wellLexed e

subst :: Expr -> Int -> Expr -> Expr
subst e n (Var n') 
  | n == n' = e
  | otherwise = (Var n')
subst e n (Lambda e') = Lambda $ subst e (n+1) e'
subst e n (App e1 e2) = App (subst e n e1) (subst e n e2)

shift :: Int -> Expr -> Expr
shift i (Var n) = if n < i then Var n else Var (n-1)
shift i (Lambda e) = Lambda $ shift (i+1) e
shift i (App e1 e2) = App (shift i e1) (shift i e2)

value :: Expr -> Bool
value (Lambda e) = True
value _ = False

-- we can run this with m = Maybe or m = List
step :: MonadPlus m => Expr -> m Expr
step (Var _) = mzero -- unbound variable
step (Lambda e) = mzero -- found a term
step (App e1@(Lambda e11) e2) 
  | value e2 = return $ shift 0 $ subst e2 0 e11
  | otherwise = do
    e2' <- step e2
    return $ App e1 e2'
step (App e1 e2) = do
  e1' <- step e1
  return $ App e1' e2

-- if we can step, we'd better preserve scope
prop_StepWellLexed e = isJust next ==> wellLexed (fromJust next)
  where next = step e
-- verify progress
prop_Progress e = (classify isValue "value") $ (classify (not isValue) "step") $ value e || isJust (step e)
  where isValue = value e

-- use the List monad to ensure determinacy
prop_Deterministic e = nextStates > 0 ==> nextStates == 1
  where nextStates = length $ step e

One of the trickiest things here was making sure I was generating well lexed lambda terms that were small enough to be tractable. It would have been even harder, I think, with a more explicit representation of variables or binding. Thoughts, variations? Or, perhaps, complaints that I should have done this with GADTS or in Coq?

Nested functions in GCC

GCC supports “nested functions” using the -fnested-functions flag. When I first saw this, I was excited: closures in C! In the famous words of Admiral Ackbar, “it’s a trap!”


typedef int (*fptr)(int);

fptr f(int arg) {
  int nested_function(int nested_arg) {
    return arg + nested_arg;

  return &nested_function;

void smash(int arg) {

int main(void) {
  fptr g = f(10);
  printf("%d\n", (*g)(5));
  // printf("%d\n", (*g)(5));
  fptr h = f(12);
  printf("%d\n", (*g)(5));
  printf("%d\n", (*h)(5));

  return 0;

Try compiling (gcc -fnested-functions). What does the second call to g produce—15 or 17? Try uncommenting line 21. What happens? Does commenting out line 20 affect this? What if line 19 is commented out, but lines 20 and 21 are uncommented?

I’m not sure this feature is worth it.

Locally installing LLVM with Ocaml bindings

We can’t install software into the /usr tree at my office, so I end up having local installs of lots of software. Some things, like GODI, play well with this. I had some trouble finding the right way to get LLVM‘s Ocaml bindings to work, so I figured I’d share the wealth. The following instructions will put an install into the directory $PREFIX/llvm-install.

Here are the steps; they’re followed by a plain English explanation.

svn co llvm
tar xzf llvm-gcc4.2-2.5-x86-linux-RHEL4.tar.gz
mkdir llvm-objects llvm-install
cd llvm-objects
../llvm/configure --with-llvmgccdir=$PREFIX/llvm-gcc4.2-2.5-x86-linux-RHEL4 --enable-optimized --enable-jit --prefix=$PREFIX/llvm-install --with-ocaml-libdir=$GODI_PATH/lib/ocaml/std-lib
make install

My PREFIX is my home directory, and GODI_PATH = ~/godi. First, we checkout the latest LLVM from SVN (step 2). Then we download and extract the latest release (2.5, as of writing) of LLVM-gcc (steps 3 and 4). (I couldn’t get the SVN version of LLVM-gcc to work with the SVN version of LLVM.) Notably, LLVM does not support in-place builds, so we create the llvm-objects directory to actually build LLVM; we’ll install it into llvm-install (step 5). We configure the software from the llvm-objects directory (steps 6 and 7). The long configure is necessary; the only optional item is --enable-jit. You may have to adjust your --with-ocaml-libdir to point to wherever your Ocaml libraries live. Then make and make install (steps 8 and 9). Voila!

To test it out, we can use the “Hello, World!” program written by Gordon Henrikson. I had to change it a little to bring it up to date with the latest APIs (in particular, the global context had to be added). You can download it as

open Printf
open Llvm

let main filename =
   let c = create_context () in

   let i8_t  = i8_type c in
   let i32_t = i32_type c in

   let m = create_module c filename in

   (* @greeting = global [14 x i8] c"Hello, world!\00" *)
   let greeting =
     define_global "greeting" (const_string c "Hello, world!\000") m in

   (* declare i32 @puts(i8* ) *)
   let puts =
     declare_function "puts"
       (function_type i32_t [|pointer_type i8_t|]) m in

   (* define i32 @main() { entry: *)
   let main = define_function "main" (function_type i32_t [| |]) m in
   let at_entry = builder_at_end c (entry_block main) in

   (* %tmp = getelementptr [14 x i8]* @greeting, i32 0, i32 0 *)
   let zero = const_int i32_t 0 in
   let str = build_gep greeting [| zero; zero |] "tmp" at_entry in

   (* call i32 @puts( i8* %tmp ) *)
   ignore (build_call puts [| str |] "" at_entry);

   (* ret void *)
   ignore (build_ret (const_null i32_t) at_entry);

   (* write the module to a file *)
   if not (Llvm_bitwriter.write_bitcode_file m filename) then exit 1;
   dispose_module m

let () = match Sys.argv with
  | [|_; filename|] -> main filename
  | _ -> main "a.out"

Now we can compile:

ocamlopt -cc g++ llvm.cmxa llvm_bitwriter.cmxa -o llvm_test
./llvm_test hello.bc # generates bitcode
$PREFIX/llvm-install/bin/llvm-dis hello.bc # disassembles bitcode into hello.ll
$PREFIX/llvm-install/bin/lli hello.bc # outputs "Hello, world!"

If interpretation via lli isn’t your bag, you can also compile to native code:

$PREFIX/llvm-install/bin/llc hello.bc # generates assembly, hello.s
gcc -o hello hello.s
./hello # outputs "Hello, world!"