CS 181-N Advanced Functional Programming is a project-based elective I’m teaching this semester. The first half of the course teaches the students basic Haskell programming, from purity to FAM and IO, by way of parsing. The second half of the course has the students build a small project using a pure FP (Haskell and Elm, this go round). The projects were kicking off just as we pivoted to video.
I’ve been streaming my own project as a way of showing students different software engineering practices: how to use CI, how to write good tests and generators, how to run benchmarks, how to file bug reports, and so on. It’s been fun.
My ‘project’ is the Regularity, a library for working with regular languages and automata. We get a lot of A-plot Haskell mixing with B-plot theory of computation.
After implementing a few kinds of NFA (and seeing that they’re much more efficient than naive “find all matches” regex matching), I turned to Brzozowski derivatives. Most undergraduates aren’t exposed to the idea of derivatives outside of a calculus class, so I thought it’d be fun to show. Here’s the implementation, from Regex.hs:
data Regex = Empty | Epsilon | Char Char | Seq !Regex !Regex | Alt !Regex !Regex | Star !Regex deriving (Eq, Ord) dMatches :: Regex -> Text -> Bool dMatches re t = nullable (T.foldl (flip deriv) re t) -- re `matches` c:s iff deriv c re `matches` s deriv :: Char -> Regex -> Regex deriv _c Empty = empty deriv _c Epsilon = empty deriv c (Char c') = if c == c' then epsilon else empty deriv c (Seq re1 re2) = alt (seq (deriv c re1) re2) (if nullable re1 then deriv c re2 else empty) deriv c (Alt re1 re2) = alt (deriv c re1) (deriv c re2) deriv c (Star re) = seq (deriv c re) (star re) -- `nullable re` returns true iff re accepts the empty string nullable :: Regex -> Bool nullable Empty = False nullable Epsilon = True nullable (Char _) = False nullable (Seq re1 re2) = nullable re1 && nullable re2 nullable (Alt re1 re2) = nullable re1 || nullable re2 nullable (Star _re) = True
It only took a few minutes to implement the derivatives, and performance was… fine. Thinking it would be nice to show the concept, I did smart constructors next:
empty :: Regex empty = Empty epsilon :: Regex epsilon = Epsilon char :: Char -> Regex char = Char seq :: Regex -> Regex -> Regex seq Epsilon re2 = re2 seq re1 Epsilon = re1 seq Empty _re2 = Empty seq _re1 Empty = Empty seq re1 re2 = Seq re1 re2 alt :: Regex -> Regex -> Regex alt Empty re2 = re2 alt re1 Empty = re1 alt Epsilon re2 = if nullable re2 then re2 else Alt Epsilon re2 alt re1 Epsilon = if nullable re1 then re1 else Alt re1 Epsilon alt re1 re2 = if re1 == re2 then re1 else Alt re1 re2 star :: Regex -> Regex star Empty = Epsilon star Epsilon = Epsilon star (Star re) = Star re star re = Star re
But then something surprised me: with smart constructors, the Brzozowski-based matching algorithm was substantially faster than NFAs, with or without epsilon transitions!
How much faster? Here are the times as reported by criterion.
|Matching (a|a)*||…on a50||…on a100|
|Naive, all-matches||71.4 µs||255 µs|
|NFA with epsilon transitions||31.6 µs||55.0 µs|
|Brzozowski derivative, dumb constructors||61.8 µs||262 µs|
|Brzozowski derivative, smart constructors||1.88µs||3.73µs|
I knew smart constructors were an optimization, but I was blown away! If you didn’t know about smart constructors, you might shrug off Brzozowski derivatives as a curiosity for math dweebs. But with just a tiny bit of optimization from smart constructors, Brzozowski derivatives offer the best performance and the simplest implementation, at least on these meager tests.
It would be interesting to dig into why the dumb constructors are so slow. It doesn’t seem to be laziness, so is it just allocation churn? (We’re not even using hash-consing, which can provide another performance boost.) Is it large, redundant regexes? Something else?
Regular-expression derivatives re-examined by Owens, Reppy, and Turon is closely related and worth a read! Relatedly, Neel Krishnaswami has some lovely theoretically discussion, too.
If you don’t do any normalization, then Brzozowski derivatives can grow exponentially Consider the regular expression a*.a**, and then take its derivative with respect to a
d_a(a*.a**) = d_a(a*).a** + d_a(a**)
= a*.a** + d_a(a*).a**
= a*.a** + a*.a**
This regular expression has doubled in size, and each derivative with respect to a will double it again.
However, Brzozowski proved in his original paper that if you quotient derivatives by “similarity” (basically ACU with respect to +, but only at the outermost level), then there are only a finite number of word derivatives. So a smart constructor implementation smart enough to ensure that (alt r r == r), (alt r1 r2 == alt r2 r1) and (alt (alt r1 r2) r3 == alt r1 (alt r2 r3)) suffices to ensure you only ever build a finite number of states!
I found the proof in his original paper very obscure, though, and didn’t understand it until I read Antimirov’s paper on partial derivatives. That yields an even simpler implementation technique than Brzozowski’s original — you can use it to implement a complete table-driven DFA-based matcher in ~50 lines of code. See
Yes, indeed: I figured this out for myself in the followup post!
Thanks for the link to the Antimirov stuff! One reason I like the Brzozowski implementation more than the Antimirov one is that you can build an implicit-state DFA matcher with just a left fold. Stephanie Weirich did this in a fancily typed way at her POPL 2017 keynote.