The Big Brzozowski

When I mentioned how important smart constructors are for Brzozowski derivatives on Twitter, when Prabhakar Ragde raised an interesting question:


The question is subtle: it’s not about how many possible derivatives there, but rather a question about size in the worst case. If I start parsing a string s with a regular expression e, how big might the nth derivative of e be?

I livecoded an empirical answer. TL;DR: the worst-case derivatives are exponentially large!

I hacked up a driver that would (a) enumerate all regular expressions of a certain size and (b) calculate their 1st through nth derivatives, while (c) logging the largest size seen at each step. (You can get the raw CSV from the repo.)

Data in hand, I did some simple analysis in R. First, I plotted the data on a logarithmic graph with linear fit: et voilà , it’s exponential.

a clear linear fit on a logarithmic graph

With that graph, I granted myself permission to fit an exponential curve:

nice fits on a conventional graph, though the smart constructor algorithm is flatter compared to the sharply spiking plain one; R^2 for the fit is in the high .90s for both

So: as you take Brzozowski derivatives, your regular expression will—in the worst case—grow exponentially. That sucks. Unsurprisingly smart constructors don’t give asymptotic savings, even though they save you a huge amount of overhead.

What do you mean, ‘big’?

In my empirical analysis, I used one measure of size: the number of nodes, counting empty regexes as being of size 0. But there are others! In particular, there’s a height measure on regular expressions, too.

In the worst case, the size of a regular expression is exponential in its height. Can I characterize how the height of a regular expression grows with the Brzozowski derivative?

I built a small model in Coq and proved that:

  • The height of the derivative of a regular expression E is no greater than twice the height of E itself (deriv_height). I doubt that this is a tight upper bound.
  • There is no constant k that bounds the size increase of the derivative (deriv_height_constant_general). The general thrust of the proof is as follows. Consider e = seq (alt epsilon e1) empty, where e1 is a regex with maximal derivative height, i.e., h (d c e1) = k + h e1 for some c. Note that alt epsilon e1 is nullable and also has maximal derivative height. We have that h (d c e) = 2 + h (d c e1') by definition; by assumption, h (d c e) <= k + h e = k + 1 + h e1'. By assumption again, h (d c e1') = k + h e1'. But then how can it be that 2 + k + h e1' <= k + 1 + h e1'?

With these two proofs combined, we can be confident that the exponential growth here is real, and not an artifact of small models. It would be interesting to determine what the multiplier actually is.

In these proofs, I set s empty = h empty = 0. It felt right at the time, but maybe it’s a little weird. So I redid the proofs setting them to 1. The dude abides.

Where next?

I have several remaining questions. Can we characterize which terms have large derivatives? Do they have equivalent forms with smaller derivatives? I suspect right-associating a seq will be better than left-associating it. Are there rewrites that ensure non-nullability (or minimize nullability) of the left-hand sides of seqs?

Smart constructors are smarter than you think

CS 181-N Advanced Functional Programming is a project-based elective I’m teaching this semester. The first half of the course teaches the students basic Haskell programming, from purity to FAM and IO, by way of parsing. The second half of the course has the students build a small project using a pure FP (Haskell and Elm, this go round). The projects were kicking off just as we pivoted to video.

I’ve been streaming my own project as a way of showing students different software engineering practices: how to use CI, how to write good tests and generators, how to run benchmarks, how to file bug reports, and so on. It’s been fun.

My ‘project’ is the Regularity, a library for working with regular languages and automata. We get a lot of A-plot Haskell mixing with B-plot theory of computation.

After implementing a few kinds of NFA (and seeing that they’re much more efficient than naive “find all matches” regex matching), I turned to Brzozowski derivatives. Most undergraduates aren’t exposed to the idea of derivatives outside of a calculus class, so I thought it’d be fun to show. Here’s the implementation, from Regex.hs:

data Regex =
    Empty
  | Epsilon
  | Char Char
  | Seq !Regex !Regex
  | Alt !Regex !Regex
  | Star !Regex
  deriving (Eq, Ord)

dMatches :: Regex -> Text -> Bool
dMatches re t = nullable (T.foldl (flip deriv) re t)

-- re `matches` c:s iff deriv c re `matches` s
deriv :: Char -> Regex -> Regex
deriv _c Empty         = empty
deriv _c Epsilon       = empty
deriv  c (Char c')     = if c == c' then epsilon else empty
deriv  c (Seq re1 re2) = 
  alt (seq (deriv c re1) re2) (if nullable re1 then deriv c re2 else empty)
deriv  c (Alt re1 re2) = alt (deriv c re1) (deriv c re2)
deriv  c (Star re)     = seq (deriv c re) (star re)

-- `nullable re` returns true iff re accepts the empty string
nullable :: Regex -> Bool
nullable Empty         = False
nullable Epsilon       = True
nullable (Char _)      = False
nullable (Seq re1 re2) = nullable re1 && nullable re2
nullable (Alt re1 re2) = nullable re1 || nullable re2
nullable (Star _re)    = True

It only took a few minutes to implement the derivatives, and performance was… fine. Thinking it would be nice to show the concept, I did smart constructors next:

empty :: Regex
empty = Empty

epsilon :: Regex
epsilon = Epsilon

char :: Char -> Regex
char = Char

seq :: Regex -> Regex -> Regex
seq Epsilon re2     = re2
seq re1     Epsilon = re1
seq Empty   _re2    = Empty
seq _re1    Empty   = Empty
seq re1     re2     = Seq re1 re2

alt :: Regex -> Regex -> Regex
alt Empty   re2     = re2
alt re1     Empty   = re1
alt Epsilon re2     = if nullable re2 then re2 else Alt Epsilon re2
alt re1     Epsilon = if nullable re1 then re1 else Alt re1 Epsilon
alt re1     re2     = if re1 == re2 then re1 else Alt re1 re2

star :: Regex -> Regex
star Empty     = Epsilon
star Epsilon   = Epsilon
star (Star re) = Star re
star re        = Star re

But then something surprised me: with smart constructors, the Brzozowski-based matching algorithm was substantially faster than NFAs, with or without epsilon transitions!

How much faster? Here are the times as reported by criterion.

Matching (a|a)* …on a50 …on a100
Naive, all-matches 71.4 µs 255   µs
NFA with epsilon transitions 31.6 µs 55.0 µs
NFA 4.88µs 9.29µs
Brzozowski derivative, dumb constructors 61.8 µs 262   µs
Brzozowski derivative, smart constructors 1.88µs 3.73µs

I knew smart constructors were an optimization, but I was blown away! If you didn’t know about smart constructors, you might shrug off Brzozowski derivatives as a curiosity for math dweebs. But with just a tiny bit of optimization from smart constructors, Brzozowski derivatives offer the best performance and the simplest implementation, at least on these meager tests.

It would be interesting to dig into why the dumb constructors are so slow. It doesn’t seem to be laziness, so is it just allocation churn? (We’re not even using hash-consing, which can provide another performance boost.) Is it large, redundant regexes? Something else?

Regular-expression derivatives re-examined by Owens, Reppy, and Turon is closely related and worth a read! Relatedly, Neel Krishnaswami has some lovely theoretically discussion, too.