How to cook a KAT for your pet theory

Kleene algebra with tests is a beautiful, powerful framework for reasoning about programs. You can easily encode conventional While programs into KAT, and KAT enjoys decidable equality. Reasoning with KAT feels like you’re cheating Alan Turing himself: here we are, deciding nontrivial properties of programs!

The gist of KAT is that you write programs using a regular expression like notation: + for parallel composition, ; for sequential, and * for iteration. So you might encode:

while x > 0:
  y += 1
  x -= 1

As (xGt0; incY; decX)*; ¬xGt0, where xGt0 is a ‘test’ and incY and decX are ‘actions’. KAT’s equivalence decision procedure can prove that this program is equivalent to any finite unrolling of itself… neat!

NetKAT is the most impactful application of KAT: it’s an influential and successful academic project, and its ideas can already be found in numerous real, production systems. In light of NetKAT’s remarkable success… why don’t we apply KAT more often?

What’s hard about KAT?

On its own, KAT proves plenty of nice theorems, but none of them reason about particular program behaviors. In the code snippet above, xGt0, incY, and decX are uninterpreted—there’s no relationship between, say xGt0 and decX. That is, you might expect that ¬xGt0;decX;¬xGt0 is equivalent to ¬xGt0;decX, because decrementing a number less than or equal to 0 will yield a number that is also less than or equal to 0. The names of our tests and actions are suggestive, but KAT treats them absractly. If you want to reason about the semantics of your tests and actions, you need to build a custom, concrete KAT. NetKAT reasons about fields on packets, and doing so means building a particular, concrete KAT with particular actions. The original paper spends quite a bit of effort proving this new, custom KAT has a sound, complete, and decidable equivalence checking.

Worse still, KAT’s metatheory is very challenging. To create NetKAT, Nate Foster and company worked through closely related ideas for a few years before Nate joined Cornell and started working with Dexter Kozen, KAT’s progenitor. Only then did they realize that KAT would be a good fit, and they got to work on developing a concrete KAT—NetKAT. Unfortunately, “collaborate with Dexter” is an approach that doesn’t scale.

How to cook a KAT

In an upcoming PLDI 2022 paper, “Kleene Algebra Modulo Theories: A Framework for Concrete KATs”, Ryan Beckett, Eric Campbell, and I show how to generate a KAT over a given theory, i.e., a set of tests, actions, and their equational theory. We call the approach Kleene algebra modulo theories, or KMT. The paper covers quite a few examples:

  • booleans and bit vectors
  • monotonic natural numbers
  • unbounded sets and maps
  • NetKAT

What’s more, our approach allows for higher-order theories, like taking the product of two theories or using finite-time LTL to reason about another theory. (Our approach abstracts and generalizes Temporal NetKAT, which is just a concrete instance of our more general method.)

To build a KMT, you provide primitive tests and actions, along with weakest preconditions relating each pair of test and action. There’s an ordering requirement: a test must be no smaller than its preconditions. With these in hand, we’re able to automatically derive a KAT with good properties in a pay-as-you-go fashion:

  • If your theory is sound, the KAT is sound.
  • If your theory is complete, the KAT is complete.
  • If your theory’s satisfiability checking is decidable, we can derive a decision procedure for equivalence.

I’m particularly excited that our framework is prototype-ready: our code is implemented as an OCaml library, where you define theories as functors. Please try it out—mess around and write your own theories, following our examples. We hope that KMT will significantly lower the bar for entry, making it easier for more people to play around with KAT’s powerful equivalence checking.

What’s the catch?

There’s more than one way to cook a KAT. KMT generates KATs with tracing semantics, i.e., the exact trace of actions matters. In KAT+B! or NetKAT, later updates override earlier ones, e.g., x:=false; x:=true ? x:=true… but KMT will treat these terms differently, because they have different traces. KAT+B! deliberately avoids tracing; NetKAT only traces at predefined points, by means of their dup primitive, which marks the current state as historically salient. There’s no deep reason for KMT to use tracing, and we believe KMT can be generalized to support dup-like controls for tracing.

The ordering constraint on weakest preconditions is a strong one. Our natural numbers, sets, and maps must be monotonic: they may grow or shrink, but not both. They cannot be compared, e.g., two natural-valued variables x and y can be compared to constants but not each other.

KMT is also just a prototype. It’s fast for small programs, but it takes dedicated work to make a KAT’s decision procedure efficient enough on more serious examples.

Why are you talking about cooking KATs?

The greatest POPL paper of all time is Manna and Pnueli 1983, “How to cook a temporal proof system for your pet language”. Why? Just take a look a the first page:

The header of the paper offsets the author names to the right. A line drawing dominates the top: a dog wags its tail, tongue dripping eagerly in front of a kabob marked with "ADA" and "shared variable" and "CSP".
I rest my case.

Update

KMT won a distinguished paper award at PLDI!

A Balance of Power: Expressive, Analyzable Controller Programming

I just finished reading A Balance of Power: Expressive, Analyzable Controller Programming. It’s an interesting proposal, but I’m writing just to express my satisfaction with the following sentence:

When we hit expressive limits, however, our goal is not to keep growing this language—down that path lies sendmail.cf and other sulphurous designs—but to call out to full-language code.

‘Sulphurous’ indeed. Come for the nonmonotonic interpretation of learning, stay for the colorful prose.

Program synthesis talk

On June 8th I gave a talk for Orna Grumberg‘s seminar in formal verification presenting Kupferman and Vardi’s 1997 paper Synthesis with Incomplete Information. I’ve posted the slides.

The paper is interesting. On the one hand, it’s fairly light in terms of original contribution — it’s an extension of Pnueli and Rosner’s 1989 paper On the Synthesis of a Reactive Module. On the other hand, the journal format makes the K&V paper much clearer than P&R. I know this chiefly because I was supposed to present the latter, but got bogged down and switched.

The paper contributes two things. First, they extend P&R’s 2-EXPTIME algorithm for LTL synthesis to a similarly complex algorithm for CTL*. Second, they add the concept of “incomplete information”. During synthesis with incomplete information, input signals (propositions in the specification ψ) are split into two sets: I, “input” — the known or “readable” signals, and E, “environment” — the unknown or “unreadable” signals. They aim to generate a program that is correct in terms of ψ, even though the program sees signals only in I and not in E.

For example, ψ = G((m ∧ ¬b) ⇒ Xr) ∧ (¬m ∨ b) ⇒ X¬r), where m represents a message, r a response, and b whether or not a message is “bogus”. The system should respond only to nonbogus messages. Given m, b ∈ I and r ∈ O (“output”), the specification is realizable. But if b ∈ E, ψ is unrealizable.

Someone asked a very astute question during the talk: when is a specification with E ≠ ∅ realizable? Only when all propositions in E form a tautology. So why is E useful/meaningful at all? I came up with two cases: iterative checks to determine what signals are necessary; and the synthesis of systems of modules with some shared signals.

No matter how interesting the paper is, my presentation was terrible. Too formal, bad examples. It didn’t help that I was speaking English to a Hebrew-speaking audience, but even Orna mentioned that my examples were bad. The problems with my examples include:

  1. Hand-drawn on the board. This meant that I spoke into the board half the time.
  2. Minimal. I drew the smallest trees possible, sticking to the binary domain and the self-product of the binary domain. This of course ties into problem number 1.
  3. Unmotivated. The tree examples all came before any mention of the need for them. The paper presents the tree preliminaries first, but perhaps a “woven” presentation would be better.
  4. Too formal. After about five minutes of explaining hide, someone said “Wait, so it’s just a projection?” I was stuck in the notation, not giving a clear picture.

We’ll see if I can improve, perhaps giving the talk again back at Brown. Then again, the audience at Brown is much less versed in verification in general — increasing the need for good, clear examples. I’ll certainly drop the emptiness check material in the slides (slides 32 to 39), as I ended up skipping it here in Israel.

My mediocre presentation skills aside, the paper is both very good and very frustrating. It presents difficult material clearly, making the technique completely understandable. But every step of the work — automata generation, processing, emptiness checks — is theoretical. So far as I can tell, there isn’t a single implementation for a single step of their algorithm — after nearly a decade. I couldn’t even find a tool to translate from CTL* to μ-calculus! Without the CTL construction in section 4.4, it would have been much harder to present a nontrivial example.

But new in the synthesis world is a 2006 paper by Kupferman, Vardi, and Nir Peterman, Safraless Compositional Synthesis. It proposes an EXPTIME algorithm for LTL synthesis (as opposed to P&R’s 2-EXPTIME algorithm); even better, the algorithm is incremental with regard to conjunction! Best still, they claim that it’s amenable to symbolic implementation, since it doesn’t use Safra’s determinization technique (from the 1988 paper On the Complexity of ω-Automata). I still need to give the 2006 paper a full reading; real implementability would be very exciting. After I finish it and a few others, I’ll write up and post a bibliographic review.