The basic idea is pretty clever. You specify a single grammar, showing at the nonterminal level how the XML syntax relates to the non-XML syntax. For example:
person : [Name name] __ "(" [Email email] ")" _ [Id id] =
[Name name] > [Email email] > >
To guarantee a unique unparsing, they perform ambiguity checks on the grammar. The perform can perform an additional check against an XML Schema (or DTD, or whatever) to ensure that all syntactic non-XML generates validating XML. They do this by extracting an “XML graph” — a flowchart for the tree structure — from the XSugar specification, which can then be checked against an XML Schema. This check boils down to “encoding content models and schema definitions as finite string automata over a common vocabulary and checking inclusion of their languages” (see XML Graphs in Program Analysis, which seems a little light on detail).
The work is good — it solves a problem, and simply so. (Much more simply than in the Kawanaka and Hosoya paper, biXid: a bidirectional transformation language for XML. They have slightly looser ambiguity requirements, but with a weird effect.) The error messages they give are okay, but they don’t seem so marvelous that they merit inclusion in the paper. Writing wise, the description of the XML graph could be clearer, and their error messages aren’t entirely clear to me. But it’s hard to criticize their result — 2372 lines of ad hoc Python to 119 lines of XSugar!
I was disappointed by only one thing. I wondered whether I might use XSugar for automatic conversion between SXML and XML; unfortunately, what I came up with was:
WS = \r|\n|\t|" " AT = "@" TOP = "*TOP*" LP = "(" RP = ")" node : LP [Element e] LP AT [attriblist as] RP [nodelist ns] RP = <[Element e] [attriblist as]>[nodelist ns]> ; attrib : LP [Attribute a] [Text v] RP =
But then I was stuck — I realized that one can’t abstract over attributes. It doesn’t seem like this sort of abstraction is impossible, but the feature seems to be missing. It would require another style of syntax to make it clear which context nonterminals were in — attribute or node. Or I may just be missing something — the journal paper is the only documentation, short of JavaDoc.
As an alternative to the Enkoder, Google Groups turns e-mail addresses in headers and in document text into links to CAPTCHAs, viz. the documentation. They’re not using the hardest distortions I’ve seen, but there seems to be some good contact between the letters.
The first is just an engineering issue. First I’d need to collect data on how long dekoding takes on different platforms, and to try to maximize my control while preventing the explosion of the enkoded data in size. (Modular multiplication? I knew MA158 Cryptography would be good for something!) The second direction needs some serious thought about puzzle design. It’s important to be acultural and not too intellectually challenging, but still hard for a machine to figure out. Secondly, it’s not clear how to prevent replay attacks if the Turing test is all client-side, so it has to be difficult to brute-force.
The response to the template language in Flapjax has been mixed, to say the least. The most common complaint is that they mix content with style. True — this can be done with our templates. But nothing stops a developer from putting CSS in the HTML directly. Nothing except good sense, that is.
In response to a recent post on our discussion group for Flapjax, I wrote about this briefly:
This can become a huge mess, so the templates simplify it. We wouldn't want to encourage doing all of your computation in the display layer! I'd compare it to Smarty or JSP -- while you can include complex computations in the tags, it's meant only to ease the interface between models in code and presentations on screen.
I think that soundly characterizes the template language. Shriram wants that in the tutorial, so that'll show up there eventually. (Some people take classes during the semester, but the Flapjax team has much more important things to do. Or so we're told.) But it would be a big mistake to see Flapjax as providing just that -- the functional reactivity is the interesting bit.
Tonight we're presenting the language to the undergraduates in the DUG; there's an exciting contest announcement coming up, as well. Who has time for midterms?
Since I got back from Israel, I’ve been working on a top secret project: a programming language for the Web. Well, for the Web 4.0 — we gave 3.0 a miss. The language is Flapjax. As you’ll note on the homepage linked above and on the Flapjax development blog, it’s multifaceted. I’ll mention the salient major features here.
The main feature is functional reactivity, found in flapjax.js. Functional reactive programming (FRP) has been around for a while in the Haskell community. The PhD lead on our project, Greg Cooper, wrote FrTime (read Father Time), which embeds FRP in Scheme.
To learn more about FRP, you might as well walk through our tutorial. It’s a callbackless world in which values vary over time and whole expressions are appropriately re-evaluated. For example, the text in a textbox can be computed with over time — no need to register callbacks for onfocus, onblur, onthis, or onthat.
In essence, FRP is a monad. But in practice, this means that FRP is a driver/callback manipulator and CPS-ed code. In FrTime, CPS-ing isn’t done directly, but instead all of the language’s primitives (+, cons, etc.) are lifted. In Flapjax, either the programmer does it manually or the compiler (my work) translates the code to CPS. There are arguments in either direction — the compiler’s aggressiveness can make it hard to use.
If AJAX without the callback mess wasn’t enough, we also wrote a generalized object store. It’s accessed via AJAX (really AJAJ, since we use JSON extensively), and was built to allow quick and easy sharing. We don’t have a demo as cool as Yaggle yet, but it’s certainly in the works.
So that’s it. Future blogging topics are: the templating syntax, compiler internals, client API internals, basic howtos. The whole project was immensely fun. Shriram got me on board by asking me what would happen if PL people actually got together and wrote something for real — that is, fully implemented an idea and sent it out at the world in a language the world can use. We both chuckled for a moment, thinking how funny it would be to actually apply PL. And then he pointed out that there’s nothing funny about that at all.
Impelled by boredom and procrastination, I’ve packaged up a propositional logic minimizer I wrote last Summer. It’s a cute little thing, written in Python. I used it for optimizing queries on a bioinformatics graph database, but it’s a generally handy thing.
The NeoSmart files has a brief commentary on the feasability of encoding schemes like PHPEnkoder.
First, he mentions that
What’s so special about that? Well, we can build a tower as high as we like. We can make it arbitrarily computationally intensive to decode the e-mail. Half a second? Easy, give it forty or fifty encodings. Five seconds? Sure. These “computational micropayments” can be worthwhile for a user to pay, but a spammer? Decode one, sure. Decode fifty? That’s nearly five minutes to get fifty e-mail addresses. How many of those people really need a bigger penis?
I don’t much like that future, though. Even if it’s a link a user can click to wait a minute for the e-mail address, that’s not ideal. NeoSmart is right, much of the problem can and should be solved server side. The only client side solution that will ever work will require human language: posting e-mail addresses as puns, jokes, tricks, songs. The only way to escape our symbol-processing machines is to abuse symbols: I’m Mike, and I hang out with hatted weasels! My plugin, PHPEnkoder, also spends a lot of time at the weasels’ place.
But I haven’t seemed to need either of those solutions, as I still haven’t received any spam at the addresses I’ve posted here — encoded, of course.
Get it while it’s hot!