The basic idea is pretty clever. You specify a single grammar, showing at the nonterminal level how the XML syntax relates to the non-XML syntax. For example:
person : [Name name] __ "(" [Email email] ")" _ [Id id] =
[Name name] > [Email email] > >
To guarantee a unique unparsing, they perform ambiguity checks on the grammar. The perform can perform an additional check against an XML Schema (or DTD, or whatever) to ensure that all syntactic non-XML generates validating XML. They do this by extracting an “XML graph” — a flowchart for the tree structure — from the XSugar specification, which can then be checked against an XML Schema. This check boils down to “encoding content models and schema definitions as finite string automata over a common vocabulary and checking inclusion of their languages” (see XML Graphs in Program Analysis, which seems a little light on detail).
The work is good — it solves a problem, and simply so. (Much more simply than in the Kawanaka and Hosoya paper, biXid: a bidirectional transformation language for XML. They have slightly looser ambiguity requirements, but with a weird effect.) The error messages they give are okay, but they don’t seem so marvelous that they merit inclusion in the paper. Writing wise, the description of the XML graph could be clearer, and their error messages aren’t entirely clear to me. But it’s hard to criticize their result — 2372 lines of ad hoc Python to 119 lines of XSugar!
I was disappointed by only one thing. I wondered whether I might use XSugar for automatic conversion between SXML and XML; unfortunately, what I came up with was:
WS = \r|\n|\t|" " AT = "@" TOP = "*TOP*" LP = "(" RP = ")" node : LP [Element e] LP AT [attriblist as] RP [nodelist ns] RP = <[Element e] [attriblist as]>[nodelist ns]> ; attrib : LP [Attribute a] [Text v] RP =
But then I was stuck — I realized that one can’t abstract over attributes. It doesn’t seem like this sort of abstraction is impossible, but the feature seems to be missing. It would require another style of syntax to make it clear which context nonterminals were in — attribute or node. Or I may just be missing something — the journal paper is the only documentation, short of JavaDoc.