JavaScript “Protection”

The NeoSmart files has a brief commentary on the feasability of encoding schemes like PHPEnkoder.

On one side, his argument is pretty strong. Any spammer could use Greasemonkey to drive harvesting — complete DOM, complete JavaScript. But there are two points I disagree with him about.

First, he mentions that

JavaScript was never meant to be used as a heavy cavalry, a knight in shining armor, or else a bit of code that can may be used to do anything – because it’s not.

On the one hand, this has historical precedence. For example, much of the damage done by COBOL was due to its abuse. It was a small scripting language that exploded. But it’s also a bit senslesss. COBOL was crippled on day one. Syntactically and semantically, I mean. JavaScript, with its anonymous functions and prototype object system, remains state-of-the-art today. Just because JavaScript was a glue-script language when it was born doesn’t mean it can’t be useful now as a general-purpose language. Or should ML only be used to write theorem provers? Should Smalltalk only be used to teach children? JavaScript is not Tcl, and it’s not Perl. JavaScript has some great features, a light syntax, and a huge user base. While it’s true that browser incompatabilities are a problem, toolkits seem to have dealt with this well.

But I’m not going to argue languages with someone who prefers VBScript to JavaScript. The important thing is that I think he missed an important capability of the Hivelogic algorithm, something that makes it much more powerful than it seems.

PHPEnkoder is just a port of a very creative piece of software, the Hivelogic Enkoder. The original Enkoder takes a string and encodes it by, say, swapping every other letter. It then tacks on some JavaScript to swap them back. This is already a pretty strong system. But then it goes on and encodes that JavaScript, building up a tower of encoded scripts. An evaluation loop calls eval, iteratively decoding until the bottom document.write is reached and the text is displayed.

What’s so special about that? Well, we can build a tower as high as we like. We can make it arbitrarily computationally intensive to decode the e-mail. Half a second? Easy, give it forty or fifty encodings. Five seconds? Sure. These “computational micropayments” can be worthwhile for a user to pay, but a spammer? Decode one, sure. Decode fifty? That’s nearly five minutes to get fifty e-mail addresses. How many of those people really need a bigger penis?

I don’t much like that future, though. Even if it’s a link a user can click to wait a minute for the e-mail address, that’s not ideal. NeoSmart is right, much of the problem can and should be solved server side. The only client side solution that will ever work will require human language: posting e-mail addresses as puns, jokes, tricks, songs. The only way to escape our symbol-processing machines is to abuse symbols: I’m Mike, and I hang out with hatted weasels! My plugin, PHPEnkoder, also spends a lot of time at the weasels’ place.

But I haven’t seemed to need either of those solutions, as I still haven’t received any spam at the addresses I’ve posted here — encoded, of course.

The Joy of Specs

My other class this term is Shmuel Katz‘s Formal Specifications of Complex Systems (link in Hebrew). Shmuel was my advisor here at the Technion, but I would have still taken the course if he wasn’t.

The course was very interesting, and I’ve come to a conclusion about the expressivity and usefulness of specification languages.

The course itself is a basic overview of specification techniques. We started with a simplified Hoare logic, using it to “specify” programs. The idea being that complete Hoare specification can be used to infer code. This is of course insanely intractable — one could write the code and the Hoare invariants in the time it would take to just write the invariants. This material was also covered in Orna’s Introduction to Software Verification.

Next we approached a more useful system, Zed, a transition-based algebraic specification language. Zed’s mathematicality was fun, but Zed is also pretty useless — it fully contains higher-order logics, so the full language is undecidable.

After Zed, we came to probably the most fun section of the course: statecharts. Statecharts are hierarchical state machines with variables and a choice of composition operators (with and without history, sequential or parallel). It was clear from the beginning that statecharts were invented for a real-world use (in-flight controls, I believe). First, they are decidable when variables have finite domains. Second, this decidability comes from a compeltely intuitive translation into a finite-state machine. Third, they are visually intuitive and appealing. The only downside was the tool we were using (Rhapsody, an IBM tool), which had odd restrictions and an even odder interpretation system — instrumented compilation into C++. I’ve had brief fantasies of writing a nice Scheme or Python environment for direct interpretation of statecharts, but I have more fantasies than time.

Following the foray into imperative specifications, we drifted towards reactive systems, first by looking at Lamport, a system of temporal requirements on “operations” (essentially transitions). From there we began a multi-week foray into temporal logic. (Needless to say, these four or so weeks were dutifully skipped.)

Only a few weeks ago we emerged from the temporal forest into the process specification language Lotos. It’s essentially π-calculus with output as synchronization: when two processes are synchronized on a given channel and one wants to output, the other must be either (a) willing to receive, or (b) outputting the exact same value.

After Lotos, we looked at Larch, a slimmed-down algebraic specification system. It’s often decidable, unlike Zed, and deals with axiomatic theories as opposed to transitions. (Zed has syntax for axiomatic reasoning, too, but it is not the default mode of expression.) Larch is nice, particularly since it offers a very intuitive, functional way of reasoning — the axioms tend to make great use of lazy evaluation.

So, out of all of the languages I was exposed to, I liked three: statecharts, Lotos, and Larch. What makes these three special? Each of them implies an execution model. These models are sequential-reactive, parallel-reactive, and functional.

Statecharts are “sequentially reactive” in that the focus is on the sequential steps taken in a reactive program. It is the lowest level of all of the languages, but is still quite expressive due to its hierarchical nature. It’s also the easiest to prove things about, since statecharts are completely decidable for variables in finite domains. The only barrier is the size of the generated model.

Lotos is “reactive in parallel”, as it focuses on process synchronization and messages sent. This is an interesting idiom, and the one I have the least experience in. I had a few problems with it, as the synchronization mechanics were frequently abused in unrealistic ways. For example, one of our sample models was a system of two elevators and a controller. Both of the elevators shared a floor counter, which worked in a very strange way (continuously counting up and down, trying to synchronize with elevators at each floor). This happened to be easily and naturally expressed in Lotos, but no working implementation could ever be reified from the specification.

Lastly, Larch is “functional” — we reasoned about various evaluation axioms over a set of tags. We essentially give a reduction semantics for these tags, indicating that some construct and others destruct, while still others observe structure. All of the Larch specifications I’ve seen so far could be translated easily into Haskell; some of them would even be fully operational.

What makes these three so good? They’re intuitive, they’re like programming languages. Statecharts resemble the ALGOL family, Lotos resembles Erlang and othe process languages, and Larch is like FP, of course. Writing specifications in these languages is like higher-level programming, a few steps beyond pseudocode. An FP programmer working in Larch will be able to write specifications in a way that discovers bugs and elucidates requirements quickly — more quickly than if writing in an abstract logic like Lamport or Zed. It’s not just playing with symbols and logic — what I felt with Zed — the end product is computationally intuitive.

I’m reminded of Nancy Leveson’s keynote at FSE-12. She talked about how engineers are just confused and irritated by formal logics. Instead, she gave them a “case chart”: each row represents a boolean state (e.g., “engines are on”, “wings are adjusting”, etc.); each column was a set of checkboxes. For a given case chart, engineers would check off the valid sets of states, each column representating a different sub-case of the given case. This, she pointed out, was just DNF. But if you call it DNF they’ll scream.

The checkbox chart was in the engineers’ domain, but logics weren’t. Statecharts, Lotos, and Larch are so good because they each rest solidly in the implementation domain: programming languages.