I’m currently working on a chapter of my master’s thesis, which basically fleshes out and elaborates on the paper I wrote for the SVG Open 2010 conference. The goal is pretty much the same:
- write an optimizing Statechart-to-ECMAScript compiler
- describe optimization goals: execution speed, compiled code size and memory usage
- describe the set of optimizations I intend to perform
- write a comprehensive suite of benchmarks
- run benchmarks against as wide a set of ECMAScript implementations as possible
- analyze the results
Unfortunately, SCXML-JS also had a number of shortcomings, so I’ve decided to start over and develop a new Statechart interpreter/compiler. This new interpreter would have the following improvements over SCXML-JS.
Separate out Core Interpreter Runtime
SCXML-JS generated a large chunk of boilerplate code that varied only in very small ways between optimizations. The other Statechart compiler I have worked on, SCC, worked the same way. One of the repercussions of this design choice is that it made it difficult to meaningfully compare size of the code payload, as the points of variation would be intermixed with the boilerplate.
The goal of the new project, then, is to move that boilerplate out into its own module, which will effectively function as a standalone interpreter for Statecharts. Certain methods (e.g. selecting transitions, updating the current configuration) would then be parameterized, so that optimized methods that use fast data structures that have been compiled ahead-of-time (e.g. a state-transition table) could then be injected at runtime when the interpreter class is instantiated. This would allow the methods we would like to optimize to be neatly separated out from the core interpreter runtime, thus allowing these methods to be directly compared for performance and payload size.
I think that moving the generated boilerplate code out into a class or set of classes also aids hackability, as this is more object-oriented approach, and overall is easier to read and comprehend.
Less code generation also means there is less motivation to use XSLT from top to bottom (SCXML-JS was about 90% implemented in XSLT). I have a lot of positive things to say about XSLT, but I think that overall, this will also be an advantage in terms of overall developer friendliness.
Focus on Semantics
There are many possible semantics that can be applied to Statecharts, and I decided with my advisor that it would be useful to orient the semantics of the new compiler to a set of possible semantic choices described in Big-Step Semantics, by Shahram Esmaeilsabzali, Nancy A. Day, Joanne M. Atlee, and Jianwei Niu at the University of Waterloo. I first considered using the Algorithm for SCXML Interpretation described in the SCXML specification, but after finding a bug in the algorithm, it seemed like a better approach would be to write my own step algorithm, and base it on a clear set of semantic choices outlined in Big-Step Semantics.
The semantics of the new project will be similar to SCXML semantics, but not identical. I will write a blog post in the future describing in detail the points of variation, and how they affect various edge cases.
Having made precise decisions regarding the semantics that would be used, I have endeavored to write the interpreter using a test-first methodology. Before writing the interpreter, I attempted to write a comprehensive test suite that would cover both the basic cases, as well as complex examples (e.g. n-nested parallel states with deep and shallow history, transition interrupts, and executable behaviour) and edge cases. I have also written a simple test harness to run these tests.
Focus on Threading
In SCXML-JS, I didn’t cleanly take into account whether it would be executing in a single- or multi-threaded environment, and so the approach used for event handling was not well-realized for either scenario. SCXML-JS used a queue for external events, the window.setTimeout method to poll the queue without blocking the browser thread, and a global lock on the statechart object to prevent events from being taken while another event was being processed. In the browser environment, which is single-threaded (Web Workers aside), this “busy-wait” and locking approach is simply unnecessary, as while the Statechart is processing an event, it has possession of the thread, and thus cannot be interrupted with another event before that first event has finished. There is no need to queue events, because each call to send an event to the statechart would be handled synchronously, and would return before the next event would be sent. Likewise, in a multi-threaded environment, there are often better approaches than busy-waiting, such as using a blocking queue, in which the thread sleeps when it finds the queue is empty, and gets awoken when another thread notifies it that an element as entered the queue.
In the new project, I would like to accommodate both single- and multi-threaded environments, and so the implementation will take this into account, and will provide multiple concrete implementations that leverage the best event-handling strategy available for the environment in which it is executing. Furthermore, all concrete implementations will conform to the chosen Statechart semantics for handling events.
This will also make performance and unit testing easier. The single-threaded implementation provides an easy programming model: an event is sent into the statechart, a big-step is then performed by the statechart, and the synchronous call returns to the thread of execution to the caller. It is then possible to check the configuration of the statechart to ensure that it conforms to an expected configuration. Likewise, it is easy to send many events into a statechart, and measure the amount of time the statechart takes to process all the events. This would be more complicated to implement (although certainly not impossible), in a multi-threaded implementation based on a blocking queue.
Where it will Live
I’ve posted the code to github, but I’m not going to post a link here until I work out some issues with the in-browser unit test harness, make sure it works across a range of browsers, and write some documentation.
I could commit it to Apache Commons Sandbox SVN as a new branch to SCXML-JS, but I’m going to hold off on that, for several reasons. First, I want to release alpha builds of this software, and Apache Commons has a policy that a project must graduate from Commons Sandbox to Commons Proper in order to perform releases. This requires a vote by the community regarding the project’s suitability for inclusion in Commons, and, unfortunately for my project, this is in part dependent on whether the project uses Maven as its build system. I think Maven is probably the best choice, if one is developing an application in Java, and all of its library dependencies can be found in Maven repositories. But for SCXML-JS at least, the Maven dependency proved to be extremely cumbersome. I spent at least a month after GSoC 2010 was done trying to fix up the SCXML-JS build system so that it would be acceptable to Commons, and this was not a great experience. The end result wasn’t great either. I don’t yet know what build system I will use for this project, but my main motivation is to not waste a lot of time on it, not let it be an obstacle, and to ship early.
Another factor is that I’d like to use git to version-control it. Even when I was working on SCXML-JS, I was using git-svn. git-svn is great, but it would occasionally break, and then it was tricky to recover. Ultimately, the requirement to use SVN is just another small distraction, when I would rather be spending my time coding.
I also haven’t yet decided how I would like to license it.
What currently works:
- Everything that would be considered a part of SCXML core.
- send, script, assign, and log tags.
- Fairly thorough test suite written (currently 84 unique tests).
Regarding what is to come, once I have the test harness working, have tested across browser environments, and written some documentation and examples, I will announce the project here, as well as on relevant mailing lists, and publish an alpha release.
After that, I will be working on implementing optimizations, testing performance, and analyzing the results. Optimistically, this should take about 2 weeks.
Then I will write my thesis.