I’m two weeks into my Google Summer of Code project, and decided it was time to write the first update describing the work I’ve done, and the work I will do.
I’m currently developing this tool under the auspices of the Apache Foundation during this year’s Google Summer of Code. For more information on it, you could read my GSoC project proposal here, or even check out the code here.
Week 1 Overview
As I said above, I’m now two weeks into the project. I had already done some work on this last semester, so I’ve been adding in support for additional modules described in the SCXML specification. In Week 1, I added basic support for the Script Module. I wrote some tests for this, and it seemed to work well, so I checked it in.
Difficulties with E4X
These language features allowed me to write my compiler in a very declarative style: I would execute transformations on the input SCXML document, then query the resulting structure and and pass it into templates which generated code in a top-down fashion. I leveraged E4X’s language features heavily throughout my project, and was very productive.
Unfortunately, during Week 1, I ran into some difficulties with E4X. There was some weirdness involving namespaces, and some involving scoping. This wasn’t entirely surprising, as the Rhino implementation of E4X has not always felt very robust to me. Right out of the box, there is a bug that prevents one from parsing XML files with XML declarations, and I have encountered other problems as well. In any case, I lost an afternoon to this problem, and decided that I needed to begin to remove SCXMLcgf/js’s E4X dependencies sooner rather than later.
I had known that it would eventually be necessary to move away from E4X for portability reasons, as it would be desirable to be able to run the SCXMLcgf/js in the browser environment, including non-Mozilla browsers. There are a number of reasons for this, including the possibility of using the compiler as a JIT compiler, and the possibility of providing a browser-based environment for Statechart development. Given the problems I had had with E4X in Week 1, I decided to move this task up in my schedule, and deal with it immediately.
So, for Week 2, I’ve been porting most of my code to XSLT.
Justification for Targeting XSLT
At the beginning of Week 2, I knew I needed to migrate away from E4X, but it wasn’t clear what the replacement technology should be. So, I spent a lot of time thinking about SCXMLcgf/js, its architecture, and the requirements that this imposes on the technology.
The architecture of SCXMLcgf/js can be broken into three main components:
- Front End: Takes in arguments, possibly passed in from the command-line, and passes these in as options to the IR Compiler and the Code Generator.
- IR Compiler: Analyzes the given SCXML document, and creates an Intermediate Representation (IR) that is easy to generate code from.
My goal for Week 2 was just to eliminate E4X dependencies in the Code Generator component. The idea behind this component is that its modules should only be used for templating. The primary goal of these template modules is that they should be easy to read, understand, and maintain. In my opinion, this means that templates should not contain procedural programming logic.
Moreover, I came up with other precise feature requirements for a templating system, based on my experience from the first implementation of SCXMLcgf/js:
- must be able to run under Rhino or the browser
- multiline text
- variable substitution
- iteration (loops)
- if/else blocks
- Mechanisms to facilitate Don’t Repeat Yourself (DRY)
- Something like function modularity, where you separate templates into named regions.
- Something like inheritance, where a template can import other templates, and override functionality in the parent template.
A quick survey of XSLT, however, indicated to me that it did support all of the above functionality. So, this left me to consider XSLT, the other programming language which enjoys good cross-browser support.
I was pretty enthusiastic about this, as I had never used XSLT before, but had wanted to learn it for some time. Nevertheless, I had several serious concerns about targeting XSLT:
- How good is the cross-browser support for XSLT?
- I’m a complete XSLT novice. How much overhead will be required before I can begin to be productive using it?
- Is XSLT going to be ridiculously verbose (do I have to wrap all non-XML text in a <text/> node)?
- Is there good free tooling for XSLT?
- Another low-priority concern was that I wanted to keep down dependencies on different languages; it would be nice to focus on only one. I’m not sure about XSLT’s expressive power. Would it be possible to port the IR-Compiler component to XSLT?
To address each of these concerns in turn:
- There are some nice js libs that abstract out the browser differences: Sarissa, Google’s AJAXSLT.
- I did an initial review of XSLT. I found parts of it to be confusing (like how and when the context node changes; the difference between apply-templates with and without the select attribute; etc.), but decided the risk was low enough that I could dive in and begin experimenting with it. As it turned out, it didn’t take long before I was able to be productive with it.
- Text node children of an <xsl:template/> are echoed out. This is well-formed XML, but I’m not sure if it’s strictly legal XSLT. Anyhow, it works well, and looks good.
- This was pretty bad. The best graphical debugger I found was: KXSLdbg for KDE 3. I also tried the XSLT debugger for Eclipse Web Tools, and found it to be really lacking. In the end, though, I mostly just used <xsl:message/> nodes as printfs in development, which was really slow and awkward. This part of XSLT development could definitely use some improvement.
I’ll talk more about 5. in a second.
XSLT Port of Code Generator and IR-Compiler Components
I started to work on the XSLT port of the Code Generator component last Saturday, and had it completed by Tuesday or Wednesday. This actually turned out not to be very difficult, as I had already written my E4X templates in a very XSLT-like style: top-down, primarily using recursion and iteration. There was some procedural logic in there which need to be broken out, so there was some refactoring to do, but this wasn’t too difficult.
When hooking everything up, though, I found another problem with E4X, which was that putting the Xalan XSLT library on the classpath caused E4X’s XML serialization to stop working correctly. Specifically, namespaced attributes would no longer be serialized correctly. This was something I used often when creating the IR, so it became evident that it would be necessary to port the IR Compiler component in this development cycle as well.
Again, I had to weigh my technology choices. This component involved some analysis, and transformation of the given SCXML document to include this extra information. For example, for every transition, the Least Common Ancestor state is computed, as well as the states exited and the states entered for that transition.
I was doubtful that XSLT would be able to do this work, or that I would have sufficient skill in order to program it, so I initially began porting this component to just use DOM for transformation, and XPath for querying. However, this quickly proved not to not be a productive approach, and I decided to try to use XSLT instead. I don’t have too much to say about this, except to observe that, even though development was often painful due to the lack of a good graphical debugger, it was ultimately successful, and the resulting code doesn’t look too bad. In most cases, I think it’s quite readable and elegant, and I think it will not be difficult to maintain.
Updating the Front End
The last thing I needed to do, then, was update the Front End to match these changes. At this point, I was in the interesting situation of having all of my business logic implemented in XSLT. I really enjoyed the idea of having a very thin front-end, so something like:
xsltproc xslt/normalizeInitialStates.xsl $1 | \
xsltproc xslt/generateUniqueStateIds.xsl - | \
xsltproc xslt/splitTransitionTargets.xsl - | \
xsltproc xslt/changeTransitionsPointingToCompoundStatesToPointToInitialStates.xsl - | \
xsltproc xslt/computeLCA.xsl - | \
xsltproc xslt/transformIf.xsl - | \
xsltproc xslt/appendStateInformation.xsl - | \
xsltproc xslt/appendBasicStateInformation.xsl - | \
xsltproc xslt/appendTransitionInformation.xsl - | \
xsltproc xslt/StatePatternStatechartGenerator.xsl | \
xmlindent > out.js
In the interest of saving time, however, I decided to continue to use Rhino for the front end, and use SAX Java API’s for processing the XSLT transformations. I’m not terribly happy with these API’s, and I think Rhino may be making the system perceptibly slower, so I’ll probably move to the thin front end at some point. But right now this approach works, passes all unit tests, and so I’m fairly happy with it.
I’m not planning to check this work into the Apache SVN repository until I finish porting the other backends, clean things up, and re-figure out the project structure. I’ve been using git and git-svn for version control, though, which has been useful and interesting (this may be the subject of another blog post). After that, I’ll be back onto the regular schedule of implementing modules described in the SCXML specification.