You might well ask - can't we generate the IL code directly from AST tree? Irony parser produces AST tree, so can we just go one step forward and generate IL code by iterating the tree and invoking some "GenerateIL" method for each node? It is possible, and might seem like a viable solution. This is the approach that DLR team had chosen. With DLR, you build a parser for your language that translates the source code into AST tree consisting of DLR-provided AST nodes. Then DLR directly compiles the AST tree into IL instructions. Sounds simple and straightforward, and that's what I originally planned to do as well. This approach is a shortcut of what is described in compiler textbooks and papers (including the Dragon book):
- The parser produces an AST tree.
- The AST tree is then translated into intermediate code representation - not MSIL, don't be confused; it is a custom internal representation of linear sequences of some virtual commands enriched with extra custom meta-data information
- The intermediate code is analyzed, optimized, restructed for execution in VM or physical processor.
- The machine instructions (or IL in our case) are generated from the intermediate representation.
Basicallly, when moving to the executable code, we have to "linearize" the high-level tree-like structures. It is better to do it with some intermediate linear pseudo-code that can preserve important meta-data information from the AST tree. We will be able to do some analysis and transformations with this intermediate representation. Note that most of the important code analysis algorithms work with linear code representations, not with AST trees. They cannot be applied to IL/assembly level code, because it is too primitive and does not contain enough meta-data that is used and produced by the algorithms. All these "lattices" and propagation algorithms of code analysis are defined over linear sequences of pseudo-instructions.
Why we need code analysis at all, you may ask. For several reasons:
- First of all, certain code analysis is unavoidable in any case. Even if we try to generate the IL code directly from AST tree, before we start producing IL, we need to perform several passes over the AST tree doing things like variable allocations, bindings, aggregating meta information - and this is in fact code analysis. Most of these operations are actually easier to do over linear intermediate code than over AST tree.
- Secondly, if we ever hope to do any optimization of the generated code, then we need it, as well as code representation expected by most analysis/optimization algorithms.
- Finally, it would be nice to have some static code analysis to detect things like use of uninitialized local variables, computed but never used values, etc. This is done by well-known code analysis algorithms which again expect a linear intermediate representation.
What do you think?
To reinforce what you have said, you can see this lack of analysis and optimization when using DLR in the performance numbers between IronPython and Python.
ReplyDeletehttp://ironpython.codeplex.com/Wiki/View.aspx?title=IP26RC1VsCPy26Perf
Thanks..
ReplyDeleteI'm really saddened by DLR outcome myself. What seems even more disturbing to me is the fact that IronPython on DLR is not much faster that IP 1.0, if not slower, contrary to what was originally promised when they anounced DLR. The DLR folks now say "it wasn't about performance", it was about interoperability, but come on, we remember Jim H talking about future performance boost once they have this dynamic code generation. What came out - no boost, and something so big and so complex that it's absolutely impossible to understand (forget use)for mere mortals like myself. After spending hours browsing the sources I just gave up - maybe I'm just stupid..