I finally upgraded the download (Alpha-release) version and brought it in sync with latest sources. I removed some dead and not-used-yet code, did some code cleanup and a bit of refactoring. Aside this, there are a few changes and improvements that I would like to comment on.
The first important change is renaming SymbolTerminal to KeyTerm. I think that the name SymbolTerminal was misleading and a bit confusing, and it became more so as I faced a need for a terminal for scanning elements known as "symbols" in many programming languages; for example, in Ruby it is an identifier preceeded by a colon, like :name. The former SymbolTerminal, on the other hand, was created to recognize the language-defined keywords and special characters (operator, assignment, braces, etc). So the name change makes sense, and "key" prefix in KeyTerm comes from the "keyword". This name change caused obvious name changes to corresponding dictionaries, lists and fields. It also made sense to rename the Symbol() method in the Grammar class - it is used to explicitly convert the first string in string-only BNF expressions and allow proper method overload to kick in. For example:
BinOp.Rule = Symbol("+") | "-" | "*" | "/";
I changed the name of the method to ToTerm(string), but I kept the old version Symbol() and decorated it with Obsolete attribute. The Symbol() method is heavily used in grammars, and leaving an old method for a while would allow to compile the old grammars without errors. The compiler will only give a warning that the method was deprecated and you should use ToTerm instead. Now the rule for BinOp should look like this:
BinOp.Rule = ToTerm("+") | "-" | "*" | "/";
Please make a change in your existing grammars.
Continuing with this theme, I'm thinking about better ways to handle keywords internally. It looks like this Grammar.CurrentGrammar thread-static property is more trouble than it's worth. The main problem is that keyword terminals can be created in static operator overloads (BnfTerm.Op_Pipe, Op_Plus etc). At the same time the terminal for a particular keyword should be instantiated once in a grammar, and with proper case sensitivity arrangement. Therefore, the grammar (an instance) should have a list/dictionary of keyword terminals which should be accessible from static operator overload methods. Currently it is done through this Grammar.CurrentGrammar property, but I'm wondering if there's a better way. Will see.
Another change is a new Grammar.RunSample virtual method. The purpose of this method is to allow custom implementations of running a sample code in the Grammar Explorer. The base implementation in Grammar class executes the code using Irony's AST interpreter - this works well for the expression evaluator grammar, as an example. The SearchGrammar (Google-to-SQL search syntax converter) overrides this method and provides it's own implementation - it converts the input string into SQL server syntax using its own custom method. Using the RunSample override allowed to get rid of a separate sample project I had before for testing the search grammar - now you can do it directly in the Grammar Explorer.
One more thing: the language flag SupportsInterpreter is renamed to CanRunSample. If this flag is set, the Grammar Explorer enables the Run button and calls the Grammar.RunSample method on click.
That's it for now - more stuff is coming. We are on track for release by the end of the year.
Enjoy it!
Oct 14, 2009
Sep 23, 2009
About code generation
As I've written in my previous post, I decided not to pursue IL code generation in Irony for initial release. The reason for this is that to do it properly I need to build more stuff than I thought originally.
You might well ask - can't we generate the IL code directly from AST tree? Irony parser produces AST tree, so can we just go one step forward and generate IL code by iterating the tree and invoking some "GenerateIL" method for each node? It is possible, and might seem like a viable solution. This is the approach that DLR team had chosen. With DLR, you build a parser for your language that translates the source code into AST tree consisting of DLR-provided AST nodes. Then DLR directly compiles the AST tree into IL instructions. Sounds simple and straightforward, and that's what I originally planned to do as well. This approach is a shortcut of what is described in compiler textbooks and papers (including the Dragon book):
Basicallly, when moving to the executable code, we have to "linearize" the high-level tree-like structures. It is better to do it with some intermediate linear pseudo-code that can preserve important meta-data information from the AST tree. We will be able to do some analysis and transformations with this intermediate representation. Note that most of the important code analysis algorithms work with linear code representations, not with AST trees. They cannot be applied to IL/assembly level code, because it is too primitive and does not contain enough meta-data that is used and produced by the algorithms. All these "lattices" and propagation algorithms of code analysis are defined over linear sequences of pseudo-instructions.
Why we need code analysis at all, you may ask. For several reasons:
What do you think?
You might well ask - can't we generate the IL code directly from AST tree? Irony parser produces AST tree, so can we just go one step forward and generate IL code by iterating the tree and invoking some "GenerateIL" method for each node? It is possible, and might seem like a viable solution. This is the approach that DLR team had chosen. With DLR, you build a parser for your language that translates the source code into AST tree consisting of DLR-provided AST nodes. Then DLR directly compiles the AST tree into IL instructions. Sounds simple and straightforward, and that's what I originally planned to do as well. This approach is a shortcut of what is described in compiler textbooks and papers (including the Dragon book):
- The parser produces an AST tree.
- The AST tree is then translated into intermediate code representation - not MSIL, don't be confused; it is a custom internal representation of linear sequences of some virtual commands enriched with extra custom meta-data information
- The intermediate code is analyzed, optimized, restructed for execution in VM or physical processor.
- The machine instructions (or IL in our case) are generated from the intermediate representation.
Basicallly, when moving to the executable code, we have to "linearize" the high-level tree-like structures. It is better to do it with some intermediate linear pseudo-code that can preserve important meta-data information from the AST tree. We will be able to do some analysis and transformations with this intermediate representation. Note that most of the important code analysis algorithms work with linear code representations, not with AST trees. They cannot be applied to IL/assembly level code, because it is too primitive and does not contain enough meta-data that is used and produced by the algorithms. All these "lattices" and propagation algorithms of code analysis are defined over linear sequences of pseudo-instructions.
Why we need code analysis at all, you may ask. For several reasons:
- First of all, certain code analysis is unavoidable in any case. Even if we try to generate the IL code directly from AST tree, before we start producing IL, we need to perform several passes over the AST tree doing things like variable allocations, bindings, aggregating meta information - and this is in fact code analysis. Most of these operations are actually easier to do over linear intermediate code than over AST tree.
- Secondly, if we ever hope to do any optimization of the generated code, then we need it, as well as code representation expected by most analysis/optimization algorithms.
- Finally, it would be nice to have some static code analysis to detect things like use of uninitialized local variables, computed but never used values, etc. This is done by well-known code analysis algorithms which again expect a linear intermediate representation.
What do you think?
Subscribe to:
Posts (Atom)