<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7076760031171426918</id><updated>2011-07-30T17:13:38.255-07:00</updated><category term='Irony'/><title type='text'>Irony</title><subtitle type='html'>My blog about Irony project:  http://www.codeplex.com/irony</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>7</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7076760031171426918.post-6580596988345272667</id><published>2010-01-24T17:28:00.000-08:00</published><updated>2010-01-27T11:22:00.093-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Irony'/><title type='text'>January 2010 Update</title><content type='html'>Hi everybody! &lt;br /&gt;We are in 2010 now. Back in September, I talked about plans to make an official release of Irony before the end of 2009. Obviously, that did not happen. Still too many issues to fix and features to add before Irony can be declared "complete" for the first release. My apologies to everybody. It looks like it always takes longer than it takes. I was a bit busy with other things lately (crunch at day-time job), so I could not spend enough time on Irony. I apologize for the failed promise, and will try to make it happen this year, hopefully in a few months. &lt;br /&gt;Now back to the current business. I'm making a sizable update to Irony today and would like to go over what is there. We'll start with small things and then move to bigger and more important changes. &lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Minor Fixes&lt;/strong&gt;&lt;br /&gt;I have fixed several bugs and inconsistencies reported previously. &lt;br /&gt;&lt;ol&gt;&lt;li&gt;FreeTextLiteral is fixed - now correctly grabs the text when it ends at EOF. &lt;/li&gt;&lt;li&gt;SourceStream.TabWidth property - now writable, so it can be set before parsing. That should fix an issue reported previously that this setting was useless in current implementation - there was no way to set it properly.&lt;/li&gt;&lt;li&gt;Fixed Scanner.vsScanToken method, adjusted to latest changes in token scanning mechanism. Did not test it with VS integration - don't have a working test package, but hopefully the reported problems are fixed. Let me know if you find any problems. &lt;/li&gt;&lt;li&gt;Changed the scanner error message to show the expected list of terminals, similar to what parser reports in case of error. This message is shown by the Scanner when it fails to produce a token in the current position.&lt;/li&gt;&lt;li&gt;Added validation of ErrorRule property of non-terminals, more precisely - productions that contain SyntaxError term. These rules/productions are instructions to Parser telling it how to recover in case of error to continue parsing and reporting other possible errors. I have noticed that these productions might be set incorrectly (it happened in one of the grammars I have seen recently), so I added an automatic validation at grammar construction time. The rule is that SyntaxError term inside the rule must always be followed by some terminal, so SyntaxError cannot be the last term in production. The meaning of error production is to be an instruction to the parser: "If you encounter error here, skip everything until this terminal X", where X is the terminal after the SyntaxError term. The added validation code checks there is in fact such a terminal X. &lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;strong&gt;Enhancements&lt;/strong&gt;&lt;br /&gt;&lt;em&gt;Handling transient and list non-terminals&lt;/em&gt;&lt;br /&gt;Refactored the parser code that handles the transient and list nodes. Now there are some explicit restrictions on non-terminals that are declared transient: first, list nodes cannot be declared transient; secondly, the rule for transient non-terminal must include 0 or 1 non-punctuation nodes, no more than one. &lt;br /&gt;A transient term is an "abstract" non-terminal that is introduced for better grammar structuring, but has no concrete implementation in the language. For example, a "statement" non-terminal is usually defined as an OR of concrete statement types: &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Stmt.Rule = AssingnmentStmt | ForStmt |WhileStmt...; &lt;br /&gt;&lt;br /&gt;&lt;div&gt;so there is no such thing as "statement", it is an aggregating term for different kind of statements. By declaring it transient we tell parser to replace the Stmt node (when it is about to produce it) with its direct child node - a specific implementation. &lt;br /&gt;&lt;/div&gt;In light of this explanation, the added restrictions make sense - we make sure we have a single child for transient nodes. In case of 0 children, I decided to allow this for now - these nodes will be eliminated if they are members of the list, so list constructor would skip them. Banning optional 0-child nodes seem to make it harder to go around in some common cases.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;MakePlusRule, MakeStarRule enhancement&lt;/em&gt;&lt;br /&gt;An important improvement is an addition to MakePlusRule and MakeStartRule method. Now they have an exra optional boolean parameter allowTrailingDelimiter. Setting this parameter to True allows you to create lists that have an optional extra delimiter after the list. This is quite common situation. For example, in c# the enum declaration allows a comma after the last element: &lt;br /&gt;&amp;nbsp; public enum MyEnum {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; First, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Second,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Third,&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;- notice a trailing comma after the Third - this is valid. With this extra parameter Irony will take care of the lists like this for you.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;New property NodeCaptionTemplate in NonTerminal and GetParseNodeCaption method&lt;/em&gt;&lt;br /&gt;The purpose of this property is to allow customization of the node caption in the parse tree in Grammar Explorer. For example, if you parse a GwBasic program (in the old version of Irony), the parse tree on the right would show a root PROGRAM node with a bunch of LINE nodes as its children. Each line's caption is just LINE, so you have to expand the node to see its contents and actually understand which particular line it is. In the new version of GwBasic grammar, the LINE non-terminal is defined as follows (I skip some details):&lt;br /&gt;&lt;div&gt;&amp;nbsp; LINE.Rule = lineNumber + LINE_CONTENT_OPT + COMMENT_OPT + NewLine;&lt;br /&gt;&lt;/div&gt;&amp;nbsp; LINE.NodeCaptionTemplate = "Line #{0}";&lt;br /&gt;NodeCaptionTemplate specifies that node caption should be a word "Line" followed by the contents of the child node at 0 index - which is line number. So line nodes in the tree now show up as "Line 10", "Line 20", etc. For other examples, see miniPython grammar - I assigned this property to non-terminals for function definition and function call, so now the show up as "def myFun(...)" and "call myFun(...)", showing the function name instead of simply node type.&lt;br /&gt;Additionally, the process of constructing the caption now goes through the virtual Grammar method GetParseNodeCaption, which you can override in your grammar and provide custom captions for specific nodes in your grammar. &lt;br /&gt;&lt;br /&gt;&lt;em&gt;Dispatching the implementation method call in AstNode base class&lt;/em&gt;&lt;br /&gt;One important improvement to AstNode base class. Previously, the AstNode.Evaluate method was calling virtual EvaluateNode method, which is supposed to be overridden in subclassed nodes and provide a specific implementation of node evaluation. It is still the case, but with a twist. There is an additonal protected field EvaluateRef in AstNode class - it is a pointer to the implementation of evaluation method, so Evaluate dispatches the call to specific implementation through this reference. This field is initially set to pointer to EvaluateNode method, so everything works as before. But now you can provide several implementations of "EvaluateNode" in specific node, each for some specific situation, and choose which to use by setting the pointer EvaluateRef during the initialization of the node. This provides a way for language implementor to combine several specific implementations in one AstNode subclass without creating several more specific subclasses. &lt;br /&gt;&lt;br /&gt;&lt;em&gt;UnExprNode class is refactored into several more specific classes&lt;/em&gt;&lt;br /&gt;The previous implementation of UnExpNode was handling all kinds of unary expressions including inc/dec operators (++/--). It was quite messy, so I decided to break it into several more specific classes. See the expression evaluator grammar for a usage example of these new node classes. &lt;br /&gt;&lt;br /&gt;&lt;strong&gt;New Features&lt;/strong&gt;&lt;br /&gt;&lt;em&gt;New terminal: ImpliedSymbolTerminal&lt;/em&gt;&lt;br /&gt;Some grammars and notation conventions have a concept of an implied (invisible) operator in expressions when two operands are sitting together without any delimiter between them. For example, in mathematical formulas the multiplication operator is usually skipped, so 'X Y' stands for 'X * Y'. The other example is SearchGrammar in samples. The input syntax allows two consequitive terms in the input stream, which implies "AND" operation, so 'T1 T2' stands for 'T1 AND T2'. &lt;br /&gt;The problem with implementing such grammars is use of operator precedence. Because there is not physical terminal for the implicit operator, you cannot assign a precedence or associativity value to it, so it becomes impossible to use operator precedence facility for automatic conflict resolution. You might think that we could create an empty non-terminal with the rule &lt;br /&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ImpliedAnd.Rule = Empty;&lt;br /&gt;&lt;/div&gt;and use it in the rule and assign a precedence value to it - but this does not work. The node for this empty non-terminal would not exist when parser is looking at the preview token - non-terminals are instantiated on top of the stack, so the parser cannot use the precedence to decide on conflict resolution. This was the case with the previous version of SearchGrammar - we could not use precedence there, so we had to fallback to an old ugly way of defining each type of binary expressions - AndExpression, OrExpression, etc, and express precedence rules through careful definition of each expression in terms of each other. &lt;br /&gt;Now we have a solution for cases like this: ImpiedSymbolTerminal. It is created by the scanner from the 'air' - it does not consume any characters in the input stream. How does the scanner know that in some particular place it must inject this impied symbol? Through scanner-parser link. The scanner looks at the current state of the parser and asks which terminals are expected. Let's say the state indicates that the terminals expected are explicit operators (like '*', '+' etc) or an implicit operator. First the scanner tries explicit operators, and if it cannot create any of them, it creates an implicit symbol and passes it to the parser. &lt;br /&gt;It should be clear from this explanation that using ImpliedSymbolTerminal requires that Parser-Scanner link is enabled for the grammar, so there is a check of this flag in the ImpliedSymbolTerminal init method. Look at SearchGrammar to see the use of this facility - the grammar had been simplified quite a bit I think, and is now more concise and clear. &lt;br /&gt;&lt;br /&gt;&lt;em&gt;BnfTerm.AstNodeConfig property&lt;/em&gt;&lt;br /&gt;This property allows you to pass some extra setup information to the AstNode initialization method. &lt;br /&gt;One usage example of this facility is StringLiteral class. AstNodeConfig property holds the instance of StringTemplateSettings (subclass of AstNodeConfig base class) in cases when embedded expressions are allowed in strings. See next section for more information about string templates and embedded expressions. &lt;br /&gt;I am planning to use this facility extensively in the near future, when implementing general-purpose AST nodes for interpreter. The idea is to provide a way for language writer to customize the behavior of general AST nodes for particular language through settings in AstNodeConfig properties of non-terminals (in addition to AstNodeType) assigned in grammar constructor. &lt;br /&gt;&lt;br /&gt;&lt;em&gt;Multiple Grammar Roots and strings templates with embedded expressions&lt;/em&gt;&lt;br /&gt;Have a look at new version of ExpressionEvaluator - it now support string operands with embedded expressions, Ruby-style syntax:&lt;br /&gt;&lt;div&gt;&amp;nbsp; name = "Joe"&lt;br /&gt;&lt;/div&gt;&amp;nbsp; hello = "Hello, #{name}"&lt;br /&gt;I added this facility to ExpressionEvaluator, just to use it as a test case in the absense of Ruby sample, will probably remove it in the future. You can use full-blown expressions inside curly braces, not only single variables. &lt;br /&gt;What is important here is not the fancy Ruby strings, but what is involved in supporting this functionality. What is required in this case is that the parser can parse and evaluate a stand-alone expression, a snippet, which is not necessarily an entire program module. (In Ruby case, it is easier - a simple expression is a valid Ruby program, so we can treat embedded expressions as little programs, but in general this is not the case). The other example of applications where such facility is needed is a code colorizer, which can convert a code snippet in some language into an HTML fragment that we can embed into a web page. We want the colorizer to be able to accept small snippets - classes, single methods, or even standalone statements as an input. But Irony parser (as it existed before) was able to accept only full-module fragments, identified by the Root non-terminal. &lt;br /&gt;Now you can have multiple roots. In addition to Grammar.Root property, there is a new Grammar.SnippetRoots set. These are additional roots for the grammar. You can add non-terminals representing "snippets" to this set in grammar constructor. The parser construction algorithm creates additional initial states in the grammar automaton for each of the snippet roots. To create a parser for a particular snippet, use constructor overload with snippet non-terminal as a parameter:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; var parser = new Parser(myLanguage, mySnippetRoot) &lt;br /&gt;Look at StringTemplateNode.cs for an example of use of this facility. I plan to use this feature in the near future when I get to code colorizers for Wiki - if I ever get there...&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Symbol Table Implementation&lt;/em&gt;&lt;br /&gt;This is quite important new feature. All strings in the langauge/grammar and in particular program are now registered in the global symbol table and represented as instances of the Symbol class. This is a standard mechanism found in most implementations of compilers and interpreters. I would say that the need for this extra layer of symbol representation is pretty obvious in unmanaged implementations - to manage effectively the names at compile- and run- time. For implementations like Irony, built on top of .NET, the benefits are not so obvious - .NET runtime manages the strings quite well behind the scene. Irony so far was doing quite well without it, and for some time I even thought it might not be needed at all. But at the end, I decided it is good to have it, it can bring some performance benefits at runtime, and it can also open possibilities for some elaborate optimizations in the future. &lt;br /&gt;One of the tricky things was to accomodate the need to work with both case sensitive and insensitive languages. I think I figured it out. Symbols are always treated as case-sensitive in internal tables. Each symbol has internal reference to all-lower symbol, which can be used for fast symbol comparison in case-insensitive languages. &lt;br /&gt;Now all key terms (keywords in grammar) and identifiers do not use plain strings to hold the textual content. Instead, they have a reference to a Symbol that represents this textual string. The implementation is not complete yet, it is more like a draft, but all public API and changes to high-visibility classes (like terminals) are implemented and put in place. I will work on this further, mainly completing internal workings of the parser and interpreter, without making breaking changes to public API. &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;strong&gt;What's Next&lt;/strong&gt;&lt;br /&gt;For the next few weeks I would like to turn to interpreter functionality and finally expand it to more realistic feature set. I see some of you guys starting playing with runtime and implementing interpreters. Nothing wrong with that, but because of incompleteness of this functionality in Irony, every time you have to reinvent the wheel. I think it's time to invent the wheel once and forever, and let you guys decorate it with specific artifacts of your languages. &lt;br /&gt;Another note on Wiki terminal and Wiki grammars in samples. Currently this implementation is quite messy and incomplete. I'm planning to get back to this functionality and rework it. So don't use it for anything but as samples - quite ugly samples in fact. It will be done better, I promise. &lt;br /&gt;Until next time - have fun with Irony!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7076760031171426918-6580596988345272667?l=irony-roman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/6580596988345272667/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://irony-roman.blogspot.com/2010/01/january-2010-update.html#comment-form' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/6580596988345272667'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/6580596988345272667'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/2010/01/january-2010-update.html' title='January 2010 Update'/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7076760031171426918.post-7592394379096497962</id><published>2009-10-14T11:20:00.000-07:00</published><updated>2009-10-14T13:08:08.975-07:00</updated><title type='text'>Updated Alpha release version, Oct 2009</title><content type='html'>I finally upgraded the download (Alpha-release) version and brought it&amp;nbsp;in sync with latest sources. I removed some dead and not-used-yet code, did some code cleanup and a bit of refactoring. Aside this, there&amp;nbsp;are a few changes and improvements that I would like to comment on. &lt;br /&gt;The first&amp;nbsp;important change is renaming &lt;em&gt;SymbolTerminal&lt;/em&gt; to &lt;em&gt;KeyTerm&lt;/em&gt;. I think that the name &lt;em&gt;SymbolTerminal&lt;/em&gt; was&amp;nbsp;misleading and a bit confusing, and it became more so as I faced&amp;nbsp;a need for&amp;nbsp;a terminal for&amp;nbsp;scanning&amp;nbsp;elements known as&amp;nbsp;"symbols"&amp;nbsp;in&amp;nbsp;many programming languages; for example, in Ruby it is an identifier preceeded by a colon, like &lt;em&gt;:name&lt;/em&gt;.&amp;nbsp;The former &lt;em&gt;SymbolTerminal&lt;/em&gt;, on the other hand, was created to recognize the language-defined keywords and special characters (operator, assignment, braces, etc). So the name change&amp;nbsp;makes sense, and "key" prefix in &lt;em&gt;KeyTerm&lt;/em&gt; comes from the "keyword". This name change caused obvious name changes to&amp;nbsp;corresponding dictionaries,&amp;nbsp;lists and fields. It also made sense to rename the&amp;nbsp; &lt;em&gt;Symbol()&lt;/em&gt;&amp;nbsp;method in the &lt;em&gt;Grammar&lt;/em&gt; class - it&amp;nbsp;is used to explicitly convert the first string in string-only BNF expressions and allow proper method overload to kick in. For example:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace;"&gt;BinOp.Rule = Symbol("+") | "-" | "*" | "/";&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I changed the name of the method to &lt;em&gt;ToTerm(string),&lt;/em&gt; but&amp;nbsp;I kept the old version &lt;em&gt;Symbol()&lt;/em&gt; and&amp;nbsp;decorated it with &lt;em&gt;Obsolete&lt;/em&gt; attribute. The &lt;em&gt;Symbol()&lt;/em&gt; method is heavily used in&amp;nbsp;grammars, and leaving&amp;nbsp;an old method for&amp;nbsp;a while would allow to compile the old grammars without errors. The compiler will only give a warning that the method was deprecated and you should use &lt;em&gt;ToTerm&lt;/em&gt; instead. Now the rule for &lt;em&gt;BinOp&lt;/em&gt; should look like this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace;"&gt;BinOp.Rule = ToTerm("+") | "-" | "*" | "/";&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Please make a change in your existing grammars. &lt;br /&gt;Continuing with this theme, I'm thinking&amp;nbsp;about better ways to handle&amp;nbsp;keywords&amp;nbsp;internally. It looks like this &lt;em&gt;Grammar.CurrentGrammar&lt;/em&gt; thread-static property is more trouble than it's worth. The main problem is that keyword terminals&amp;nbsp;can be created in static operator overloads (&lt;em&gt;BnfTerm.Op_Pipe, Op_Plus&lt;/em&gt; etc).&amp;nbsp;At the same time the&amp;nbsp;terminal for a particular keyword&amp;nbsp;should be instantiated once in a grammar, and with proper case sensitivity arrangement.&amp;nbsp;Therefore, the grammar (an instance)&amp;nbsp;should have a list/dictionary of keyword terminals which should be accessible&amp;nbsp;from static operator overload methods.&amp;nbsp;Currently it is done through this &lt;em&gt;Grammar.CurrentGrammar&lt;/em&gt; property, but I'm wondering if there's a better way. Will see.&lt;br /&gt;Another change is&amp;nbsp;a new&amp;nbsp;&lt;em&gt;Grammar.RunSample&lt;/em&gt; virtual method. The purpose of this method is to allow custom implementations of running a sample code in the Grammar Explorer. The base implementation in &lt;em&gt;Grammar&lt;/em&gt; class executes the code using Irony's AST interpreter - this works&amp;nbsp;well&amp;nbsp;for the expression evaluator grammar, as an example. The &lt;em&gt;SearchGrammar&lt;/em&gt; (Google-to-SQL search syntax converter) overrides this method and provides it's own implementation - it converts the input string into SQL server syntax using its own custom method.&amp;nbsp;Using the&amp;nbsp;&lt;em&gt;RunSample&lt;/em&gt; override&amp;nbsp;allowed to get rid of a separate sample project I had before for testing the search grammar - now you can do it directly in the Grammar Explorer. &lt;br /&gt;One more thing: the language flag &lt;em&gt;SupportsInterpreter&lt;/em&gt; is renamed to &lt;em&gt;CanRunSample&lt;/em&gt;. If this flag is set, the Grammar Explorer enables the &lt;em&gt;Run&lt;/em&gt; button and calls the &lt;em&gt;Grammar.RunSample&lt;/em&gt; method on click.&lt;br /&gt;That's it for&amp;nbsp;now -&amp;nbsp;more stuff is coming. We are&amp;nbsp;on track for release by the end of the year.&lt;br /&gt;Enjoy it!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7076760031171426918-7592394379096497962?l=irony-roman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/7592394379096497962/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://irony-roman.blogspot.com/2009/10/updated-alpha-release-version-oct-2009.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/7592394379096497962'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/7592394379096497962'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/2009/10/updated-alpha-release-version-oct-2009.html' title='Updated Alpha release version, Oct 2009'/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7076760031171426918.post-6461342071746572660</id><published>2009-09-23T00:42:00.000-07:00</published><updated>2009-09-25T08:15:20.113-07:00</updated><title type='text'>About code generation</title><content type='html'>As I've written in my previous post, I decided not to pursue IL code generation in Irony for initial release. The reason for this is that to do it properly I need to build more stuff than I thought originally.&lt;br /&gt;&lt;br /&gt;You might well ask - can't we generate the IL code directly from AST tree? Irony parser produces AST tree, so can we just go one step forward and generate IL code by iterating the tree and invoking some "GenerateIL" method for each node? It is possible, and might seem like a viable solution. This is the approach that DLR team had chosen. With DLR, you&amp;nbsp;build a parser for your language that translates the source code into AST tree consisting of DLR-provided AST nodes. Then DLR directly compiles the AST tree into IL instructions. Sounds simple and straightforward, and that's what I originally planned to do as well. This approach is a&amp;nbsp;shortcut of&amp;nbsp;what is described&amp;nbsp;in compiler&amp;nbsp;textbooks and papers (including the Dragon book):&amp;nbsp; &lt;br /&gt;&lt;ol&gt;&lt;li&gt;The parser produces an AST tree.&lt;/li&gt;&lt;li&gt;The AST tree is then translated into intermediate code representation - not MSIL, don't be confused; it is a custom internal representation of linear sequences of some&amp;nbsp;virtual commands enriched with extra custom meta-data information&lt;/li&gt;&lt;li&gt;The intermediate code is analyzed, optimized, restructed for execution in VM or physical processor. &lt;/li&gt;&lt;li&gt;The machine instructions (or IL in our case) are generated from the intermediate representation. &lt;/li&gt;&lt;/ol&gt;One might think that steps 2 and 3 are unnecessary or needed only for really advanced compilers.&amp;nbsp;But this middle phase is there for a reason. There is a huge structural mismatch/gap between AST (a tree), and executable code - a plain linear sequence of primitive commands. It is quite difficult to cross this gap directly in one step. The structure of the AST reflects the logical structure of a program using high-level abstractions of the programming languages. IL code on the other hand reflects how the program is actually executed - linearly, as a stream of instructions.&lt;br /&gt;Basicallly, when moving to the executable code, we have to "linearize" the high-level tree-like structures. It is better to do it with some intermediate linear pseudo-code that can preserve important meta-data information from the AST tree. We will be able to do some analysis and transformations with this intermediate representation. Note that most of the important code analysis algorithms work with linear code representations, not with AST trees. They cannot be applied to IL/assembly level code, because it is too primitive and does not contain enough meta-data that is used and produced by the algorithms. All these "lattices" and propagation algorithms of code analysis are defined over linear sequences of pseudo-instructions.&lt;br /&gt;Why we need code analysis at all, you may ask. For several reasons:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;First of all, certain code analysis is unavoidable in any case. Even if we try to generate the IL code directly from AST tree, before we start producing IL, we need to perform several passes over the AST tree doing things like variable allocations, bindings, aggregating meta information - and this is in fact code analysis. Most of these operations are actually easier to do over linear intermediate code than over AST tree. &lt;/li&gt;&lt;li&gt;Secondly, if we ever hope to do any optimization of the generated code, then we need it, as well as code representation expected by most analysis/optimization algorithms. &lt;/li&gt;&lt;li&gt;Finally, it would be nice to have some static code analysis to detect things like use of uninitialized local variables,&amp;nbsp;computed but never used&amp;nbsp;values, etc.&amp;nbsp;This is done by well-known code analysis algorithms which again expect a linear intermediate representation. &lt;/li&gt;&lt;/ul&gt;In conclusion, there is a lot of valuable results and algorithms in code analysis arena, and Irony should definitely take advantage of all this stuff. However, after doing some research and prototyping I see that this should be a project of its own, and probably put off to the next release. &lt;br /&gt;&lt;em&gt;What do you think?&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7076760031171426918-6461342071746572660?l=irony-roman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/6461342071746572660/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://irony-roman.blogspot.com/2009/09/about-code-generation.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/6461342071746572660'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/6461342071746572660'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/2009/09/about-code-generation.html' title='About code generation'/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7076760031171426918.post-7955185271769175408</id><published>2009-09-20T08:59:00.000-07:00</published><updated>2009-09-23T10:21:06.847-07:00</updated><title type='text'>Time to talk about the future</title><content type='html'>&lt;p&gt;Had been quite busy for last few weeks, both at work and at home. The Irony project progress had been slow lately, but I'm trying to catch up. I want to share some thoughts on the future plans for Irony, list some near and longer term goals, and invite some feedback on these plans. &lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;em&gt;Final 1.0 Irony release by the end of the year.&lt;/em&gt; I think it's time to start wrapping it up. My initial plan was to cover it all in the very first release: compiler front-end, some code analysis and IL code generation. Now I see that it's better to focus on front-end only - scanner+parser+AST generation, kind of Lex/Yacc equivalent, and to leave the rest to later versions. No IL code generation in initial release, sorry folks if you were waiting for it. More on code generation in a separate post. What I hope to implement instead is a simple interpreting engine that directly "executes" the AST tree. The AST node set would be minimal, just to support a simple expression language. You can see some sketch of this engine in the source code; for now the major part of it is a dynamic operator evaluator. &lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;em&gt;Short-term plan - to post the updated version on downloads page.&lt;/em&gt; Latest Irony sources are quite different (and much better) than the code in the old release, and this apparently becomes more and more confusing. As I noticed, most people download the Release version, thus getting quite wrong impression of the current state of the project. Back in April I posted a big update to the sources with completely refactored code base, with a lot of improvements and new functionality. However, I did not update download version, for the reason that some features supported in old download version (like interpreter) were not supported yet in refactored code. Now the current source version is finally up to the feature set of the old release. It contains AST interpreter, although quite limited - only for expression evaluator. But the only thing lacking compared to old download is a sample Scheme interpreter. I don't think this sample is important enough to hold back the upgrade of download, so it's time to create a new, intermediate release - let's call it Alpha2. Expect it in the next 2-3 weeks. I will remove unused and research-stage classes, and brush-up and clean-up the code in general. The next pre-release drop, after Alpha2, should be already BETA, with complete feature set for final release of Irony 1.0. &lt;/li&gt;&lt;/ol&gt;&lt;strong&gt;Feature set planned for 1.0 release&lt;/strong&gt;&lt;br /&gt;Here's a list of features I'm planning to implement before the 1.0 release.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Parser/Scanner&lt;/em&gt;&lt;br /&gt;The first goal here is to complete the Irony's terminal set covering most of the common and maybe unusual lexical constructs for popular languages like Python, Ruby, c#, Basic, JavaScript and others. I'm looking closely at different languages trying to identify all the special cases of syntax not supported yet by standard Irony terminals. If you can recommend me such constructs - please let me know. I might add sample grammars for more languages and maybe revive Python and Ruby samples. The purpose of this is not to have complete grammars for these languages but rather to play with some fancy lexical/syntactic constructs from these languages and to check that Irony terminals can support these cases. Some of these terminals, token types and more general facilities:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Date literal in Visual Basic (example: #01/05/2005") . A simple but important enough case. I'm thinking about creating a generic delimited Literal terminal (delimited by start/end symbol like # for VB dates) that can support this VB date and maybe some other literals in other languages - any suggestions of such literals? &lt;/li&gt;&lt;li&gt;Support for exponent symbol (D vs E) that identifies float number type. For example, in Scheme and Fortran "1.5D01" identifies "double" float value, while opposed to "1.5E01" is a single-precision number. &lt;/li&gt;&lt;li&gt;Implicit operator/precedence hint - something that Yacc has, and it might be important for some cases.&lt;/li&gt;&lt;li&gt;Create facility for processing documentation strings (Python) or XML comments (c#). Maybe implement Doc grammar hint? - might be useful for python doc strings which are essentially regular string literals but become doc pieces when they appear in certain places. &lt;/li&gt;&lt;li&gt;CompilerDirective terminal for c#, with ability to analyze boolean expressions over defined symbols in "#if" directive. This would require ability to define sub-grammars inside main grammar to parse and evaluate these expressions.&lt;/li&gt;&lt;li&gt;Templated string literal with embedded expressions, like Ruby's "Name: #{customer.Name}" - this also requires sub-grammar facility. &lt;/li&gt;&lt;li&gt;Other exotic terminals: Ruby HereDoc; &lt;/li&gt;&lt;li&gt;Thinking also about implementing Wiki grammar/parser - might be an interesting case. &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;em&gt;Other features&lt;/em&gt;&lt;strong&gt; &lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Localization support - put all error messages into localizable resources. &lt;/li&gt;&lt;li&gt;Symbol table implementation &lt;/li&gt;&lt;li&gt;Interpreter for direct interpretation of AST tree; no code analysis or IL code generation. Implementing basic runtime and object model for interpreter; basic generic infrastructure, not necessarily implementations for specific languages. Basic AST node set, maybe only for expression evaluator.&lt;/li&gt;&lt;li&gt;Template processor to support processsing of files like Ruby's .rhtml templates. &lt;/li&gt;&lt;li&gt;Moving to VS 2010 when it's out, support for DynamicObject facility from .NET 4.0. &lt;/li&gt;&lt;li&gt;Finish VS Integration support implementation. &lt;/li&gt;&lt;li&gt;Write Xml documentation for core classes - this is a huge effort; also will try to create some introductory quick-start guide &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Asking for feedback&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Please let me know what you think about all this. If you have an idea or suggestion about features that you think should be included into release - please post a comment or shoot me an email. Especially looking for suggestions about exotic lexical constructs found in existing languages that are not yet supported by Irony terminals; if you know such construct - please let me know. &lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7076760031171426918-7955185271769175408?l=irony-roman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/7955185271769175408/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://irony-roman.blogspot.com/2009/09/time-to-talk-about-future.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/7955185271769175408'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/7955185271769175408'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/2009/09/time-to-talk-about-future.html' title='Time to talk about the future'/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7076760031171426918.post-4487362050620070901</id><published>2009-05-04T21:59:00.000-07:00</published><updated>2009-05-05T08:56:08.915-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-family:Arial;font-size:130%;"&gt;&lt;strong&gt;About Keywords and Reserved words&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;In my previous post I forgot to describe what happened to keywords and reserved words in new version, and how their treatment had changed. These terms are mentioned in the caption of one of the sections but I forgot to actually describe the new functionality. So here it is.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;In the previous versions the &lt;em&gt;Grammar&lt;/em&gt; class had a &lt;em&gt;Keywords&lt;/em&gt; field which was a string list. The programmer had to explicitly add the language keywords to the list, so that they could be highlighted in the code editor, for ex. It was a bit superflous as each of the keywords was already used in the grammar (and therefore Irony could discover them), so what's the point of listing them again? In addition, th&lt;/span&gt;&lt;span style="font-family:Arial;"&gt;ere was some confusion about what this list should contain - keywords or reserved words only. Now the issue had been clarified. Irony now maintains a clear distinction between the two terms. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;A &lt;strong&gt;keyword&lt;/strong&gt; is any word (symbol starting with a letter) used in language/grammar definition. A &lt;strong&gt;reserved word&lt;/strong&gt; is a keyword that is reserved by the language as special symbol and cannot be used as identifier or anything else. New version of Irony does not hold separate lists for these special words: keywords and reserved words are symbol terminals that are marked by special flags: &lt;em&gt;TermOptions.IsKeyword&lt;/em&gt; and &lt;em&gt;Term.Options.IsReservedWord&lt;/em&gt;. Any token produced by scanner has its Terminal object attached to it, so tokens containing special words would have &lt;em&gt;SymbolTerminal&lt;/em&gt; in &lt;em&gt;Terminal&lt;/em&gt; field with appropriate flag set. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;Now, how these flags are set. Irony sets the &lt;em&gt;IsKeyword&lt;/em&gt; flag automatically for any word (symbol starting with a letter) that it finds in grammar rules. The grammar author does not need to add keywords explicitly to any lists - Irony finds them automatically, and highlighter would color them appropriately. On the other hand, the reserved words in the language should be marked explicitly using &lt;em&gt;MarkReservedWords(...)&lt;/em&gt; method. This call is important for languages with reserved words, as it tells scanner and parser to always recongize the word as keyword, and never as indentifier or something else. Without this call the grammar may become ambiguous and parser can make wrong decisions. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;In conclusion - a little less code to write for you, fellow language creators. &lt;/span&gt;&lt;span style="font-family:Arial;"&gt;Enjoy it!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7076760031171426918-4487362050620070901?l=irony-roman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/4487362050620070901/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://irony-roman.blogspot.com/2009/05/about-keywords-and-reserved-words-in-my.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/4487362050620070901'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/4487362050620070901'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/2009/05/about-keywords-and-reserved-words-in-my.html' title=''/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7076760031171426918.post-5593633312133234642</id><published>2009-04-27T22:48:00.000-07:00</published><updated>2009-05-02T14:43:10.734-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Irony'/><title type='text'>Irony, April 2009 update - what's new</title><content type='html'>&lt;span style="font-family:arial;"&gt;Lot's of things have changed: namespaces and classes renamed, some parts are rewritten from scratch; there are breaking changes in public API. Breaking public API is a bad thing, I know, but I think it was unavoidable result of code refactoring aimed at fixing existing problems in Irony functionality. I hope you will agree after reading this blog that fixes and improvements in new version were worth it, and Irony has become a much better tool as a result.&lt;br /&gt;Here is a quick list of changes: &lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;New namespaces, folders, classes &lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;New LALR construction implementation&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;NLALR parsing algorithm&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Concrete and abstract syntax trees&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Parser/Scanner/Token-Filters pipeline changes&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Scanner conflicts resolution, Parser-Scanner link&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Grammar Symbols, Keywords, &lt;/strong&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Reserved Words and Case Sensitivity&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Scripting namespace, AST classes, interpreter&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Diagnostics, Parser Tracing, Grammar Explorer changes&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;New namespaces, folders, classes &lt;/strong&gt;&lt;br /&gt;The code had been reorganized. The Irony.Compiler namespace had been renamed to Irony.CompilerServices - this allowed using Compiler for class name, so former LanguageCompiler is now simply Compiler. All code and entities involved in construction of parsing automaton - computing states and transitions - is moved into a new namespace Irony.CompilerServices.Construction, with dedicated classes for building GrammarData, ScannerData, ParserData.&lt;br /&gt;The parsing automaton data is also separated from transient data used during automaton construction. The ParserState class has a special field BuilderData that holds the temporary construction data. This field may be cleared when construction is completed to release the memory.&lt;br /&gt;Former Irony.Runtime namespace was merged into new namespace Irony.Scripting, which now has AST sub-namespace with all AST node classes. All AST nodes defined in Irony so far were implementations of executable nodes for scripting/dynamic languages, so it makes sense to move all these stuff to the new Scripting namespace. Irony.Diagnostics is a new namespace containing debugging and tracing classes and utilities.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;New LALR Construction Implementation&lt;/strong&gt;&lt;br /&gt;I have to admit that the implementation of LALR algorithm in previous version was not quite correct. Initially I followed the LALR description in Dragon book, and there is one subtle moment in lookahead propagation piece that is not clearly emphasized there. It was only after reading through LALR description in "Parsing Techniques" (2nd edition, 2007 by Grune, Jacobs) that I understood my misenterpretation. If you are into parsing and compilers - get this excellent book!&lt;br /&gt;As for the fault in original LALR implementation in Irony, it is explained on page 308. The last paragraph on this page starts with the sentence "Now it's tempting to say that ... ... ...but that is not true". It is this tempting but wrong conjecture that was assumed true in old implementation. It turns out that the wrong Irony's implementation is a legitimate algorithm known as NQLALR method (Not-Quite LALR). It works well, except it potentially might have more conflicts in parsing automaton in complex grammars compared to pure LALR.&lt;br /&gt;The new LALR construction follows the outline in "Parsing techniques", except two details.&lt;br /&gt;Firstly, the algorithm known as DeRemer and Pennello algorithm, based on notions of transitions, lookbacks and "includes relation". In its second part involving "includes relation propagation" it suggests using efficient transitive closure computation based on SCC (Strongly-Connected Components algorithm). This optimization part is not implemented, and still awaits its hero coder. That explains why processing complex grammars like c# takes so long (about 500 ms) - should be much faster when optimization is in place.&lt;br /&gt;Secondly, the original algorithm computes lookaheads, types of expected tokens, strings in fact. Irony's computation is a bit more complex - it collects not lookaheads themselves but lookahead sources - LR items that are actually produce the lookaheads. The reason for this is that Irony now implements NLALR algorithm (non-canonical variation of LALR) which is built as an extension of LALR. NLALR uses LALR automaton as a base. Skipping some technical details, let's just say that NLALR uses the lookahead sources (LR items) to construct new non-canonical states that are used as base points for exploration of the further input and for building new non-terminal lookaheads. With these two exceptions the Irony LALR implementation follows quite closely the implementation described in Grune,Jacobs.&lt;br /&gt;Note: the term "transition" used in the book maps to Irony's "ShiftTransition" class.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;NLALR parsing algorithm&lt;/strong&gt;&lt;br /&gt;One big addition in Irony is NLALR parsing method (Non-canonical LALR), with optional automatic grammar transformation. The Irony.Samples project includes several demo grammars in NonCanonTestGrammars folder that clearly illustrate the concept. There is also a text file _about_noncanon_grammars.txt in the same folder with comments and explanations of how it works and what is so good about non-canonical parsing. The ultimate goal of NLALR implementation is to build a parsing algorithm that is free of usual limitations of LR methods with limited number of lookahead symbols. It is now working for simple grammars like the ones provided as demo, but is not yet ready to handle complex grammars like c# sample.&lt;br /&gt;The current NLALR implemenation needs more work: automaton construction is quite slow, and number of extra non-canonical states generated seems excessive, etc.As for the sample c# grammar in particular, it is not exactly the grammar from c# spec, I've changed it quite a lot previously to fit into LALR method. I'm not sure what is the trouble for NLALR with this grammar - ambiguities in original grammar or the tweaks I added. More experimentation and investigation is needed.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Concrete and abstract syntax trees&lt;/strong&gt;&lt;br /&gt;In the previous version Irony parser produced the abstract syntax tree (AST) directly. The AST nodes were created based on types specified in NodeType property of non-terminals. The base idea was that language implementors were free to build the AST node hierarchy, but in fact Irony was forcing them to use the Irony-defined AstNode as a base class.&lt;br /&gt;This is fixed now. The AST tree creation process follows a standard two-step pattern described in compiler literature. The first step is creating the so-called concrete syntax tree that directly maps the elements in the source stream into the tree. It is called concrete because it reflects the specific syntactic elements of concrete programming language. The alternative name is "parse tree". Irony parser produces the ParseTree instance, which holds the root node of the tree and all nodes underneath as child nodes of the root and so on. The nodes are instances of class ParseTreeNode - a container for all information about the parsed syntax element: grammar term.&lt;br /&gt;The AST nodes now can be instances of any type. There are two ways to instantiate and initialize nodes.&lt;br /&gt;The first method is to implement IAstNodeInit interface in all your custom node classes. Your custom nodes also must have default (parameterless) constructor. You then specify node types for each non-terminal in extra parameter of non-terminal's constructor. Irony then would create all nodes and initialize them while it builds the parse tree.&lt;br /&gt;The second method is to use NodeCreator methods that can be attached to non-terminals (through constructor parameter) and create AST nodes in these methods. Note: this functionality might be broken still, as all AST and interpreter infrastructure had not been brought back to life yet. I will fix it ASAP.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Parser/Scanner/Token filters pipeline changes &lt;/strong&gt;&lt;br /&gt;The parser and scanner classes and the way they communicate (pipeline) had been refactored. The main purpose is to simplify the public API for consumers like code colorizers.&lt;br /&gt;The token filters pipeline had been moved inside Scanner, so now the scanner returns the token stream passed through all filters specified in grammar. Previously, the token stream produced by core scanner was in fact incomplete if there were filters specified.This presented a complication for consumers that had to either use raw, incomplete stream, or build and manage the entier pipeline from scanner and filters. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;The other change is that there are two parser classes now. The former Parser class implementing LALR algorithm was renamed to CoreParser - it is "LALR parser" without scanner. A new Parser class combines both core parser and scanner, so it is a full parser transforming text into parse tree. Again, this is for simplifying the API for external consumers that need a combined scanner+parser object and don't care much about details.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Scanner conflicts resolution, scanner-parser link&lt;/strong&gt;&lt;br /&gt;Scanner often runs into situation when input stream is ambigous - the input word can be associated with more than one terminal from the list of terminals.&lt;br /&gt;One case of ambiguity that often comes up is ambiguity between identifier and keyword. Often keywords in the grammar can also be used as identifiers, unless a keyword is explicitly marked as "reserved" word in grammar. In previous version the Irony scanner/parser used some resolution mechanism that allowed the input stream to be correctly parsed, even when scanner labeled input token "incorrectly". In new version the resolution algorithm was improved.&lt;br /&gt;As before, the scanner initially recognizes keyword/identifier tokens as identifiers, but also sets an alternative terminal for token in AsSymbol field, when it finds that there is exactly matching keyword in grammar, and it is not Reserved word (more about Reserved words later). The parser first checks this AsSymbol field and if it is not null it tries to find an action for this symbol. If an action is defined in the current state, it means that "the keyword wins", and parser does some extra back-patching - this is new in Irony. Parser changes the "main" Terminal assigned to the token from Identifier to this AsSymbol terminal. The token in fact changes its identity, and might be correctly interpreted later (by code colorizer for example). If on the other hand, the parser does not find an action for AsSymbol terminal, it continues with assumption that is in fact Identifier and retrieves the action for the Identifier terminal.&lt;br /&gt;The other case of scanning ambiguity is a bit more complicated. Sometimes grammar defines several non-terminals (non symbols) that can be associated with the same words/tokens. For example, GwBasic sample grammar defines 3 number terminals: regular number (int or float) used in arithmetic expressions, lineNumber - for line numbers, and fileNumber - integers identifying file# in input/output statements. Scanner by itself cannot possibly identify what is "123" in the input stream - any of the three terminals would match it.&lt;br /&gt;In new version in situation like this the scanner checks with parser - which terminals are actually expected in current parser state. Each ParserState instance has a field ExpectedTerms which contains grammar terms (terminals and non-terminals) which are expected in input in this state. This set is precalculated at parser construction time. So when the new Irony scanner finds that more than one terminal can be used for matching the input (based on current input char), it checks the list of candidates against the ExpectedTerms set in the current parser state. That should leave only one terminal (unless the grammar is really ambiguous) which is then used for producing the token.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Grammar Symbols, Keywords, Reserved Words and Case Sensitivity&lt;/strong&gt;&lt;br /&gt;The Symbol terms dictionary and surrounding functionality had been refactored. The biggest trouble with old architecture was that Symbols dictionary was a static singleton field in SymbolTerminals class. This was obviously bad because this dictionary would be shared between several grammars in one application, with all negative consequences. But having this dictionary in static field was necessary as it was accessed by operator overrides (for + and operations) which are static methods.&lt;br /&gt;Even a bigger problem was the fact that the dictionary was always initially case sensitive, it treated differently-cased words as different symbols, and this caused all kinds of problems for case-insensitive grammars. Grammar-processing code tried to reconcile different cAsE instances of the same symbol, but it didn't always work properly: for example the operator precedence in SQL grammar for logical operators (AND, OR) didn't work because of this mess.&lt;br /&gt;The new solution in Irony is the following.&lt;br /&gt;First, the SymbolTerms dictionary is an instance field in Grammar class. Static operator overloads access this dictionary through an extra copy in so called thread-static field of Grammar class. Thread-static fields are identified by [ThreadStatic] attribute, and are just like normal static fields except there is a separate instance for each accessing thread, so there is no danger of concurrent access to this dictionary from different grammars constructed on different threads.&lt;br /&gt;Another change is that the symbol dictionary is created as case-insensitive for grammar-insensitive grammars. It is instantiated in base grammar constructor, and case sensitivity flag is now provided as a parameter to the constructor. The CaseSensitive Grammar's field is now read-only. This is one breaking change for existing case insensitive grammars, but fortunately it's an easy one to fix in existing grammars.&lt;br /&gt;These changes simplified and straightened out the symbol terms support in Irony. Operator precedence now works correctly in sample SQL grammar.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Scripting namespace, AST classes, interpreter&lt;/strong&gt;&lt;br /&gt;This version introduces new namespace Irony.Scripting that contains all runtime and interpreter-related classes, including AST nodes that are in fact nodes suitable for interpreting the code. The interpreter infrastructure is not back to life yet, most of the code is commented out. Will be fixing it and probably rewriting some code - most likely "code analysis" methods will be gone, and functionality merged into evaluation methods.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Diagnostics, Parser Tracing, Grammar Explorer changes&lt;/strong&gt;&lt;br /&gt;New Diagnostics namespace contains several classes and utilities for parser debugging and information printout. Grammar Explorer now shows parser trace in the bottom area, and trace entries are active links - double-clicking in grid cell brings to you place in source code or parser state printout. &lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;&lt;strong&gt;Enjoy it!&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7076760031171426918-5593633312133234642?l=irony-roman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/5593633312133234642/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://irony-roman.blogspot.com/2009/04/irony-april-2009-update-whats-new.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/5593633312133234642'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/5593633312133234642'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/2009/04/irony-april-2009-update-whats-new.html' title='Irony, April 2009 update - what&apos;s new'/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7076760031171426918.post-8035603352015316841</id><published>2009-04-26T01:00:00.000-07:00</published><updated>2009-05-04T21:56:20.354-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Irony'/><title type='text'>Welcome to my blog about Irony project!</title><content type='html'>&lt;span style="font-family:arial;"&gt;Hi and welcome to my blog about &lt;/span&gt;&lt;a href="http://www.codeplex.com/irony"&gt;&lt;span style="font-family:arial;"&gt;Irony - .NET Language Implementation Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;"&gt; - an open-source project hosted on CodePlex!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;I've decided to start this blog to communicate what I'm doing in Irony and why, to explain and discuss design decisions and all things that come up as the project goes ahead. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Certainly, the blog is not a replacement for a real documentation and user or programmer guides - this will come eventually. The main trouble with documentation is that it does not make much sense to document things that continue to change - and at least until now - April 2009 - the Irony code and public API-s were quite unstable and changing all the time. Now it looks like things get settled and pretty soon I will start thinking about documenting it. This future documentation will eventually explain how to use it and how it works to some extent. This blog on the other hand will be providing some background explanations why things were done in a certain way. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;So, let's get started - my first post will be about latest code drop (April 2009), and big improvements that came with it. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7076760031171426918-8035603352015316841?l=irony-roman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://irony-roman.blogspot.com/feeds/8035603352015316841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://irony-roman.blogspot.com/2008/05/welcome.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/8035603352015316841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7076760031171426918/posts/default/8035603352015316841'/><link rel='alternate' type='text/html' href='http://irony-roman.blogspot.com/2008/05/welcome.html' title='Welcome to my blog about Irony project!'/><author><name>Roman</name><uri>http://www.blogger.com/profile/02542298538518270943</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
