Sep 20, 2009

Time to talk about the future

Had been quite busy for last few weeks, both at work and at home. The Irony project progress had been slow lately, but I'm trying to catch up. I want to share some thoughts on the future plans for Irony, list some near and longer term goals, and invite some feedback on these plans.

  1. Final 1.0 Irony release by the end of the year. I think it's time to start wrapping it up. My initial plan was to cover it all in the very first release: compiler front-end, some code analysis and IL code generation. Now I see that it's better to focus on front-end only - scanner+parser+AST generation, kind of Lex/Yacc equivalent, and to leave the rest to later versions. No IL code generation in initial release, sorry folks if you were waiting for it. More on code generation in a separate post. What I hope to implement instead is a simple interpreting engine that directly "executes" the AST tree. The AST node set would be minimal, just to support a simple expression language. You can see some sketch of this engine in the source code; for now the major part of it is a dynamic operator evaluator.

  2. Short-term plan - to post the updated version on downloads page. Latest Irony sources are quite different (and much better) than the code in the old release, and this apparently becomes more and more confusing. As I noticed, most people download the Release version, thus getting quite wrong impression of the current state of the project. Back in April I posted a big update to the sources with completely refactored code base, with a lot of improvements and new functionality. However, I did not update download version, for the reason that some features supported in old download version (like interpreter) were not supported yet in refactored code. Now the current source version is finally up to the feature set of the old release. It contains AST interpreter, although quite limited - only for expression evaluator. But the only thing lacking compared to old download is a sample Scheme interpreter. I don't think this sample is important enough to hold back the upgrade of download, so it's time to create a new, intermediate release - let's call it Alpha2. Expect it in the next 2-3 weeks. I will remove unused and research-stage classes, and brush-up and clean-up the code in general. The next pre-release drop, after Alpha2, should be already BETA, with complete feature set for final release of Irony 1.0.
Feature set planned for 1.0 release
Here's a list of features I'm planning to implement before the 1.0 release.

Parser/Scanner
The first goal here is to complete the Irony's terminal set covering most of the common and maybe unusual lexical constructs for popular languages like Python, Ruby, c#, Basic, JavaScript and others. I'm looking closely at different languages trying to identify all the special cases of syntax not supported yet by standard Irony terminals. If you can recommend me such constructs - please let me know. I might add sample grammars for more languages and maybe revive Python and Ruby samples. The purpose of this is not to have complete grammars for these languages but rather to play with some fancy lexical/syntactic constructs from these languages and to check that Irony terminals can support these cases. Some of these terminals, token types and more general facilities:
  • Date literal in Visual Basic (example: #01/05/2005") . A simple but important enough case. I'm thinking about creating a generic delimited Literal terminal (delimited by start/end symbol like # for VB dates) that can support this VB date and maybe some other literals in other languages - any suggestions of such literals?
  • Support for exponent symbol (D vs E) that identifies float number type. For example, in Scheme and Fortran "1.5D01" identifies "double" float value, while opposed to "1.5E01" is a single-precision number.
  • Implicit operator/precedence hint - something that Yacc has, and it might be important for some cases.
  • Create facility for processing documentation strings (Python) or XML comments (c#). Maybe implement Doc grammar hint? - might be useful for python doc strings which are essentially regular string literals but become doc pieces when they appear in certain places.
  • CompilerDirective terminal for c#, with ability to analyze boolean expressions over defined symbols in "#if" directive. This would require ability to define sub-grammars inside main grammar to parse and evaluate these expressions.
  • Templated string literal with embedded expressions, like Ruby's "Name: #{customer.Name}" - this also requires sub-grammar facility.
  • Other exotic terminals: Ruby HereDoc;
  • Thinking also about implementing Wiki grammar/parser - might be an interesting case.

Other features

  • Localization support - put all error messages into localizable resources.
  • Symbol table implementation
  • Interpreter for direct interpretation of AST tree; no code analysis or IL code generation. Implementing basic runtime and object model for interpreter; basic generic infrastructure, not necessarily implementations for specific languages. Basic AST node set, maybe only for expression evaluator.
  • Template processor to support processsing of files like Ruby's .rhtml templates.
  • Moving to VS 2010 when it's out, support for DynamicObject facility from .NET 4.0.
  • Finish VS Integration support implementation.
  • Write Xml documentation for core classes - this is a huge effort; also will try to create some introductory quick-start guide

Asking for feedback

Please let me know what you think about all this. If you have an idea or suggestion about features that you think should be included into release - please post a comment or shoot me an email. Especially looking for suggestions about exotic lexical constructs found in existing languages that are not yet supported by Irony terminals; if you know such construct - please let me know.


9 comments:

  1. One thing I'd like to see is to have Irony leverage Microsoft's Common Compiler Infastructure (http://cciast.codeplex.com/ There are actually 4 CodePlex projects for this, that's just the main one).

    I haven't delved to deeply into either code base, but I believe what you'd mostly have to do, is when building the AST tree, use their AST classes instead on you own.

    Microsoft's team is largely working on the AST -> executable portion, so if we can interface with that, the long-time goal could be in sight.

    ReplyDelete
  2. Thanks for you feedback!
    I've looked at CCI before and talked to Herman V. at Lang.NET. One thing that scared me is the size of the whole thing - seems like a lot of stuff.
    What is not clear for me is the advantage of using CCI instead of just using IL code generator in core .NET framework - CCI appears to be a superset of this functionality. It looks like CCI covers some very advanced and complex cases of producing .NET assemblies and can be used in tools like IL rewriting for injecting things like aspect-oriented artifacts. I'm not sure most people need that level of granular control they provide, I think mostly basic facilities in .NET will be enough. Another thing that discouraged me is the fact (addmitted by Herman) that there is no any code analysis in CCI, so it is questionable for me the benefits of integrating with it. Finally, the AST tree appears to be geared heavily towards compilable static-typed languages like c#, while most folks are interested in dynamic languages.
    That's my impression... maybe I'm missing something

    ReplyDelete
  3. My biggest request right now would be to polish up the error messages reported or at least offer some routines or documentation that end users can use to clean them up. Currently, when a user of my language service has a syntax error in a pretty generic spot, they are shown an extremely long error message that gives them every possible terminal they could enter.

    Aside from that, I'd say focus on documenting what's in there for both end users and people who want to poke around the guts of what you have :)

    As for the VS Integration, in my current project I've had to bypass much of what is there (the line scanner hooks and the token editor stuff) mainly because most of my coloring is coming from a catalog database that was built from the entire parse tree.

    For simpler cases, I still think a separate integration project is in order. Perhaps a DSL-based project where the designer can build their grammar via a diagram in specify TokenEditorInfo through that (which will propagate to other classes).

    -Salec

    ReplyDelete
  4. Thanks for your suggestions. I will definitely look into improving error reporting by Scanner. Just to make sure you didn't miss it. There is a facility for collapsing the expected symbol lists currently. If you specify DisplayName property on NonTerminal, then parser will collapse all symbols/terms under it into this single name. For example:

    BinOp.Rule = Symbol("+") | "-" | "*" | "/";
    BinOp.DisplayName = "operator";

    Now if parser finds an unexpected symbol in place where it expects binary operator, it would not report the entire list of operator symbols but would say "Expected: operator" instead.
    Did you try to play with this?
    Roman

    ReplyDelete
  5. How your lexer will handle languages like F#? In F# whitespaces matter. Also how good your parser is in hanlding less than perfect code? I need a parser which can give me as much information about an incomplete class - something similar to what intellisense does

    ReplyDelete
  6. I did miss it! Thanks, this will be a huge help to simplifying these messages.

    Another feature that would be extremely useful when using the library as part of an editor would be better support for incremental parsing. This is kinda tricky, but it would be nice to not have to rebuild the entire parse tree if a small portion of the source changes. This would then allow me to do incremental updates to my catalog as opposed to rebuilding the entire thing.

    -Salec

    ReplyDelete
  7. About F# - good suggestion, will look at it, might be an interesting case and may uncover something. Generally Irony supports whitespace-sensitive languages, all you need to do is clear WhitespaceChars property in grammar.
    Less-than perfect code. It is handled by parser recovery. You can specify rules with Error terminal inside, and parser will try to recover using this points of recovery, and continue parsing after that. Try it with c# sample, create several syntax errors in source file and parse.
    Incremental parsing - that's a tough one. May I suggest an alternative approach? You keep "last good" parse tree, and wait until on-the-fly parsing succeeds while user types something. Then you run comparison procedure that compares last-good and current trees, detects differences in some smart way (maybe using current editing position to point to changes location), and then posts the delta to catalog. It seems it's better to catch the differences when the parsing tree is complete, rather than guess at parse level what we're currently changing.

    ReplyDelete
  8. About less then perfect code - If I understand it right you are saying that I need a C# grammar extended by error recovery rules for every case I want to recover from - wow. What I really need is a small subset of the grammar - only types, type members (names, signatures) and decorations (attributes) on them. I do not care about statements or expressions - but I do want to be able to extract the information I need no matter how many semicolons are missing. I understand that you can only go so far with this - a missing } can throw the parser off completely. I have done it in my designers using Devin Cooks parsing system (http://www.devincook.com/goldparser/) but his project seems somewhat stale and I am looking for an alternative. Are you it?

    ReplyDelete
  9. What do you mean by "wow"? - sounds too much? but in fact in c# there's just a single error rule:

    statement.ErrorRule = SyntaxError + ";";

    It tells parser to skip forward to the next semicolon when it encounters error, and restart parsing next statement. Not much at all.
    Roman

    ReplyDelete