Saturday, April 21, 2012

Movimentum - ANTLR grammar for setup

I use ANTLR as a grammar tool. And I use .Net for development. In my old days, I keep to things I know.

Here is an ANTLR grammar for Movimentum. First, we need some ANTLR and .Net noise:

    grammar Movimentum;
   
    options {
        language=CSharp2;
    }
   
    @parser::header {
        using System.Collections.Generic;
    }
   
    @parser::namespace { Movimentum.Parser }
   
    @lexer::namespace  { Movimentum.Lexer }


Then, I like to start grammars top down:

    script
      : config
        objectdefinition*
        ( time
          constraint*
        )*
        EOF
      ;

    config
      : CONFIG '('
            NUMBER  // frames per time unit
        ',' unit    // angular unit
        ')'
      ;

At this time, we must start with the lexer definitions also:

    CONFIG : '.config';

    NUMBER  
      : ('0'..'9')+
        ( '.'
          ('0'..'9')*
        )?
        ( ('E'|'e')
          ('-')?
          ('0'..'9')+
        )?
      ;

And before we forget it, we add rules for whitespace and comments:

    WHITESPACE
      : ( '\t' | ' ' | '\r' | '\n' )+ { $channel = HIDDEN; }
      ;
   
    COMMENT
      : '/' '/' .* ( '\r' | '\n' )    { $channel = HIDDEN; }
      ;


The next important thing are the objectdefinitions:

    objectdefinition
      :  IDENT
        ':'
        source
        anchordefinition+
        ';'
      ;

We now need an IDENT in the lexer. For the moment, we restrict ourselves to ASCII letters in its definition:

    IDENT   
      : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
      ;

For the source, we currently define only a "filename" (I suspect I want plain texts later, and maybe even some standard forms like arrows and lines):

    source
      : FILENAME
      ;

and

    FILENAME
      : '\''
        (~('\''))*
        '\''
      ;
   
Finally, we need anchordefinitions. We allow only constant vectors in them:

    anchordefinition
      : IDENT
        '='
        ( constvector
        | IDENT '+' constvector
        | IDENT '-' constvector
        )
      ;
   
where constvector is defined as

    constvector
      : '['
        ('-')? constscalar
        ','
        ('-')? constscalar
        ']'
      ;

    constscalar
      : NUMBER
      | constvector X
      | constvector Y
      ;


This requires our first operator definitions in the lexer:

    X : '.x';
    Y : '.y';

This should suffice to define objects and their anchors for the moment. Time to set up a .Net project, write the setup part of our crank-slider mechanism into a file and check whether we can read it (after we comment out the time and constraint references in script)!

UPDATE: Setting up the project and writing the first test case revealed two errors in the grammar above. First, a semicolon is missing at the end of the config.rule - it must be:

    config
      : CONFIG '('
            NUMBER  // frames per time unit
        ',' unit    // angular unit
        ')'
        ';'
      ;
The second is an LL problem: The constscalar rule cannot decide whether to take the second or the third alternative. Therefore, the common prefix has to be factored out:

    constscalar
      : NUMBER
      | constvector

        ( X
        | Y
        )
      ;

In a test case, I can now parse the object definitions for the slider-crank mechanism flawlessly!

No comments:

Post a Comment