An Overview of Parser Generator Tools for Java

Com S 440/540
Spring 2001
Dept. of Computer Science
Iowa State University
February 20, 2001

Curtis Clifton

 

An Overview of Parser Generator Tools for Java

Some Java-based Parser Generators

There are a number of parser generator tools available for Java. Table 1 gives a list of LALR(1) (i.e., bottom up) tools, while Table 2 gives a list of pred-LL(k) (i.e., top down) tools.

TABLE 1. LALR(1) Parser Generators

Tool

URL

CUP, Java-based Constructor of Useful Parsers

www.cs.princeton.edu/~appel/modern/java/CUP (see also www.cs.princeton.edu/~appel/modern/java/JLex)

jay, YACC for Java

www.inf.uos.de/bernd/jay

SableCC, The Sable Compiler Compiler

www.sablecc.org

TABLE 2. pred-LL(k) Parser Generators

Tool

URL

ANTLR, Another Tool for Language Recognition

www.antlr.org

JavaCC, Java Compiler Compiler

www.metamata.com/javacc

Choosing a Parser-Generator
Reasons for Choosing a pred-LL(k) Parser Generator

Many feel that pred-LL(k) grammars are easier to read than LR(1) grammars, especially after semantic actions are added.

EBNF

Can pass arguments down to subproductions and return results from subproductions

Can embed semantic actions within a production without weakening the parser-generator. This is a technical point.

More powerful when we consider language translating instead of just language recognition.

Easier to read generated code.

Extending the concepts learned in class to a different tool helps you learn more deeply.

Your TA is more familiar with these.

Reasons for Choosing an LALR(1) Parser Generator

More closely matches what will be covered in class.

Won't have to modify the given Trend grammar.

Greater variety of references available.

Your instructor is more familiar with these.

Choosing Among the Options

Look at

  • documentation
  • example code
  • support (e.g., newsgroups)
  • portability
A Few Examples
A Simple Grammar in EBNF

This grammar is taken from the CUP user manual.

<list> := ( <part> )+

<part> := <expr> ';'

<expr> := <expr> '+' <expr>
| <expr> '-' <expr>
| <expr> '*' <expr>
| <expr> '/' <expr>
| <expr> '%' <expr>
| '(' <expr> ')'
| '-' <expr>
| number

Translation into YACC

This YACC grammar has shift-reduce ambiguities. The order of the productions and YACC disambiguating rules make this work.

%{
#include <stdio.h>
#include <ctype.h>

int regs[26];

%}

%start list

%token DIGIT LETTER

%left '+' '-'
%left '*' '/' '%'
%left UMINUS /* supplies precedence of unary minus */

%% /* beginning of rules section */

list : /* empty */
| list part
| part
;

part : expr ';'
{ printf( "=%d\n", $1 ); }
;

expr : expr '+' expr
{ $$ = $1 + $3; }
| expr '-' expr
{ $$ = $1 - $3; }
| expr '*' expr
{ $$ = $1 * $3; }
| expr '/' expr
{ $$ = $1 / $3; }
| expr '%' expr
{ $$ = $1 % $3; }
| number
| '(' expr ')'
{ $$ = $2; }
| '-' expr %prec UMINUS
{ $$ = - $2; }
;

number : DIGIT
{ $$=$1; }
| number DIGIT
{ $$ = 10*$1 + $2;}
;

%% /* start of programs */

yylex() {

int c;

while( (c=getchar()) == ' ' || (c=='\n') ) {
/*skip blanks*/
}

if( islower( c ) ) {
yylval = c - 'a';
return( LETTER );
}

if( isdigit( c ) ) {
yylval = c - '0';
return( DIGIT );
}

return( c );
}

main() {
return( yyparse() );
}

yyerror(s) char *s; {
fprintf( stderr, "%s\n", s );
}
Translation into CUP

This is taken from the java_cup/simple_calc package of the CUP distribution. Recall that CUP is an LALR(1) parser generator.

// JavaCup specification for a simple expression evaluator (w/ actions)

package java_cup.simple_calc;

import java_cup.runtime.*;

/* Terminals (tokens returned by the scanner). */
terminal SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD;
terminal UMINUS, LPAREN, RPAREN;
terminal Integer NUMBER;

/* Non terminals */
non terminal Object expr_list, expr_part;
non terminal Integer expr;

/* Precedences */
precedence left PLUS, MINUS;
precedence left TIMES, DIVIDE, MOD;
precedence left UMINUS, LPAREN;

/* The grammar */
expr_list ::= expr_list expr_part
|
expr_part
;

expr_part ::= expr:e
{: System.out.println("= " + e); :}
SEMI
;

expr ::= expr:e1 PLUS expr:e2
{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}
|
expr:e1 MINUS expr:e2
{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}
|
expr:e1 TIMES expr:e2
{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}
|
expr:e1 DIVIDE expr:e2
{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}
|
expr:e1 MOD expr:e2
{: RESULT = new Integer(e1.intValue() % e2.intValue()); :}
|
NUMBER:n
{: RESULT = n; :}
|
MINUS expr:e
{: RESULT = new Integer(0 - e.intValue()); :}
%prec UMINUS
|
LPAREN expr:e RPAREN
{: RESULT = e; :}
;
Translation into ANTLR

Recall that ANTLR is a pred-LR(k) parser generator.

//------------------------------------------------------------
// PARSER
//------------------------------------------------------------

class MyParser extends Parser;
options {
k = 2;
}

list
:
( part )+
;

part
{ int res; }
:
res=expr SEMI { System.out.println( "=" + res ); }
;

expr returns [int res]
{
int right;
res = 0;
}
:
res=timesExpr
(
PLUS right=timesExpr { res += right; }
|
MINUS right=timesExpr { res -= right; }
)*
;

timesExpr returns [int res]
{
int right;
res = 0;
}
:
res=unaryExpr
(
STAR right=unaryExpr { res *= right; }
|
SLASH right=unaryExpr { res /= right; }
|
PERCENT right=unaryExpr { res %= right; }
)*
;

unaryExpr returns [int res]
{
int sign = 1;
res = 0;
}
:
( MINUS { sign = -1; } )?
n:NUMBER { res = sign * Integer.parseInt( n.getText() ); } ;

//------------------------------------------------------------
// LEXER
//------------------------------------------------------------

{ /* Lexer specific imports */
}

class MyLexer extends Lexer;
options {
k = 2;
charVocabulary = `\3'..'\377'; // just handle ASCII not Unicode
}

// custom members for the lexer class
{
}

WS :
(
` `
|
(
`\n'
|
`\r'
)
{ newline(); }
)+
{ $setType(Token.SKIP); }
;

NUMBER :
(`0'..'9')+
;

PLUS : `+' ;
MINUS : `-' ;
STAR : `*' ;
SLASH : `/' ;
PERCENT : `%' ;
SEMI : `;' ;
LPAREN : `(` ;
RPAREN : `)' ;
References

Here are some references in addition to the web pages for the various tools as listed in Table 1 and Table 2.

  1. Terence J. Parr and Russel W. Quong, LL and LR Translators Need k>1 Lookahead, ACM SIGPLAN Notices [31]2, February 1996.
  2. Terence John Parr. Obtaining Practical Variants of LL(k) and LR(k) for k>1 by Splitting the Atomic k-Tuple. PhD thesis, Purdue University, West Lafayette, Indiana, August 1993.
  3. S.C. Johnson. YACC --- yet another compiler compiler. Computing Science Tech. Rep. 32, Bell Laboratories, Murray Hill, NJ, 1978.