just the lexer, ma’am
Thursday, October 29th, 2009So in my compilers class I’m supposed to write a recursive descent parser for the WAE grammar. To remind you, this grammar looks like this:
WAEStart ::= WAE SEMI
WAE ::= num | {+ WAE WAE} | {- WAE WAE} | {with {id WAE} WAE} | id
I’ve created this in python, using the PLY parser. That code has scoping issues, and is available on google code. I’ve also built this using treetop, with some modifications on the unary minus (a new rule, WAE::= -WAE , and that’s available from google code, bitbucket, and github.
But I also wanted to try to build both a recursive descent parser and possibly a packrat parser by hand, and maybe using a combinator, so I’m working on another version of this grammar using rparsec. I chose rparsec because I needed a lexer with simple, easy to understand, useful output. Possibly, there are a dozen rubygems that solve this problem, but rparsec is the one in “Practical Ruby Programming”, so that’s the one I used.
I also wanted to test out the pygments tool, which is a syntax highlighter that converts code for display in various formats. So, using pygments, my lexer looks like this:
require 'rubygems'
require 'rparsec'
module WAEParser
include RParsec
extend RParsec::Parsers
Id = regexp(/[a-z]/)
Num = number.token(:number)
Reserved = Keywords.case_sensitive(%w{with exit})
Ops = Operators.new(%w{+ - \{ \} ;})
Lexer = longer(Id.token(:id) , Reserved.lexer) | Num | Ops.lexer
Lexeme = Lexer.lexeme << eof
end
The method Lexeme.parse(string) produces the array of tokens.