It is a bit odd how primitive parsing support is in Emacs. It is one of those mysteries, like how window configuration and manipulation support can be so weak. Peculiar.
CEDET includes a parser generator, called wisent. That is long overdue… though even it is a bit odd, apparently preferring a yacc-ish input syntax. I don’t know about you, but when I have a lisp system just sitting there, I reflexively reach for sexp-based formats. Well, ok, it is a port of bison. But still.
I did a little parser hacking in gdb recently. In gdb, if you complete an expression involving a field lookup, it will currently print every matching symbol in your program — when all you really wanted was the completion of a field name. This is what I set out to fix.
My first idea was: hey, the parser knows what tokens are valid. I can just ask it! But, I don’t think there’s a way to do that with bison parsers. At least, no documented way — boo. And anyway, as it turns out, this is not what you want.
For instance, consider the simple case of “
p pointer->field“. This is syntactically valid as-is, so the parser would indicate that the desired completions are whatever can come next — say, an operator. But if the cursor is just after the “d”, you want to continue completing on the field name. So, you have to differentiate this case based on whitespace.
I ended up hacking the lexer as well as the parser. The lexer can now return a special
COMPLETE token, which it does depending on the previous tokens and the presence or absence of whitespace. I also added some new productions like:
expression: expression '.' name COMPLETE expression: expression '.' COMPLETE
From here it is pretty simple to solve the rest of the problem.
I don’t remember reading about this anywhere, but I’m sure it has been done before. I thought it was a pretty fun hack – I love problems that start with the user experience and end up someplace much deeper.