I’ve been digging through GCC’s C front end the last couple of weeks, trying to understand how it works and how feasible some parts of my project are.
The parser itself is nice. It is a hand-written recursive descent parser, which means that it is “just plain code” and thus simple to debug. It is written in a typical object-oriented style, where every function in the parser takes a “c_parser” object pointer as its first parameter.
Unfortunately the rest of the C front end is not so nice. There are many (around 200) global variables, and even though the parser itself is quite clean, it implicitly relies on these via functions elsewhere in the front end. This means that making the front end reentrant is going to be a fair amount of work. While much of this will be mechanical, there are some ugly things, like places where lang hooks (a sort of virtual function in the compiler) are designed in a way that implicitly requires a global variable.
I haven’t looked deeply into the C++ front end yet, but a quick look with nm shows that there are about 100 globals there as well.
Making the parser re-entrant is sort of a side plot for me: it will help with part of my project, but it isn’t on the critical path. Still, I’d like to see the C and C++ front ends eventually be written in “gcjx style”: thread-aware reusable libraries, written in an OO style, separated from the back ends.
So why work on this at all right now? I didn’t know the C front end at all, and I thought this would be a way to learn about it without getting bored just reading code. This plan is working pretty well. I’ve also spent a bit of time simply stepping through various parts of the compiler with gdb — another useful technique for learning a foreign code base.
Meanwhile I’ve been working on the design of the more crucial parts of the plan — namely, re-using parsed headers and providing incremental recompilation. My current hope is to have something ready for critique before the GCC Summit.
One Comment
Nice. I think you’ve picked a good entry point for this work.
From what I understand, the C FE is based on the C++ FE design by Mark. So, I would expect them to be pretty similar.
All this would probably be easier if there was some commonality, so you didn’t find yourself cleaning up the same global bits in the C FE, and then doing the same-or-just-slightly-different thing in the C++ FE.
However, that’s not really on your critical path either….