There are a few aspects to compile server scalability that are important to address.
First, and most obviously, memory use. Because we want to be able to send real programs through the compile server, and because we want it to remain live for relatively long periods of time, it is important that memory use be “acceptably bounded”. Naturally, the server process will grow with each additional compilation unit. At least in the straightforward implementation, there’s no way around that (but see below). However, it is important that the server not leak memory, and that recompilations generally not increase memory use. Also, ideally, all that work on decl sharing will keep memory use in check.
For the most part, this did not take any effort to achieve. GCC has a built-in garbage collector, and most nontrivial data structures are allocated using the GC. This is not a silver bullet, of course, but it has yielded good results with little effort in practice.
In the case of recompilation, we employ a simple heuristic — we store all parsed hunks keyed off the name of the requested object file (note: not the input file; it is common for a project to compile a given source file multiple times, but it is rare to see the same object file name more than once). When recompiling an object, we assume that there will be a lot of reuse against the object’s previous version, so we store those hunks temporarily, but then discard the old ones at the end of compilation. This way, we reuse, but we can also free hunks which are no longer in use.
Results from a few tests are very encouraging here. I compiled gdb with the compile server, then deleted the object files and re-compiled. Memory use (as reported by
-fmem-report) stayed flat at around 51M — meaning that recompilation doesn’t grow the image, and the collection approach is working as desired.
I also built gdb using the compiler in “normal” mode, and looked at the
-fmem-report totals. If you sum them up, which I naively expect gives a rough idea of how much memory
--combine would use, you get 1.2G. Or, in other words, decl sharing appears to make a huge difference (I’m not completely confident in this particular number).
If memory use does become a problem for very large compiles, we could look at scaling another way: writing out hunks and reading them back in. Maybe we could use machinery from the LTO project to do this. This would only be useful if it is cheaper to read decls via LTO than it is to parse the source; if this is not cheaper then we could instead try to flush out (and force re-parsing of) objects which are rarely re-used. One special case of this is getting rid of non-inlineable function bodies — when we have incremental code-generation, we’ll never compile a function like that more than once anyway.
Another scalability question is how to exploit multiple processors, either multi-core machines, or compile farms. In an earlier post, I discussed making the compile server multi-threaded. However, that interacts poorly with our code generation approach (
fork and do the work in the child), so I am probably not going to pursue it. Instead, for the multi-core case, it looks straightforward to simply run multiple servers — in other words, you would just invoke “
gcc --server -j5“. Something similar can be done for compile farms.
An ideal result for this project would be for small changes to result in compilation times beneath your perceptual threshold. I doubt that is likely to happen, but the point is, the absolute turnaround time is important. (This is not really a question of scalability, but I felt like talking about it anyway.)
In the current code, though, we always run the preprocessor for any change. So, even once incremental code generation is implemented, the turnaround time will be bound by the time it takes to preprocess the source. This might turn out to be a problem.
In an earlier design (and in some other designs I have heard of), this is handled by making a model of compilation that includes preprocessing. That seems too complicated to me, though, and instead I think that it should be possible to also make an incremental preprocessor (say, one that uses
inotify to decide what work must be re-done), and then use it without excessive cooperation from the parser.