When you follow good rules, code generation rules.

I recently read a post by Justin Etheredge on why code generation should be a last resort for programmers. This had me saddened as I’ve always been an advocate of writing your own code generators. In fact, it’s the basis of X2O and many other popular frameworks.

I’ve had a 99.9% positive experience generating code. If you nurture your own generated frameworks properly, it’s the equivalent of hiring a team of developers do the dirty work for you, with the added benefit of the following:

  • Code generators don’t get lazy
  • They don’t get bored
  • They don’t forget what you told them
  • They don’t ask for money
  • They are really fast.

Every tiny little repeatable process that I can teach a human being to do, I can write a piece of code for 1,000,000 times better. Once you’ve “taught” a generator how to properly write something given a set of inputs, you can expect the same fast, predictable results over and over again. If a piece of generated code is buggy, you’ll notice it immediately. It will be replicated in exactly the same way with the same set of circumstances each time. Even buggy code gets predictable. And, once you’ve perfected a piece of generated code, it’s there for you forever – no re-teaching necessary.

Still, there will be naysayers. Here’s three rules that keep me on the code generation bandwagon:

No one said you have to generate bad code.
There’s no rule that says your code generator has to create reams of unwieldy code. If your generated classes have duplicate functions or common methods, then refactor your code generator. Write the duplicate functions into a stand-alone class that lives outside the generator. Write an interface and let each generated class implement it.

The stigma of poorly-written generated code is mainly due to the fact that it’s just easier to generate crappy code. Coding by hand usually follows two steps: (1) Make it work, (2) Refactor it so it’s easier to maintain. When a generator can spit out usable but inelegant code that works, it’s hard to motivate yourself to refactor your code generator because, typically, people don’t have to ever maintain that code.

Justin Etheredge says “DRY is important though, and code generation can be the ultimate DRY violator if you aren’t using it for the right kind of code.”

I disagree. There is nothing that says your code generator can’t be DRY. If it’s repeating the same code 30 times in 30 different classes (say, to open a .NET SqlConnection, run a SqlCommand, read through the results set in a SqlDataReader, and close all connections at the end), then refactor it. Figure out where the common bits are and push them into a common class. Then modify your generator to call the common class.

Want to stay DRY and have a tidy code base? Just resist the temptation to take shortcuts in your outputted code.

Generate only what can be regenerated.
I’m a big believer in this. If you cannot re-generate your code based on new inputs (new tables in a database, new parameters in a metadata file, etc.), don’t generate it in the first place. Code generators that are “one-time only” rarely withstand the test of time. Inevitably, something will change in your process or requirements that will force changes to what was once generated.

If you’re doing “one-time only” generations, you will eventually have to dig into code you didn’t write directly. And, if you’ve failed to write it neatly in the first place, then, yes – perhaps writing it with your own fingers directly was the right way to go.

Writing a code generator forces you to think.
More than anything else, writing your own code generator really makes you think about your development process. That’s a healthy practice for programmers. It requires you to figure out what parts of your development process are trivial enough (or repeatable enough) to be good candidates for code generation.

Justin says, rightfully, that generated code is inflexible. If you want to tweak it, you really can’t. If you want to augment it, you’re relegated to partial classes or abusing mis-intentioned patterns just to get around the rigidity of generated code.

But, this is where you really have to consider the benefits of code generation. If you’re outputted code requires too many inputs to generate or requires too many hacks to use, you probably shouldn’t be generating that bit in the first place.

For example, X2O generates a content management system by examining a database. There’s a one-to-one correspondence between add/edit/listing pages and tables. There’s a one-to-one correspondence between database field types and input forms. There’s a one-to-one correspondence between required foreign key constraints and required dropdown boxes in an add or edit screen. There’s validation based on meta types. There’s pagination and eventually there may be search forms. You can even customize the navigation and request records be sorted by a particular attribute.

All these rules and conventions work across any database-driven CMS we’ve built. We’ve gone pretty far on the generation-end because there aren’t too many “one-off” scenarios for us. When there are, we augment and tweak as necessary. But, weighing the benefit of how much the generator did for us with the drawback of custom “one-off” tweaking is like considering rebuilding your own house because you don’t like the light fixtures. So long as you’ve thought through what really should be generated, there’s just too much good stuff that can come out of it to let the little nuances bother you.

3 responses to “When you follow good rules, code generation rules.

  1. I’m also a “believer” in code-generation. For instance, I cannot imagine myself writing “create table” scripts anymore, it is much easier to just draw a (UML or ER) class diagram.

    But I also think we have to be careful not to push code-generation beyond what is reasonable. Approches for Executable UML allow you to define the behaviour of your class methods using a pseudo-code (that then it is translated into Java code). Is that useful? Why is that better than directly using a programming language? This has to be further investigated

  2. @Jordi-

    I completely agree…there is a limit to code-generation. The “worst-case” scenario is just that. If what you’re generating is so input-driven that you’re just writing in one format to output another, there’s little benefit and huge maintainability issues.

  3. All I have to do is imagine the pace at We Are Mammoth sans X2O. Probably 20-30% longer hours and certainly 20-30% more blown deadlines.

    Codee-goodness aside, the biggest improvements we’ve seen has nothing to do with code. It’s culture. By isolating all of the repeatable and common features of data driven applications and leaving it to gen, we spend that much more time building a smarter front-end, or writing, or god forbid, simply going home!

    Oh, and it cuts down on heartburn from hit’n’run code cowboys.

    Let it be inflexible! That’s the friggin’ point.

Comments are closed.