Recently, an acquaintance was complaining of a project they were working on that involved reading a mountain of “incidentally complex” code – code that wasn’t doing anything particularly interesting algorithmically, but that was nevertheless so hairy that it was difficult to understand the implications of adding even the small feature they wanted to add. After reading code non-stop for a week, they were tearing their hair out in frustration.

I can sympathize. As a researcher, I’m often faced with piles of undocumented, untested, poorly-engineered, “research-quality” code that I need to somehow come to understand well enough to modify. What I’ve found is that I can’t just sit there and read that kind of code for very long; I don’t learn anything, and I end up miserable. I need to dig in and start trying to refactor.

The idea is to pick some concrete task and attempt it. Depending on the code, some options (in order of how ambitious they are) might include:

  • changing so-called “magic numbers” to named constants;
  • removing dead code;
  • splitting uncomfortably long functions up into shorter ones; or
  • factoring repetitive things out into their own functions or interfaces.

When I’m at this stage, I may find that I don’t know enough about the code to really be able to refactor well. But the beauty of attempting to refactor is that even if I’m not able to make an improvement, I can still benefit just from attempting the improvement. I can make work-in-progress commits in a branch, and at the end of the day, if I think that my changes are an improvement, I can decide to clean up and keep those commits. If not, that’s okay; I still learned something about how the code is organized or what it does.

Whatever it is that I learn while traipsing through the code, I can document that knowledge in some way, if only by adding the occasional explanatory comment. And, if there are no tests (which, if it’s research code, there often aren’t), I can try to write them as I go along. Adding tests not only helps with my understanding of the code, it’ll also help me be more confident of the changes I make when I attempt the next round of refactoring.

All this might seem like a strange approach to learning one’s way around a project. After all, refactoring is supposed to improve the design of existing code – how can anyone be expected to improve code before they even know what it does? For me, the trick is to realize that at this stage, improving the code isn’t the point. Maybe I’ll end up making some improvements as a side effect, but even if I don’t, refactoring helps me. It increases my feeling of ownership over the code, which will help me stay motivated to continue working on it. Most importantly, the act of editing the code myself (if only just to rename things or move them around), rather than passively reading it, will help solidify things about the code in my memory and give me a more visceral sense of what it’s doing. That can only be good for my understanding of the code, even if I end up deciding not to keep the changes I made.

Thanks to Maggie Zhou, Julia Evans, Scott Feeney, Darius Bacon, and Dan Luu for discussing the ideas in this post with me. (None of them are the aforementioned acquaintance!)