Writing Clean Code – Data Column | Institute for Advanced Analytics

What is clean code and why should we care about it? John F. Woods once said, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”¹

I think a better incentive is to code as if YOU will be the one maintaining your code. There’s nothing worse² than opening some code you wrote a year ago, only to start muttering multiple WTFs under your breath as you scroll through, attempting to decipher what seems like an alien language at this point. Writing clean code is about writing code that is both maintainable and readable, not just efficient.

¹ https://groups.google.com/g/comp.lang.c++/c/rYCO5yn4lXw/m/oITtSkZOtoUJ
² There’s actually lots of things that are worse, but we’ll pretend there aren’t for the sake of this blog post

Note: While the examples listed in this post are written in Python, the same concepts can be applied to other programming languages as well.

Create Better Variables

Variable names should be descriptive. A lot of times, when you’re looking at a tutorial for learning a new language, you’ll probably see something like the following as an example of a nested loop:

While this does do what we want it to do, we can definitely update it to be a bit more descriptive:

This is a rather simplistic example and may not be necessary, but even creating descriptive variable names in simple programs can help build good habits.

For variable names you aren’t going to change the values on, constants, use uppercase lettering. Constants are great for swapping out “magic numbers” — numbers in your code that don’t have any explained meaning. In the example below, we see a combination of poor variable naming and a magic number.

A better way to do this would be to update “12” to be a constant and make “h” more descriptive.

Python also has a feature called global variables. Generally, with rare exceptions, you want to avoid using global variables. Instead, pass variables to functions as arguments (more about this in the Embrace Less Is More section). Global variables are generally avoided because it can be difficult to track where they’re being updated and modified, leading to possible hidden side effects in functions, thereby decreasing maintainability and readability. Basically, global variables don’t fix issues in your code, they just obfuscate them.

Be DRY

I look at programming as a way to be as lazy as possible. It’s why we have functions and loops to run the same code repeatedly so that you Don’t Repeat Yourself. Why bother writing a chunk of code two or three times when I can write a function to do it and then call the function two or three times? Imagine if you have some code where you’re converting Fahrenheit to Celsius in multiple places. That might look something like this:

Sure, that doesn’t seem too bad, but what if you were doing it in a bunch of different places? That can get really repetitive, and it’s something we’re definitely trying to avoid. Instead, try something like this:

Now we can use the same function whenever we want to perform a conversion from Fahrenheit to Celsius.

Embrace Less is More

A lot of times in programming, less is more. Fewer comments don’t mean no comments; it means using them sparingly. Ideally, descriptive variable and function names will cut down on the total number of comments that you’ll have in your code. Unless your code is using lesser-known functions, comments should generally be reserved for WHY your code is doing what it’s doing, not WHAT it is doing. Maybe you’re updating some code that hasn’t been touched in a while, or maybe you’re adding code that looks out of place but is actually covering a weird edge case. That is the perfect time to add a comment on why you made the changes that you did.

The “less is more” mentality also applies to functions, specifically in two different ways that I’ll be discussing here.

The “less is more” mentality reduces the number of arguments that a function takes. Ideally, we don’t want to have more than three arguments for a given function, but that may not always be a rule that can be followed, in which case we want to minimize the number as much as possible. The example below shows a function that would take in a bunch of arguments and then add a new user to a database.

An easy way to fix this is by encapsulation. We can take those arguments and instead create a User class and Address class, thus minimizing the required arguments to two while still retaining the same level of information.

Functions should also be small and only have a single responsibility. This means that a function should only be doing one thing, not multiple.

As you can see, our function is actually performing two different jobs: pulling the data and converting one of the columns. Let’s fix that by splitting the function into two different functions.

By ensuring that our functions only have a single responsibility, we’ll naturally decrease the overall size of our functions, increasing readability.

Conclusion

I’d be a liar if I said I followed these rules 100% of the time, but they are still very useful in creating guidelines to follow as you continue your coding journey. There are also a lot of other good tips that I didn’t cover in this post, so I recommend taking the time to research the idea of “clean code” beyond this blog. A personal favorite of mine and a great resource for this is Clean Code by Robert C. Martin.

A great way to practice and learn is by doing code reviews. While they aren’t mandatory here at the Institute for Advanced Analytics, it is something that my practicum team decided to implement internally. By using Git for version control, our team creates a pull request each time we want to merge in new code to our main branch. Since we always have someone else look at the code before it gets merged in, we can reduce the amount of “dirty” code that’s allowed into our main code repository.

Whether you take the time to read some of the great literature on clean code or implement code reviews, by following these principles, your code should hopefully be more readable and maintainable, whether it’s by you or by a violent psychopath that knows where you live.

Columnist: Michael Long