Programming: Naming

Where are you talking from?

A tiger labelled “danger zebra”
This is what happens when you replace abstractions with descriptions. But you must make sure that “Snow tiger” is well defined in you project’s dictionary.

Naming is one of the most important aspects of programming. It is also one of the most overseen aspect of it (i.e. “let’s make it work with a ‘x’ variable, we’ll choose a better name later”). Maybe this is because there is a distorted conception of programming.

What is programming?

A common definition of programming is speaking to the machine. This is misleading for a couple of reasons.

Unless you’re writing machine code alone, the recipients of your code are:

  • a compiler that will translate your instructions into lower-level code. The aim of compilation here is to provide you some abstraction through a higher-level language that will be more convenient to express what your application should do.
    Compilers — and the machine itself — don’t need to understand any meaning of what you are telling them. Even the most obscure and ill-designed code will execute smoothly, providing you comply with an expected syntax.
  • your colleagues, who need to understand the meaning of your code in order to maintain it. Unlike your compiler, all of them are different in terms of experience, skills, culture. To understand each other, you’ll have to agree on using additional abstractions over the programming language, related to design (good technical practices) and the business concepts handled by your app.

In any case, a third recipient of your code is yourself, when you’ll look at it some months later. Even if you don’t care about your colleagues, you should at least care about the future you.

Using the image of oral communication is also wrong, for many reasons.

Speaking to another human is a real-time dialog, where you don’t have time to choose the most proper words and — to compensate that — people can provide you some immediate feedback about what you just said (if this was not clear enough, or even incorrect) so that you have the opportunity to clarify or correct it.

However, programming is just the opposite of that: you make your code work before getting feedback, and even if you get such feedback quickly (assuming that your development process includes peer reviews), this may be a loose/loose case:

  • if the feedback is bad, the fact that the code “works” (while not being fully understandable), coupled to the delivery time expectations, might lead you to postpone the improvement of maintainability ;
  • even if the feedback is ok, that does not mean that someone — even you — might understand that code a few months later, when you’ll have forgotten the current context.

So programming is not speaking. It is writing, while not knowing:

  • what part of your code will be read. So you cannot bet on “nobody will try to understand that”.
  • who will read it. It could be a junior programmer, somebody without some required technical or business background.
  • when it will be read. It could be anytime from the day it was written to decades later.
  • why it will be read. It could be for bug fixing, refactoring, new feature development, testing or documentation. Each of those can require different info.

So we have to keep this in mind when writing code. We care less about names and comments when we write them than when we read them. We have to fight against that natural tendency to focus more on the logic (“it works”) than the words to express it (maintainability).

What to name?

As we just said, programs are not directly aimed for computers but for humans first. Their high-level instructions are abstractions whose parameters can be labelled. Those parameters can be:

  • identifiers: a program variable or constant name is nothing more than a label on a memory place, functions names are just the same (but with the different purpose of executing that memory place, with arguments pushed on the stack), classes are functions groups, and attributes are classes variables.
  • artifacts: files and directories are labels on disk addresses, as well as database identifiers (name, tables & columns, constraints, triggers, etc.) are labels on database file offsets.
  • documentation: names are also used in documents.

How to name?

As we said above, naming is not a real-time interaction like speaking, and so you can take time to devise the best name for an entity. We all know that time is always limited, that choosing a perfect name is hard, but at least this should be a best effort.

In 2013 a discussion thread on Quora led to discover that naming things was the hardest task for half of programmers.

As the late Phil Karton (who architected products at Netscape) once said:

There are only two hard things in Computer Science: cache invalidation and naming things .

The more a component is likely to be used, the more you should be cautious when naming it. Once a public API is published, it could be so heavily used that you will have a hard time changing it later. Avoiding the hassle of migrating an heavily-used API is worth the time spent to choose good names.

Some (bad) developers seem to act like a compiler, as they only care about syntax. All they want is to avoid collision with existing identifiers:

x1 = getUnitPrice()
x2 = getCount()
// You’ll have to find the origin of x1 and x2 to
// understand what "total" is
total = x1 * x2

Some others pretend to do better by using first letter or abbreviations but this is not getting really better:

up = getUnitPrice()
c = getCount()
// Is "up" the opposite of "down" ?
// Is "c" the speed of light?
total = up * c

So how to provide added value? By stating something that cannot be guessed: its semantic meaning.

The primary semantic information about a variable is not how it works (which would break the encapsulation) but what it represents as a concept. This will help devise why it is there, and how it can help.

For instance, if you’re using Redis to implement a cache, name it cache, not redis or redisCache, as the interesting information is that it is a cache API, not that it uses Redis (and may use another solution later, while remaining a cache).

Indeed, a common mistake is to re-state what you already know, that is, the type of the variable, whether it is:

  • totally: var string: String
  • partially: function func(param1, param2)
  • paraphrased: keymap = dict(). We already know that a dict is a key-value mapping.
  • added through prefixes or suffixes: customer_obj = Customer(), var v_name: string, const c_max = 20, function f_sayHello()
  • technical or business: customer: Customer. That doesn’t help more.

By doing this, you miss the opportunity to give some semantics, and provide very little value to understand the rationale for those variables or functions. Naming it x would have helped as much, that is, none.

There is also an even-less critical variant that re-state the type… after the semantic information:

eventsArrayMap: Map<User, Event[]>

which requires to go to the definition to understand what’s in it. Whereas:

eventsByUser: Map<User, Event[]>

provides all info in the name: key is a user, value is an event.

It remains a re-stating of types, though. You should rather challenge yourself to answering the question: “what is eventsByUser?”. “It’s an history of events per user,” may you answer. You could add this useful description as a comment, but why not just renaming it:

userHistories: Map<User, Event[]>

Here you added the semantic info that those events are meant to be an history (and you could go further by encapsulating the Event[] array in a clearer History concept, but this goes beyond the topic of naming).

Sometimes name devising leads to “easy” names that actually don’t say much about what the named entities are. This could be “do something” names like Handler, Processor, Manager, neutral package names like common, shared or util, or “build something” names like Builder orCreator.

Such names are symptoms of a failure to find the specifics of an component. There could be several causes for this:

  • you’ve been lazy, and didn’t bother to take the necessary time to devise a useful name.
  • what you are trying to name does too different things which cannot be grasped in a single and simple concept. For instance, if some component loads and parse files, then uploads data somewhere, then analyze the data, it is likely that you’ll give it a vague name such as Manager or Builder, because anything more specific would be too specific and fail to grasp some aspect of it. This is the “smell” of a poor application of the SRP, and means that you should probably split that component into separate, more detailed and more identifiables parts.

Note that it is fine to use those general terms as qualifiers (ModelBuilder, RequestHandler, EntityManager, etc.) though, as long as it forces you to state the responsibility of the component.

Only a few restrictions are enforced by compilers regarding identifiers syntax. Unless you’re programming with languages from the fifties or sixties like COBOL or ForTran which used to limit identifiers size, actually only one usually remain: they cannot start with a number (and cannot contain delimiter characters such as ;, :, ., -, !, ?, and spaces, parentheses or brackets, but who does that?).

Aside this, only some “traditional” conventions apply, notably about casing:

  • camelCase (including PascalCase) is named after switching upper and lower case in Java, JavaScript/TypeScript, Swift.
  • hyphen-case (or “dash” case, or “kebab” case) in tag languages such as XML or HTML (or languages whose identifiers are used in such languages, such as CSS) because they are usually case-insensitive (and so placeHolder would be confused with placeholder if using camelCase) and because those languages do not usually support arithmetic operators such as - (this has raised new constraints about spaces around hyphens when performing subtractions in CSS’s calc()).
  • snake_case is named after using a dash on the “ground” in languages like Python or Rust. They allow to mimic a space between words where they are not allowed.

None is really better than the other as they all imply the same number of key strokes (if we include [Shift]). The lack of characters separators in camelCase has both benefits (shorter identifiers that can be selected at once) and drawbacks (more difficult to distinguish and select words).

Duplicate identifiers are forbidden by compilers, so you have no choice but to avoid them.

Compilers do allow duplicate names, though, as soon as they are not in the same scope. The first case of this is scope nesting, where identifiers in a child scope can hide similar names from a parent scope:

unitPrice = getUnitPrice()
quantity = getCount()
function computeTotal (unitPrice, quantity) {
return unitPrice * quantity // Which ones?
}
total = computeTotal(unitPrice, quantity)

In the code above, local variables are shadowing variables in the same name in a parent scope. This should be avoided, as this may confuse the reader as to which scope you are referring to. Shadowing can also occur in OOP, where parameters or local variables have the same name than an enclosing class member, or a subclass or anonymous class that defines a member that as the same name as its super class or enclosing class.

Modern IDEs now do a quite good job in warning about shadowing between nested scopes, but not across parallel scopes (local variables of different functions, attributes of different classes, classes of different packages, files of different directories, columns of different tables, etc.) because they are inherently isolated one from each other.

Isolation is a good thing, as it allows you to freely choose both simple and meaningful names in a reusable component without caring about its usage context, but it should not be a pretext to a poor distinction of concepts.

class Block { 
contents: string
note: string
}
class Note {
title: string
blocks: Block[]
}

Indeed, there is something worse than repeating yourself: it is repeating yourself with a different meaning. If your code use the same name for two different things, any reader will have a hard time understanding what you’re talking about, as well as finding one of the concepts (among the multiple similar names that will show up, only a subset will be what is looked for).

The opposite is true: same concepts should have similar names or, put it negatively, you shouldn’t use different names to denote the same concepts.

To help you with that:

  • Define some naming rules that will help to avoid formal variations. For instance prefixes, suffixes, or words ordering: as english is the almost-ubiquitous language used to name identifiers, a common practice is to follow the qualifier-first rule of the english language. For instance a quick sort will be named QuickSort and not SortQuick. You could also decide a name schema for a family of types, like interfaces/protocols suffixed with “able” to denote a capability (as the famous Serializable, Cloneable interfaces that could apply to any Java object). Whatever you decide, always make sure to comply with it, since consistency will help readability.
  • Maintain a dictionary of business terms (applicative concepts) to be shared among the project team. Such a dictionary may not be limited to provide definitions, but also translations between different worlds such as marketing and development.

A important quality of a name is its ability to be quickly understood by any project member. Two important factors will make a name more shareable this way:

  • once again, avoiding multiple names for one single thing, thus reducing ambiguity. When it is not possible (because marketing names are different from technical names, for instance), a well-maintained dictionary of concepts should help, especially for newcomers.
  • Use canonical names whenever you can. Typically a component that follows a Design Pattern should be named after that pattern (modelAdapter, is better than modelNormalizer, authenticationProxy is better than authenticationInterceptor, documentVisitor better than documentExplorer, etc.) so that people aware of such patterns immediately understand the role of a component.

Today code repositories are still stored on disk as a set of text files (this may sound obvious, but that’s more a legacy constraint than the most efficient design. IDEs (other than Visual Age which was— as Smalltalk itself — way ahead of its time in this regard) still need to build their own indexes from the parsing of those files.

If you follow the SRP you should have only one class per file. This will simplify:

  • reading a class: each time you’ll open a file, you’ll see simpler code (one with only one concern) ;
  • writing in a class: when two developers update different classes, you will be sure there is no file merge to do.
  • loading a class: This will avoid the possibility of a something that depends is forced to load another class. For instance A.b:B tries to load B but the file where is B also defines a superclass or interface of A.

Also, the file name should equal the class name: this will avoid the need to understand two names, and will ease refactoring tasks such as renaming (modern IDEs will be able to understand that the file name should be also renamed when you rename the class, and vice versa).

Note that this is even more important for interfaces definitions, which should never depend on implementations (so classes and interfaces should never be in the same file).

Sometimes even the best intentions won’t result in proper naming. A name may follow all good practices but still be misleading as not reflecting the truth: you provided proper semantic information, but that information is false.

This could happen because the code has evolved since initial naming, and so the initial semantics don’t apply anymore. The refactoring failed to change the name accordingly.

For instance, you may name a variable defaultName which may let the reader think that there are other names, whereas only that one exists (and so it should have been namedname).

A application of DRY in naming means that you should repeat the scope of the identifier:

class House {
houseDoor: Door // redundant
}
var house = new House()
var houseDoor = house.houseDoor

should rather be:

class House {
door: Door
}
var house = new House()
var houseDoor = house.door

Another common question is how to denote n-ary cardinality? A common usage is to simply pluralize a name, like:

class Selection {
targets: Target[]
}

While being elegant, this usage can also introduce ambiguities, because :

  • it only differs by one character from the singular version, which is easy to forget or misunderstand when communication with colleagues ;
  • the verb can have exactly the same form as the noun in english at the third person singular of the simple present tense. As a result, mySelection.targets could be interpreted as 3 different things: a collection of targetsfor (i of targets), an action of targeting—obj.targets(x) or a boolean getterif x.targets(y). The same goes for a number of nouns/verbs like models/models(), hits/hits(), note/notes(), documents/documents(), etc.

For these reasons, it may be safer to be denote a collection in another way, more explicitly. For instance:

targetCollection: Collection<Target>
targetSequence: Target[] // Ordered
targetSet: Set<Target> // Unique
// Targets dictionary
targetByKey: {[key: string]: Target}
// The versions of a target
targetVersions: Version[]
// The versions of targets
targetsVersions: {[target]: Version[]}

Another reason for avoiding plurals is that they don’t always exist: some english nouns have no plural, like information, info, knowledge, work… and using them would add another ambiguity about cardinality. Conversely, some nouns have no singular form, like newsfor instance.

So plurals, while being elegant and concise, induces risks both about ambiguity and naming rules consistency (i.e. plural when possible only). I cannot recommend to avoid it since I fail myself to avoid them, but at least be aware of those risks.

Beyond names

Sometimes name won’t be enough to convey all the required information about some abstraction, and the declared names should be supplemented with comments.

However note that this should be a last resort (i.e. you should always to find an uncommented name that convey required info first) and, as names themselves, comments should add value over names, not re-state them. If they don’t, just remove them.

If comments re-state the name, rewrite or remove the comment.

Make sure, also, to use a comment format that can be interpreted as documentation by IDEs, such as Javadoc, JSDoc, Docstring, etc. especially when they help filling missing parts of a language (typing through comments in the untyped JavaScript world for instance).

Conclusion

Bad naming is one of the most common pitfalls in software development. It has a very significant impact of code readability and so maintainability. Beyond being a programming concern, as something that states the responsibility of components, it is closely related to design. It can even help to identify bad design such as SRP failures.

For these reasons, you should never miss an opportunity to improve a name: each time you‘ve having a hard time understanding code (including code you have written yourself), don’t hesitate to refactor through renaming.

Software engineer for three decades, I would like to share my memory. https://javarome.com