Programming: Naming
Good naming smells like good design
Naming is one of the most important aspects of programming. It is also one of the most overseen, neglected aspect of it (a.k.a. “I’ll choose a better name later”).
Maybe this is because there is a distorted conception of programming.
What is programming?
A common definition is speaking to the machine. This is misleading for a couple of reasons.
Your audience is not the machine
Unless you’re writing machine code alone, the recipients of your code are:
- a compiler that will translate your instructions into lower-level code. The aim of that compilation is to allow you to use higher-level abstractions (control flow statements, data structures and other APIs) of a higher-level language that will be more convenient to express what your application should do.
Compilers — and the machine itself — don’t need to understand the high-level meaning of what you are telling them. Even the most obscure and ill-designed code will execute smoothly, providing you comply with the expected syntax. - your colleagues, on the other hand, need to understand the meaning of your code to maintain it. Unlike your compiler, all of them are different in terms of experience, skills, culture. To understand each other, you’ll have to agree on using additional abstractions over the programming language, related to design (good technical practices) and the business concepts of your app.
- yourself, when you’ll look at it some months later. Even if you don’t care about your colleagues, you should at least care about the future you.
You’re not speaking
“Speaking”, the image of oral communication is also wrong, for many reasons.
First, talking to another human is a real-time dialog. In such a situation, you don’t have time to choose the most proper words and — to compensate that — people can provide you some immediate feedback about what you just said (if this was not clear enough, or even incorrect) so that you have the opportunity to clarify or correct it.
Programming, however, is just the opposite of that: you make the code work before getting feedback (much later, by QA or end users). Even if you get feedback quicker (from a peer review typically), this may be a loose/loose case:
- if the feedback is bad, the fact that the code “works” (while not being fully understandable), coupled to pressure for meeting expected delivery, might lead you to postpone the improvement of maintainability;
- even if the feedback is ok, that does not mean that someone — even you — will understand that code a few months later, when you’ll have forgotten its development context.
Writing for the unknown
So programming is not speaking. It is writing, while not knowing:
- what part of your code will be read: anything can be investigated, refactored, and you cannot bet on “nobody will try to understand that”.
- who will read it: it could be a junior programmer, somebody without required technical or business background.
- when it will be read: it could be anytime from the day it was written to decades later.
- why it will be read: it could be to fix a bug, refactor, build some new feature, add testing or documentation. Each of those may require different info.
So we have to keep this in mind when writing code. We care less about names and comments when we write them than when we read them. You’ll be proud to have written so fast a code that you will blame for poor readability months later. We have to fight against our natural tendency to focus more on the logic (i.e. “it works”) than the words to express it (maintainability).
What to name?
As we just said, programs are not directly aimed for computers but for humans first. They are written using languages featuring high-level instructions and APIs, which are actually abstractions with parameters: a for loop has a initializer, a running condition and an increment, a class has a name, fields and methods, etc.). The language allows you to give labels to those arguments, which can be:
- types if you use a typed language (or typing annotations) to check assignment compatibility. You can then create and name your own types.
- identifiers: a program variable or constant name is nothing more than a label on a memory place, functions names are just the same (but with the different purpose of executing that memory place, with arguments pushed on the stack), classes are functions groups, and attributes are classes variables.
- artifacts: files and directories are labels on disk addresses, as well as database identifiers (name, tables & columns, constraints, triggers, etc.) are labels on database file offsets.
- documentation: names are also used in documents. Here you might face the challenge of keeping names consistent code and/or translating implementation names into documentation names, through some dictionary.
How to name?
Take time
As we said above, naming is not a real-time interaction like speaking, and so you can take time to devise the best name for an entity. We all know that time is limited, that choosing a perfect name is hard, but this should be an honest “best effort” at first.
As the late Phil Karton (who architected products at Netscape) once said:
There are only two hard things in Computer Science: cache invalidation and naming things.
So, because your time is precious and not infinite, you should define priorities, choose your battles to impact with good naming where it matters the most: The more a component is likely to be used, the more you should be cautious when naming it.
Following this advice, special care on naming should be put on public APIs/SDKs. As soon as published, it could be so heavily used by people who 1) need to understand it and 2) will have a hard time changing it later. They won’t easily accept API changes just for the sake of naming. Avoiding the hassle of migrating an heavily-used API is worth the time spent to choose good names.
Add value
Developers in a hurry are reluctant to spent time devising good names. They sometime act like a compiler, caring about syntax only: all they want is to avoid collision with existing identifiers:
x = up()
y = c()
// You need to look for the assignments of x and y
// to understand what "t" is about.
t = x * y
While this can be understandable for code golfing and competitions where speed is important, this is not acceptable for code that will be read by others later, including yourself.
Some others pretend to do better by using first letter or abbreviations but this is actually not getting much better:
up = getUnitPrice()
c = getCount()
// Is "up" the opposite of "down" ?
// Is "c" the speed of light?
total = up * c
There are several reasons to avoid abbreviations:
- People inspecting your code (or yourself) will search text with the long names (“price”, “customer”, “people”, etc.), not abbreviations like “p”, “c” or “pp”. However you can there will be no harm shortening common technical terms like “value” as “val” or “count” as “cnt” since nobody will search for these.
- modern programs sizes are less dependent of source code size (even interpreted languages like JavaScript are usually minified today);
- modern IDEs’ autocompletion allows you to insert more explicit names easily, without typing them.
Such naming doesn’t carry more value than avoiding collision and compiling without error. Naming is an opportunity to add more. How? By stating something that cannot be guessed: semantics, what is meant.
State the what, not the how
The primary semantic information about a variable is not how it works (which would break encapsulation) but what it represents, as a concept. This will help to devise why it is there, and where it can help.
For instance, if you’re using Redis to implement a cache, name it cache
, not redis
or redisCache
, as the interesting information here is that the goal is to cache, not the product name to do so (which may change later, while remaining a cache
).
Indeed, a common mistake is to re-state what you already know, like the type of the variable, in various fashions:
- totally:
var string: String
. - partially:
function func(param1, param2)
- paraphrased:
keymap = dict()
. We already know that a dict is a key-value mapping. - added through prefixes or suffixes:
customer_obj = Customer()
,var v_name: string
,const c_max = 20
,function f_sayHello()
. This is typical from the era where IDEs were basic text editors without type detection, completion or syntax highlighting (which now highlight params and other identifiers differently). - technical or business:
customer: Customer
. That doesn’t help more.
By doing this, you miss the opportunity to give some semantics, and provide very little value to understand the rationale for those variables or functions. Naming it x
would have helped as much; that is: not a all.
There is also this less critical variant that re-state the type… as a suffix:
eventsArrayMap: Map<User, Event[]>
which requires to go to the definition to understand what’s in it. That’s an anti-pattern, since the very goal of good naming is to grasp such info in a blink, and avoid to navigate to such definition.
So we should avoid re-stating the technical type, and focus to the relevant info the name: key is a user, value is an event:
eventsByUser: Map<User, Event[]>
But that’s still some kind of types re-statement, isn’t it? You should rather ask yourself the question, as a reader: “what is eventsByUser
?”. “It’s an history of events per user,” may you answer as the author. That would give you the most proper information about it:
/**
* User histories
*/
eventsByUser: Map<User, Event[]>
Hey, isn’t there a more concise way of writing this? One that would not require to inspect that declaration? Sure there is:
userHistories: Map<User, Event[]>
Here you added the semantic info that those events are meant to be an history. You could even go further by encapsulating the Event[]
array in a clearer History
concept (but this goes beyond the topic of this article).
Be specific
Sometimes name devising leads to “easy” names that actually don’t say much about what the named entities are. This could be “does something” names like Handler
, Processor
, Manager
, neutral package names like common
, shared
or util
, or “build something” names like Builder
orCreator
. Thanks for letting us know that something is happening there, but we still don’t know what.
Such names are symptoms of a failure to find the specifics of an component. There could be several causes for this:
- you’ve been lazy, and didn’t bother to take the necessary time to devise a useful name.
- what you are trying to name does too different things and so is difficult to name simply. For instance, if some component loads and parse files, then uploads data somewhere, then analyze the data, it is likely that you’ll give it a vague name such as
Manager
orBuilder
, because anything more specific would be too specific and fail to grasp some important aspect of this complex component. This is a “smell” of bad design: not only this component should be the sum of smaller components with a clear and easily nameable responsibility, but the aggregation of these may not be the most consistent one if you cannot devise a name for it. Maybe this should be refactored a different set of components.
Note that it is fine to use those general terms as qualifiers (ModelBuilder
, RequestHandler
, EntityManager
... suffixes) though, as long as it forces you to state the responsibility of the component.
Format
Only a few restrictions are enforced by compilers regarding identifiers syntax. Unless you’re programming with languages from the fifties or sixties like COBOL or ForTran which used to restrict identifiers size because of the limited memory resources, lexers only keep enforcing one today: they cannot start with a number (and cannot contain delimiter characters such as ;
, :
, .
, -
, !
, ?
, and spaces, parentheses or brackets, but who does that?). I guess this will also go away with AI being able to distinguish, as we do, what is an identifier and what is not, no matter of the characters that composes them.
Aside this, only some “traditional” conventions apply, notably about casing:
camelCase
(includingPascalCase
) is named after switching upper and lower case in Java, JavaScript/TypeScript, Swift.hyphen-case
(or “dash” or “kebab” case) in tag languages such as XML or HTML (or languages whose identifiers are used in such languages, such as CSS) because they are usually case-insensitive (and soplaceHolder
would be confused withplaceholder
if using camelCase) and because those languages do not usually support arithmetic operators such as-
(this has raised new constraints about spaces around hyphens when performing subtractions in CSS’scalc()
).snake_case
is named after using a dash on the “ground” in languages like Python or Rust. They allow to mimic a space between words where they are not allowed.
None is really better than the other as they all imply the same number of key strokes (if we include [⇧Shift]). The lack of characters separators in camelCase has both benefits (shorter identifiers that can be selected at once) and drawbacks (case sensitivity, more difficult to distinguish and select words).
Avoid shadowing
Duplicate identifiers are forbidden by compilers, so you have no choice but to avoid them. They can be allowed, though, when in different scopes. The first case of this is nesting, when an identifier of a child scope hides a similar name from a parent scope:
unitPrice = getUnitPrice()
quantity = getCount()
function computeTotal (unitPrice, quantity) {
return unitPrice * quantity // Which ones?
}
total = computeTotal(unitPrice, quantity)
In the code above, local variables (parameters here) are shadowing variables using the same name in a parent scope. This should be avoided, as this may confuse the reader as to which identifier you are referring to. Shadowing can also occur in OOP, where parameters or local variables have the same name than an enclosing class member (field or method), or a subclass or anonymous class that defines a member that as the same name as its super class, enclosing class or even parent code block variable:
unitPrice = getUnitPrice()
quantity = getCount()
priceComputer = new class extends Computer { // Anonymous class
constructor(unitPrice, quantity, discount) {
this.unitPrice = unitPrice
this.quantity = quantity
this.discount = discount
}
get unitPrice() {
return this.unitPrice * (1 - this.discount)
}
compute() {
// Beware as there will be no error if you forget the "this." prefix here
return this.unitPrice * this.quantity // Which ones?
}
}(unitPrice, quantity, 0.1)
Avoid duplicates
Modern IDEs now do a quite good job in warning about shadowing between nested scopes, but not across parallel scopes (local variables of different functions, attributes of different classes, classes of different packages, files of different directories, columns of different tables, etc.) because they are inherently isolated one from each other.
Isolation is a good thing, as it allows you to freely choose both simple and meaningful names in a reusable component without caring about its usage context, but it should not be a pretext to a poor distinction of concepts. For instance, imagine your tech lead asking you to “update the note” in the code below:
class Block {
contents: string
note: string // Note is a text
}
class Note { // Note is a title with blocks
title: string
blocks: Block[]
}
A short discussion will help to remove any ambiguity in there, but it is even best to avoid them. Remove any ambiguity as soon as you can, but using even slighting different names. This will also help where searching for identifiers:
class Block {
contents: string
comment: string
}
class Note {
title: string
blocks: Block[]
}
Lesson learned: even worse than repeating yourself is to repeat yourself with a different meaning.
Be consistent
The opposite is true: same concepts should have similar names or, put it negatively, you should not use different names when referencing the same concepts.
To help you with that:
- Define some naming rules that will help avoiding formal variations. This can be about agreeing on prefixes, suffixes or words ordering: this could be the qualifier-first rule of the english language (
QuickSort
), or on the contrary some hierarchic naming scheme where the more specific occurs at the end (SortQuick
). You could also decide about some naming scheme for a family of types, like interfaces/protocols suffixed with “able” to denote a capability (as the famousSerializable
,Cloneable
interfaces that could apply to any Java object). Whatever you decide, always make sure to agree and comply with it, since consistency will help with readability. - Maintain a dictionary of business terms (applicative concepts) to be shared among the project team. Such a dictionary may not be limited to provide definitions for newcomers, but also translations between different languages (if required) and, if you can’t avoid it, between different “worlds” such as marketing and development (what you call a “web component” may be a “widget” for someone from another department, etc.).
Be shareable
Indeed, an important quality of a name is its ability to be quickly understood by any project member or stakeholder. Two important factors will make a name more shareable this way:
- once again, avoiding different names for the same thing, thus reducing ambiguity. When it is not possible (because marketing names are different from technical names, for instance), a well-maintained dictionary of concepts should help, especially for newcomers.
- Use canonical names whenever you can. Typically a component that follows a Design Pattern should be named after that pattern (
modelAdapter
, is better thanmodelNormalizer
,authenticationProxy
is better thanauthenticationInterceptor
,documentVisitor
better thandocumentExplorer
, etc.) so that people aware of such patterns immediately understand the role of a component.
Files and directories
Today code repositories are still stored on disk as a set of text files (this may sound obvious, but that’s more a legacy constraint than the most efficient design. IDEs (other than Visual Age which was— as Smalltalk itself — way ahead of its time in this regard) still need to build their own indexes from the parsing of those files.
If you follow the SRP you should have only one class per file. This will simplify:
- reading a class: each time you’ll open a file, you’ll see simpler code (one with only one concern) ;
- writing in a class: when two developers update different classes, you will be sure there is no file merge to do.
- loading a class: This will avoid the possibility of a something that depends is forced to load another class. For instance
A.b:B
tries to loadB
but the file where isB
also defines a superclass or interface ofA
.
Also, the file name should equal the class name: this will avoid the need to understand two names, and will ease refactoring tasks such as renaming (modern IDEs will be able to understand that the file name should be also renamed when you rename the class, and vice versa).
Note that this is even more important for interfaces definitions, which should never depend on implementations (so classes and interfaces should never be in the same file).
Be exact
Sometimes even the best intentions won’t result in proper naming. A name may follow all good practices but still be misleading as not reflecting the truth: you provided proper semantic information, but that information is false.
This could happen because the code has evolved since initial naming, and so the initial semantics don’t apply anymore. The refactoring failed to change the name accordingly.
For instance, you may name a variable defaultName
which may let the reader think that there are other names, whereas only that one exists (and so it should have been namedname
).
Don’t re-state context
A application of DRY in naming means that you should repeat the scope of the identifier:
class House {
houseDoor: Door // redundant
}
var house = new House()
var houseDoor = house.houseDoor
should rather be:
class House {
door: Door
}
var house = new House()
var houseDoor = house.door
Collections
Another common question is how to denote n-ary cardinality? A common usage is to simply pluralize a name, like:
class Selection {
targets: Target[]
}
While being elegant, this usage can also introduce ambiguities, because :
- it only differs by one character from the singular version, which is easy to forget or misunderstand when communication with colleagues ;
- the verb can have exactly the same form as the noun in english at the third person singular of the simple present tense. As a result,
mySelection.targets
could be interpreted as 3 different things: a collection of targets —for (i of targets)
, an action of targeting—obj.targets(x)
or a boolean getter —if x.targets(y)
. The same goes for a number of nouns/verbs likemodels/models()
,hits/hits()
,note/notes()
,documents/documents()
, etc.
For these reasons, it may be safer to be denote a collection in another way, more explicitly. For instance:
targetCollection: Collection<Target>
targetSequence: Target[] // Ordered
targetSet: Set<Target> // Unique
// Targets dictionary
targetByKey: {[key: string]: Target}
// The versions of a target
targetVersions: Version[]
// The versions of targets
targetsVersions: {[target]: Version[]}
Another reason for avoiding plurals is that they don’t always exist: some english nouns have no plural, like information
, info
, knowledge
, work
… and using them would add another ambiguity about cardinality. Conversely, some nouns have no singular form, like news
for instance.
So plurals, while being elegant and concise, induces risks both about ambiguity and naming rules consistency (i.e. plural when possible only). I cannot recommend to avoid it since I fail myself to avoid them, but at least be aware of those risks.
Beyond names
Sometimes name won’t be enough to convey all the required information about some abstraction, and the declared names should be supplemented with comments.
However note that this should be a last resort (i.e. you should always to find an uncommented name that convey required info first) and, as names themselves, comments should add value over names, not re-state them. If they don’t, just remove them.
Make sure, also, to use a comment format that can be interpreted as documentation by IDEs, such as Javadoc, JSDoc, Docstring, etc. especially when they help filling missing parts of a language (typing through comments in the untyped JavaScript world for instance).
Conclusion
Bad naming is one of the most common pitfalls in software development. It has a very significant impact of code readability and so maintainability. Beyond being a programming concern, as something that states the responsibility of components, it is closely related to design. It can even help to identify bad design such as SRP failures.
For these reasons, you should never miss an opportunity to improve a name: each time you‘ve having a hard time understanding code (including code you have written yourself), don’t hesitate to refactor through renaming.
To misname things is to contribute to the world’s miseries. — Albert Camus