Project: Testing

The real way to sign a contract

Jérôme Beau
18 min readSep 20, 2024
“Mock” everything… but your test target of course.

I once had a job interview where I got my turn asking questions: I asked about the tests coverage of the company’s product, and they replied to me:

We don’t have tests.

Seeing the stupefaction on my face, they quickly added:

…because we don’t have bugs!

It could be have been — moderately — funny if it was a joke, but it wasn’t. Surprisingly, they seemed to be both confident and proud about it.

Those “professionals” with a “testing is doubting” tatoo are anything but professional. Even excellent developers will not be the only developers maintaining a code.

Why testing?

Of course they had bugs; any software has. They just didn’t know it, because they didn’t have tests. Unsurprisingly, when I got the job and added tests, the unsuspected bugs showed up.

Spotting bugs

Obviously this is the first goal of tests, or at least the first that comes to your mind. But there’s a lot more beyond that:

  • fixing them is just part of the story. Tests would allow you to quickly reproduce and isolate the bug. Once fixed, this would become a Non-Regression Test (NRT) that will make sure it never occurs again (in production).
  • non-fixed bugs may exist for some time (because they’re minor or rarely occurring, or require some heavy refactoring) but at least once spotted we know they are there, and where. We are now able to plan the fix, and anticipate responses to bug reports in the interim. It is just a matter of time for the team to work on the tickets that will fix them;
  • other tasks can now be reconsidered and refined in the light of the spotted bugs. Aside planning, you learned from them what you should avoid in the future (in terms of implementation, design, flexibility, security... but also testing).

Improve your design

Hopefully you don’t write tests when bugs are spotted only; and this may be where the most value stems from tests. Because writing them:

  • put yourself in the user’s shoes: this is a opportunity to check that your API is understandable/well documented, usable and practicable;
  • reveal design flaws and weaknesses that you would not have spotted if you’d integrated the component in the codebase right away. For instance, you’ll realize how many dependencies requires, and find out that not all of them should be required in isolation. As your test is growing in complexity, you may also realize that the component being tested is doing too many things. Overall, most of the time, testing will result in a simplification of your component.

All in all, if you can write tests easily, your design is probably good. However:

Performance

It’s unfortunately easy to write code that dramatically degrades your performance… until you add performance tests. Should you run only unit or end-to-end (e2e) tests that execute scenarios only one time, you probably won’t detect the doubling of your response time from 200 to 400 ms. As a result, you’re putting yourself at risk of discovering the problem in production, when thousands of users will do that test for you.

So don’t forget performance tests, and make sure they perform scenarios with your maximum concurrent users. If your design requires it, plan a warmup/ramp-up phase.

So do you and the other QA engineers have any fun plans for the holiday? — Yeah, assuming the scheduling system doesn’t crash. (Software testing day is a holiday celebrated every -1 years on January 0th at 25:71PM
Bad testing can even impact testers, by Randal Munroe (XKCD)

What to test?

The contract

What you want to test is if the system is behaving as expected. That is, if some given inputs (explicit or implicit/contextual) yield expected outputs/behavior. This has nothing to do with how those outputs are computed and produced. You want to always test a black box, because implementation can change for several reasons (bug fixing, technology choices) without affecting the result.

What you aim to test is not the “how” but the “what”: an I/O contract represented by an API (whether local or remote).

The case for coverage

At first the question of how much your code should be tested sounds silly: why wouldn’t you aim to test 100% of your code? There’s actually real-world reasons for that:

  • productivity: writing tests take time, so you have to ponder this with the consequences of not testing some code: Is it likely to fail? What would be the commercial impact? How quick would a problem be fixed in that area?
  • agility: By definition, tests depends on code. This means that refactoring your code will likely imply refactoring your tests. Because passing those tests are required to pass to produce builds, is this “friction to change” worth adding at the moment?

Don’t take the above reasons as pretexts to not writing tests, but rather as constraints for the journey that aims to a maximum coverage. Coverage should be dynamic, a goal that depends on the maturity of your codebase:

  1. If you’re starting a new code base, it’s expected to have, if no tests at all, only a few ones;
  2. As your codebase grows, some parts will become mature/stable, and you should add tests for them so that this stability does not degrade. Other, more recent parts will still be changing frequently and should not be tested too early. Of course adding new code will decrease the overall code coverage ratio.
  3. If you’ve released a first version and planning others, you could target a minimum of 60% coverage, but less than 80% could be ok to keep a decent ratio of flexibility over quality. This will allow you to embraces expected changes without impairing productivity;
  4. If you’re app is not expected to change much in the future, you should aim for a 90% coverage minimum.

So tests coverage is always a tradeoff between quality and productivity+flexibility: the more you write tests, the higher quality you get, but also the less productive and agile you are, because of the time writing and refactoring tests.

On the other hand, quality through test coverage is an investment to allow you to:

  • make quick and serene deployments;
  • refactor more confidently: if your — possibly refactored — tests pass, you can be confident your refactoring didn’t broke things;
  • spot new bugs: the more you cover, the more you find unsuspected bugs.
TDD can bring you 100% coverage by implementing only what’s needed

Scope and axis

The debate about whether to use unit tests, integration or “end-to-end” tests is often thought as a “scope” issue: is it better to tests small components or a whole scenario using them?

Of course, you don’t have to chose one over another, but you have to start somewhere. The question may be reified as “what should we prioritize?”, since all of these strategies have benefits and drawbacks:

  • Testing a large (e2e) scope does not imply to test dependent smaller components properly: Actually this is usually the opposite. Think of test-driving a car: a driving session might give you (at your own risks!) an idea of the quality of the car’s components, but is it enough to thoroughly test tires, the trunk or headlights? No; because your driving test used only a subset of the possible test cases for each of those components: you didn’t drive with both sunny and rainy weather at the same time, nor by daylight and night. You didn’t even test all the combinative of them. So you got it: the larger is the test scope, the larger is the combinative to cover all possible paths in it. It is also quite difficult to reproduce some situations through a e2e scenario (while it is easy to mock in unit tests), not speaking of the execution time of a combinative of e2e scenarios.
  • unit tests benefits are not only about testing but also about usually lead to design improvement of the tested components, as stated before.

There are different levels to test:

  • unitary (object/SRP level)
  • integration (multiple components assembled)
  • end-to-end (E2E) from a user to actual data.
Mike Cohn’s “test pyramid” reminds you that you should write tests of different granularity, but more of the finest granularity. But you shouldn’t start building the top without the foundations.

Resorting to E2E/UI tests only is also too often a pretext to avoid unit testing (which requires writing more code) and so to avoid the refactoring components to be testable. If your code is an ugly mess of spaghetti code, yes, the only way to test it would be end-to-end. That should be a smell to you.

In any case, beware of not crossing your system boundaries. Don’t waste time and effort testing writes in a database, for instance. The database is not your system, and be assured that the database provider has already tested his product for you.

Normal behavior

Sometimes quick & dirty fixes can bring support for edge cases that were not expected/tested before, but lead to regressions on “normal” cases. So make sure to check normal behaviors in your tests, aside error and edge cases.

Range of values

Variables, even typed, can hold many values. This explains why even a “100% test coverage” might not cover 100% of possibilities: you can go through all the code paths, but not will all possible values as parameters.

But this usually makes sense. Most of the time you won’t test for all string values for instance, or all possible numbers. Because you’ll know that values “A” and “B” will have no different impact.

It’s worth testing values boundaries however. This may be about crossing a type range (min/max, negative or not), handling special values (the “nullish” or “falsy” ones) to make sure they are properly handled (through proper error handling typically).

Too many cases

Once in a while I’ve seen the “too big combinative” problem handled as a “best effort” using random values: you don’t want to test all values from 0 to 100K (if only because of the execution time) so you’ll test only 1K among them. And because testing only and always 0..999 would not cover all the possibilities, you make sure that the tested 1K values sample will change every time you run the test. Or this can be because of a more complex and huge combinative of possible values.

This has a number of drawbacks:

  • You don’t know what you’re testing: Sometime it will be some values, sometime others. As a result, the same test may pass once, then fail on the next run (because using different test values). You should rather characterize the passing and non-passing values, and test them both, but once.
  • On the long run, such tests won’t be trusted: if you repeatedly see a test passing and failing alternatively, you will end up ignoring its failures, assuming it’s more “bad luck” in an badly-designed test than a real problem to fix.
  • It will slow down the CI process because even reducing the scope of a huge combinative remains a significant number of attempts/values.

It’s better to defined a fixed combinative: at least you known what you’re testing. And actually you don’t need to test all the combinative values: just test normal, edge and error cases.

User Interface

Most often “UI” tests are not testing the User Interface: they are actually testing the app use cases, i.e. they are not checking that a button’s margin or color is correct, but that a click on this button is enabled and triggers some page change as expected. As such, “UI” test actually are usually more end-to-end tests.

However, they suffer a number a drawbacks:

  1. They are usually brittle, depending on location/resolution and response time, thus leading to false negatives and false positives.
  2. During project development, usually UI is changing too fast for UI tests to cope, thus leading to a high maintenance cost/time. You can mitigate this by using semantic selectors (unique #id or .class instead of structure/path), though, instead of text (which will vary at development but also run time because of internationalization).
  3. Web & testing frameworks you use may not be well suited together (because one expects the other behave some way), leading in tricky test code and brittle tests (Selenium with newer React based inputs such as text dropdown w/search for instance).
  4. They take longer to write and execute than other types of tests.

Pure UI tests, however — checking visual features — could be useful to prevent visual regressions, but I’ve never seen them effectively adopted.

Not private stuff

As said earlier, you don’t want to test implementation, and private methods are part of it. If you nevertheless feel that you should, it probably means that your component should be split. For instance if you have this class with a private method:

class ToBeTested {

publicMethod() {
// ...
privateMethod() // Call private method
// ...
}

#privateMethod() {
// ...
}
}

And want to test #privateMethod() it rather means that you should either:

  • test it indirectly through the testing of publicMethod() (but this will test more than you expect)
  • refactor it to be testable. That does not mean to change the method’s visibility to public but rather to delegate to a testable component, like below:
class ToBeTested {

// The component instance remains private
private subComponent = new SubComponent()

publicMethod() {
subComponent.subPrivate()
}
}

class SubComponent {
subPrivate() { // public
// ...
}
}

How to test?

You need to make sure that you test one feature at a time, and you want to be notified as soon as any problem occurs.
— Tim Mackinnon, Steve Freeman, Philip Craig: Endo-Testing: Unit Testing with Mock Objects

Expectations independence

Sometimes what you want to check is readily available from the tested object. For instance, take this class:

class FullName {

constructor(...names) {
this.name = this.buildFullName(names)
}

buildFullName(firstName, lastName) {
return names.join(" ")
}
}

And you test goes like this:

const toBeTested = new FullName("John", "Doe") 
assert.equals(toBeTested.name, "John Doe")

Right. But then you might be tempted to avoid redundancy and improve consistency by writing instead:

const names = "John", "Doe"
const toBeTested = new FullName(names)
assert.equals(toBeTested.name, toBeTested.buildFullName(names)

This would be a great mistake, because:

  • you are testing more than one thing: the constructor and a public call to a method. You are assuming here the implementation of the constructor, which breaks encapsulation (maybe it will evolve as doing more/less/different things) whereas a test should only check public contract.
  • You are not testing the component, but asking it to test itself: It may have internal consistency, but that consistent behavior may not be what is expected from the outside. It’s like asking a potential liar if he’s telling the truth.

Don’t twist your code for tests

We talked about the influence of testing on design, and we saw that there can be a fine line between “improving my design thanks to tests” and “twisting my design for better testing”. Indeed, among the worsts things to do would is changing your code because of test constraints 💀. For instance:

  • adding if (testMode) doSomething() else doSomethingElse() in regular runtime code. Note that this is different from instantiating different objects depending on the environment, using a provider/factory pattern, which makes sense to build an app foundation that won’t change;
  • changing members visibility (like, from private to public) so ease test checks.
  • adding test-specific data in UI to help your end-to-end test framework finding them (yes, I saw what you did there 👀).

“Mocking”

Most of the time the software components you test require dependencies to work. This is a problem because:

  • it may create side effects: Not only you should test a single thing, but you should make sure that you are not testing something else: if a dependency is not mocked, your test might fail or (luckily) succeed because of a bug in that dependency, and you will hardly know.
  • it’s not practical: If you provide a real instance of dependency A, you will have to provide dependency B that is required by such a real instance. Then maybe dependency C which is required by B, etc. In the end you might have to instantiate the whole app to unit-test a component, which is a nonsense for a unit test.

For these reasons, you’ll want to test in isolation, that is, provide dependencies which are not the actual dependencies.

You don’t want to instantiate a whole system to unit test a single object. You only need to provide direct dependencies “doubles”.

You may think “mock” at this time, but you should choose your words wisely.

“The word “mock” is sometimes used in an informal way to refer to the whole family of objects that are used in tests.”
— Robert C. Martin: “The Little Mocker

That whole family of “test doubles” goes like this:

The test doubles family
  • Dummy objects are required for the test to run but actually not used in the test scenario. Note that they may be hints of a poor cohesion, that is, that your tested composed doesn’t always use its full state and may be split to several components according to the SRP.
  • Fake objects are fully working implementations of the whole contract, but in a way that ease testing and that will not be used in production. This can be an in-memory database or server for instance.
  • Stubs are responsible for providing data/behavior that is specific to the test scenario.
  • Spies are stubs that record information (usually in memory) when they are called, so that the test can check them afterward to see if everything operated as expected.
  • Mocks are different from all the others since they are responsible for checking an expected behavior. Because of this, they are very coupled with the tested object.

As you can see, there is a line of specialization here: a mock can be viewed as a kind of spy, a spy a kind of stub, and a stub a kind of dummy.

Last but not least: of course, don’t “mock” what you’re testing or just an interface with a fake implementation (this sounds silly, but I’ve seen it!)

We need to talk about mocks

Mocking, in the broad sense, is a good practice when testing. But it’s even better if you don’t need them at all.

A reason for this is that the need to replace a dependency for testing can be a smell of bad design. Because it means that the business code you test depends on a component that it shouldn’t depend on. For instance, you might ask a Account to save itself, but want to mock the storage it uses to saves itself. But the Account should not be coupled with a storage in the first place.

What’s the alternatives, then?

  • Refactoring to review your design with less/no more coupling. In our example, a storage service should be able to save a dependency-free Account.
  • Service virtualization: Instead of mocking/faking to replace a remote party, you could use imposters from mountebank.

Context independence

Developers are lazy, and sometimes it seems convenient for a test to rely on some context built by a previous test. For instance, some mocks could hold a state a be shared by multiple tests.

Don’t do that, because it would mean that:

  • your test description is misleading (it’s not test B but actually test B after test A);
  • you won’t be able to execute your test individually;
  • failure of a dependent test will make your test fail (i.e. if test A fails, it will make test B fail).

Mocks should be instantiated for each test. beforeEach() callbacks are for such purposes.

Cleanup

Every test has a context. This can (ideally) be a set of local variables, or some data or “fixtures” shared by a test suite. In any case, you should:

  • Define test/test suite names (instead of the same “undefined” for all unnamed test suites) to be shown in your tests results;
  • Make sure to have this context freed/cleaned up when your test ends/tears down, either by explicit release of implicit block/closure termination.

Avoid redundancy

It’s quite easy to test the same thing in different tests. This may be justified if the context (i.e. the set of input data) changes, but not otherwise: test run time should me minimized because, the longer the test suite, the less it will be run.

Don’t test other’s code

A common mistake when testing is to test outside your system: there is no point in checking the content behind an external hyperlink or reading a record to check that an insertion in a database has worked. The database editor has already tested that for you. What you might want is to test that the query has been issued, not more.

Location

Speaking of modularity, it is always better to keep test files and tested objects collocated, in the same module. This way, the module will be autonomous and reusable.

Evergreen

I can remember a heated debate in my company about the green state of our test suite. I mean, even if we had “known” failing tests that we planned to fix:

  • The cons were that keeping the suite green was like “forgetting” the work required to fix the failing tests.
  • The pros was that we could way more easily spot a new, unexpected failure.

With time and experience, I can only vouch for the evergreen approach. I’ve seen so many times the “known” failures being amalgamated and undifferentiated from new ones, not speaking of new ones occurring in those “know” failing test suites, so that it was even more difficult to spot them.

Evergreen is the way to go, by skipping the faulty tests: skipped test can be seen and reported, whereas inverting (assertions) prevents this. The worse way to do it being to comment the test, which prevents to be sync with refactoring.

However, the evergreen choice only applies if that’s one of only two options. Modern test runners should allow a better handling of those “waiting for (better) fix” tests, like NodeJS does with its “TODO” tests which are reported failing, but not counted as failures.

Tools

Last but not least: how did the automation of these tests will be run? For years, a number of “tests runners”/frameworks (JUnit by Kent Beck & al, Jasmine, SinonJS, Mocha…) and associated assertions libraries (Chai) have been around, allowing you to write unit tests using several flavors of asserts (including Behavior-Driven Development — BDD — tests, a specialization of TDD) but also build mocks.

Honestly, there have been too many of these, and it’s good that the JavaScript/TypeScript community is converging to adopting a single one (Jest), so we can finally capitalize on learning this one.

It’s not perfect, though, and you may have experienced like myself hours of unnerving configuration work to make it finally work with both TypeScript and your ESM/CJS libraries. This actually went as far as, because I couldn’t accept wasting any more time on such a framework configuration pain, I ended up writing my own TS testing tool. It’s far from being on par with what Jest can do, but at least it works out of the box.

Should you’re not using TypeScript and sticking to JavaScript, I would recommend to ditch any of those fancy test frameworks and just use the “standard” test runner shipped with NodeJS.

When to test?

Demistyfing TDD

Some Test Driven Development proponents might answer “always” to this “when” question, since they consider code as derived from tests. The plan is as follows:

  1. Write tests
  2. Write/fix code until all tests passes.

As a result, many people misinterpret it as a new “waterfall” approach, which has been demonstrated to be inefficient (because you can’t guess all design flaws before implementation, notably). Project cycles/iterations fix this.

Hopefully, this is a distorted view of TDD: the rationale here is as follows:

  • coding by intention (i.e. starting to write calls to your API as if you were a user) is way to make sure you’ll be implementing something that meets the real need, is understandable and practicable. This may saves you some usability refactoring.
  • you always code to satisfy some contract, at least in your mind, if not written in some spec or in a test. You never code out of nothing, and since you always have this “Definition of Done” (DoD) in mind, it might be a good idea to start by writing the test that will check it.

But that’s not a golden rule. A lot of, if not most, developers write code then tests, and it has even been proved that it doesn’t change much productivity or quality… provided the cycles are short: if you don’t write too much code, it doesn’t matter whether you write the tests before of after it (Test Last Development or TLD).

“The reason we write the tests first is that it encourages us to keep the cycles really short.”
— Robert C. Martin: “TDD doesn’t work

Now, as discussed, you know that your code, and even your design, will evolve during the development process, and sometimes that will require your tests to be refactored as well. But along with time, the foundations you built, as validated by tests, will be less and less likely to change.

Productivity

It’s a natural tendency, at least at the beginning of your career, to be reluctant about writing tests: this looks tedious, not “really productive”, some way of wasting your time. However, this is not accounting for the time you’ll spent and the pressure you’ll experience fixing bugs in urgency.

It’s also not accounting for your refactoring sessions, when you’ll wonder if you broke something or not. Testing manually won’t bring you as much peace as a significant coverage of automated tests.

So yes, testing is an investment. It brings you confidence in making all the changes you want, as you as your tests pass. It brings you agility.

Pre-production

The goal of a pre-production environment is to mimic the production one as close as possible, so that passing tests on it should warrant a minimum of bad surprises in production.

Should your app use a persistent database, your tests will probably need one to perform large-scale scenarios. I’ve seen a number of different to handle this:

  • Setup the server to use memory db ⚡️ (fake object) or service virtualization instead of a “real” database. This has the benefit of being very fast (which eases the run of tests) an warranting a good design (since you are able to easily replace your database implementation) but this avoid testing the real deal and issues which may arise when using a persistent database, including timeouts, concurrent accesses with deadlocks, etc. So this doesn’t meet enough the goal of pre-production.
  • Using the production database 😱 on records (or even tables) dedicated for tests. Aside the risk of impacting production data (tests fail sometimes, right?), this may also impair production performance/responsiveness, as well as its usage statistics. You should avoid this.
  • Use a dedicated test database 👍 is probably the best option to both be realistic (i.e. near the production environment) and avoid impacting the production database.

Deployment

Now that you deployed your latest changes in production, what could you do? No a lot, aside a few manual tests using a test account, and see if something is some obviously not working or not. This is like starting a machine and, without looking at internals, just looking for suspcious smoke that would emanate from it. Hence the terme of “smoke tests”.

Conclusion

Good developers don’t produce buggy nor ship regressions.

Not because they are good, but because they have written automated tests.

--

--

Jérôme Beau
Jérôme Beau

Written by Jérôme Beau

Sharing learnings from three decades of software development. https://javarome.com

No responses yet