Unit testing still have a reputation of being a waste of time for some software developers and their managers. And in some cases, it definitely can be as waste of time, but in 90% of the cases it is worth spending the extra time it takes to create unit tests.
Here we will try to show, why unit testing in general is a good practice when doing calculations on an HPC facility.
The difficult part of unit testing is to find the correct unit you want to ensure is tested. That is even more true when we are talking scientific software that run on HPC facilities.
What is a good unit?
There is no correct answer for that, as the definition depends on what your code are doing. For this purpose, we have created a set of guidelines that match most cases.
It is important to distinguish between testing and critical path benchmarking. The purpose of unit testing is to ensure that the different units still do what you expect of them after a code change, change of platform, or other changes.
A critical path benchmark will only tell you if your critical code parts are producing the same benchmark number as intended after for instance a code change.
Good units for testing in scientific code can for example be these:
- Hard to understand code. This could be calculations that depend on loops within loops. A test of this unit will always ensure that a change in the code will provide the correct results.
- Hard to get right code. This is code that may use some complex data structure that are hard to use or get correct usage of.
- Complex call graph. A complex call graph in the code could be a result of many different corner cases or patterns where the testing will ensure that all different corner cases produce the correct results.
- Communications. Lot of scientific code depends on some form of communication between different executions units that could be the same machine or other machines in a high-speed network.
- Calculation. Most scientific code have a calculation core where all critical calculations are done.
- Legacy. Even if we do not want to depend on legacy that are not maintained anymore, it still happens. Unit testing the usage of the legacy code can give early warning of errors before they occur.
Why is unit testing for scientific code important?
Looking at the pilot test code helps a lot in order to spot differences in testing results when trying to optimize for the LUMI platform for example. This could be as simple as seeing that the resolution of the floating-point unit in the CPU is different on the 15 decimals. This can be used as feedback for the scientists to verify that the results are ok or there is a need to change the code to ensure correctness of the results.
The unit tests also help a great deal when porting the code to a more modern framework like OpenMP or rewriting parts of the code to GPU kernels. This helps the developer to ensure that the code porting is correct.
For software that are being made ready to run on large HPC installations, like LUMI, unit tests are worth the initial cost of creating them. As with many other software project, scientific code also depends on others’ work and different software packages.
To make it easy to start creating unit tests for existing code, we have created a small GitHub code repository that show how to use two different unit test frameworks. The GitHub project shows how this is done in FORTRAN, C and C++. The code uses cMake to create and manage the creation of build files but can easily be adapted to other build systems. It works on all platforms (Linux, windows, mac OS X). The different unit tests frameworks are both a xUnit style framework and basically work the same way.
Please see the GitHub link and contact the HPC section at DeiC if you have any question or suggestions for a change in the code.