Image by Zach Dischner; license: CC
This is the first post in a series about microbenchmarking on the JVM with the Java Microbenchmarking Harness (JMH).
part 2: Microbenchmarks and their environment
part 3: Common Flaws of Handwritten Benchmarks
part 4: Hello JMH
part 5: Digging Deeper
In this post I'll introduce benchmarking conceptually and describe the specific flavor microbenchmarking.
What is Benchmarking?
When software engineers are concerned with the performance of a system, they can resort to a rich variety of practices, for example:
- Performance Testing to determine the performance of an already built system. MSDN provides a very thorough guide on the subject.
- Profiling to analyze and investigate bottlenecks when a system is running.
- Benchmarking to compare the relative performance of systems.
- Analysis to determine the algorithmic complexity (think Big-O notation).
In this series of blog posts we take a deeper look at benchmarking. As software engineers, we are interested in answers to questions like:
- Is it better to use radix sort or quick sort for sorting data in our homegrown database?
- Which of these two machines will run the latest copy of GTA faster (and how much faster)?
- What are the performance characteristics of different Map implementations for varying degrees of contention?
These examples already expose some characteristics of software benchmarks:
- Benchmark candidate: What piece of software do we benchmark? This may be an entire application or just a single component.
- Comparison against a baseline: How do we know whether performance is "good"? What is considered "good"? The baseline may be determined by customer requirements or you might be just looking for the best relative performance in a specific scenario among a set of benchmark candidates.
- Metrics: Which metrics do we use to determine performance? Are we interested in highest throughput or lowest latency?
- Benchmarking scenario: Do we consider single-threaded or multi-threaded performance? How does a data structure behave when accessed concurrently by multiple writers?
- Benchmarking duration: Are we interested in the performance of an individual operation such as formatting a date or the performance of a complex use-case involving multiple remote systems? Consequently, the latter benchmark has to run for a significantly longer time period to get trustworthy results.
This leads to two commonly known types of benchmarks:
- Macrobenchmarks are used to test entire system configurations. They are sometimes standardized, one example being SPECjbb2013 but nothing prevents you to do macrobenchmarks of a specific application on different system configurations, e.g. application servers or JVMs. Macrobenchmarks might be useful for system-level performance regression testing before upgrades of system software or for selecting a suitable application server.
- Microbenchmarks are used to compare different implementations in an isolated context, for example a single component. They are often written by library developers to determine performance characteristics or by application developers to choose a specific implementation in performance-critical contexts. Microbenchmarks are useful for comparisons of algorithms (e.g. serialization or XML/JSON parsers) or data structures.
You should now have a rough understanding of microbenchmarking and where this series of blog posts is headed. In the next post, we'll discuss common flaws of microbenchmarks on the JVM and find out why writing a correct one is not a piece of cake. Stay tuned.
Questions or comments?
Just ping me on Twitter