Daniel Mitterdorfer

Microbenchmarking in Java with JMH: Digging Deeper

This is the fifth and last post in a series about microbenchmarking on the JVM with the Java Microbenchmarking Harness (JMH).

part 1: Microbenchmarking in Java with JMH: An Introduction

part 2: Microbenchmarks and their environment

part 3: Common Flaws of Handwritten Benchmarks

part 4: Hello JMH

In the previous post, I have introduced JMH with a Hello World benchmark. Now, let's dig a bit deeper to find out more about the capabilities of JMH.

A Date Format Benchmark

In this blog post, we'll implement a microbenchmark that compares the multithreaded performance of different date formatting approaches in Java. In this microbenchmark, we can exercise more features of JMH than just in a Hello World example. There are three contenders:

  1. JDK SimpleDateFormat wrapped in a synchronized block: As SimpleDateFormat is not thread-safe, we have to guard access to it using a synchronized block.
  2. Thread-confined JDK SimpleDateFormat: One alternative to a global lock is to use one instance per thread. We'd expect this alternative to scale much better than the first alternative, as there is no contention.
  3. FastDateFormat from Apache Commons Lang: This class is a drop-in replacement for SimpleDateFormat (see also its Javadoc)

To measure how these three implementations behave when formatting a date in a multithreaded environment, they will be tested with one, two, four and eight benchmark threads. The key metric that should be reported is the time that is needed per invocation of the format method.

Phew, that's quite a bit to chew on. So let's tackle the challenge step by step.

Choosing the Metric

Let's start with the metric that we want to determine. JMH defines the output metric in the enum Mode. As the Javadoc of Mode is already quite detailed, I won't duplicate the information here. After we've looked at the options, we choose Mode.AverageTime as benchmark mode. We can specify the benchmark mode on the benchmark class using @BenchmarkMode(Mode.AverageTime). Additionally, we want the output time unit to be µs.

import org.openjdk.jmh.annotations.*;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class DateFormatMicroBenchmark {
  // more code to come...
}

When the microbenchmark is run, results will be reported as µs/op, i.e. how many µs one invocation of the benchmark method took. Let's move on.

Defining Microbenchmark Candidates

Next, we need to define the three microbenchmark candidates. We need to keep the three implementations around during a benchmark run. That's what @State is for in JMH. We also define the scope here; in our case either Scope.Benchmark, i.e. one instance for the whole benchmark and Scope.Thread, i.e. one instance per benchmark thread. The benchmark class now looks as follows:

import org.apache.commons.lang3.time.FastDateFormat;
import org.openjdk.jmh.annotations.*;

import java.text.DateFormat;
import java.text.Format;
import java.util.Date;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class DateFormatMicroBenchmark {
    // This is the date that will be formatted in the benchmark methods
    @State(Scope.Benchmark)
    public static class DateToFormat {
        final Date date = new Date();
    }

    // These are the three benchmark candidates

    @State(Scope.Thread)
    public static class JdkDateFormatHolder {
        final Format format = DateFormat.getDateInstance(DateFormat.MEDIUM);

        public String format(Date d) {
            return format.format(d);
        }
    }

    @State(Scope.Benchmark)
    public static class SyncJdkDateFormatHolder {
        final Format format = DateFormat.getDateInstance(DateFormat.MEDIUM);

        public synchronized String format(Date d) {
            return format.format(d);
        }
    }
    
    @State(Scope.Benchmark)
    public static class CommonsDateFormatHolder {
        final Format format = FastDateFormat.getDateInstance(FastDateFormat.MEDIUM);

        public String format(Date d) {
            return format.format(d);
        }
    }
}

We defined holder classes for each Format implementation. That's needed, as we need a place to put the @State annotation. Later on, we can have JMH inject instances of these classes to benchmark methods. Additionally, JMH will ensure that instances have a proper scope. Note that SyncJdkDateFormatHolder achieves thread-safety by defining #format() as synchronized. Now we're almost there; only the actual benchmark code is missing.

Multithreaded Benchmarking

The actual benchmark code is dead-simple. Here is one example:

@Benchmark
public String measureJdkFormat_1(JdkDateFormatHolder df, DateToFormat date) {
    return df.format(date.date);
}

Two things are noteworthy: First, JMH figures out that we need an instance of JdkDateFormatHolder and DateToFormat and injects a properly scoped instance. Second, the method needs to return the result in order to avoid dead-code elimination.

As we did not specify anything, the method will run single-threaded. So let's add the last missing piece:

@Benchmark
@Threads(2)
public String measureJdkFormat_2(JdkDateFormatHolder df, DateToFormat date) {
    return df.format(date.date);
}

With @Threads we can specify the number of benchmark threads. The actual benchmark code contains methods for each microbenchmark candidate for one, two, four and eight threads. It's not particularly interesting to copy the whole benchmark code here, so just have a look at Github.

Running the Benchmark

This benchmark is included in benchmarking-experiments on Github. Just follow the installation instructions, and then issue java -jar build/libs/benchmarking-experiments-0.1.0-all.jar "name.mitterdorfer.benchmark.jmh.DateFormat.*".

Results

I've run the benchmark on my machine with an Intel Core i7-2635QM with 4 physical cores and Hyperthreading enabled. The results can be found below:

Results of the DateFormatMicroBenchmark

Unsurprisingly, the synchronized version of SimpleDateFormat does not scale very well, whereas the thread-confined version and FastDateFormat are much better.

There's (Much) More

DateFormatMicroBenchmark is a more realistic use case of a microbenchmark than what we have seen before in this article series. As you have seen in this example, JMH has a lot to offer: Support for different scopes for state, multithreaded benchmarking and customization of reported metrics.

Apart from these features, JMH provides a lot more such as support for asymmetric microbenchmarks (think readers and writers), control on the behavior of the benchmark (How many VM forks are created? Which output formats should be used for reporting? How many warm-up iterations should be run?), etc. etc.. It also supports arcane features such as the possibility to control the certain aspects of compiler behavior with the @CompilerControl annotation, a Control class that allows to get information about state transitions in microbenchmarks, support for profilers and many more. Just have a look at the examples yourself, or look for usages of JMH in the wild, such as JCTools microbenchmarks from Nitsan Wakart, the benchmark suite of the Reactor project or Chris Vest's XorShift microbenchmark.

Alternatives

There are also some alternatives to JMH, but for me none of them is currently as compelling as JMH:

Final Thoughts

Although writing correct microbenchmarks on the JVM is really hard, JMH helps to avoid many issues. It is written by experts on the OpenJDK team and solves issues you might not even knew you may have had in a handwritten benchmark, e.g. false sharing. JMH makes it much easier to write correct microbenchmarks without requiring an intimate knowledge of the JVM at the level of an engineer on the HotSpot team. JMH's benefits are so compelling that you should never consider rolling your own handwritten microbenchmarks.

Questions or comments?

Just ping me on Twitter


Cheap Read-Write Lock Explained

I have recently stumbled across a nice idiom called the cheap read-write lock. It is intended for very frequent concurrent reads of a field where synchronization would lead to too much contention. I think that the idiom is a bit odd and begs for an explanation.

Usage Scenario

For the sake of demonstration consider the following situation: Our system has a globally available calendar that holds the current day within the year. Every part of the system may read the current day. Once a day, a background thread will update it. Note that this scenario might not justify using the idiom in practice, but the example is sufficient for illustrating it.

@NotThreadSafe
public final class SystemCalendar {
  // may be a value between 0 and 364
  private int currentDayOfYear;
  
  public int getCurrentDayOfYear() {
    return currentDayOfYear;
  }
  
  public void advanceToNextDay() {
    // let's ignore leap years for this example...
    this.currentDayOfYear = (this.currentDayOfYear + 1) % 365;
  }
}

That's all nice and well, but this code is not thread safe. To demonstrate why, follow me on a short detour. The picture below illustrates what could happen to currentDayOfYear on a hypothetical and very simple multicore system. We assume that the updating thread always runs on CPU1 and reader threads run on CPU2. On initial access, the value of currentDayOfYear is read from main memory and cached on each CPU separately. The yellow circle indicates the currently known value of currentDayOfYear to any part of the system.

Memory Hierarchy with missing happens-before order

As you can see, the value has been updated on CPU1, but it has never been reconciled with main memory and thus never reached CPU2. How can this happen? Java 5 has brought us JSR 133, better known as the Java Memory Model, which defines a happens‑before order between memory accesses. I won't go into the details here, but we failed to establish a happens‑before order between writes and reads of currentDayOfYear, which allowed the runtime to cache its value in each processor separately. Let's fix it:

@ThreadSafe
public final class SystemCalendar {
  private int currentDayOfYear;
  
  public synchronized int getCurrentDayOfYear() {
    return currentDayOfYear;
  }
  
  public synchronized void advanceToNextDay() {
    this.currentDayOfYear = (this.currentDayOfYear + 1) % 365;
  }
}

By marking both methods as synchronized, we have established a happens‑before order. Writes will now be flushed to main memory, and the value will be fetched from main memory before reads (This is not entirely correct and the reality is more complicated, but this mental model is sufficient for understanding the idiom. If you are interested in the details, have a look at the MESI protocol).

Memory Hierarchy with established happens-before order

Problem solved? You bet! But let's suppose that there are thousands of concurrent readers of currentDayOfYear. As a result, #getCurrentDayOfYear() will suffer from heavy contention. Each reader has to obtain the monitor on SystemCalendar.this just to ensure it gets the most recent value of currentDayOfYear.

Removing the synchronization on #getCurrentDayOfYear() in this situation is fatal, as the class would not be thread safe anymore. JLS §17.4.4 states: "An unlock action on monitor m synchronizes‑with all subsequent lock actions on m [...]" (you can substitute synchronizes‑with with happens‑before for our purposes). The mentioned "subsequent lock action" we want is exactly the acquisition of the monitor lock by #getCurrentDayOfYear() after #advanceToNextDay() has been called. Only then, we have established a happens‑before order and are thus able to see the new value. Too bad: It seems we are stuck with the lock.

It turns out we can relax the constraint by declaring currentDayOfYear as volatile instead of marking #getCurrentDayOfYear() as synchronized. That's essentially the cheap read-write lock:

@ThreadSafe
public final class SystemCalendar {
  private volatile int currentDayOfYear;
  
  public int getCurrentDayOfYear() {
    return currentDayOfYear;
  }
  
  public synchronized void advanceToNextDay() {
    this.currentDayOfYear = (this.currentDayOfYear + 1) % 365;
  }
}

Why does this work? Both synchronized and volatile establish a happens‑before order, but only synchronized guarantees that the guarded block is executed atomically. The setter performs multiple operations that have to be atomic, so synchronizing the method is the correct choice. However, the getter performs only one action, namely a read of the field, which is atomic (as per guarantee of the JLS). This allows us to use volatile instead of synchronized to establish a happens‑before order.

When to use the cheap read-write lock?

This idiom may be applied when reads far outweigh writes and synchronization would lead to too much contention. You should not just assume that there is a bottleneck but actually measure it before and after applying this idiom.

When to avoid the cheap read-write lock?

In short: almost always. In the original article, Brian Goetz warns firmly that you should not use this idiom lightly. First, you should empirically demonstrate that there is a performance problem and using the cheap read-write lock eliminates it. The code should be properly encapsulated and be documented very well. You must not use the cheap read-write lock if you deviate from the standard use case above. For example, the idiom is obviously insufficient if your getter consists of multiple statements.

Alternatives

The safer alternative in most similar situations that occur in practice will be ReadWriteLock, which is provided by the JDK since Java 5:

@ThreadSafe
public final class SystemCalendar {
  private final ReadWriteLock lock = new ReentrantReadWriteLock();
  private final Lock readLock = lock.readLock();
  private final Lock writeLock = lock.writeLock();
  
  private int currentDayOfYear;
  
  public int getCurrentDayOfYear() {
    readLock.lock();
    try {
      return currentDayOfYear;
    } finally {
      readLock.unlock();
    }
  }
  
  public void advanceToNextDay() {
    writeLock.lock();
    // The try - finally block is not strictly necessary in this case 
    // but it is the common idiom when using locks. You almost always want to use it.
    try {
      this.currentDayOfYear = (this.currentDayOfYear + 1) % 365;
    } finally {
      writeLock.unlock();
    }
  }
}

Since Java 8 you can also use StampedLock.

Conclusion

With careful thought and a very good understanding of the runtime behavior of our system, we are able to relax the cost that may be introduced by synchronized access to highly contended code paths with a lot of reads. Which technique is sufficient to solve a performance problem is best demonstrated by intensively profiling your system in different usage scenarios. While the cheap read-write lock is certainly not something you'll pull out every day, it is a nice technique for your concurrency toolbox.

If you have any questions or comments just ping me on Twitter.


Microbenchmarking in Java with JMH: Hello JMH

This is the fourth post in a series about microbenchmarking on the JVM with the Java Microbenchmarking Harness (JMH).

part 1: Microbenchmarking in Java with JMH: An Introduction

part 2: Microbenchmarks and their environment

part 3: Common Flaws of Handwritten Benchmarks

part 5: Digging Deeper

In the previous post I have shown different problems that we might miss when writing microbenchmarks from scratch. In this blog post I'll introduce JMH and show how it helps us to avoid these problems.

Java Microbenchmarking Harness: Hello World Walkthrough

By now you might think there is no way to write a correct microbenchmark on the JVM without being an engineer working on HotSpot. Fortunately, some people on the OpenJDK team, most prominently Aleksey Shipilёv, have written the Java Microbenchmarking Harness or JMH for short. JMH takes all sorts of countermeasures to eliminate or reduce the problems I have described earlier and helps you to concentrate on writing the microbenchmarking instead of satisfying the JVM. To get a grasp of JMHs approach to microbenchmarking, let's write a hello world benchmark which you can also find in the accompanying project on Github:

package name.mitterdorfer.benchmark.jmh;

import org.openjdk.jmh.annotations.Benchmark;

public class HelloJMHMicroBenchmark {
    @Benchmark
    public void benchmarkRuntimeOverhead() {
        //intentionally left blank
    }
}

A JMH microbenchmark is a plain Java class. Each microbenchmark is implemented as a method that is annotated with @Benchmark (in earlier versions of JMH the annotation has been called @GenerateMicroBenchmark). But how to we run it? Before we can run the microbenchmark, we have some work to do. To see why, let's have a look at the basic workflow with JMH:

Runtime diagram of the forked JVM runs of JMH

This workflow might strike you as a bit odd at first. Why is JMH generating code? Why do we have to create a shaded JAR? Wouldn't it be easier to run a microbenchmark just like a JUnit test? Let's go through this process step by step.

We have already completed the first step by annotating a method with @Benchmark. The second step is carried out when the microbenchmarking class is compiled. JMH implements multiple annotation processors that generate the final microbenchmark class. This generated class contains setup and measurement code as well as code that's required to minimize unwanted optimizations of the JIT compiler in the microbenchmark. The generated class for name.mitterdorfer.jmh.HelloJMHMicroBenchmark is name.mitterdorfer.jmh.generated.HelloJMHMicroBenchmark_benchmarkRuntimeOverhead and can be found in the corresponding .java file below build/classes/main if you're curios. As you can see, JMH generates one class per method that is annotated with @Benchmark but that is transparent to JMH users.

JMH contains a Runner class somewhat similar to JUnit so it is possible to run embedded microbenchmarks using the JMH Java API. However, let's use the JAR-based workflow for now and create a shaded JAR which we'll run. JMH allows multiple microbenchmark classes in the same JAR and can run all of them in the same microbenchmarking run.

To run the microbenchmark we will now tacke step three and create a shaded JAR. I'll use Gradle for that as I prefer it over Maven. If you want or have to use Maven, just look at the JMH example POM or at the sample POM of my benchmarking project. Just type gradle shadow to create the shaded JAR, which means that a single JAR will be created that contains your microbenchmark and its dependencies. When you type java -jar build/libs/benchmarking-experiments-0.1.0-all.jar JMH runs the microbenchmarks that are contained in the JAR and prints something similar to this:

# Run progress: 0,00% complete, ETA 00:06:40
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: name.mitterdorfer.benchmark.jmh.HelloJMHMicroBenchmark.benchmarkRuntimeOverhead
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.7.0_10.jdk/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF8
# Fork: 1 of 10
# Warmup Iteration   1: 1442257053,080 ops/s
# Warmup Iteration   2: 1474088913,188 ops/s
[...]
# Warmup Iteration  19: 435080374,496 ops/s
# Warmup Iteration  20: 436917769,398 ops/s
Iteration   1: 1462176825,349 ops/s
Iteration   2: 1431427218,067 ops/s
[...]

# Run complete. Total time: 00:08:06

Benchmark                                                   Mode   Samples        Score  Score error    Units
n.m.b.j.HelloJMHMicroBenchmark.benchmarkRuntimeOverhead    thrpt       200 1450534078,416 29308551,722    ops/s

You can see that JMH creates multiple JVM forks. For each for fork, it runs n warmup iterations (shown in blue in the picture below), which do not get measured and are just needed to reach steady state before m iterations are run (shown in red in the picture below). In this example, n is 20 and m is 20 but you can change this with command line parameters.

Runtime diagram of the forked JVM runs of JMH

At the end, JMH summarizes the result of all microbenchmarking runs. The two most important measures are "score" (which is the mean for the throughput benchmarking mode) which allows you to estimate the performance of the benchmarked code and the "score error" which allows you to estimate the "noisyness" of the measurements taken by the microbenchmark. As this post is not intended to give an introduction to statistics I suggest "Explained: Key Mathematic Principles for Performance Testers" written by Microsoft's patterns & practices group. If you are more the intuitive type you'll like the articles from Kalid Azad very much, especially "How To Analyze Data Using the Average".

That's basically the whole microbenchmarking process with JMH. Congratulations, you mastered the first step of writing microbenchmarks with JMH! In the next post we'll get to know more concepts of JMH.

Questions or comments?

Just ping me on Twitter


Microbenchmarking in Java with JMH: Common Flaws of Handwritten Benchmarks

Fail Road

Image by Sarah; license: CC

This is the third post in a series about microbenchmarking on the JVM with the Java Microbenchmarking Harness (JMH).

part 1: Microbenchmarking in Java with JMH: An Introduction

part 2: Microbenchmarks and their environment

part 4: Hello JMH

part 5: Digging Deeper

In the previous post I have shown typical issues that have to be considered when executing microbenchmarks, such as the behavior of the JIT compiler or background system load. In this blog post we'll discover different problems that you'll encounter when writing microbenchmarks on the JVM.

Flaws and Dangers of Java Microbenchmarks

We write microbenchmarks because we want to know about the performance characteristics of a piece of code. In an ideal world, we would like to argue:

For given microbenchmark candidates X and Y: If X performs better than Y in a given microbenchmark, X will perform better than Y in any "similar" situation.

But Sheldon Cooper knows better: That's just superstitious hokum.

Sheldon Cooper says 'Superstitious Hokum'

What could possibly go wrong with this innocent argument? For one thing the microbenchmark could be flawed, which renders the premise invalid. Here are some examples:

These examples should demonstrate that there is a vast amount of things that can go wrong. However, in the unlikely event that we mere mortals get a microbenchmark right and the premise is valid, the conclusion could still be wrong. Let's consider some examples (non-exhaustive):

What's next?

In this article, we have seen that we can fail in a lot of ways when trying to measure the performance of a Java component with a microbenchmark. However, all hope is not lost. In the next part I'll introduce the Java Microbenchmarking Harness. Although it does not prevent all issues, it goes to great lengths to eliminate a lot of them upfront.

Questions or comments?

Just ping me on Twitter

Many thanks to @mmitterdorfer and @steve0392 for reading draft versions of this article.


Microbenchmarking in Java with JMH: Microbenchmarks and their environment

CPU Pins

Image by Eduardo Diez Viñuela; license: CC

This is the second post in a series about microbenchmarking on the JVM with the Java Microbenchmarking Harness (JMH).

part 1: Microbenchmarking in Java with JMH: An Introduction

part 3: Common Flaws of Handwritten Benchmarks

part 4: Hello JMH

part 5: Digging Deeper

In the previous post I have introduced microbenchmarking. In this blog post we'll discover different problems and precautions that should be taken when writing microbenchmarks on the JVM.

Is it really that hard?

How hard can it be to write a microbenchmark? Just invoke the code that should be benchmarked, measure the elapsed time and we are done. Hold your horses, it's not that easy. The class below - also available in the accompanying project for this series on Github - is a microbenchmark that attempts to determine the performance of Collection#add():

package name.mitterdorfer.benchmark.plain;

import java.util.*;
import java.util.concurrent.ConcurrentSkipListSet;

public class FlawedSetMicroBenchmark {
    private static final int MAX_ELEMENTS = 10_000_000;

    public static void main(String[] args) {
        List<? extends Set<Integer>> testees =
                Arrays.asList(
                        new HashSet<Integer>(),
                        new TreeSet<Integer>(),
                        new ConcurrentSkipListSet<Integer>());
        for (Set<Integer> testee : testees) {
            doBenchmark(testee);
        }
    }

    private static void doBenchmark(Set<Integer> testee) {
        long start = System.currentTimeMillis();
        for (int i = 0; i < MAX_ELEMENTS; i++) {
            testee.add(i);
        }
        long end = System.currentTimeMillis();
        System.out.printf("%s took %d ms%n", testee.getClass(), (end - start));
    }
}

This microbenchmark looks simple but it does not measure what we think it does. Just consider two apparent issues:

  1. The program will start in interpreted mode. After some time the runtime detects that #doBenchmark() is a so-called hot method, the Just-In-Time (JIT) compiler will kick in and compile it. Due to a technique called on-stack-replacement (OSR) the runtime is able to switch in the middle of executing a method from interpreted to compiled mode. Thus, the microbenchmark will measure the performance for a mixture of both modes. Dr. Cliff Click wrote an in-depth article about on-stack replacement in which he writes that OSRed code may not be optimal from a performance perspective.
  2. On the first invocation of #doBenchmark(Set), the microbenchmark continuously inserts elements into a HashSet. Therefore, the internal array will get resized, which produces garbage. This will in turn trigger the garbage collector and distort measurement results.

To summarize, we have identified two common sources of problems in Java microbenchmarks: the JIT-compiler and the garbage collector. This microbenchmark contains a few more surprises that we'll cover in a later article. Now, let's take a step back and think first about the ideal environment for microbenchmarks.

Environmental Considerations

Before running any benchmark, we have to consider these goals:

  1. We want to stay as close as possible to the target system configuration, provided it is known in advance.
  2. We do not want to introduce any unintended bottlenecks due to the execution environment.

Therefore, we should think about various system components and their configuration. Of particular importance for microbenchmarks are:

We can conclude that we need to select hardware and system software carefully and configure it properly. Depending on the requirements, the same microbenchmark should even be run on different systems (hardware, OS, JVM) to get a grasp of the performance characteristics. These considerations affect all benchmarks, regardless of underlying technology. Microbenchmarks running on virtual machines face additional challenges.

The JVM's Dynamic Nature

On a very high level, the JVM consists of three main components that work together: the runtime including the interpreter, the garbage collector and the JIT-compiler. Due to these components, we neither know in advance which machine code will be executed nor how it behaves exactly at runtime. Contrast this behavior with a regular C program:

Comparison of optimizations that happen at compile time and optimizations in JITed code

Oracle's HotSpot JVM applies a vast amount of optimizations on Java code: an outdated page on the OpenJDK Wiki lists close to 70 optimization techniques and I suspect HotSpot applies a lot more today. This makes it impossible to reason which machine code is finally executed based on the source code or the bytecode. For the curios, HotSpot provides a possibility to look at the generated assembly code of the JIT-compiler. The easy part: add a disassembler library to the Java library path and provide the proper JVM flags: -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:PrintAssemblyOptions=intel (see also the OpenJDK documentation on PrintAssembly). The hard part: Understand the output. For x86 architectures, the instruction set references (mnemonics A-M, mnemonics N-Z) are a start. To sum up, the dynamic behavior of the JVM makes it even harder to write correct microbenchmarks.

What's next?

In this article, we have seen that a lot of factors influence the runtime behavior of microbenchmarks. In the next article, we'll take a deep dive and look at specific flaws that can happen in handwritten microbenchmarks.

Questions or comments?

Just ping me on Twitter

Many thanks to @mmitterdorfer and @steve0392 for reading draft versions of this article.