Measure CPU bug complexity to improve benchmark quality

Determine the ability of verification methodologies to find the latest bugs.

I often get asked the question “When is the CPU check performed?” or in other words “how can I measure the effectiveness of my test bench and how can I be confident in the quality of the verification?” There is no easy answer. There are several common indicators used in the industry, such as coverage and bug curve. If they are absolutely necessary, they are not enough to achieve the best possible quality. Indeed, such indicators do not really reveal the ability of verification methodologies to find the latest bugs. With experience, I have learned that measuring the processor bug complexity is an excellent indicator to use throughout the development of the project.

What defines CPU bug complexity and how do you measure it?

Experience has taught me that we can define the complexity of a bug by counting the number of independent events or conditions that are needed to hit the bug.

What is considered an event?

Let’s take a simple example. A typical bug is found in caches, when a required hazard is missing. Data corruption can occur when:

  1. A cache line at address @A is valid and dirty in the cache.
  2. A load at the address @B causes an eviction of the line @A.
  3. Another upload to address @A starts.
  4. The external write bus is slower than the read, so the @A load completes before the eviction completes.

The external memory returns the previous data because the most recent data from the eviction was lost, resulting in data corruption.

In this example, 4 events – or conditions – are needed to encounter the bug. These 4 events give the bug a score of 4, i.e. a complexity of 4.

Categorize CPU bugs

To measure the complexity of a bug, we can provide a classification that will be used by the entire CPU verification team. In a previous blog post, we discussed 4 types of bugs and explained how we use these categories to improve the quality of our testbench and verification. Let’s take it a step further and combine this method with the complexity of bugs.

An easy bug may require between 1 and 3 events to trigger. The first simple test fails. A corner case going to need 4 or more events.

Going back to our example above, we have a bug with a score of 4. If one of the four conditions is not present, then the bug is not hit.

A constrained random testbed will need several features to be able to achieve the example above. The sequence of addresses must be smart enough to reuse previous addresses from previous requests, the delays on the external buses must be atypical enough to have fast reads and slow enough writes.

A hidden case will need even more events. Perhaps a more subtle bug has the same conditions as our example, but only occurs when an ECC error is discovered on the cache, at exactly the same time an interrupt occurs, and only when the core terminates an FPU operation that results in a divide-by-zero error. With typical random testbeds, the probability of having everything these conditions together are extremely weak, making it a “hidden” bug.

Making these hidden bugs more accessible in the testbench improves the quality of verification. It consists in making hidden crates corner crates.

This classification has no limit. Experience has shown me that a testbed that can find bugs with a score of 8 or 9 is a solid simulation testbed and is essential for delivering quality RTL. From what I’ve seen today, the most advanced simulation testbeds can find bugs with a complexity level of up to 10. Fortunately, using formal verification can find much more easily bugs that have even higher complexity, paving the way for even better design, and giving clues as to what to improve in the simulation.

Using Bug Complexity to Improve the Quality of a Verification Benchmark

This classification and this methodology are only useful if they are used from the beginning of the verification and throughout the development of the project, for 2 reasons:

  1. Bugs should be fixed as they are discovered. Leaving a level 2 or 3 bug unfixed means that many failures occur when running important dip tests. Statistically, a similar bug (from the same squadron) that requires more events might go unnoticed.
  2. Bug complexity is used to improve and measure the quality of a testbench. Since the complexity level is the number of events required to trigger the bug, the higher the complexity score, the more stressful the testbench. Tracking and analyzing the events that triggered a bug is very useful for understanding how to fix random constraints or for creating a new point of functional coverage.

Finally, by combining this approach with our methodology of chasing bugs flying in the squadrons, we ensure a high level of quality verification that helps us to be sure that we are exceeding the verification approval criteria.

Philip Luke

(All posts)

Philippe Luc is Audit Director at Codasip.

Sharon D. Cole