BackgroundMy favorite success story with formal verification was when I was working for a CPU design company in Silicon Valley. This was for a high-performance pipelined, superscalar CPU used in a PC/laptop (as opposed to an embedded CPU) so the design was fairly complex.
I was given two chip blocks to verify (the CPU was divided into
blocks the size and complexity of an ALU unit). Both of them were
verified using the described methodology.
I created a test environment for both blocks that I like to call
"formal testcase" with "directed random" testing.
It consisted of:
- A golden, cycle-accurate model of each block, which I wrote in behavioral verilog
- A 'piping/decoding' unit which allowed me to write my tests directly in verilog yet as a series of assembly instructions, and then allowed the results to be depiped for easy comparison.
- Directed Random: A set of random 'templates' that would randomly generate a 'type' of test that followed a template, i.e., a random addition, a random number of cycles, then a random branch cancel,.. so forth
- Formal Testcases: A set of formal 'templates' that would formally test a set of instructions, i.e., all addition instructions followed by all dependent subtraction operations.
The formal tool I used at this site was ESP created by Innologic (since acquired by Synopsys) but in purely formal mode (not using the random hybrid capabilities). This was not just used for testing a simple set of design assertions, this did a complete formal check of the golden model versus the block for a given 'testcase.'
The reason for choosing a split approach was because:
- We were evaluating the formal tools and I didn't want to put all my eggs in one basket just yet
- It had a faster turnaround (about 1 hour) to do a "full" run on block updates
CapabilitiesEvery time a block change was checked in, I could run my entire suite of directed random and formal tests in about an hour. I was usually able to root-cause and file bugs faster than they could get fixed.
Technical OverviewThose interested in the overall effects of the testing can skip to "Results"
Tool CapabilitiesThe ESP / Innologic simulator was unusual even for a formal tool. It did not just formally verify assertions for the block, or verify only portions of combinatorial logic. ESP is able to 'symbolize' inputs to your design and then compare outputs in a formal way, even with time delays (which can also be symbolic) and with non-combinatorial loops. Obviously the more complex your loops and our symbols became, the longer the simulation would take, until it would exponentially 'explode' and become unwieldy or impossible to run. Avoiding these situations involved a great deal of the work needed. As a simple example of how the ESP tool works, consider a simple 16b adder without carry-in/out:
_____ A --| | |Adder|--> C B --|_____|The first step in testing this block formally, we need to symbolize the inputs:
Golden ModelTo continue, we need a golden model to compare with. The golden model must also be written in behavioral verilog, as required by the formal tools. Golden models written in C and accessed via a PLI are useless, as they cannot run symbolically. Fortunately we can use behaviorial verilog as opposed to being limited to synthesizeable verilog, which gives us the benefit of fairly closely modelling the spec without having to drop to such a low-level that we may introduce errors in the model. Furthermore, since we are using behaviorial instead of synthesizeable verilog, any errors we introduce are very unlikely to match errors in the block under test. Since a golden model needs to be written that correctly encapsulates the expected behavior, a design specification needs to be complete. In writing the golden model according to the spec we were able to find bugs and ambiguities in the spec that could be cleared up before simulation had even begun. To continue our hypothetical case above, our simple golden model is: (a+b) & 16'hffff To test this, we simply formally look for a set of symbols where these conditions are not met:
initial begin $esp_var(a,b); #10; exp_c = (a+b) & 16'hffff; if (c !== exp_c) $esp_error("Found adder failure: "); endThe ESP simulator would symbolically simulate both the block under test and the golden model. If there were any possible symbols that fulfilled the '!==' conditional which triggered the $esp_error call, then we would see the error as well as a possible set of inputs to create it.
Symbol Space ManagementAs mentioned above, one of the issues was that as we added more symbols, the simulation time would grow exponentially, so it was important to manage our 'symbol space'. As an oversimplification, we can view 'symbol space' size as fairly related to the number of bits that are symbolized, though it greatly depends upon whether they are, for example, control signals or data signals. In the simple adder example above, let's imagine that the A and B inputs are actually 32b inputs to the block, but we are only testing them as 16b in the adder. In this case we can immediately cut down on the symbol space by:
a = a & 16'hffff; b = b & 16'hffff;Which suddenly makes the top bits of the symbols irrelevant. In reality, the cases are not this simple, and we need to find other ways to manage the symbol space. Control signals cause the most problems, since they generally cause the largest expansion of formal paths. Apart from masking off bits that we don't care about, we can also try to find inputs that are not generally interrelated and test them separately. To accomplish this with the blocks, I had a number of formal 'test cases,' each one would symbolize a different set of inputs and then randomize the other inputs. Then running each of the formal tests a small number of times would be enough to find bugs in the design. Obviously this is riskier than doing a formal simulation with all inputs completely symbolized, but since such a full case was impossible, we resorted to subdividing the problem and then using the random-directed tests as a backup.
DepipingThe blocks under test were pipelined, meaning that multiple instructions could run through the block and would return outputs after a number of cycles. The golden model written was not pipelined, so the testbench needed to be 'depiped' in order to compare the final values. This was done with a custom testbench generator. The depiping was more important for the directed-random tests which would run multiple input values through the blocks. We also, however, needed to worry about interactions between separate stages of the pipeline, i.e., a single instruction may execute properly but may have improper effects on a previous or following instruction. Examining the pipeline dependencies made it clear that we only needed to worry about no more than two instructions at a time, so some of the formal testcases would symbolize inputs, take a clock, symbolize a second set of inputs, and then clock through the pipeline to check the results of both instructions.
ResultsIn the end, this process turned out to be even more successful than I hoped. Usually about half of the verification of a CPU happens after the first CPU designs are fabricated, in 'post-silicon' - where your turnaround time for a design is very high, but your test run time is orders of magnitude faster.
The above methodology of directed-random combined with formal testcases found a mass of bugs in the pre-silicon phase. In fact, exactly zero bugs have been found in post-silicon. The blocks are clean.
In all my years of working with CPU verification, I had never heard of a
case where all the bugs in a block were found in pre-silicon. If you
know of a situation like this, please contact
me and let me know what your methodology was!
The good news is that the chip went into production and could be found
in a number of high-profile laptop designs. To date, exactly zero
bugs have been found in the blocks that I verified with this methodology.
Interested in having a success story like this at your company? Consider hiring me to see how it can be done.