David Jeffrey Ljung Madison
Verification Engineer Success Stories
Formal Block Verification
Background
My favorite success story with formal verification was when I was
working for a CPU design company in Silicon Valley. This was for
a high performance pipelined, superscalar CPU used in a PC/laptop
as opposed to an embedded CPU, so the design was fairly complex.
I was given two chip blocks to verify (the CPU was divided into
blocks the size and complexity of an ALU unit). Both of them were
verified using the described methodology.
Environment
I created a test environment for both blocks that I like to call
"formal testcase" with "directed random" testing.
It consisted of:
- A golden, cycle-accurate model of each block, written in
behavioral verilog
- A 'piping/decoding' unit which allowed me to write my
tests in verilog as a series of assembly instructions,
and then allowed the results to be depiped for easy comparison.
- Directed Random: A set of random 'templates' that would randomly generate a 'type'
of test that followed a template, i.e., a random addition,
a random number of cycles, then a random branch cancel..
- Formal Testcases: A set of formal 'templates' that would formally test a
set of instructions, i.e., all addition instructions followed
by all dependent subtraction operations.
The formal tool I used at this site was ESP created by Innologic
(since acquired by Synopsys) but in purely formal mode (not using the
random hybrid capabilities). This was not just used for testing a
simple set of design assertions, this did a complete formal check of
the golden model versus the block for a given 'testcase.'
The reason for choosing a split approach was because:
- We were evaluating the formal tools and I didn't want to
put all my eggs in one basket just yet
- It had a faster turnaround (about 1 hour) to do a "full"
run on block updates
I believe that a pure formal solution (no directed cases) could also
have worked - but at that time Innologic did not have coverage
analysis (it does now). Because of this the directed cases gave us not only
a sense of coverage but an extra level of security against any
mistakes made in the formal approach.
Capabilities
Every time a block change was checked in, I could run my entire suite
of directed random and formal tests in about an hour. I was usually
able to root-cause and file bugs faster than they could get fixed.
Technical Discussion
Those interested in the overall effects of the testing can skip
to "Results"
Tool Capabilities
The ESP / Innologic simulator was unusual even for a formal tool. It did
not just formally verify assertions for the block, or verify only portions
of combinatorial logic. ESP is able to 'symbolize' inputs to your design
and then compare outputs in a formal way, even with time delays (which
can also be symbolic) and with non-combinatorial loops. Obviously the
more complex your loops and our symbols became, the longer the simulation
would take, until it would exponentially 'explode' and become unwieldy or
impossible to run. Avoiding these situations involved a great deal of the
work needed.
As a simple example of how the ESP tool works, consider a simple 16b
adder without carry-in/out:
_____
A --| |
|Adder|--> C
B --|_____|
The first step in testing this block formally, we need to symbolize the inputs:
initial $esp_var(a,b);
Golden Model
To continue, we need a golden model to compare with. The golden model must
also be written in behavioral verilog, as required by the formal tools.
Golden models written in C and accessed via a PLI are useless, as they
cannot run symbolically. Fortunately we can use behaviorial verilog
as opposed to being limited to synthesizeable verilog, which gives us
the benefit of fairly closely modelling the spec without having to drop
to such a low-level that we may introduce errors in the model. Furthermore,
since we are using behaviorial instead of synthesizeable verilog, any
errors we introduce are very unlikely to match errors in the block under test.
Since a golden model needs to be written that correctly encapsulates
the expected behavior, a design specification needs to be complete.
In writing the golden model according to the spec we were able to
find bugs and ambiguities in the spec that could be cleared up
before simulation had even begun.
To continue our hypothetical case above, our simple golden model is: (a+b) & 16'hffff
To test this, we simply formally look for a set of symbols where these
conditions are not met:
initial begin
$esp_var(a,b);
#10;
exp_c = (a+b) & 16'hffff;
if (c !== exp_c) $esp_error("Found adder failure: ");
end
The ESP simulator would symbolically simulate both the block under test
and the golden model. If there were any possible symbols that fulfilled
the '!==' conditional which triggered the $esp_error call, then we would
see the error as well as a possible set of inputs to create it.
Symbol Space Management
As mentioned above, one of the issues was that as we added more symbols,
the simulation time would grow exponentially, so it was important to
manage our 'symbol space'. As an oversimplification, we can view 'symbol
space' size as fairly related to the number of bits that are symbolized,
though it greatly depends upon whether they are, for example, control
signals or data signals.
In the simple adder example above, let's imagine that the A and B
inputs are actually 32b inputs to the block, but we are only testing
them as 16b in the adder. In this case we can immediately cut down
on the symbol space by:
a = a & 16'hffff;
b = b & 16'hffff;
Which suddenly makes the top bits of the symbols irrelevant.
In reality, the cases are not this simple, and we need to find other
ways to manage the symbol space. Control signals cause the most problems,
since they generally cause the largest expansion of formal paths. Apart
from masking off bits that we don't care about, we can also try to find
inputs that are not generally interrelated and test them separately.
To accomplish this with the blocks, I had a number of formal 'test cases,'
each one would symbolize a different set of inputs and then randomize
the other inputs. Then running each of the formal tests a small number
of times would be enough to find bugs in the design. Obviously this
is riskier than doing a formal simulation with all inputs completely
symbolized, but since such a full case was impossible, we resorted to
subdividing the problem and then using the random-directed tests as
a backup.
Depiping
The blocks under test were pipelined, meaning that multiple instructions
could run through the block and would return outputs after a number of
cycles. The golden model written was not pipelined, so the testbench
needed to be 'depiped' in order to compare the final values. This was
done with a custom testbench generator. The depiping was more important
for the directed-random tests which would run multiple input values
through the blocks. We also, however, needed to worry about interactions
between separate stages of the pipeline, i.e., a single instruction
may execute properly but may have improper effects on a previous or
following instruction. Examining the pipeline dependencies made it
clear that we only needed to worry about no more than two instructions
at a time, so some of the formal testcases would symbolize inputs,
take a clock, symbolize a second set of inputs, and then clock through
the pipeline to check the results of both instructions.
Results
In the end, this process turned out to be even more successful than
I hoped. Usually about half of the verification of a CPU happens
after the first CPU designs are fabricated, in 'post-silicon' - where
your turnaround time for a design is very high, but your test run time
is orders of magnitude faster.
The above methodology of directed-random combined with formal testcases
found a mass of bugs in the pre-silicon phase. In fact, exactly
zero bugs have been found in post-silicon, the blocks are clean.
In all my years of working with CPU verification, I had never heard of a
case where all the bugs in a block were found in pre-silicon. If you
know of a situation like this, please contact
me and let me know what your methodology was!
Post-Mortem
The good news is that the chip went into production and could be found
in a number of high-profile laptop designs. To date, exactly zero
bugs have been found in the blocks that I verified with this methodology.
Interested in having a success story like this at your company?
Consider hiring me to see
how it can be done.