Tech Design Forum Techniques Tech Design Forum Techniques Section Mon, 23 Jul 2018 10:11:38 -0700 en-US hourly 1 The budget case for formal verification Mon, 23 Jul 2018 10:10:47 +0000 Axiomise launched its formal verification training program at the same time as the recent Verification Futures 2018 conference in the UK, where I set out a new vision for formal. Afterwards, I was contacted by a number of seasoned simulation experts interested in formal verification. Quite a few asked me not only to explain how formal differs from simulation-based verification but also why one should make the investment in learning and deploying it.

I quickly realized that a number of the things I consider plainly obvious advantages of formal are not that obvious to the majority of verification engineers who use flows based around simulation, directed test, and constrained random. I also discovered that many of these engineers’ willingness to invest is framed by significant, recent investments in emulation.

Interestingly enough, Tech Design Forum has an article specifically making ‘The budget case for emulation’. So, let me follow a similar tack. In this article, I will argue why formal verification should play an important part in your overall verification plan and, importantly, why your investment in formal verification tools, training and methodology development will not increase your costs, but shrink them.

We need to understand what are the different verification-related costs in a typical SoC design verification program?

In my view, cost is not just the expenditure incurred adopting a new verification technology in its early days by buying tools and training. It should also should be measured by how long the technology takes to use in practice, what kind of bugs it helps to catch, how it affects debug, and finally how it contributes to signing off verification with confidence.

Now, if you factor in the cost of missed bugs as well, the discussion becomes very interesting. With all that in mind, these are my ten main costs points, along with the reasons why you should adopt formal verification to control them successfully.

1. Tools and support

How much do are the tools cost?

How easy is it to use them?

What is the support cost if one ever needs it?

What is the cost of maintaining the tools, their different patches and so on?

What is required to ensure consistency in their use across the organization?

While it is true that formal tools cost more than their simulation counterparts, they also cost a lot less than emulators. The initial cost of investment in formal tools quickly gets repaid when trained engineers deploying good methodologies start finding bugs earlier in the design cycle, and are able to exhaustively prove the conclusive absence of bugs. This is not possible with either simulation or emulation.

Using multiple simulator licenses to achieve better throughput for test-rate can be offset against a single formal tool license capable of finding the same issues and even more due to the way formal technology works. Whereas constrained simulation takes a random approach requiring a significant investment in running the same test harness with different seeds in parallel, formal can explore the state space exhaustively without requiring any stimulus or seed. It can perform breadth-first search systematically, often finding corner case bugs much more quickly.

2. Training

How much does it cost for each employee?

How long does it take?

When will it start to pay off?

Most good training programs ensure that the student is ready to work on an appropriate project, and scope out what can be done with the trained resource over staggered periods of six weeks, one quarter, and six months.

If you are not seeing any return on your investment after six months, the training probably has not been deployed properly and you must look at the reasons why. If you have derived enough quality results from formal in six months, you should have already recovered your initial investment, so from this point onwards the ROI will simply continue to grow.

But let’s be honest. A good training methodology has been lacking. This is one reason why formal’s adoption has been a mixed bag. However, if you can get good training (such as what we offer at Axiomise) that is totally methodology oriented, modular, customizable and tool-agnostic, you will get results. Also, as a benchmark, the cost of a high quality training program for formal should typically cost less than you would pay for a UVM training course.

3. Methodology

Has it been developed thoughtfully?

How is it promoted and adopted across the organization?

Can every vendor’s tools be applied within it?

An investment needs to be made in effort, time and money to ensure that any methodology is adopted properly and consistently within an organization. The methodology should consider the overall scope of verification and should look to solve the real problems in verification not just focus on exploiting automated features within your tools.

Indeed, your methodology should be completely vendor-agnostic and your training providers should provide instruction that works seamlessly across all tools.

4. Tool deployment

What does each license cost?

What is the tool’s runtime?

What is the cost of the required compute cluster?

From my experience, if a well-trained engineering team is using its tools properly, they not only find more bugs but also optimize the runtimes of compute clusters effectively. This in turn means they require fewer licenses.

So much is certainly valid in the case of formal verification where good methodologies require a small set of tool licenses and the methodologies provide a quicker turnaround on proof results yielding an optimal use of the compute cluster.

I recently described in one of my blogs how I regularly undertake formal verification tasks and obtain a high rate of proof convergence on a tablet PC with a mere 8GB of RAM and a single CPU core with just one license. The example in that article was of a serial buffer verification where exhaustive verification of buffers with 1040,000 states can result in finding bugs in less than 10 minutes!

5. Debug

What is the most efficient strategy?

Formal verification is renowned to produce debug traces that are much shorter than would result from a simulation or emulation test. If one was to measure how much it costs to debug an SoC design with formal, simulation and emulation in terms of effort, time and the cost of the engineering resource (the job grade of an engineer for example), you would likely get interesting results showing that formal outperforms both simulation and emulation in debug.

I recall on one project I was asked to investigate a bug in an SoC controller where one of the performance counters were not behaving properly. This error was flagged by a highly competent verification engineer who had spent about three weeks debugging the emulation trace to find this dodgy counter. Using some powerful techniques, I reproduced the bug in this 64-bit counter in formal with one minute of runtime. What was even better was that the runtime remained fixed as the counter size increased and even for a 512-bit counter the time it took to find any kind of counter bug was fixed at a minute of runtime. A 64-bit counter has nearly 18.4 quintillion states, and if it takes a minute to find a bug, I’ll take that.

6. Bug fix and retest

How easy is it to fix a bug?

How easy is it to test that a fix works and that it has not broken anything else in the design?

Fixing one bug and thereby causing another: It’s a long-standing problem. Formal verification is ideal for this kind of work. Its exhaustive nature allows the verification team to establish conclusively that a bug fix was safe.

7. Signoff

What time, effort and engineering resources are required to be sure that signoff is complete?

Understanding what it costs to sign off verification is a topic in itself, but we can get a high level idea by looking at those factors above and how they contribute to a coverage-based signoff infrastructure.

Even formal verification has a cost and is not entirely free like stimulus generation (which is free with formal). However, if the right coverage techniques are deployed, formal verification signoff becomes a lot easier. Certain tools address signoff in a way that saves engineers an enormous amount of time. As I write this article, I am keenly aware that new emerging solutions in this space are being delivered and this will surely make signoff with formal easier still.

The important thing to note is that when one is able to prove a property exhaustively with formal; establish at the same time that free formal stimulus was not blocked by user-defined constraints; and know that subsequent mutations to the design can be caught with one or more checkers – this is the best possible outcome one can imagine for any verification technique.

8. Late and missed bugs

What would it cost if a bug were found two weeks before tapeout?

What would it cost to patch a broken chip if a bug were to escape into the final product?

Or putting these risks in another context, ‘What if we could have found the same bug in the first hour of design bring-up?’

We still do not really know how much it has cost to fix Meltdown and Spectre. Using formal verification early in design cycle and maintaining systematic deployment across all of the IP development cycle can minimize the occurrence of late bugs and even obliterate the missed bugs phenomena.

9. Reuse

What options are there to reuse the verification infrastructure?

You want to minimize the costs related to your engineering resources and shrink your time-to-market. Architecting end-to-end formal verification testbenches does require an investment of time and for big designs involves a dedicated deployment of engineering resources. But the good news is that if this effort is properly planned for based on good training and a good methodology, it is very likely that the testbench can be reused.

Where formal verification wins over simulation is that formal does not require any investment in stimulus generation. You get that for free. And it can be deployed across the I/O boundary of the IP as well as on internal finite state-machines for detecting and proving exhaustively that the designs are free from deadlocks and livelocks.

10. Engineering resources

How do you maximize productivity across your team?

If a junior engineer finds most bugs in a project surely that is more cost effective than having a principal engineer do so.

No offense to the principal engineers out there - I was one myself. But I remember from my experiences at several major companies that the best resources for formal verification are often the youngest hires – provided they have received adequate training.

I can think of several star performers who went from initial training to production-grade work in weeks and have continued to use formal verification ever since. The reason why formal verification is easier to learn than constrained random is because all it takes to become productive is some background in digital design, familiarity with some high-level design languages and a problem-solving mindset focussed on requirements as the basis for verification. You do not need to be good at understanding polymorphism, object-oriented programming or software engineering.

So, time to invest in formal?

I urge you to consider your verification choices carefully. History is an important asset, so analyze your previous project data. it should reveal what worked for you without formal and what did not. If you used formal, and it did not work, do you know why? Was part of the problem down to a lack of good training and/or methodology?

Do you need formal verification? You know my answer, what’s yours?

Perhaps, you may agree with some of my reasons to use formal, perhaps you don’t. In any case, get in touch and share your thoughts.

You can follow Doc Formal on Twitter at @ashishdarbari

]]> 0
Formal fundamentals: what’s hiding behind your constraints Tue, 17 Jul 2018 18:47:07 +0000 Countless articles, blogs and books discuss formal verification fundamentals such as constraints. When starting out with formal verification, these topics are explained, discussed and mastered fairly quickly. However, as one delves into the details of verification and starts dealing with failures, debug and convergence challenges, it is easy to forget some of the fundamentals. That’s what had happened in a recent encounter with a customer.

We were talking with the customer about using formal verification more broadly in their organization. It was the kind of intellectually stimulating meeting where we were going deep, whiteboarding, and challenging each other, with the common goal of improving the verification process and methodologies. In the middle of the discussion, one of the users said:

“The problem was too hard for formal… I was not getting enough proofs, so I added some constraints, got all proofs and moved on…."

I had to stop the customer right there and ask what was the intention behind adding the constraints. The customer was looking to get better convergence (i.e. more proofs), by adding assumptions to the setup to turn many of the inconclusive or bounded properties into full proofs.

Seeing a long list of properties with a green check mark next to them, showing they were proven, made the customer feel good and they moved on, assuming that formal verification had just confirmed that there were no bugs. What the customer actually did was very dangerous, and we later had a good conversation about why.

Let’s take a step back. The reason we need constraints or assumptions in the first place is to limit the behavior/functionality that the formal tool analyzes to the design’s “legal” states/subset. We don’t want to be alerted to failing waveforms or counter-examples if they are caused by input combinations that cannot happen in reality. So we add constraints to tell the tool to avoid analyzing certain behaviors.

One consequence of adding constraints can be to limit the exploration space for formal tools and thus improve the tool’s performance and/or the design’s convergence. This benefit comes at the cost of ignoring/avoiding some potentially legal (and even possibly buggy) state spaces. We call this approach over-constraining the design, and it is one of the most dangerous aspects of formal verification because it can deliver what look like full proofs, despite the fact that the design has not been fully analyzed.

Let’s say we have a design space that is the area of a square: Length x Width. If Length and Width are large (let’s say 100) and we want to look inside every single one of its 10,000 sub-squares to ensure it is free of bugs, that’s a large space to cover. However, if a constraint is added to say we should only analyze sub-squares placed where Length < 10 and Width < 10, then the state space to explore is much smaller (just 100 sub-squares) and can be analyzed more quickly. Importantly, though, the analysis has not addressed potential bugs in sub-squares where Length or Width >= 10.

This kind of over-constraint is a common part of formal verification. Even when the intent is correct, it is still possible to over-constrain the design analysis accidentally. Luckily, VC Formal has very strong over-constraint analysis features that can be used to catch such situations. (There’s a good webinar on this here.)

Some advanced users do over-constrain a design on purpose, to get more proofs or deeper bounds, but they do it in a systematic way, commonly referred to as case-splitting, in which a problem is broken down into multiple sub-problems which are each exhaustively verified. Advanced users also do an over-constraint analysis (for which there are many techniques) to ensure that they are not creating the conditions for bad surprises.

While we would all love to get better convergence for our designs, we have to stay vigilant and ensure that we don’t get there by over-constraining our analysis.

Further information

For a related blog, check out Goldilocks and the three constraints.


Sean Safarpour is the Director of Application Engineering at Synopsys, where his team of specialists support the development and deployment of products such as VC Formal, Hector and Assertion IPs. He works closely with customers and R&D to solve their current verification challenges as well as to define and realize the next generation of formal applications. Prior to Synopsys, Safarpour was Director of R&D at Atrenta focused on new technology, and VP of Engineering and CTO at Vennsa Technologies, a start-up focused on automated root-cause analysis using formal techniques. Sean received his PhD from the University of Toronto where he completed his thesis entitled “Formal Methods in Automated Design Debugging”.

Company info

Synopsys Corporate Headquarters
690 East Middlefield Road
Mountain View, CA 94043
(650) 584-5000
(800) 541-7737

Sign up for more

If this was useful to you, why not make sure you're getting our regular digests of Tech Design Forum's technical content? Register and receive our newsletter free.

]]> 0
Picking the right-sized crypto processor for your SoC Wed, 11 Jul 2018 10:20:16 +0000 Applications such as encrypted storage, authenticated communication, secure boot, and software update all need confidentiality, integrity, authenticity, and/or non-repudiation, which are made possible by cryptography algorithms.

Security protocols prescribe which cryptography algorithms should be used, how they should be applied, and in what order. Within these constraints, SoC designers have a lot of implementation choices, which may depend upon the kind of device they are trying to secure, the importance of doing so, and its resource constraints. The robustness of the security implementation will also vary, depending on the kind of attacks expected. Implementing security is therefore a trade-off between factors such as the level of risk, level of security, cost of security implementation, energy consumption, and performance.

Cryptography implementation options

Cryptography algorithms and protocols can be implemented in hardware, software, or a mixture of both. Figure 1 shows three commercial designs built three different ways.

Three different ways to implement cryptographic functions (Source: Synopsys)

Figure 1 Three different ways to implement cryptographic functions (Source: Synopsys)

The left of Figure 1 shows a software-only model built with the Synopsys DesignWare Cryptography Software Library. This library features configurable symmetric and asymmetric cryptography algorithms for encryption and certificate-processing functions. The Library is certified under the National Institute of Science and Technology Crypto Algorithm Verification Program, making it useful for systems that need to comply with FIPS 140-2.

The middle of Figure 1 shows an example of partial acceleration by means of specialized instructions and auxiliary registers. The DesignWare ARC SEM Security Processor is extended with the DesignWare ARC CryptoPack option, which adds custom instructions and registers to the ARC processors to speed up software algorithms including AES, 3DES, ECC, SHA-256, and RSA.

The right side of Figure 1 shows hardware solutions. The DesignWare Cryptography IP solutions include configurable symmetric and hash cryptographic engines, public key accelerators (PKAs) and true random number generators (TRNGs).

Software-only algorithm implementation

A software-only implementation involves trade-offs between performance, code size, and memory. Using lookup tables instead of computing values on-the-fly, for example, can speed up the implementation of certain algorithms – at the cost of extra memory. Loop unrolling can reduce branching overhead and provide more scheduling options to the compiler, at the cost of larger code.

Designers need to consider the vulnerability of their software implementations to attacks. For example, implementing RSA requires the modular exponentiation of very large integer numbers. The square-and-multiply algorithm does this efficiently, but the time it takes to execute depends linearly on the number of ‘1’ bits in the key. By repeating timing measurements for different data values and applying statistical analysis to the results, an attacker could obtain the secret key.

As an example of the impact of different software implementation choices, Figure 2 shows the performance versus code size of several SHA hashing implementations.

The labels of the data points indicate the software implementation choice. A number indicates the loop unroll factor, and ‘pre’ or ‘otf’ indicates whether the message scheduler is pre-computed at the beginning of the routine (pre), or the computation is interleaved with the round transformations (on-the-fly). The shape and color of the points indicate the compiler configuration for generating code optimized either for performance or size. Performance is shown as the number of cycles per message byte, so a lower number indicates higher performance. The figure shows that loop unrolling has the largest effect on performance, but compiler optimization for speed and on-the-fly computation improve performance as well, at the cost of increased size.

Software implementations compiled for size or speed (Source: Synopsys)

Figure 2 Software implementations compiled for size or speed (Source: Synopsys)

Partially accelerated implementation

In this approach, parts of the cryptographic algorithm are implemented in dedicated hardware, which improves overall performance and reduces energy consumption at the cost of increased hardware area and reduced flexibility. Some of the area absorbed by the cryptographic acceleration functions can be regained by reducing the instruction memory. Data memory could also be reduced by computing values on-the-fly in hardware instead of using lookup tables.

Designers can offload software computations to dedicated hardware in various ways. One is to implement a bus-based peripheral device to do the computations, but this only makes sense if enough cryptographic computations are offloaded at once to make up for bus latencies to and from the peripheral.

Other approaches include using a coprocessor interface or extending the processor with dedicated registers and specialized instructions.

The CryptoPack implementation of AES shows one approach to finding the right granularity and mix of hardware and software elements to meet a performance, area, and energy budget. The AES algorithm repeats the same computations (called a ‘round’) several times. CryptoPack introduces an instruction for the ARC 32bit processor that computes a quarter of a round, instead of a full round. This makes the best use of the ARC core’s 32bit datapath to get the required round keys from memory to the accelerator, pipelined with the computations. Implementing a full round in hardware as an instruction extension would need more hardware but would not improve performance much because extra instructions would still be required to load and transfer the full 128bit round key.

Full hardware algorithm implementation

For cases in which a software or partially accelerated software implementation does not provide enough performance or energy efficiency, a hardware engine can implement the full algorithm. Software is then limited to a device driver or hardware abstraction layer for interacting with the hardware engine.

For bulk cryptographic operations (such as hashing, encryption, and decryption), hardware engines often have DMA features, so the engine does not have to rely on a processor and device driver software for data transfers.

Another implementation issue involves dealing with the keys needed for the algorithms. In a software implementation, the keys are accessed by the software and this can be a security risk. In a hardware implementation, the keys may be accessed through a dedicated interface to non-volatile memory. A cryptography engine could also be configured with extra key storage memory, so it can switch between multiple parallel clients quickly.

SoC designers also need to decide how to implement cryptography algorithms, for example exploring whether to implement constant timing strategies for extra security, or to choose look-up tables versus on-the-fly computations. Each choice has a different implication for the robustness of the security implementation.

Security robustness: countermeasures

Different implementations of a cryptography algorithm have different levels of security robustness. For example, what happens when an implementation is fed incorrect parameters, such as weak keys? Does the implementation check such corner cases? Will it withstand side-channel attacks made by measuring the timing, power, or electromagnetic emissions of the implementation when running an algorithm?

Countermeasures to these attacks come at a performance, power, and area cost. Knowing whether or not to incur this cost demands an analysis of the kinds of threats the device may face, and the importance of securing it against such attacks. Responses can range from no hardening (the device is only operated in a secure environment by trusted operators), to some hardening (robustness against network-based timing attacks), to extensive hardening (protection against differential power analysis and/or hardware fault-injection attacks).

Timing and power analysis attacks can be countered by preventing information leakage by using implementations whose timing is not dependent on the data being processed, and/or by masking the information by deliberately adding noise.

Measurable parameters

If SoC designers want to assess whether they have chosen the right-sized cryptographic processing solution, they need to choose some measurable parameters with which to compare options. These can include:

  • Cryptographic performance, usually defined as throughput and measured as bits or bytes per cycle or second, or operations per second.
  • Latency, the time from starting to process a message to completing it.
  • CPU utilization during cryptographic operations. This will typically be 100% for a software solution, but less for a hardware solution, especially if a crypto engine is in use and the main CPU can be doing other things, or switch into a low-power mode.
  • Size of solution. Hardware can be measured by logic and memory gate count or resulting die area. Software can be measured by memory footprint, divided between code size and data size.
  • Power, usually measured in Watts, or energy measured in Joules. Power usually scales linearly with frequency, so it can be normalized for clock frequency to give a measure of power efficiency in μW/MHz.
  • Security robustness, which is more difficult to quantify. Some test labs assess how long it takes to hack a solution, given certain resources. Lab testing is often part of formal certification like Common Criteria or the EMV specification. Such certifications can be used as a proxy for security robustness.

The EEMBC IoT SecureMark

The Embedded Microprocessor Benchmark Consortium (EEMBC) develops benchmarks such as CoreMark, the industry standard for embedded microprocessor performance. It has now introduced a set of Internet of Things benchmarks: IoTMark and SecureMark.

SecureMark provides a framework for benchmarking the efficiency of cryptographic processing solutions, which supports different profiles for different application domains.

SecureMark-TLS models the cryptographic operations required for the Transport Layer Security (TLS) protocol that is used for secure internet communication. TLS supports various cryptographic approaches to authenticating the communicating parties and ensuring the privacy and integrity of messages exchanged between them. The IoT security working group of EEMBC chose a set of cryptography algorithms and parameters appropriate for IoT edge devices for its SecureMark-TLS benchmark.

The resulting system-level benchmark measures the energy used and performance of the cryptographic operations necessary to do the TLS operations forIoT devices.

The SecureMark-TLS test framework connects the device under test (DUT), such as an ARC processor-based IoT development kit, to a testbed that has two hardware boards and two software components. An I/O manager board sends commands over a UART to the DUT and receives results back to check whether the operations have been performed correctly. An energy monitor board provides power to the DUT and measures its energy consumption. A timestamp I/O pin, toggled by the DUT, marks the beginning and end of the operations being measured.

The SecureMark-TLS benchmarking software consists of a host PC application and embedded DUT software. The host application drives the execution of the benchmark by using the test-bed hardware boards to send commands to the DUT to perform cryptographic operations. It receives the results of the operations for checking, and it obtains power and timing measurements from the energy monitor.

The EEMBC test-harness software that runs on the DUT toggles the timestamp I/O to indicate when energy measurements should be started and stopped, and uses an API to tell the DUT to do various benchmark cryptographic operations.

Example benchmark results

Synopsys has experimented using the beta release of SecureMark-TLS, as shown in Figure 3. It used the mbed TLS reference implementation from EEMBC, and the DWC Cryptography Software implementation, accelerated using the CryptoPack processor extensions for the ARC EM processor in the DesignWare ARC EM Starter Kit.

Performance comparison between mbed-TLS and DesignWare cryptography software with CryptoPack extensions (Source: Synopsys)

Figure 3 Performance comparison between mbed-TLS and DesignWare cryptography software with CryptoPack extensions (Source: Synopsys)

The graph shows the results measured in time units, normalized to the smallest number (the 23 byte SHA256 hash operation on the ARC EM processor using CryptoPack acceleration). Lower numbers are better.

The benchmark shows that symmetric crypto and hashing run much faster than asymmetric shared-secret, sign, and verify computations. It also shows that CryptoPack acceleration speeds up the symmetric and hash (AES and SHA) processes between 4x and 7x, depending on the message size. The speedup for asymmetric ECDH and ECDSA operations is smaller, at up to 40%. CryptoPack was designed to add only 10% to 20% hardware area to a small ARC EM processor, and the large multipliers required for further speedup of the asymmetric operations would require more area.


SoC designers have to make nuanced tradeoffs between the cost, performance, energy consumption and security of their designs, especially for IoT devices. Most of the cost of security comes from implementing the necessary cryptographic algorithms. This can be done in hardware, software, or a mix of the two, and making effective comparisons among these approaches demands a consistent set of measurements.

The EEMBC has developed SecureMark as a tool for comparing the efficiency of cryptographic implementations. In turn, SecureMark-TLS focuses on implementations of the TLS protocol in IoT devices. Using this benchmark to compare a pure software approach with a hardware-accelerated software solution that uses CryptoPack extensions for the ARC EM processor showed a 4x to 7x performance improvement, at a cost of a 10% to 20% area increase when implementing the AES and SHA algorithms.

Further information


Ruud Derwig

Ruud Derwig has more than 20 years of experience with software and system architectures for embedded systems. Key areas of expertise include (real-time, multi-core) operating systems, media processing, component based architectures, and security. He holds a master's degree in computing science and a professional doctorate in engineering. Derwig started his career at Philips Corporate Research, worked as a software technology competence manager at NXP Semiconductors, and is currently a software and systems architect at Synopsys.

Company info

Synopsys Corporate Headquarters
690 East Middlefield Road
Mountain View, CA 94043
(650) 584-5000
(800) 541-7737

Sign up for more

If this was useful to you, why not make sure you're getting our regular digests of Tech Design Forum's technical content? Register and receive our newsletter free.

]]> 0
Slash test time by combining hierarchical DFT and channel sharing Mon, 09 Jul 2018 06:36:07 +0000 Hierarchical DFT, in which DFT insertion and sign-off pattern generation is done at the block level, is now standard procedure for most large IC designs. In a hierarchical methodology, DFT work can start early, eliminating spikes in the workload towards tapeout, and reducing the compute resource requirements. The combination of hierarchical DFT and embedded deterministic test (EDT) channel sharing maximizes channel allocation to further improve ATPG turn-around time and test cost.

Why use hierarchical DFT

Leading semiconductor companies are seeing a dramatic increase in design size, while the number of top-level pins for test access is remaining relatively static. In addition to being very large, these designs also target advanced process node, and often need to meet quality standards that require the use of several fault models. The combination of large design size, advanced node, test pin limitations, and quality requirements poses a huge challenge for the ATPG tool and the DFT engineers responsible for the task. This is where a divide-and-conquer approach with hierarchical DFT saves the day. Because many design tasks, like synthesis and physical layout, are already implemented hierarchically, performing hierarchical DFT is consistent with that approach. Figure 1 illustrates the concept of hierarchical DFT.

Channel sharing feature - fig 1

Figure 1. The Tessent Hierarchical DFT flow merges and retargets patterns from the core level to the top level (Mentor)

Numerous published case studies show the benefits of hierarchical DFT. These include:

  • An up to 10X performance gain in ATPG, diagnosis, and pattern verification;
  • An up to 2x pattern count reduction;
  • Getting DFT out of the critical path; and
  • Enabling core re-use

Patterns are generated at the core level with much faster runtimes and less compute resources (memory) than would be needed for full-chip ATPG. With the use of an IEEE 1687 IJTAG infrastructure, hierarchical DFT is highly automated, flexible, and scalable.

Manage pin limitations and boost compression with EDT channel sharing

In large designs, the number of chip-level pins available for scan test data is limited. There are several techniques to manage this. These include input channel broadcasting, where a set of scan channel input pins are shared among multiple identical cores. Modern multicore architectures contain many heterogeneous IP cores, each with a different EDT controller. For this situation, channel sharing is a good option. Figure 2 illustrates the concept.

Figure 2. Each non-identical EDT block only needs a few control channels. Most channels can be shared.

Figure 2. Each non-identical EDT block only needs a few control channels. Most channels can be shared (Mentor).

EDT channel sharing allows scan input channels to be shared across multiple, heterogeneous cores. The compression architecture separates control and data channels. Now, the control channels can be individually accessible and uniquely allocated, and the data channels can be shared among a group of cores. Channel sharing boosts compression ratios by about another 2X and lends added flexibility to DFT planning in SoC design flows.

However, all cores sharing data channels must be present when patterns are generated because the ATPG tool needs to be able to predict the expected outputs based on the input stimulus. In a non-hierarchical flow, all cores are present during pattern generation as a rule. But in a hierarchical DFT approach, patterns are generated at the wrapped core level then retargeted and merged at the chip level. In this case, sharing of channels must be done inside a hierarchical boundary defined for pattern generation. This involves grouping cores together in design regions that are also wrapped. Creating this new level of hierarchy allows for channel sharing across cores inside the region, and patterns generated at that region level are retargeted to the chip level.

Case studies

The benefits of a hierarchical DFT flow were documented in a case study at DATE 2016[1]. On a 4.3M flop design with 18 cores, the ATPG turn-around time was found to be 11X faster with a hierarchical flow against a traditional flow. The load on the DFT engineer was found to be more uniform, saving weeks of work at the most critical time of the project.

In a more recent study published by Spreadtrum at DTIS 2018[2], the combination of hierarchical DFT and channel sharing was studied on a new generation, high-end mobile chipset. The project team compared the results to the data from a functionally similar, previous generation mobile chip.

The previous chip had about 3 million scan flops and was designed for 28nm. It used channel sharing and a flat DFT methodology in which all cores were tested together through 100 chip-level pins. The large memory footprint needed to load the design limited the number of machines available for ATPG, which took several weeks.

The design used for the Spreadtrum study used 14nm technology and contained nearly 7 million scan flops, but was still limited to 100 chip-level test pins. The design team was concerned that ATPG would be impossible to complete within the design schedule.

Channel sharing with hierarchical DFT requires that patterns be generated for the entire group of cores that share data channels. This design had five such “sub-chip” regions that each contained multiple EDT blocks that used channel sharing. Patterns generated at those sub-chip levels were retargeted to the chip level in the same automated fashion as any block-level patterns.

The new ATPG methodology resulted in an average 3X reduction in memory footprint, even before accounting for the 2.3X increase in design size. That is, the 7 million scan flop design had a 3X lower ATPG memory footprint than the 3 million scan flop design. The reduced memory footprint freed up compute resources for use in other critical tapeout tasks. ATPG runtime, again not scaled to a per-flop basis, was reduced by over 10X. The overall reduction in test time was about 1.7X, and test coverage for both stuck-at and transition delay increased.

The savings to the overall schedule was greater than the runtime improvements. In a flat approach, ATPG must wait for the full chip netlist to be completed, putting DFT work in the critical path to tapeout. In a hierarchical flow, ATPG occurs earlier in the schedule as each core is completed. By the time the top-level design is complete and ready for tapeout, the test patterns have already been generated and verified.


The rapid pace of design scaling calls for change to traditional DFT flows. The basic hierarchical methodology involves inserting DFT and generating sign-off test patterns at the core level, and retargeting the patterns to the top level. Another technique for managing pin-limited design is EDT channel sharing. A recent case study by Spreadtrum demonstrates that the two techniques are compatible and result in combined benefits to memory footprint, ATPG runtime, test channel allocation, and overall tapeout schedule. The value of hierarchical DFT is well established, but it does not preclude using other techniques like channel sharing to further improve DFT efficiency.


[1] D. Trock, R. Fisette, “Recursive hierarchical DFT methodology with multi-level clock control and scan pattern retargeting,” Design, Automation & Test in Europe (DATE) 2016.

[2] B. Lu,, “The test cost reduction benefits of combining a hierarchical DFT methodology with EDT channel sharing — A case study,” International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DIS) 2018.

About the author

Geir Eide is the product marketing director for Tessent ATPG and Compression at Mentor, A Siemens Business. As a 20-year veteran of the test and DFT industry, Geir has presented seminars on DFT and test throughout the world. Geir earned his MS in Electrical and Computer Engineering from the University of California at Santa Barbara.

]]> 0
Bringing Ethernet time-sensitive networking to automotive applications Wed, 20 Jun 2018 17:01:57 +0000 Advanced driver assistance systems (ADAS) such as lane keeping/departure warning, emergency braking, collision avoidance and, eventually, fully autonomous driving features, all demand predictable latency and guaranteed bandwidth in the network to ensure that safety-critical data can flow from sensors to processors and back quickly enough to be effective.

To meet this demand, Ethernet has become increasingly popular for automotive networking, and so the IEEE’s Working Group on Time Sensitive Networking (TSN) has released a set of TSN standards, and continues to define new specifications, to support real-time networking in vehicles. Automotive SoC developers need to understand the capabilities of Ethernet with TSN, as well as what it takes to implement the standard in an automotive context.

The evolution of time-sensitive networking

The evolution of TSN started with the introduction of Audio Video Bridging (AVB), which was meant to enable the use of Ethernet in audio video systems. As automotive designers began to look to using Ethernet in ADAS applications, their much stricter latency requirements could not be met with AVB alone. This led the IEEE TSN working groups to expand the AVB specifications to meet the requirements of using Ethernet for control applications with predictable latency and guaranteed bandwidth. The initial set of TSN standards is shown in Table 1.

The TSN working group has introduced multiple standards (Source: Synopsys)

Table 1 The TSN working group has introduced multiple standards (Source: Synopsys)

Time-aware shaper

Automotive networks are designed to have predictable, guaranteed latency. A network with these characteristics is known as an engineered network. The time-aware shaper is used in an engineered network and enables scheduling to ensure that a critical traffic queue is not blocked. This is done with a ‘time gate’ that enables time-critical data to stream unimpeded while blocking non-time critical data, as shown in Figure 1. The IEEE 802.1Qbv scheduler’s logic determines the time intervals at which the gates must open and close. The time-aware shaper is implemented in the Ethernet MAC.

Time-aware shaper allows scheduling (Source: Synopsys)

Figure 1 Time-aware shaper allows scheduling (Source: Synopsys)


Pre-emption can also be used to reduce the latency of time-critical data streams. On an Ethernet network, pre-emption enables a time-critical data frame to interrupt the transmission of a data frame whose transmission is not time critical. Once the time-critical data frame reaches its destination, transmission of the non-time critical data frame resumes. Any fragmented data frame must be reassembled before its transmission can continue. See Figure 2.

Preemption reduces latency of time-critical data streams (Source: Synopsys)

Figure 2 Preemption reduces latency of time-critical data streams (Source: Synopsys)

Let’s use an emergency braking system as an example. Two Ethernet MACs transmit the time-critical data frame (green) and non-time critical data frame (orange). The preemptable MAC lets the green frame travel ahead of the orange frame to get to its destination in time. The brake is then applied, regardless of the other data frames on the network. The time-critical data frame preempts the non-time critical data frame, which reduces the latency for the time-critical data and makes it more predictable.

Cyclic queuing and forwarding

Cyclic queuing and forwarding supports known latencies regardless of the network topology. Its main role is to make network latencies more consistent across bridges. See Figure 3. According to the IEEE P802.1Qch standard, the cyclic queuing and forwarding amendment, “specifies a transmission selection algorithm that allows deterministic delays through a bridged network to be easily calculated regardless of network topology. This is an improvement of the existing techniques that provides much simpler determination of network delays, reduces delivery jitter, and simplifies provision of deterministic services across a bridged LAN.”

Cyclic queuing and forwarding supports known latencies regardless of the network topology (Source: Synopsys)

Figure 3 Cyclic queuing and forwarding supports known latencies regardless of the network topology (Source: Synopsys)

Per-stream filtering and policing

Per-stream filtering and policing enables a bridge or endpoint component to detect whether components in the network are conforming to the agreed rules. For example, a node gets allocated a certain amount of bandwidth and when this bandwidth is exceeded due to a component failure, or malicious act, action can be taken to protect the network. This standard includes procedures to perform frame counting, filtering, policing, etc. Policing and filtering functions are useful for detecting and enabling the elimination of disruptive transmissions, thus improving the robustness of the network.

Frame replication and elimination

Frame replication and elimination supports seamless data redundancy. It detects and mitigates issues caused by cyclical redundancy check (CRC) errors, broken wires, and loose connections. The time-critical data frame is expanded to include a sequence number and is replicated where each frame follows a separate path in the network. If the separate paths rejoin at a bridge or merge point in the network, duplicate frames are eliminated from the stream, allowing applications to receive frames out of order. See Figure 4.

Frame replication and elimination detects and mitigates issues caused by CRC errors, broken wires, and loose connections (Source: Synopsys)

Figure 4 Frame replication and elimination detects and mitigates issues caused by CRC errors, broken wires, and loose connections (Source: Synopsys)

For example, when an adaptive cruise control system sends a signal to the control system to maintain a certain speed and distance from the car ahead, separate paths are created across the network to enable this signal and signals from other applications to travel seamlessly. Once the signals merge together, duplicate frames are eliminated to allow uninterrupted signal transmission. The IEEE defines three implementations of frame replication and elimination where the talker sends the signal and the listener receives the signal:

  • Talker replicates, listener removes duplicates
  • Bridge replicates, listener removes duplicates
  • Bridge replicates, bridge removes duplicates

Enhanced generic precise timing protocol

The enhanced generic precise timing protocol supports clock redundancy by synchronizing clocks across the network in two ways: with a single grand master, or with multiple grand masters. The system has a master that synchronizes the clock and a grand master that references the root timing of the network.

In a single grand master model, the clock time information is transmitted to the listener on one segment of the network, and then communicated to the other segment on the same network. Only the grand master knows the accurate clock time.

In a multiple grand master model, the clock time is transmitted throughout the network in different directions so that, in case of an interruption, an accurate clock time is still known throughout the network.

Single grand master transmitting two copies using separate paths (Source: Synopsys)

Figure 5a Single grand master transmitting two copies using separate paths (Source: Synopsys)

Multiple grand masters transmitting two copies using separate paths (Source: Synopsys)

Figure 5b Multiple grand masters transmitting two copies using separate paths (Source: Synopsys)

Implementation issues

The TSN specifications, and other standards such as IEEE P802.1Qcc and P802.1Qcr, have evolved to help automotive designers be more certain that the vehicles they design will be safe. Designers producing SoCs for automotive applications, especially for ADAS, can take advantage of the standards to ensure that safety-critical data flows through their designs as necessary to ensure correct operation.

Automotive SoC designers also need to respect other, stringent standards, such as ISO 26262 covering functional safety, AEC-Q100 covering reliability, and advanced quality management strategies. For example, achieving ISO 26262 certification demands defining and documenting all processes, development efforts, standards, and safety plans for the Automotive Safety Integrity Level (ASIL, A thru D) that the designer has chosen to implement. Similarly, the SoC and IP must be tested to meet very low defect densities measured in defects parts per million.

Synopsys already offers automotive-certified IP such as Synopsys’ DesignWare Ethernet Quality-of-Silicon IP. This ASIL B Ready ISO26262 certified IP with automotive safety package supports Ethernet speeds up to 2.5Gbit/s, real-time networking, the original IEEE AVB specification, and now TSN.


John Swanson is a senior product marketing manager at Synopsys.

Company info

Synopsys Corporate Headquarters
690 East Middlefield Road
Mountain View, CA 94043
(650) 584-5000
(800) 541-7737

Sign up for more

If this was useful to you, why not make sure you're getting our regular digests of Tech Design Forum's technical content? Register and receive our newsletter free.

]]> 0
Formal fault analysis for ISO 26262: Find faults before they find you Mon, 18 Jun 2018 10:00:11 +0000 Automobiles are becoming computers on wheels - more accurately, multiple complex mobile computing systems. The safety of highly electronic vehicles depends increasingly on the quality and correct functionality of electronic designs.

The automotive safety standard ISO 26262 requires that objective hardware metrics are used to calculate the probability of random hardware failures and mandates specific remedial steps in the case of a failure to meet safety criteria. Hardware architectures must be rigorously tested to ensure they meet functional safety requirements dictated by the standard. It states that this analysis should include Failure Mode and Effects Analysis (FMEA). Fault analysis is used to measure and verify the assumptions of FMEA.

Fault injection is an essential component of fault analysis. However, traditional applications of random fault injection in gate-level simulation eat up too much time and require late architectural changes. Thus, it is important to begin this process earlier in the design cycle by moving to higher levels of abstraction, such as the register transfer and transaction levels.

Yet at higher levels of abstraction we still run into similar problems with random fault injection: It is inefficient and time-consuming. This is true even when using hardware emulation and design prototypes.

To overcome these limitations, the focus must be on faults that are not insulated by safety mechanisms and that will subsequently cause safety failures.

The answer lies in formal verification fault analysis using a combination of formal fault pruning, fault injection, and formal sequential equivalency checking.

By leveraging formal technology, we can determine the number of safe faults by identifying the unreachable design elements, those outside the cone of influence (COI), or those that do not affect the outputs (or are gated by a safety mechanism). After fault pruning, the optimized fault list can be used for fault injection. Formal verification conclusively determines if faults are safe or not, making the failure rates from formal analysis more comprehensive than fault simulation. These strategies are discussed in detail below.

Culling the verification space

We want to improve the efficiency of any fault injection mechanism. To do so, static analysis is performed ahead of time to determine the critical set of design elements where faults should be injected. At the same time, we can remove design elements that are not critical to the safety mechanism. We call this ‘fault pruning’. The primary objective is to skip elements that will not affect the safety requirement and focus fault injection on elements that will.

As part of a ISO 26262 safety analysis and based on the safety requirement, an engineer defines the safety goals and the safety critical elements of a design. A formal tool has the ability to trace back from the safety goals through the design elements to determine what is in the cone of influence (COI) See Figure 1.

Figure 1: Formal tools trace back the cone of influence to perform their analysis (Mentor).

Figure 1: Formal tools trace back the cone of influence to perform their analysis (Mentor).

The steps in the fault-pruning process can be summarized as:

  1. Identify a set of safety critical elements in the design that will have an immediate impact on the safety goals. These include output ports, state machines, counters, configuration/status registers, FIFOs, and other user-specified
  2. Create a list of constraints for the block. Examples can include debugging mode, test mode, and operational modes not used in a specific instance. However, for this list to be valid, its constraints must have an independent safety mechanism (An example is an error flag that is generated whenever the design sees these modes are active). Typically these modes are global and their safety mechanisms are addressed at higher levels.
  3. Compute the COI of these safety critical elements and create a fault injection list based on elements in these fan-in cones. Design elements that are not in the COI are considered safe and need not be considered.
  4. Apply formal techniques to verify that faults injected in these elements will be observed and will impact the safety goals and/or safety critical

The next step is to look at what design elements in the COI of a safety requirement overlap with the COI of the safety detection mechanism. For example, suppose the data integrity of a storage and transfer unit is considered safety critical. As shown in Figure 2, along with each data packet, there are error-correcting code (ECC) bits for detecting any error. As long as the ECC bits can flag any transient fault, the data output is protected because errors can be detected and dealt with downstream. However, in a dual-point fault situation, error(s) may also be affecting the ECC detection logic. As a result, the safety mechanism is prevented from detecting bad data in the packet.

In this case, the first fault in the safety mechanism is considered a masking condition and thus a latent fault. The second fault in the dual-point fault scenario is an actual error that the safety mechanism would have detected if not for the latent fault, thus leading to a dual-point failure.

Figure 2: Anything outside the COI of the detection logic is considered potentially unsafe.

Figure 2: Anything outside the COI of the detection logic is considered potentially unsafe (Mentor)

Besides, not all design elements are in the COI of the safety mechanism. Design elements that are in the COI of the data output but not in the COI of the ECC logic will not be protected by the safety mechanism and so must be considered potentially unsafe. Faults in these design elements are treated as undetectable or potentially unsafe. As a result, we have:

  1. Design elements (in the COI of the ECC) where faults are potentially detectable by the safety mechanism (safe, residual faults, or dual-point faults); and
  2. Design elements (out of the COI of the ECC) where faults are undetectable by the safety mechanism and potentially unsafe (residual faults).

With the ability to measure the two COI, the opportunity to improve the design (hardening) becomes apparent. A large ‘undetectable’ section of the two cones means that there is a higher percentage of elements that can lead to residual faults. As a result, the ability to meet high ASIL ratings is significantly reduced. The goal of the design team at this point is to determine how to create more overlap of the safety critical function and the safety mechanism.

Injecting faults

Once the number of design elements in the fault list have been pruned down to a high quality and meaningful subset, it can be passed onto various fault injection mechanisms; such as fault simulation, hardware accelerated fault simulation, formal fault verification, and hardware prototyping.

Even though most simulation regression environments deliver high functional coverage, they are poor at propagating faults to the output ports. Also, the functional checks are not robust enough to detect all kinds of faults. We create the regression environment to verify the functional specification - i.e., the positive behaviors of the design. It is not designed to test the negative behaviors. We are interested in formal-based fault injections primarily because of two attributes:

  1. Fault propagation: For example, a random fault has been injected into the FIFO registers that causes its content to be corrupted. As the simulation environment is not reactive, the fault can be overwritten by subsequent data being pushed into the FIFO. Formal verification is different. It injects faults intelligently. It is good at introducing faults that it knows how to propagate to affect the safety goals or outputs of the design.
  2. Fault detection: For example, even when the corrupted FIFO content has been read, the fault may not be detected. The FIFO content may be discarded because the functional simulation is not in the mode to process the data. Most functional simulations test the scenarios that have been defined in the specification, while fault detection requires testing all the scenarios that may go wrong (negative behavior) consistently. To check all potential negative scenarios, we rely on formal sequential equivalency checking (FSEC).

FSEC compares the outputs of two designs or two representations of the same design. The implementation of the two designs can be different as long as the outputs are always the same. For example, FSEC can be used to compare a VHDL design that has been ported to Verilog (or vice versa) to see if the two RTL designs are functionally equivalent. Conceptually, an FSEC tool is a formal tool with two designs instantiated, constraining the inputs to be the same, and with assertions specifying the outputs should be equivalent for all possible internal states.

While there are many uses for FSEC, fault injection is a sweet spot. Formal tools have the ability to inject both stuck-at and transient faults into a design, clock the fault through the design’s state space, and see if the fault is propagated, masked, or detected by a safety mechanism. Once the fault is injected, the formal tool tests all possible input combinations to propagate the fault through the design to the safety mechanism — this is where FSEC becomes necessary.

As depicted in Figure 3, a golden (no-fault) model and a fault-injected model are used to perform on-the-fly fault injection and result analysis. Fault injection with formal FSEC is useful early in a design cycle when no complete functional regression is available. It can be used to perform FMEA and to determine whether the safety mechanism is sufficient. Designers can experiment and trade off different safety approaches quickly at the RTL. Such approaches include error detection, error correction, redundancy mechanism, register hardening, and more.

Figure 3: Fault injection with sequential logic equivalency checking

Figure 3: Fault injection with sequential logic equivalency checking (Mentor)

By instantiating a design with a copy of itself, all legal input values are automatically specified for FSEC, just as a golden reference model in simulation predicts all expected outputs for any input stimulus. The only possible inputs are those values that can legally propagate through the reference design to the corresponding output. By comparing a fault-injected design with a copy of itself without faults, the formal tool checks if there is any possible way for the fault to either escape to the outputs or go undetected by the safety mechanism.

The steps of the fault injection with FSEC can be summarized as:

  1. Specify two copies of the original design for FSEC. The input ports will be constrained together and the output comparison will be checked
  2. Run FSEC to identify any design elements — such as memory black boxes, unconnected wires, un-resettable registers, — that will cause the outputs to be different. Constraints are added to synchronize them.
  3. Use the fault list to automatically specify possible injection points in the faulty design. Normally, we will focus on the single point faults - e., one fault will be injected to check for equivalency.
  4. Based on the fault list and the comparison points, FSEC will automatically identify and remove any redundant blocks and
  5. Run FSEC concurrently on multiple servers. The multi-core approach has significant performance advantages as multiple outputs can be checked on multiple cores concurrently.
  6. Use a technology to compile all the faults in one single run, and thus save time and space resources usually needed for compiling the faults one by one.
  7. Run the FSEC verification step for thousands of faults in one single run.
  8. If an injected fault has caused a failure at the comparison point, a waveform of the counter-example that captures the fault injection and propagation sequence can be generated for debugging.


We have assembled an environment for formal fault pruning and fault injection with FSEC, summarized in Figure 4. The fault-pruning step builds a netlist representation of the design. Based on the safety goals (that may be just the output ports) and the safety mechanisms, a formal method is then used to compute and examine the COIs. Safe and unreachable faults are reported. Fault injection with FSEC is done using the Mentor Questa Formal Verification tool. Before running fault injection, users can control the locations and the types of faults to be injected using the fault modeling and constraints. In addition, users can partition the design or configure the tool to perform the injection runs on a multi-core server farm environment.

Figure 4: Fault pruning and injection environment (Mentor)

Figure 4: Fault pruning and injection environment (Mentor)

Formal fault pruning - Case 1 results

The design is a float point unit with approximately 530K gates. The goal is to identify the safe faults in the design. These faults are outside the COI or cannot be propagated to functional outputs regardless of input stimuli. The original fault list contains approximately 32K faults. After compilation and set up, 868 of them were identified as safe faults. Interestingly, due to the mode of operation, 690 of them cannot be propagated and will not affect the outputs of the design.

Formal fault pruning - Case 2 results

The design is a memory management unit with approximately 1.3M gates. The goal is to identify faults that can propagate to internal status registers. These registers are checked by safety mechanisms at a higher level. Hence, these detectable faults can be considered safe. In the design, there are 71 status registers and 1524 faults in the targeted fault list. After compilation and set up, 720 of them were identified as safe faults within an hour.

Table 1. Formal fault pruning results for a float point unit and a memory management unit (Mentor).

Table 1. Formal fault pruning results for a float point unit and a memory management unit (Mentor).

Formal fault injection – Case 1a/1b results

The design is a clock controller block with a safety mechanism implemented with triple modular redundancy (TMR). TMR is an expensive, but robust, on-the-fly safety mechanism. For a quick preliminary test, faults were allowed to be injected to all the registers in the design (Case 1a). As all the registers are in the COI of the safety mechanisms, there is no surprise. Formal fault injection verifies that all the injected faults will be caught by the safety mechanisms. Next, faults were allowed to be injected to all the nodes (registers, gates, and wires) in the design - there are significantly more faults (Case 1b). Again, formal fault injection verifies that all the injected single point faults will be guarded by the safety mechanisms.

Formal Fault Injection - Case 2a/2b results

The design is a bridge controller that consists of the clock controller block from Case 1. In this case, formal fault injection was able to inject and propagate some faults to the output ports of the design. Two types of faulty scenarios were observed:

  • Single point faults that were not protected by any safety mechanism

Residual faults that were protected by safety mechanisms; however, the safety mechanisms did not detect the error conditions correctly.

Table 2. Formal fault injection results for a clock controller and a bridge controller

Table 2. Formal fault injection results for a clock controller and a bridge controller (Mentor).

Figure 5 shows a scenario of Case 2 where the injected fault was missed by the safety mechanism and violated the safety goal. A random fault was injected shortly after reset removal. This fault had caused multiple errors in the design, and they reached the outputs of the design after five clock cycles. Unfortunately, the safety mechanism was not able to detect these errors. As a result, the fault has caused
a violation of the safety goal.

Figure 5: Waveform showing how faults are propagated to outside safety mechanism

Figure 5: Waveform showing how faults are propagated to outside safety mechanism (Mentor).

About the authors

Ping Yeung is a Verification Technologist at Mentor, a Siemens Business, Doug Smith is a Verfication Consultant at Mentor and Abdelouahab Ayari is a Euro App Engineer - Digital Design & Verification Solutions at Mentor.


]]> 0
Power analysis isn’t just for battery-operated products Mon, 18 Jun 2018 09:59:30 +0000 Market demands for increasing functionality in mobile platforms and the explosion in IoT devices have put immense pressure on chip design groups to lower power consumption.

Devices that run on batteries or other standalone energy sources are not alone in requiring power analysis solutions. Engineers developing semiconductors for systems that run off external power –– i.e., though that are plugged in –– also need to understand power consumption accurately, especially consumption under the chip’s realistic operating conditions. For example, these designers need to ensure that their power distribution networks and chip packaging meet the peak-power demands of the circuitry.

If they underestimate the peak power consumption in the device, they likely will under-design the power distribution network. This will result in excessive IR-drop in the power grid. And in a worst case scenario, excessive IR-drop reduces noise margins that can cause a device to fail under high-loading conditions. Other negative effects of excessive IR drop include reduced device reliability and diminished performance (e.g., slower switching speeds).

Conversely, overestimating peak power can lead to over-design of the power grids, resulting in much higher cost due to increases in the chip area and number of metal layers to supply the power and ground planes.

Similar to the power grid, design considerations for the chip packaging have to be considered. Under peak power loads, a significant voltage drop could be due to the inductance at the package pins. This voltage drop (di/dt-drop) results is similar negative affects with respect to noise margins and device performance.

High-performance systems containing advanced microprocessors, graphics and networking chips are particularly vulnerable to these affects.

Peak power after P&R

Until now, the solutions that allow these effects to be analyzed and understood have only become available in the design cycle, usually after place and route. Fixing power distribution problems this late in the design is both costly and time consuming. Care must be taken to ensure the proper conditions are analyzed to understand worst-case scenarios.

Moreover, the speed and capacity of current solutions have greatly limited the size of the design and the length of simulation scenarios for analysis available to the user. Typically, IR drop analysis can only been done for a few vectors (cycles) due to speed and capacity limitations. Without analyzing realistic scenarios under real hardware and software loads, peak-power conditions are easily missed leading to under-design of power distribution and packaging solutions.

Designers require a methodology to find peak-power situations under realistic loads and apply these peak conditions to their IR drop and di/dt-drop analysis. Otherwise, they face a severe risk of failing to meet their design’s speed, reliability and operability goals.

Fortunately, the landscape is starting to change. One promising methodology provides accurate dynamic peak-power analysis and characterization under realistic operating scenarios. Engineers can now confidently assess peak-power issues in their designs and easily correct them to address power distribution and packaging needs. Figure 1 gives some flavor of the way ahead – the challenges we face mean it is sure to attract more than its fair share of followers.

FIgure 1. A new methodology provides accurate dynamic peak-power analysis and characterization under realistic operating scenarios for engineers to better understand and fix peak power issues

FIgure 1. A new methodology provides accurate dynamic peak-power analysis and characterization under realistic operating scenarios so engineers can better understand and fix peak power issues

]]> 0
How emulation’s SoC and SoS advantages begin with transaction-based co-modeling Mon, 11 Jun 2018 09:46:48 +0000 System design companies are increasingly turning to emulators as the only verification platform with the capacity and performance to validate that their system-on-chip (SoC) and system-of-systems (SoS) designs function as intended. On an emulator, this complex SoC and SoS verification happens with the help of an advanced capability called transaction-based co-modeling.

It wasn’t always this way. It wasn’t until the predominant emulation use-model began evolving from in-circuit emulation (ICE) to virtual emulation that it became necessary to use co-modeling technology to perform HW/SW co-verification and debug while maintaining high emulation performance.

Transaction based co-modeling article - block-level diagram

Figure 1. A typical solution block-level diagram (Mentor)

For today’s emulators to be viable, they must have wide solutions support, fast throughput, and advanced co-modeling capabilities to meet new and emerging application demands for complex designs.

Thus, demand for co-model channel versatility and performance has increased as new virtual applications and test platforms emerged for new emulation targets.

Understanding co-modeling technology, its impact on verification and validation, and how best to make trade-offs should be a critical aim for anyone selecting and deploying emulation co-modeling resources. Let’s look at how emulation co-modeling is architected to meet the needs of advanced verification and validation across a wide range of vertical solutions. Those solutions include networking, storage, multimedia, mobile, automotive, wireless, and cellular.

Co-modeling is an untimed, transaction-level interface to the emulator expressed as function calls defined by the Accellera Standard Co-Emulation Modeling Interface (SCE-MI). Co-modeling supports dynamic design configuration, waveform and debug/analysis data streaming and conveying IO data from virtual devices and transactors to and from the design/testbench during emulation. Co-modeling also enhances a wide range of significant test capabilities in the areas of functionality, performance, debug, power-performance analysis, design-for-test (DFT), safety and security of designs, coverage closure, software co-verification and system interoperability.

The co-model channels for Mentor’s Veloce emulator are high-speed, low latency, high-bandwidth channels that follow the same producer-consumer models employed by any IO subsystem in a computer architecture – but with one major difference. Unlike most IO designs, co-model channels are not highly tuned at the expense of other traffic types. For example, a streaming interface and a message-based interface have very different requirements for throughput and latency tradeoffs. In fact, they can work against each other depending on the protocol. Therefore, an architecture that is flexible and tunable to each vertical market requirement should be deployed to meet the wide and demanding array of IO requirements.

More on co-modeling

For more information about transaction-based co-modeling and other attributes of a high-performance emulation platform, download The Veloce Strato Platform: Unique Core Components Create High-Value Advantages.

]]> 0
Layout-database file control: the missing link Thu, 31 May 2018 04:40:05 +0000 GDSII to MEBES to OASIS...

As exchanges of layout descriptions between teams involved in IC development and production increase in rate, value and size, so do the needs for control and reliability. Regular integrity controls (e.g., cyclic redundancy checks, cryptographic hash functions and error correcting codes) can be used but their limits quickly become apparent.

On many projects, there are two teams simultaneously manipulating the same layout, but each uses a different representation based on one of the proprietary, standard or semi-standard formats (e.g., GDSII, MEBES, OASIS and OASIS.MASK). Comparing and identifying differences between the two resulting databases is a complex and time-consuming process. It also requires that one of the parties possesses both files, yet for security reasons, concerns will frequently arise over transmitting “unnecessary” data to a partner.

A standard traceability process usually starts with the creation of a unique signature that accompanies the transfer of a product, and thus guarantees its authenticity to the recipient. A good traceability system must be reliable, secure and independent of any specific implementation technology. Rather, it must ensure interoperability between heterogeneous environments. And, it must not risk compromising confidential or proprietary information.

State of the art

There is not much on the market today to address these needs. Legacy formats, such as GDSII or MEBES, offer no integrity control. Newer formats, such as OASIS, include options like a cyclic redundancy check (CRC) checksum, but this still has several drawbacks:

  • The use of the CRC algorithm turns reading and writing into a linear operation –– bytes of the file have to be read in order –– and this disables optimizations based on parallelism for large files.
  • The CRC algorithm is an error-detection code. It does not guarantee security guarantee. By that measure, it is easy to create two files with different content, but the same CRC checksum.
  • Since the checksum is not mandatory, many writers omit it. And some readers do not bother to check it.

Given all this, it is tempting to use other tools that are consdiered secure and efficient. A good example would be those that generate cryptographic checksums, such as the Secure Hash Algorithm (SHA) or the deprecated Message Digest 5 (MD5). Like the CRC checksum, they are byte-dependent, which means that a single bit change in the file causes a dramatic change in the checksum. Normally, this would be considered a useful feature. However for layout databases, it is a drawback: What needs to be secured and traced is not bytes in the file, but the actual design or geometric description.

The only way to check that two files implement the same geometric design is to run an exclusive OR (XOR) operation on them This is a complex and lengthy operation that requires expensive tools. Unlike a checksum, it requires the availability of both the original file and that to which it is being compared. Users cannot easily know if the designs are the same, though that is what they want to do.

Specific constraints for layout file control

In most cases, checking and validating the contents of a file is important, not the container by itself. This is mandatory in the EDA process flow because of the various ways that exist to describe the same thing. In addition to generic considerations for the file signature, microelectronics has specific needs and constraints.

On the design side, most layout files use the GDSII format, the de facto standard. Some companies are switching to OASIS, but this transition is taking place gradually because not all EDA tools yet support the OASIS format. So for the time being, the back-end flows need to manage both GDSII and OASIS files with some translation at defined steps.

Users are reluctant to switch due to a lack of quick and easy ways to verify that a layout description in OASIS is the same as in GDSII. This makes a format-independent signature necessary. Signing a layout described in GDSII and signing the same layout described in OASIS must yield the same result, as it must for other formats such as OASIS.MASK or MEBES.

Most chip layout formats offer a great deal of leeway in terms of how the structure is described. In GDSII, for example, the order of cells described in a file is totally arbitrary.

Even if lower-level polygon descriptions are identical and the hierarchies are identical, it is impossible to compare two files by using a file-level signature. Reading a GDSII file within an EDA tool and rewriting it without modifications breaks the signature, given that the signature also contains the creation date and other meta-data. This makes any checksum unusable. The problem is worse with OASIS because the format offers a number of different ways to save the same data. These include strict mode, by reference or by name. These non-geometric intricacies should not impact the signature.

The main geometric issue is that aspect of EDA tools and file formats that permits different ways being used to describe the same thing. For example a ‘wire’, can be drawn as a ‘path’ - that is as a succession of abutting segments with a given width or as a complex polygon that can be described as an assembly of basic trapezoids.

Figure 1. Different EDA tools and file formats allow different ways to describe the same geometry (XYALIS)

Figure 1. Different EDA tools and file formats allow different ways to describe the same geometry (XYALIS)

Additionally, no constraint applies to the description of a polygon. It is a list of edges starting from one arbitrary point and turning either clockwise or counter clockwise, making the classical signature of a simple polygon description irrelevant. If the user does not care about polygon overlaps within the same layer, a meaningful signature must only consider the final envelope of the whole polygon with clear specifications on the vertex order.

When validating file integrity, a global checksum or signature is enough. But simple ‘go/no go’ information is useless in most cases on a huge layout file and it is important to know the number of differences as well as their position. For example, when comparing two text files, using “diff” gives more information than using a simple checksum because users can know how many lines there are and which ones contain differences.

In the same way, a layout file can be split into windows to compare two files. Comparisons take place window by window. This makes it possible to run more detailed XOR operations on reported windows, if needed.

The same mechanism can be used for the signature. Instead of reporting a single checksum for the entire layout, the signature can be a list of checksums –– one for each window. The signature becomes a file containing the window information and the checksum for each window.

Chip layout description files generally contain multiple layers. When a rework is required, it is generally achieved using a metal fix where only a few interconnection layers are changed. This reduces mask cost and accelerates delivery time by using wafers that have already been processed up to the interconnection layers. Mask data preparation teams need to be able to quickly check that all front-end layers are strictly identical while only expected metal layers have been changed. The signature of a layout database must be split by layers.

After reviewing the specific needs of chip layout files, we can see that a list of constraints that will underpin an efficient signature strategy is required. A signature file must validate content, not the container, and must also be:

  • Independent of the file format and description strategy;
  • Based on the final geometric envelope of polygons; and
  • Split in windows and layers.

The solution: A new signature

Having identified the need for a list of constraints, how should we go about building it and, from that foundation, a new signature?

The standard must lead to a unique and universal way to sign any layout database. It will not be a single checksum, but a file containing checksums for each window and each layer, and contain additional information such as window size and column/row number.

Layout file format Figure 2. A new signature standard will offer a new way to sign any layout database.

Figure 2. A new signature standard will offer a new way to sign any layout database (Xyalis).

As noted above, two identical layouts must give the same signature, despite the fact that they are described in different formats. To achieve this, it is mandatory to define a unique way to describe the geometries –– or more accurately, the geometrical envelope.

To guarantee its uniqueness, this description must meet multiple constraints. It must define an order for the vertices. It must define precisely which points are part of the description. Then, using standard algorithms such as SHA256 or SHA3, the uniqueness of the signature of the description can be guaranteed.

The signature file will be small compared to the original database and available for use as a reference at any time. Comparison between two layout databases will not be made by a full XOR, but by comparison of the signatures of the two databases. It will also be possible to compare a database with a previously computed signature file.

With such a signature based on the geometrical description, any physical change of the layout will be quickly detected, whether that change is intentional or accidental.

For example, as soon as a rework –– i.e. a metal fix –– is requested on a design, the new database is compared to the original to check that only the expected layers have been changed, and usually only in a few localized areas.

Until now, this has required recovering the original database archive. Using the signature file as a reference simplifies the process because it is small enough to be accessible. Yet at the same time, it contains enough information to guarantee the similarity between some layers and highlight the windows where differences have been found on other layers.

Thus armed, the design team can then still elect to run a detailed XOR but need only focus on those windows that contain differences.

Since the signature only depends on the physical layout, it is easy to check that a simple read/write with a tool or a format conversion such as GDSII to OASIS has not led to any geometric modification. And, because the signature will never change unless a geometrical transformation has been introduced in the layout, this technique is well suited to validating new tools and complex flows, such as hierarchy flattening or layer splitting.


The proposed signature is tailor-made for layout database file integrity control. It is secure, reliable and, most important, focuses on what really matters –– the geometric description. It enables the quick and easy comparison of huge databases, yet is based on a small signature file that does not need to contain sensitive information and can therefore safely be sent to any partner.

The adoption of such a signature will have a positive impact on the semiconductor industry if enabled as an open standard. A fair-use policy on the patent that covers the signature scheme can be enforced. As an open standard, the signature becomes even more useful. It can be embedded directly into OASIS or OASIS.MASK databases as a special property, for example.

About the authors

Dr. Philippe Morey-Chaisemartin is Chief Technology Officer at XYALIS. After managing different design projects at STmicroelectronics, he set up its mask data preparation team for advanced 300mm foundry projects. He received a Ph.D. and a Master of Science degree in Microelectronics and Computer Science from the Université Joseph Fourier in Grenoble, France, and teaches at the Institut National Polytechnique de Grenoble.

Frederic Brault is a Senior Engineer for EDA software at XYALIS. He previously worked on compiler optimization at INRIA and high-performance, massively multicore chips at Kalray. He holds a Masters degree in Advanced Computing from Imperial College London and a Bachelor of Science degree from Supelec in France. He also teaches at the Institut National Polytechnique de Grenoble.

]]> 0
The impact of AI on autonomous vehicles Thu, 24 May 2018 22:02:41 +0000 The automotive industry is facing a decade of rapid change as vehicles become more connected, new propulsion systems such as electric motors reach the mainstream, and the level of vehicle autonomy rises.

Many car makers have already responded by announcing pilot projects in autonomous driving. These vehicles will need new cameras, radar and possibly LIDAR modules, plus processors and sensor-fusion engine control units – as well as new algorithms, testing, validation, and certification strategies – to enable true autonomy. According to market analysts IHS Market, this will drive a 6% compound annual growth rate in the value of the electronic systems in cars from 2016 through to 2023, when it will reach $1650.

Part of this growth will be driven by a steady increase in the level of autonomy that vehicles achieve over the next 10 to 15 years.

Figure 1, below, shows the number and value of sensors that leading car makers believe will be necessary to achieve three different levels of autonomy. Part of the reason for this cost increase is due to the need for redundant sensors at higher levels of autonomy.

Sensor numbers increase as vehicles become more autonomous (Source: IHS Markit)

Figure 1 Sensor numbers increase as vehicles become more autonomous (Source: IHS Markit)

The rise of artificial intelligence and deep learning in automotive design

One of the key enablers of vehicle autonomy will be the application of deep-learning algorithms implemented on multi-layer convolutional neural networks (CNNs). These algorithms show great promise in the kind of object recognition, segmentation, and classification tasks necessary for vehicle autonomy.

AI will enable the detection and recognition of multiple objects in a scene, improving a vehicle’s situational awareness. It can also perform semantic analysis of the area surrounding a vehicle. AI and machine learning will also be used in human/machine interfaces, enabling speech recognition, gesture recognition, eye tracking, and virtual assistance.

Before vehicles reach full autonomy, driver assistance systems will be able to monitor drivers for drowsiness, by recognizing irregular facial or eye movements, and erratic driver behavior. Deep-learning algorithms will analyze real-time data about both the car, such as how it is being steered, and the driver, such as heart rate, to detect behavior that deviates from expected patterns. Autonomous cars using sensor systems trained with deep learning will be able to detect other cars on the road, possible obstructions and people on the street. They will differentiate between various types of vehicles, and different vehicle behaviors, such as a parked car from one that is just pulling out into traffic.

Most importantly, autonomous vehicles driven by deep-learning algorithms will have to demonstrate deterministic latency, so that the delay between the time at which a sensor input is captured and the time the car’s actuators respond to it is known. Deep learning will have to be implemented within a vehicle’s embedded systems to achieve the low latency necessary for real-time safety.

IHS Markit projects that AI systems will be used to enable advanced driver assistance systems (ADAS) and infotainment applications in approximately 100 million new vehicles worldwide by 2023, up from fewer than 20 million in 2015.

Applying deep learning to object detection

Neural-network techniques are not new, but the availability of large amounts of computing power have made their use much more practical. The use of neural networks is now broken down into two phases: a training phase during which the network is tuned to recognize objects; and an inference phase during which the trained network is exposed to new images and makes a judgement on whether or not it recognizes anything within them. Training is computationally intensive, and is often carried out on server farms. Inference is done in the field, for example in a vehicle’s ADAS system, and so the processors running the algorithm have to achieve high performance at low power.

Training and deployment of a CNN (Source: Synopsys)

Figure 2 Training and deployment of a CNN (Source: Synopsys)

The impact of deep learning on design and automotive bills of materials

Some of the key metrics in implementing AI in the automotive sector are TeraMAC/s, latency and power consumption. A TeraMAC/s is a measure of computing performance, equivalent to one thousand billion multiply-accumulate operations per second. Latency is the time from a sensor input to an actuator output, and needs to be on the order of milliseconds in automotive systems, to match vehicle speeds. Power consumption must be low, since automotive embedded systems have limited power budgets: automotive designers may soon demand 50-100 TeraMAC/s computing performance for less than 10W. These issues will be addressed by moving to next-generation processes, and by using optimized neural-network processors.

Other key aspects of implementing AI and deep learning in automotive applications will include providing fast wireless broadband connections to access cloud-based services, and guaranteeing functional safety through the implementation of ISO 26262 strategies.

Putting deep learning to work

As Figure 2 shows, the output of neural-network training is usually a set of 32bit floating-point weights. For embedded systems, the precision with which these weights are expressed should be reduced to the lowest number of bits that will maintain recognition accuracy, to simplify the inference hardware and so reduce power consumption. This is also important as the resolution, frame rates and number of cameras used on vehicles increases – which will increase the amount of computation involved in object recognition.

Some of the challenges in applying deep learning are how to acquire a training data set, how to train the system on that data set, and efficient implementation of the resultant inference algorithms.

A neural network’s output during inference is a probability that what it is seeing is what you trained it on. The next step is to work out how to make decisions about that information. The simplest case for recognizing a pedestrian is now understood, and the next challenge is to work out how that pedestrian is moving. A lot of companies are working on being able to make these higher-level neural network decisions.

To support these companies, Synopsys has developed embedded-vision processor semiconductor IP, the EV6x family, that combines standard scalar processors and 32bit vector DSPs into a vision CPU, and is tightly integrated with a CNN engine. The vector processor is highly configurable, can support many vision algorithms, and achieves up to 64 MAC/cycle per vision CPU. Each EV6x can be configured with up to four vision CPUs. The programmable CNN engine is scalable from 880 to 3,520 MACs and can deliver up to 4.5 TMAC/s performance when implemented in a 16nm process.

The EV6x embedded vision processor architecture (Source: Synopsys)

Figure 3 The EV6x embedded vision processor architecture (Source: Synopsys)

The CNN engine supports standard CNN graphs such as AlexNet, GoogLeNet, ResNet, SqueezeNet, and TinyYolo. The EV6x embedded vision processor IP is supported by a standards-based toolset that includes OpenCV libraries, OpenVX framework, OpenCL C compiler, and a C/C++ compiler. There is also a CNN mapping tool to efficiently map a trained graph to the available CNN hardware. The tool offers support for portioning the graph in different ways, for example to pipeline sequential images through a single CNN engine, or to spread a single graph across multiple CNN engines to reduce latency.

AI and functional safety

AI and neural networks hold great promise in enabling advanced driver assistance and eventually vehicle autonomy. In practice, however, the automotive industry is increasingly demanding that vehicle systems function correctly to avoid hazardous situations, and can be shown to be capable of detecting and managing faults. These requirements are governed by the ISO 26262 functional safety standard, and the Automotive Safety Integrity Levels (ASIL) it defines.

Defining the various levels of Automotive Safety Integrity Level (Source: Synopsys)

Figure 4 Defining the various levels of Automotive Safety Integrity Level (Source: Synopsys)

Deep-learning systems will need to meet the requirements of ISO 26262. It may be that for infotainment systems, it will be straightforward to achieve ASIL level B. But as we move towards the more safety-critical levels of ASIL C and D, designers will need to add redundancy to their systems. They’ll also need to develop policies, processes and documentation strategies that meet the rigorous requirements of ISO 26262.

Synopsys offers a broad portfolio of ASIL B and ASIL D ready IP, and has extensive experience in meeting the requirements of ISO 26262. It can advise on developing a safety culture, building a verification plan, and undertaking failure-mode effect and diagnostic analysis assessments. Synopsys also has a collaboration with SGS-TUV, which can speed up SoC safety assessment and certifications.


AI and deep learning are powerful techniques for enabling ADAS and greater autonomy in vehicles. Although these techniques may be freely applicable to non-safety-critical aspects of a vehicle, there are challenges involved to meet the functional safety requirements for the safety-critical levels of ISO 26262. Selecting proven IP, developing a safety culture, creating rigorous processes and policies, and employing safety managers will all help meet the safety requirements. Extensive simulation, validation, and testing, with a structured verification plan, will also be necessary to meet the standard’s requirements.

Further information

Get more detail on this topic by downloading the White Paper, here.


Gordon Cooper is embedded vision product marketing manager at Synopsys

Company info

Synopsys Corporate Headquarters
690 East Middlefield Road
Mountain View, CA 94043
(650) 584-5000
(800) 541-7737

Sign up for more

If this was useful to you, why not make sure you're getting our regular digests of Tech Design Forum's technical content? Register and receive our newsletter free.


]]> 0