Tech Design Forum Technical information for electronics design Fri, 20 Jul 2018 08:52:50 -0700 en-US hourly 1 Why the time has come for cloud-based emulation Fri, 20 Jul 2018 08:52:50 +0000 Mentor, a Siemens business, recently launched cloud-based access to its Veloce emulation platform via Amazon Web Services. The move aims at extending emulation’s use across a much broader range of clients, all tackling increasingly complex designs.

Gate-counts for SoCs implemented at the latest process geometries can reach into the billions. Software simulation has consequently become insufficient for system-level verification. At the same time, hardware emulation has in many cases become a mandatory component within the toolbox used by hardware and software verification teams.

For many companies however, one persistent problem has been that they have found the combined cost of licensing, housing and operating/servicing an emulator prohibitive.

By providing access on the cloud, Mentor is targeting companies of all sizes and budgets, offering 24/7 access to the Veloce platform from anywhere in the world. Companies can now use emulation only when they need it without incurring the expenses related to housing an emulator on site.

Using Veloce securely outside your own lab

The concept of placing various EDA tools and services in the cloud is hardly a new one. But yet another persistent concern has been, of course, security. What has changed in the decade since launches were first announced, but largely floundered on fears about hackers stealing design data and other corporate crown jewels.

One factor has simply been the increasing reach of the cloud. Most companies and even governments have become more familiar and comfortable with accessing critical business and technology software over the could as a result.

Nevertheless, security was a vital factor in Mentor’s decision to work though AWS. It has firewalls built into Amazon VPC, and the web application firewall capabilities in AWS WAF let customers create private networks and control access to their own instances and applications. Meanwhile AWS offer encryption in transit with TLS across all of its services.

“Offering access to Veloce on the cloud was really driven by customer demand,” said Jean-Marie Brunet, Senior Director of Marketing for Mentor’s Emulation Division. “They care about capacity, uptime, latency and use-models. Accessing tools on the cloud has become commonplace in most other aspects of enterprise and so security is not a hindrance. That said, we encourage potential customers to talk to AWS to see for themselves.”

Mentor piloted cloud-based emulation before its formal launch. One of the partners on its pilot was Softnautics. The company provides IP-based VLSI and embedded software solutions for artificial intelligence, the Internet of Things, connectivity, storage and security markets.

“Our commitment to customers is to provide a wide range of IP-based solutions to their precise requirements, on time and within budget,” said Amit Vashi, COO of Softnautics. “We are excited to have our VLSI IP-based solution integrated on the Veloce platform, as well as to be part of the effort to validate the use model of emulation software like (Veloce) on the AWS cloud.”


]]> 0
Embedded FPGAs start to take hold in SoC Mon, 16 Jul 2018 08:56:15 +0000 The embedded field-programmable gate array (eFPGA) is beginning to find a market, with communications leading the way but machine learning likely to drive broader adoption.

The eFPGA has had a checkered history since the concept first appeared in the 1990s as concerns grew over the cost of mask sets for processes as they approached the 180nm process node. But several attempts to kickstart technology ran into issues of cost and density. The need to maintain programmability, particularly in the interconnect, meant many SoC designers opted for other ways to incorporate flexibility in their products.

The market seems to be changing as users look beyond die cost to the need to support fast-moving standards and novel algorithms that, in some cases, mutate from month to month. Moore’s Law might have slowed down considerably for silicon but a similar exponential has emerged in machine learning, with the number of papers on deep-learning and associated algorithms appearing on the academic site ArXiV more than doubling every two years, according to a presentation by Jeff Dean of Google at the ScaledML conference earlier in the year.

Reaction to slowdown

The slowdown in Moore’s Law and the increase in interest in eFPGAs has a connection. With clock speeds having stalled at around 3GHz for close to 15 years and now processor density being limited by power and cost/performance tradeoffs, attention is shifting in new markets such as machine learning to custom acceleration. Designers have the option to hardwire them, build microcode-programmable custom designs, introduce eFPGA cores to allow the accelerators to mutate as algorithms change, or do all three.

Steve Mensor, vice president of marketing at Achronix, says the company has changed its branding to present itself as a “data acceleration” company in recognition of the shift in attitudes. Rather than focus on the cost savings of not having to deal with mask changes if a design needs to change, Mensor says acceleration looks to be a more convincing pitch for a variety of markets that include machine learning. The current major market for Achronix is in 5G communications, where a focus on low latency makes software processing impractical but also where standards remain fluid.

Flex Logic has also seen adoption in wireless basestations. “Historically, it’s been one of the few high-volume applications for [discrete] FPGAs. But they have problems: they can’t get data in and out of the FPGA fast enough. They also have the issue of the protocols becoming more complicated and being implemented in phases. [With eFPGA] they can tweak things if they make mistakes and still get to market quickly,” says Flex Logic CEO Geoff Tate.

Beyond communications

A key set of early adopters for Flex Logic is in the military and aerospace sector. The company has a deal with Sandia Labs to put the eFPGA cores into radiation-hardened logic. The other key market for Flex Logic at the moment is networking and communications, particularly for switching and interface cards for data-center servers.

As well as 5G, Mensor says cryptocurrency mining has appeared as a key market for Achronix, at least in the short term. Longer term, Mensor sees the data-center storage sector as being important as more processing gets pushed out into the peripherals, particularly as new architectures develop around the new generation of persistent, “storage class” memories such as phase-change RAM, resistive RAM, and the Intel/Micron 3D Xpoint.

Mensor says the revenue is starting to flow for eFPGA design wins: “A year ago we announced we were going to a hundred million dollars in revenue. Eighty per cent of that is discrete; the other 20 per cent of that is embedded-FPGA technology. We have multiple customers and a number of them have gone full cycle, through device bringup, full stress testing and shipped to end customers.

“The interesting the about this is that, of the design wins we’ve got [and taped out], every single one has already repeated,” Mensor claims.

Tate says customers often take the adoption of eFPGA step by step. “Architecturally, when they first use us, they take a baby step: something simple with high confidence it will work. Then they do some more and then some more,” he says, pointing to aerospace as a key sector for Flex Logic where this has been a common pattern.

Learning opportunity

The longer-term opportunity is in machine learning, another sector where fluid algorithms, a lack of standards, and need for processing speed have come together in a way that may drive customers to adopt eFPGA, possibly alongside more coarse-grain reprogrammable architectures. Mensor says the engagements with machine-learning customers tend to be strategic: “All the conversations involve CEOs. The use is more integral and architectural with applications that are very heavy in machine-learning functionality.”

Both Achronix and Flex Logic have developed arithmetic blocks that are tuned for the kind of processing used in today’s deep-learning pipelines, primarily focusing on 8bit multiply-adds. They expect more variants to appear as algorithms needs become clearer. “We expect that customers who engage will tell us ‘here’s what we really want to do’,” Tate says.

At the implementation level Achronix and Flex Logic have taken different paths. Achronix favors tuning its fabric for a small number of processes, such as TSMC’s 16nm FF+. “There are two modes of integration. One has registers around the outside and one with lower latency where you need to share timing information [during signoff],” says Mensor.

Flex Logic has favored an approach that uses the foundry standard cells, with a small number of customized cells, to allow easier porting to a variety of processes. These include the 180nm rad-hard process from Sandia and the upcoming 7nm foundry processes that Tate expects networking and communications customers to adopt. “We end up twice the size of the Achronix in terms of the cell size but our cofounder came up with a more efficient interconnect and we end up using fewer metal layers and still achieve utilization of 90 per cent or more,” Tate claims.

]]> 0
With RF, power and MRAM, FD-SOI finds its role Thu, 12 Jul 2018 06:35:06 +0000 GlobalFoundries has claimed it has more than 50 design wins for its 22FDX process, which is based on fully depleted silicon on insulator (FD-SOI) wafers, that are expected to amount to at least $2bn when they go into production. After a long gestation, FD-SOI looks to be carving out several niches alongside the mainstream finFET process technologies.

Ahead of the Design Automation Conference (DAC) in San Francisco in late June, Arm said the company has developed a compiler for Samsung’s embedded magnetoresistive memory (eMRAM) module, which is available on the foundry’s 28nm FD-SOI process (28FDS) as part of a portfolio of IP for the technology.

Kelvin Low, vice president of marketing for Arm’s physical design group, said the company opted to build a compiler rather than offer fixed-size instances because of the variety of applications for the eMRAM technology. “We are focusing mostly on flash replacement at the moment,” he said but the compiler can target uses for replacing electrically erasable memory blocks and in data buffers and scratchpads where long retention time is not the chief requirement. Low said longer term, last-level cache replacement will become more important and is an area of active research among the FD-SOI foundries.

The compiler supports options to improve temperature robustness with the aim of supporting products, such as those in automotive, that will need retention at 150°C and error correction.

Body bias power cuts

Low noted that a key advantage for FD-SOI is its support for power tuning through real-time changes to the voltage applied to the transistor body. “FD-SOI without body bias doesn’t really make sense. With body bias, FD-SOI provides a lot more flexibility for threshold voltage and extend the range of operation,” he said.

Design for body bias increases the number of timings-analysis corners during signoff, with multiple combinations of process variability, voltage and temperature, Low said. “The number of PVT points we have to provide has more than doubled. Some designs have two hundred or so PVT points. But the design infrastructure is available: Cadence and Synopsys have flows that support body biasing on FD-SOI.”

In a panel at DAC organized by Synopsys in collaboration with GlobalFoundries, Kripa Venkatachalam, director of product management for the foundry said the “ability to do back-gate bias drives extremely powerful optimizations. It also drives novel circuit topologies thanks to the ability to alter Vt in silicon”.

In a test design based on an Arm Cortex-A53, Venkatachalam said compared with the older 28nm high-k metal-gate bulk-silicon process, the migration resulted in an 43 per cent power saving, with 11 per cent coming from the migration and the rest from body bias tuning.

Beyond power optimization

In May, VLSI Research CEO Dan Hutcheson warned attendees to the SOI Silicon Valley Symposium body biasing on FD-SOI risks being oversold, with project managers wary of the increased complexity of design.

Venkatachalam insisted the design process with body biasing is not difficult though the idea of “back-gate bias gives the impression of being complex”. She added: “We are seeing customers start from scratch with FD-SOI and take it to tapeout in less than six months.”

Jacob Avidan, vice president of R&D in Synopsys’ design group, explained how the company has implemented direct support for FD-SOI timing analysis in its PrimeTime tool and in its UPF flow. He insisted, thanks to application of interpolation techniques, the use of body biasing in combination with PrimeTime does not add extra corners. “In signoff with PrimeTime, you have to add a bias voltage but that’s the only change to the signoff environment. That lets you look at the tradeoffs between performance and power using different levels of bias scaling. We’ve done a lot of SPICE simulations and we are within 3 per cent in PrimeTime, which is inside the criteria for foundry signoff. You can interpolate any point along the line, which saves you having to generate a library with all these [different bias] points.”

Potentially more important, according to Venkatachalam, is the ability of FD-SOI to integrate RF and power circuitry based on LD-MOS transistors. “You can bring the PMIC into your chip as well, bringing components together into a single solution to drive cost down,” she said, adding that the foundry is working with design house and IP provider Verisilicon on a reference design for a single-chip narrowband-IoT (NB-IoT) node.

Venkatachalam said a design can incorporate low-loss antenna switches and use FET stacking on FD-SOI to cut down losses in the core circuitry. “You also have lower leakage with SOI,” she added.

Wayne Dai, president and CEO of Verisilicon, said: “The sweet spot is integrating RF. When you have an IoT node that can only cost a few dollars, you can’t afford to have the PA [power amplifier] outside.”

FinFET and FD-SOI choices

Dai pointed to the availability of eMRAM on FD-SOI as another advantage, providing a memory that is less susceptible to soft errors than SRAM. “We don’t need it to replace finFET [processes]. It’s a different market.”

Venkatachalam argued: “FinFET is the choice if we want to drive high density. FD-SOI has a role to play if you are targeting very low power and highly integrated designs that need RF and PMICs onchip.”

Dai added: “If you have a design with a very large digital requirement and it’s on most of the time at high performance, then finFET makes sense. But if you need high performance only 20 per cent of the time, then I think FD-SOI is a better choice, even for cellphones. FinFET is better if you want 4G or better. But we might consider FD-SOI for the low end.” He pointed to ADAS Level 2 subsystems as being contenders for FD-SOI. Level 4 on the other hand would probably require the performance of finFETs.

Venkatachalam added that GlobalFoundries expects it 22FDX to be a long-lived node and will add further process extensions. She also indicated there is a continuing role for the 28nm version. “All the customers who are moving from 55nm and 40nm will find a sweet spot in 28nm. It’s the best single-contact process,” she said, referring to the requirement for double-patterning in smaller geometries.

“22nm is the last node for double-patterning and 12nm will be the last for triple patterning,” Dai claimed. He pointed to 12nm as being a strong contender for millimeter-wave designs as those designs could be difficult to implement on finFET processes.

Low said FD-SOI “makes a lot of sense” for highly integrated LIDAR transceivers for automotive systems. The result is that FD-SOI is beginning to find its role in the market away from the mainstream process evolution.

]]> 0
Leti and Soitec partner for wafer development Wed, 11 Jul 2018 06:33:51 +0000 Research institute Leti and Soitec have decided to team up to work on a new generation of engineered substrates, such as specialized silicon-on-insulator (SOI) wafers.

The deal includes the launch of a prototyping hub and pilot line intended to provide access for partners to early exploration and analysis. The Substrate Innovation Center is to be located on Leti’s campus and will be staffed by Leti and Soitec engineers. The main targets for the engineered substrates are 4G/5G connectivity, artificial intelligence, sensors and display, automotive, photonics, and edge computing.

“Leti and Soitec’s collaboration on SOI and differentiated materials, which extends back to Soitec’s launch in 1992, has produced innovative technologies that are vital to a wide range of consumer and industrial products and components,” said Emmanuel Sabonnadière, Leti CEO. “This new common hub at Leti’s campus marks the next step in this ongoing partnership. By jointly working with foundries, fabless, and system companies, we provide our partners with a strong edge for their future products."

]]> 0
Cloud makes hardware acceleration more accessible Thu, 05 Jul 2018 17:10:48 +0000 At this year’s Design Automation Conference (DAC), Cadence Design Systems and Mentor, a Siemens business, publicly announced they had put hardware emulators in the cloud to make it easier for customers to access accelerated verification. The moves may help promote the use of other forms of hardware acceleration dedicated to EDA tasks.

During a session at the conference to describe to users how its cloud service operates, Jean-Marie Brunet, marketing director of Mentor’s emulation business, said the company has done extensive planning for the service: “We worked with Amazon for over a year on this.”

Among the concerns were how long it would take to send design data to the emulator. ”If it takes a couple of days to send to the box and it take three hours to run, that’s not a good value proposition,” Brunet said, adding the ability to compile the RTL for emulation close to the emulator itself is important.

First experiments

Rajesh Shah, CEO of IP designer Softnautics, said the company was keen to experiment with cloud-based emulation and seized on the opportunity to test Mentor’s offering. “We would like to have design in the cloud and enable customers to build systems or subsystems in the the cloud.”

Shah said experiments that involved stimuli from a C testbench demonstrated that the data transfers could take place quickly, with about 3Gbyte of results and other data transferred back in about five minutes.

Brunet said: “It took us a while to put this together but it’s in place. It’s the same flow you are running today. Except you have no idea where the box is.”

In practice, there will be some effect from geographical location. Response times, Brunet said, “will be related to the amount of hardware in a geographical region. Go across a region you may have some degradation in latency”.

Although one of the applications of cloud-based emulation is to absorb peak demand towards the end of a project, the capacity available online will limit how much peak demand the service can absorb. Brunet said, at least in the early days of the service, Mentor would look to establish a more consistent baseline usage with customers and “enable peak usage once a baseline has been established”.

Security assurances

The rollout of services like emulation in the cloud are beginning to demonstrate that EDA users are becoming more comfortable with the idea of sending design data to third-party server farms. Mentor worked with Amazon Web Services (AWS) to try to demonstrate that data would be protected.

“We had a lot of questions from the field: is this secure? When you see Department of Defense and national security certificates: you can say ‘that is OK’,” Brunet claimed. “It’s very secure.”

David Pellerin, head of worldwide business development for high-performance computing at AWS, said: “You’ve got to have security. We work with third party auditors to demonstrate that. Large enterprises now understand that they can operate in a more secure manner than with legacy infrastructure.”

Although, Pellerin acknowledged the concerns over security, he said: “We’ve gone past that now.” What is happening now is that EDA users are beginning to see the end of the road for much of their internal infrastructure.”

Pellerin added: “The pattern we have seen in EDA is to similar to other computer-aided engineering areas. You have a dedicated data center with various servers of different vintages. It’s not really flexible. You can’t have different resources available during short bursts. The difference when we move to cloud is you can create that same environment but it’s now flexible and scalable. I can scale up and I can scale down. We have been seeing tremendous productivity in areas such as drug discovery and proteomics. And now in EDA.”

In an interview with TDF at DAC, Metrics Technologies president and CEO Doug Letcher, said he perceives the same shift in attitude among users to putting more design data into the cloud. “What we’re seeing is that, often, engineers in the field have this opinion ‘this is awesome but management won’t let me do it’. But the management people now see it as being strategic to IT.

“Companies have mandates to not build data centres on their own,” Letcher added. “Engineers in the field haven’t caught up with the change in attitude in management. One vice president at a relatively large semiconductor company said ‘we’ve moved our financial, customer support and legal data into the cloud. Am I really that worried about the RTL?”

Acceleration options

Having started with a software-based simulator that runs on cloud servers, Metrics is now beginning to look at offering hardware-based acceleration. Shortly before the conference, the company announced its intention to merge with Montana Systems, which is developing a simulation accelerator for SystemVerilog workloads. Letcher said he sees a major advantage to putting acceleration into the cloud as a service instead of selling the necessary hardware to customers. They will be able to access the accelerator by selecting a different option and then running the testbench as normal.

“Emulation takes some time to port. Our target is maybe five to twenty times faster than software-based simulation but it’s zero effort,” Letcher said. “And it takes away the idea that I have to buy a box upfront. Over the course of next year we will be putting that product together.”

At the other end of the convenience scale for logic verification is the deployment of field-programmable gate arrays (FPGAs) into the cloud through services such as F1 from AWS. In his keynote at DAC, UC Berkeley Professor David Patterson said he saw the availability of these cloud-based FPGAs as being part of a rapid prototyping flow for a new wave of designs. Though they have to be specifically compiled for an FPGA platform, as with existing virtual-prototyping boxes, the ability to rent the hardware for short periods of time could be instrumental in moving to more agile design techniques.

“It takes months for chips to come back from the fab. So how do you do iteration? You can use simulation at the C++ level but those are still pretty slow,” Patterson said. “The next step is FPGAs. For some people they don’t want the hassle of buying FPGAs and setting up a lab. But you don’t have to do that: FPGAs are in the cloud. You can rent cloud service. It’s a remarkable opportunity.”

]]> 0
Fusion improves timing say Synopsys users Tue, 03 Jul 2018 09:16:13 +0000 At its SNUG conference in March, Synopsys publicly unveiled its Fusion portfolio and data model, a way of combining many different tools to deal with nanometer design problems. Early-access customers talked about their experiences with the suite in a panel session at the Design Automation Conference (DAC) in San Francisco last week (June 25, 2018).

Synopsys introduced the Fusion technology to enable tools such as IC Compiler II (ICC II), Design Compiler Graphical (DCG), PrimeTime, StarRC, IC Validator, DFTMAX, SpyGlass, and Formality equivalence checking to share information about a design. The Fusion technology comes in a variety of forms: Design Fusion; ECO Fusion; Signoff Fusion; and Test Fusion, each aimed at different parts of the flow. ECO Fusion, for example, enables the rapid insertion of changes at the physical level by using optimizations at different layers. Implementation tools can perform logic restructuring to improve area and timing that can span blocks and test structures. According to Synopsys, the use of a common data model enables the same interpretation of rules and design intent throughout the flow, and is based on a shared massively parallel and machine-learning ready infrastructure.

Power and area targets

Kazuhiro Takahashi, senior principal engineer at Renesas, said the company has been working closely with Synopsys since 2000, becoming an early adopter of DCG and then ICC II. “Renesas was one of the first to use ICC II,” he said.

Takahashi said the traditional flow of synthesis to place and route and finally signoff is running out of steam – optimizations performed in isolation by each tool do not maximize performance or area. “We need tighter integration of synthesis and layout and we think we can do this with the Design Fusion flow. We expect better QoR [quality of results] especially at the chip level,” he said.

On one project, a 400MHz automotive microcontroller with stringent area and leakage power targets, Takahashi said work with the new type of flow achieved an area reduction of 8 per cent and a leakage reduction of 15 per cent compared to a traditional approach.

Sorin Dobre, senior director of technology at Qualcomm, said his company is working closely with Samsung to bring products based on a 7nm process to market and to migrate quickly to denser derivatives from next year. “We are moving fast to have high-volume production in 4nm and 5nm,” he said. “We are using Synopsys tools and flow in 7nm and on the previous technologies. Why Fusion? Our target is for improvement in the quality of design, a reduction in power and a reduction in the time to tapeout. We see Fusion as an enabling platform moving forward.”

Cutting slack

Dobre said evaluations of the most recent tools on block-level designs saw large reductions in negative slack and negative hold slack. “We are able to converge the design significantly faster using timing analysis that is physically aware. We are working with Synopsys to enable this not just for the most advanced technologies but other nodes, such as 11nm and 22nm.”

Qualcomm has also made use of the integration between ICC II and Ansys RedHawk. “To speed up physical design convergence we are looking to enable technologies for integration inside Fusion, to take into consideration the effects of IR drop on timing. When you go to lower technology nodes and the lower voltages they enable, designs become very sensitive to unexpected voltage drops.”

Saran Kumar Seethapathi, principal IC design engineer at Broadcom, described how one part of his company has moved from using RedHawk as a standalone tool to integrating it into the Fusion-based flow. The group is an Arm centre of excellence. “We deliver hard macros for the various system groups. We do these Arm cores in a semicustom fashion on timescales of three to eight months. We are doing these very high speed cores in a very tight schedule.”

Seethapathi said the team was keen to obtain identical IR-drop analysis results from the RedHawk tool in the new flow. “And they were,” he said. In use, the integrated use of the tools led to a 10 to 15 per cent improvement in static and dynamic IR drop.

High-performance design

Philip Steinke, senior manager of CAD and physical design at AMD also had high performance in mind in his company’s use of the Synopsys flow: “We believe high-performance computing is the heart of the next generation of computing. It’s what’s driving growth in our industry.”

AMD used DCG and ICC II on its Ryzen 7 project, a design with four CPUs and ten graphics cores and a total of 4.5bn transistors. “It had lots of physical reuse and a lot of congestion. With these kinds of challenges we are constantly looking for tool and methodology improvements to stay ahead.”

In recent years, AMD has applied POCV, multibit register banking and global-route layer binning to optimize the physical design. “Even with that, at the latest process nodes the challenges get harder. We need the tools to anticipate more,” Steinke said. “We started out looking at Fusion for design resynthesis. And we also brought in the Signoff Fusion.”

The synthesis transforms that become available in place and route through the Fusion flow help optimize elements like test logic that will be incorporated after conventional synthesis. “The logic restructuring algos can see all that,” Steinke said. “We tested this on some cutting-edge GPU blocks and saw an average 2.5 per cent area reduction and an average 18 per cent improvement in TNS.

“The third feature we are taking advantage of is PrimeTime delay calculation in Fusion on the final pass of routing, rather than having to feed that back in afterwards. It catches those final outlier paths that would normally be problematic. We saved about a week per tile.”

]]> 0
Tools suppliers back version 1.0 of portable-stimulus standard Mon, 02 Jul 2018 07:55:31 +0000 As the Design Automation Conference (DAC) proceeded in San Francisco last week (25-28 June, 2018), Accellera published the first release of the Portable Test and Stimulus Standard (PSS), with support announced by Breker Verification Systems, Cadence Design Systems and Mentor, a Siemens business.

Accellera chair Lu Dai said he expects the standard to “have a profound impact on the industry as a whole” by providing a way of building test scenarios that can be applied across multiple situations and platforms. Dai told TDF the working group is one of the most active at Accellera and is already working on updates based on feedback from the user community. “The team is not slowing down at all,” he noted.

The standard is available to download for free at Accellera’s website. It defines a specification to create a single representation of stimulus and test scenarios, usable by a variety of users across many levels of integration under different configurations. The execution platforms can span simulation, emulation, FPGA prototyping, and post-silicon, among others. With this standard, users can specify a set of behaviors once and expect to observe consistent behavior across different environments. PSS supports two different types of input for creating test scenarios. One is a domain-specific language created specifically for the purpose; the other is a set of class declarations for use with C++.

Tool support

Mentor, which started work on graph-based portable-stimulus concepts in 2004, said it will fully support the new Accellera Portable Test and Stimulus Standard 1.0 in the upcoming release of its Questa inFact tool.

“When Mentor donated the Questa inFact tool’s language and initiated the original Accellera Portable Test and Stimulus Working Group in 2014, we did so to drive portable stimulus toward mainstream use and help more design teams realize the step-function gains in verification productivity afforded by our Questa inFact tool,” said Mark Olen, product marketing group manager for Mentor’s IC verification solutions division.

Cadence said its Perspec System Verifier supports the version 1.0 standard, providing what the company calls an “abstract model-based approach” for defining SoC use-cases from the PSS model. The tool uses Unified Modeling Language (UML) activity diagrams to visualize the generated tests.

Breker has also claimed full compliance with the version 1.0 standard for its Trek5 portfolio of tools.

EDA companies such as Mentor have seen opportunities to use the test scenarios, which are built using declarative-programming techniques based on actions, schedules and constraints, to support greater degrees of automation. Mentor said it has applied classification machine learning to the graph-based technology in Questa InFact to enable the better targeting of scenarios to achieve coverage goals at the IP block level and increased usefulness of bare metal testing at the IC level. The tool learns from each subsequent scenario during simulation or emulation.

Mentor has also applied data mining technology to extend the application of portable stimulus beyond verification. The Questa inFact tool can collect and correlate transaction-level activity to characterize IC design performance parameters including fabric routing efficiency and bandwidth, system-level latency, cache coherency, arbitration efficiency, out-of-order execution, and even opcode performance. It can also analyze regression test environments to help eliminate redundant simulation and emulation cycles.

]]> 0
EDA needs to work on the back end, says Qualcomm Wed, 27 Jun 2018 18:41:32 +0000 It’s the back end that needs work as system-level considerations begin to dominate design, argued Qualcomm’s vice president of engineering in a short Visionary Talk on Wednesday at the Design Automation Conference (DAC).

PR ‘Chidi’ Chidambaram said the trend was being driven by a combination of a slowdown in Moore’s Law and a focus on markets that need greater durability. “The upcoming markets are in automobiles and IoT, where products will be with people for ten years or more. We have to start thinking about designing for durability. The DPPM failure rate has to drop below 1ppm. We have DTCO [design-technology co-optimization] but more is needed.”

Communications will remain a huge market for semiconductors, Chidambaram said, but the focus is now moving away from the core SoC to the numerous interface ICs now needed to deal with multiple bands and antennas in 5G handsets. That, in turn, leads to a growing need for systems-in-package technologies that are cheaper and more reliable.

“Chip and substrate interactions cause a lot of stress. The modeling of this is not as mature as the modeling in other parts [of EDA],” Chidambaram noted, adding that the metal stack inside individual ICs also needs attention. “Modeling of the behavior of the back-end of line is not as mature as that for the front-end.”

Better metal models

He said modeling of effects such as quantum confinment in nanometer-scale fins now achieves high accuracy. “I can predict behavior to within 2 or 3 percent accuracy. But as I get into the metal the error increases a lot. The juice generated by the fin[FET] doesn’t get to the user. It’s all sucked up by the back end.”

Designers need to add significant margins to account for via parasitics, he said. “The error you get is 5 to 10 per cent on the worst-case paths. And these are the paths that matter. All this manifests in an unpredictability that we have to design for. You always end up with a wide tail that you can’t predict. So we end up over-margining the part. Getting better predictability will help us scale.”

Although novel devices such as gate-all-around and negative-capacitance transistors are likely to help bring down power, Chidambaram said, “they are not killer solutions”.

“The opportunity is for system-level innovation to drive the scaling forward. A lot of the key technologies exist today. We have packaging technologies that can integrate these different chips together. But just putting them together I don’t see a lot of benefit. Can I split the functions up differently? Maybe we can use innovation in packaging to fit all these things together.”

]]> 0
Remember the design gap? It’s back Wed, 27 Jun 2018 18:05:21 +0000 Fifteen years ago, the chipmaking industry became very concerned about the design gap. At the 40th Design Automation Conference (DAC) in Anaheim, Gary Smith, then an analyst with Gartner Dataquest, pointed out a problem that had emerged in the rollout of the 90nm process. Scaling according to Moore’s Law was providing the transistors but chipmakers were finding it increasingly difficult to put them to good use.

“When 90nm processes became available, there were no designs at 50 million gates [or above] though 90nm gives designers 100 million gates to play with,” Smith said in 2003. “EDA requires a major technology shift every 10 to 12 years to keep up with developments in the silicon.”

As the design gap opened up in the early 2000s, Synopsys and others bet primarily on IP reuse as the main weapon they would offer to close the design gap, rather than system-level and behavioral compilation tools. The decision was largely vindicated. IP reuse, up to the level of entire design platforms, has helped implement multi-billion transistor SoCs. But a new gap has opened up, according to DARPA, and a new shift in EDA is needed.

Andreas Olofsson, program manager in DARPA’s microsystems technology office, says Gordon Moore’s seminal article in the April 19, 1965 edition of Electronics pointed to that kind of pressure on design in a section on the third page headed “Day of reckoning”.

Olofsson’s argument is the reckoning has come in the form of design cost, which has become the go/no-go factor in OKing a project rather than manufacturing issues.

Professor Andrew Kahng of the University of California at San Diego, said at this year’s DAC in San Francisco: “Wafer cost almost always ends up a nickel or dime per square millimeter. But the design cost of that square millitmeter is out of control.”

In his keynote at DAC on Tuesday, IBM vice president for AI Dario Gil pointed to the problem being one of intense difficulty that has become critical because of pressure to get projects completed more quickly. “The design cycle may last for years,” he said, which is a problem in fast-moving, hardware-enabled areas such as machine learning. “Given that there is a renaissance going on in the world of AI, increasing automation in design is incredibly important.”

Step and repeat

Up to the end of 2016, Olofsson was CEO of Adapteva, which acheived fame when the company crowdsourced the funding of its parallel processor. Olofsson uses the Adapteva experience as a demonstration of one way in which it’s possible to cut design costs – making the most of replicated blocks.

This time around, more extensive high-level design automation may well be the answer, in line with Moore’s comments from the third page of his 1965 article: “Perhaps newly devised design automation procedures could translate from logic diagram to technological realization without any special engineering.”

Last year, DARPA put together several programs under the banner of “page three”, in reference to the relevant section of Moore’s article. “The objective is to create a no-human-in-the-loop 24-hour turnaround layout generator for system on chips, system in packages and printed circuit boards,” Olofsson says.

A different gap

The problems that faces such a project is that the nature of today's design gap is different to that of the early 2000s. Kahng claims the problem lies in the unpredictability of design. Small changes in tool settings can lead to big differences in die area or performance. He points to the 14nm finFET implementation of the Pulpino SoC, a research device based on the open-source RISC-V architecture. A frequency change of just 10MHz on a 1GHz target can lead to an area increase of 6 per cent.

Although it’s not one that carries happy memories for those in EDA, Olofsson has resurrected the term “silicon compiler” to describe what he believes is needed to achieve a dramatic increase in the automation of design. However, DARPA has far from ruled out a further increase in IP reuse – this time based on the open-source movement, albeit the “non-viral” version based on the General Public License (GPL) that prevails in software. That is the subject of the Posh Open Source Hardware (POSH) program, using a name that seems to nod to the recursive naming of the GPL-protected GNU tools (GNU’s Not Unix). Olofsson points to RISC-V, Open Cores, and the Open Compute Project as early examples of what might be achieved using open-source hardware IP.

“In my view, you can only design so fast with productivity gains. The best gains can really be made by not designing at all. With any components that have already been used and verified we should be able to just drop them in at close to zero cost,” Olofsson says.

DARPA projects

The overall aim of the DARPA programs is to make it possible to get design costs for a large SoC down to the $2m level. Although this figure might itself be dwarfed by mask costs on leading-edge nodes, Olofsson points to the use of multiproject wafers (MPW) as a way to constrain the production costs for runs of around 10,000 units, which are the kinds of volumes that the Department of Defense typically needs.

The core program for heavily automated design is Intelligent Design of Electronic Assets (IDEA). DARPA issued its initial call for contributions for both IDEA and POSH last autumn and awarded its first contract for a “page three design” project to Northrop Grumman earlier this month (June 11, 2018).

IDEA splits into two parts. The first technical area (TA1) covers automated, unified physical design from un-annotated schematics and RTL code. DARPA hopes this will include support for automated retiming and gate-level power-saving techniques as well as test-logic insertion. The second (TA2) seeks an answer for “intent-driven system synthesis”, using a large ready-made parts database to select candidate blocks to support a high-level design.

Learned behavior

DARPA expects the systems developed under these programs to make use of techniques like machine learning and data mining. Gil described experiments in automated design as part of the SysTunSys project at IBM as one approach the industry could examine. The software would run many synthesis jobs in parallel with different parameters to try to find sweetspots automatically. He claimed the technique applied to one design improved total negative slack by 36 per cent and cut power by 7 per cent. “This was after the experts had done the best they could,” Gil claimed.

Kahng sees machine learning as one of the crucial technologies for an automated flow, applying techniques similar to those in SysTunSys. He proposed the “multi-armed bandit” where arrays of computers try different approaches in a random walk manner, each time trying to get closer to the target. The key problem is killing simulation runs or implementation steps that get stuck. To address this, a strategy modeled on blackjack seems to offer a viable approach with the refinement of waiting for three negative signals before completely killing a job that looks to be unpromising.

The use of machine learning may also help create efficient models that can predict aspects such as timing across a large number of corners, so that implementation tools can then move to answers that correlate with reality much more quickly. “We want to predict timing at corners we don’t actually analyze. Our hope is to run static timing analysis on a bounded number of corners, 14 say, and from that be able to predict all the others,” Kahng said.

Kahng said the sharing of data would be critical in making automated design successful, which may prove to be a stumbling block. In his keynote on Wednesday, Professor David Patterson of UC Berkeley pointed to open-source hardware, such as his group’s RISC-V project, as helping to boost the idea of agile design in which teams iterate very quickly.

Although researchers are taking a long-term view of building a much more automated flow, Olofsson expects the interim phase of IDEA to be complete by the end of the year with an initial integration of technologies that puts the program on its way to creating an automated silicon compiler that can achieve 50 per cent of its PPA targets. “The ultimate aim is to reach 100 per cent PPA. Maybe not better than every team in the world, but one that will beat a lot of teams in implementations,” he said at DAC.

]]> 0
EDA learns to love AI Tue, 26 Jun 2018 10:25:05 +0000 Although the rise of AI chipmaking startups has attracted the most attention, the slow penetration of machine learning into electronics design continues at this year’s Design Automation Conference (DAC). A number of suppliers are also combining AI with the other theme of this year’s event: the expansion into cloud computing.

At a lunchtime session organized by Synopsys focused on the company’s Fusion implementation tools, Michael Jackson, senior vice president in the design group, said the overall strategy for the near term is to use machine learning across a number of tools to speed up analysis.

As part of a rollout of several products designed to run in the cloud, Cadence Design Systems said it has applied machine learning to library characterization. The use of machine learning for this task is an example of how the technique is likely to evolve, at least in its early phases: to speed up the process by identifying the most likely hotspots that need detailed analysis, by learning from prior runs.

Simulation reduction

The simulation of libraries and standard cells needs to take into account many more design corners than used to be the case, because of the growing effect of temperature and process effects. It also relies increasingly on statistical analysis techniques. This consumes a large (and growing) number of machine cycles.

Cadence refers to its use of machine learning in library characterization as "smart interpolation": using learned heuristics to identify the most important design corners that need characterization.

Seena Shankar, senior principal product manager in the custom IC and PCB group at Cadence, said the tool “leverages clustering techniques to identify and predict critical corners based on a handfull of cells from the library. The selection of critical voltage corners for characterization across a range of voltages is done by keeping temperature and process constant.”

Typically, users provide the voltage ranges and pick the cells upon which they want to base the analysis, Shankar explained. “Liberate Trio generates simulation data across the range of voltages provided and decides the critical points for characterization. The level of supervision needed is minimal.”

Ron Moore, vice president of business planning in Arm’s physical design group, said the team “saw a notable improvement in turnaround time using the same number of CPUs”.

A little over a year ago, Solido Design Automation, which was acquired by Mentor, A Siemens Business in late 2017, launched a library characterization tool as part of a long-term program of development based on machine learning. One tool, used for high-sigma Monte Carlo analysis, involved 40 person years of effort since the initial version was developed at Solido a decade ago.

AI targets in EDA

In a session at DAC intended to show how EDA tools can make use of AI, Jeff Dyck, director of engineering at Mentor, laid out the characteristics that make certain applications suitable for AI.

“You look for things that are CPU-bound,” Dyck said, where brute-force simulation of all parameters would tie up a large number of licenses and machines but where heuristics would lead to much faster results. Being on the critical path increases the pressure to reduce the simulation times. “That looks like a great machine-learning problem,” he added.

A key difference between EDA and other sectors that want to apply machine learning is the nature of the data. ”We don’t try to take a bunch of historic data and learn from that. We collect data on the fly so that it runs on an adaptive machine-learning loop.” The generator of the training data is often a simulator: “Usually a SPICE simulator that we use in our case,” Dyck said.

However, analog design is far from being the only area in EDA that is amenable to machine learning. Synopsys has been looking closely at using machine learning to build heuristics that help speed up the operation of formal verification.

Non-obvious answers

In FPGA design, Singapore-based Plunify has used the combination of cloud computing and machine learning to tighten up timing. The InTime tool learns which implementation settings work for different types of logic from its results with previous RTL compilations.

Plunify co-founder and COO Kirvy Teo said: “In terms of what can be done, we have seen designs pass timing from as bad as 2ns worst negative slack. The results have surprised even ourselves. Usually what happens is that, when we get a design, the user has tried all the aggressive settings that they can get their hands on. Usually it doesn't get the best results. But we have found groups of settings that work better. If you set everything to be aggressive the software churns without getting good results.”

The learned attributes can produce results that seem counter-intuitive, but which work. For example, FPGA designers would typically expect to use DSP blocks wherever possible. The tool learned that using settings which limit the number of hard-wired DSPs used in a design can result in better results, which may be due to a reduction in wiring congestion made possible by not having to find routes to and from distant DSP blocks.

Dyck said that the use of machine learning in tools tends to lead to initial resistance from designers, although the ability to deliver answers faster is breaking down that problem.

“Supporting these tools is very different. A key difficulty is proving that an answer is correct. Verifiability and trust are the trickiest things. How do you prove they are right without running the actual simulations?" Deck said. " You need trust. If people don’t trust them, they won’t use them.”

]]> 0