THURSDAY September 26, 11:30am - 1:00pm | Grand Victoria B
EVENT TYPE: REGULAR SESSION
SESSION 4ESL & Virtual Prototyping
Swaminathan Ramachandran - CircuitSutra Technologies Pvt. Ltd.
Virtual Platforms have become ubiquitous, no longer is the question whether to use them, nor how to build them, not even how to use them; today the focus is on the next big performance gains. We will see three different approaches: novel new technology, approaches to optimising models, and big leaps forward in abstractions. As applications and system complexity continues to increase, these novel techniques are all required for us to stay ahead and allow Virtual Platforms to continue to be the cornerstone of system development."
|4.1||Open Source Virtual Platforms for SW Prototyping on FPGA based HW|
|Meeting the needs of both simulation speed, and hardware representatively is always challenging. A novel combination of virtual platform technology is introduced that takes advantage of existing infrastructure (including GreenSocs QBox, and existing FPGA availability) allowing high performance, but accurate simulation.|
|Speaker:||Praveen Wadikar - NVIDIA Corp.
|Authors:||Chen Qian - Nvidia
Mark Burton - GreenSocs Ltd
Praveen Wadikar - NVIDIA Corp.
|4.2||Framework for creating performance model of AI algorithms for early architecture exploration.|
|We are now living in an age of AI enabled smart homes, autonomous vehicles, and chatbots. Researchers are coming up with complex AI algorithms that are useful in various areas like speech and image recognition, even surpassing human accuracy. Increased complexity, growing competition and splurge in demand of AI enabled devices constantly put pressure on vendors to improve the chip performance and deliver the product as early as possible. System designers tend to use simulation techniques to quickly analyze and improve their model before implementing it in silicon. But till now these simulation techniques have not been leveraged for the AI domain. This paper proposes a novel framework to leverage Virtual Prototyping for early design exploration of power and performance targeting AI centric designs.|
|Speaker:||Amit Dudeja - Synopsys India Pvt. Ltd.
|Authors:||Amit Dudeja - Synopsys India Pvt. Ltd.
Amit Tara - Synopsys India Pvt. Ltd.
Amit Garg - Synopsys India Pvt. Ltd.
Tushar Jain - Synopsys India Pvt. Ltd.
|4.3||Profiling Virtual Prototypes: Simulation Performance Analysis & Optimization|
|Short Abstract Virtual Prototyping is a proven methodology for early hardware architectural exploration, performance analysis and software development. To meet its objectives, Virtual Prototypes are expected to meet certain requirements around accuracy, early availability and simulation speed. Modeling Virtual Prototypes to accurately represent actual system being simulated helps increase confidence on reliability of results obtained using Virtual Prototypes. Early availability of Virtual Prototypes to architects and software developers during project phase help increase the utility and impact of Virtual Prototypes on project execution. Besides accuracy and early availability, simulation speed of Virtual Prototype is an equally important aspect – high simulation speed of Virtual Prototypes allows architects and performance analysts to explore large design space in a timely manner as well as helps software developers bring-up full software stack on Virtual Prototype in a relatively short time span. While there has been much research around modeling techniques to help improve simulation performance of Virtual Prototypes, there hasn’t been much focus on good profiling tools to measure and analyze simulation performance of Virtual Prototypes. Most off-the-shelf profiling tools, like gprof, oprofile, perf, etc. are primarily designed to profile pre-compiled software code generated by a C/C++ compiler or JVM like code which is JIT compiled but is not updated during the course of application run. Moreover, report generated by these tools is organized for analysis of software function calls, while for the purpose of Virtual Prototypes a much desirable format is a one based on design hierarchy of the SoC being modeled. Off-the-shelf profilers provide information regarding time spent for each function at a software level, however, these profilers don’t have any information about design hierarchy being simulated in a Virtual Prototype, so can’t provide any visibility into how different component models in a system consume simulation time. For example, if a common function is used to implement a component which is instantiated multiple times in Virtual Prototype, off-the-shelf profilers will not be able to distinguish how much time this common function takes in context of different instantiations of same component model. Due to lack of such a profiler, Virtual Prototype model developers generally rely on analyzing end-to-end simulation time obtained by simulating performance benchmark suites representing typical workloads to be used on Virtual Prototype or a coarse grained analysis of simulation time spent between simulation kernel and certain component models. This paper presents a Virtual Prototype profiler developed to overcome these limitations to provide a detailed design hierarchy level breakdown of simulation time. Moreover, the profiler is built into the Virtual Prototype and is designed to have negligible overhead. The paper presents the detailed design of the profiler, results obtained by application of this profiler in a number of full-system Virtual Prototypes and its impact in uncovering a number of otherwise hard-to-find performance issues in simulation. The paper also shows the result on full system simulations with and without enabling the in-built profiler to show that the profiler has negligible overhead as desired. Related Work Much of the prior work listed in references has focused on general techniques for developing high performance software in general, and avoiding modeling aspects known to degrade simulation performance. General software techniques include coding styles, inlining, carefully selecting data structures and containers, avoiding unnecessary logging and IO/, making as fewer copies as possible, etc. Typical modeling techniques recommended for simulation models include avoiding context switches, higher abstraction levels and bursty communication. Typical modeling techniques for instruction set simulators (ISS) include just-in-time (JIT) compilation instead of interpretive execution and direct memory access. Also, much of the prior work focused around applying standard off-the-shelf profilers like gprof, perf, oprofile for Virtual Prototypes which provide only function call level details of time spent during simulation and help in optimizing some of the most-time taking methods to a certain level. However, with regards to Virtual Prototypes, these software function level reports are of limited applicability since it doesn’t provide any insight into how various component models are actually consuming simulation time. Method employed in  &  report simulation time between simulation kernel vs certain important component models, with the focus on optimizing the kernel. This paper extends the concept to provide a mechanism to get a detailed design hierarchy level breakdown of simulation time. Besides providing design hierarchy level simulation time information, our profiler can also provide information regarding time spent executing from translation cache vs pre-fetch, decode, translation and other overheads involved in typical ISS models employing JIT compilation technique. Application Virtual prototype profiler presented in this paper can be applied to any Virtual Prototype development methodology and helps in gaining useful information about simulation time spent among various components at a design hierarchy level. This helps gaining valuable insight into performance of different component models and helps optimizing the Virtual Prototype to achieve maximum simulation speed. Moreover, since the profiler is built into the Virtual Prototype and can be enabled/disabled at run time, it is useful in measuring the impact of integrating any new component model into the Virtual Prototype, impact of integrating any third-party model on overall simulation performance of the Virtual Prototype as well as how different software and run-time configurations impact simulation performance of the Virtual Prototype. Summary/Results The Virtual prototype profiler presented in this paper was applied to a number of different Virtual Prototypes developed for both architectural exploration and software development. The profiler was enabled at run-time to measure impact of different types of software applications and different run-time configuration settings of Virtual Prototypes on simulation performance of the different component models. This helped in uncovering and fixing a number of performance issues in simulation models as well as figuring out the best possible configurations of various Virtual Prototype run-time configuration options to achieve maximum possible simulation speed for different types of software workloads. The paper provides details of some of the examples where the profiler helped improve simulation performance significantly. For one of the software workloads, the profiler helped uncover that one of the UART model instances was consuming most simulation time, which was both undesirable and unexpected. But having this uncovered, it was simple to fix the issue by removing the unnecessary synchronization requirement enforced by the model and help improve simulation performance by around 200%. In another scenario, one of the CPU model instances was found to take significantly higher simulation time that other CPU model instances in an SMP system. It turned out to be a run-time configuration issue of the Virtual Prototype which caused that CPU model instance to get stuck in an infinite loop of exceptions and hence significantly degrading simulation performance of the overall Virtual Prototype. Fixing it helped improve simulation performance by around 800%. Lastly, but not the least, the paper shows simulation results obtained across a variety of software workloads and benchmarks to showcase that the in-built profiler has a negligible overhead. The measured overhead of the in-built profiler across a range of 20 benchmarks was less than 1.5% on an average and approx. 2.8% in the worst case. References 1. Practical Techniques for Improving SystemC Simulation Performance, Joseph Chapman, NASCUG 2007 2. SyncView: Visualize and Profile SystemC Simulations, Denis Becker et.al., Design Automation for Understanding Hardware Designs, 2016 3. Profiling High Level Abstraction Simulators of Multiprocessor Systems, Liana Duenha, et. al. 4. Dramatically Increase the Performance of SystemC Simulations, Dr. Greg Tumbush, et. al. 5. VP Performance Optimization – How to analyze and optimize the speed of SystemC TLM models, Rocco Jonack, DVCon Europe, 2014|
|Speaker:||Sandeep Jain - NXP Semiconductors
|Author:||Sandeep Jain - NXP Semiconductors