WP 2: Fast and scalable numerical methods for Transient circuit simulation
When it comes to the D&V of a large SoC, the size of the circuits matters, as it may include a large number of "analog" transistors, a large digital content, and a lot of parasitics in case of post-layout simulations. We are talking here of 500K to 50M active devices and up to 10M linear devices as well as of the desire to simulate a maximum of operating modes of the chip (startup, power-on-reset, standby modes, etc..), which corresponds to ms of physical time. The SoC is simulated in mixed-mode, e.g. with VHDL/Verilog descriptions for the digital blocks and pure transistor-level and/or behavioral models. D&V of such circuits would require much more than simulating during ms the chip in operation, which is not feasible with the current simulation engines. The verification also requires to run some corners (P,V,T) to get some insight about the variability of the design.
To this purpose, we want to develop new algorithms to speed up each part of an analogue simulator (circuit representation and modelling, the time integration schemes, the non-linear solver, direct and iterative linear solvers). These algorithms have to be generic (not tailored for specific circuit applications), they must not affect analog simulation robustness nor accuracy. They also need to be scalable (improve performance both in single and multi-threaded modes). The context here is an analog simulator with a hierarchical solver.
The capacity of existing simulators was primarily determined by the size of the circuit being simulated, i.e. the memory usage during a circuit simulation and the simulation time itself were approximately proportional to the circuit size. But for today's VLSI circuits design this would be impracticable, thus one can do a hierarchical partition of the full-chip design into blocks (subcircuits), in order to save both memory usage (by exploiting reusability) and simulation time (by creating advanced behavioural models for some subcircuits). However, hierarchical partitioning is not an error-free process.
The main goal is to develop fast and scalable numerical methods for Transient circuit simulation by orders of magnitude (> 10x speed-up).
More in detail, we will:
- Apply latency and multirate techniques to the levels of the hierarchy (during time integration as well as during Newton's iterations). When a subcircuit (or a hierarchy branch) is considered as latent, then its content (internal nodes) doesn’t need to be recomputed, and with multirate considerations the internal nodes simply have to be extrapolated.
- Reduce the total amount of computation (and then simulation time) by compressing some levels of the hierarchy, i.e. by using a higher level of description for some sub circuits. This will be achieved by using some non-linear Model Order Reduction (MOR) and by building non-linear dynamic macro models, capturing dominant behaviours. The constraint will be to keep an acceptable accuracy. In this context, the Discrete Empirical Interpolation Method (DEIM) developed by Sorensen (A State Space Error Estimate For POD-DEIM Nonlinear Model Reduction) will be considered as this has led to promising results in other areas.
- Make efficient use of the current and foreseen architectures that are or will be mainstream in the next 3 to 5 years, e.g. multi-core, manycore (Intel XeonPhi) and GPGPU (NVIDIA Kepler). It is therefore clear that a good scalability of the numerical schemes is a strong requirement for any new method.