# Subspace Trajectory Piecewise-Linear Model Order Reduction for Nonlinear Circuits

Xiaoda Pan<sup>1</sup>, Hengliang Zhu<sup>1,\*</sup>, Fan Yang<sup>1</sup> and Xuan Zeng<sup>1</sup>

<sup>1</sup> State Key Lab. of ASIC & System, Microelectronics Department, Fudan University, Shanghai, P.R. China.

Received 7 May 2012; Accepted (in revised version) 5 November 2012

Communicated by Wei Cai

Available online 6 February 2013

**Abstract.** Despite the efficiency of trajectory piecewise-linear (TPWL) model order reduction (MOR) for nonlinear circuits, it needs large amount of expansion points for large-scale nonlinear circuits. This will inevitably increase the model size as well as the simulation time of the resulting reduced macromodels. In this paper, subspace TPWL-MOR approach is developed for the model order reduction of nonlinear circuits. By breaking the high-dimensional state space into several subspaces with much lower dimensions, the subspace TPWL-MOR has very promising advantages of reducing the number of expansion points as well as increasing the effective region of the reduced-order model in the state space. As a result, the model size and the accuracy of the TWPL model can be greatly improved. The numerical results have shown dramatic reduction in the model size as well as the improvement in accuracy by using the subspace TPWL-MOR compared with the conventional TPWL-MOR approach.

**AMS subject classifications**: 94C05, 93A15, 68U07, 68U20, 41A21

Key words: Trajectory piecewise linear model order reduction, nonlinear circuits.

## 1 Introduction

With the continuously increasing complexity of integrated circuits, fast simulation and verification become an important part in the design flow. Model Order Reduction (MOR) is one of the most promising approaches for fast simulation of complex systems. The MOR techniques for linear time invariant systems have been well developed during the past years [1–4]. On the other hand, the problem of nonlinear system model order reduction is generally much more difficult and challenging [5–7].

http://www.global-sci.com/

<sup>\*</sup>Corresponding author. *Email addresses:* 082052014@fudan.edu.cn (X. Pan), hlzhu@fudan.edu.cn (H. Zhu), yangfan@fudan.edu.cn (F. Yang), xzeng@fudan.edu.cn (X. Zeng)

Trajectory piecewise linear model order reduction (TPWL-MOR) method has been considered as the standard model order reduction approach for nonlinear circuits. In TPWL-MOR method [6], the nonlinear system is linearized around multiple expansion points on its simulated state trajectories driven by some "training inputs". The nonlinear system is first approximated by the piecewise linear model around these expansion points, and then reduced to a smaller macromodel by model order reduction approaches. The TPWL-MOR has been widely applied in many areas including analog circuits [6,8,9], MEMS [6,8,10], bio-chemical system [7], thermal analysis [11], computational fluid dynamics (CFD) [12], subsurface flow simulation [13], floorplanning [14,15], modeling oscillators [16] and analog circuits mismatch analysis [17], etc.

There have been many efforts devoted to improving the accuracy and efficiency of the TPWL-MOR approach. The trajectory piecewise linear model was extended to piecewise polynomial model in [18, 19], which improves the accuracy by using higher order bases. A wavelet-collocation based TPWL-MOR method was proposed in [20] to enhance timedomain simulation accuracy. A novel transistor level macromodel was proposed in [21] to enhance the coverage of the trajectories in high-dimensional state space for the nonlinear circuits. For finding the proper weighting function for the piecewise linear model, a time-dependent weighting function was proposed in [22] which was shown to be computationally cheaper, and a weighting scheme was to proposed in [23] to enhance the stability of the reduced order model. For model order reduction of the piecewise linear model, the moment-matching approach, TBR and POD were applied in [6], [8] and [24], respectively. A localized reduction approach was developed in [25], and a nonlinear projection method was proposed in [9, 26], which can reduce the nonlinear system to much lower orders. A novel approach proposed in [27] is based on the quadratic-linear representation of nonlinear systems, which can avoid the accuracy loss and reduce the computational cost of trajectory piecewise linear approximation for a very wide range of nonlinear circuits.

One of the key steps in TPWL-MOR approaches is the interpolation of nonlinear systems in high dimensional state space. Although TPWL-MOR method avoids the exponential explosion in computational cost [6], it still needs a large amount of expansion points to cover the state trajectory of large-scale nonlinear systems. Despite many great efforts to improve the efficiency and accuracy of the TPWL model, there are very few works focusing on the direction of reducing the number of expansion points to improve the efficiency of the high-dimensional interpolation in TPWL-MOR.

The problem of high-dimensional interpolation has been studied for many decades in computational science. A novel approach that has been proposed for solving the multi-dimensional partial differential equation (PDE) is based on the ANOVA decomposition and expresses the multi-dimensional function as a combination of the functions of some subgroups of its dimensions [28]. Inspired from this work, the subspace TPWL-MOR approach is proposed in this paper to improve the efficiency and accuracy for the high-dimensional interpolation in TPWL-MOR methods. The major advantage of the subspace TPWL-MOR is that the number of expansion points for the interpolation of high-dimensional nonlinear systems can be greatly reduced, which leads to much smaller model size as well as less simulation time of the reduced TPWL model. Meanwhile, the effective region of the subspace TPWL model can be generally larger than that of the conventional TPWL model, which also improves the accuracy of the resulting reduced-order model.

The idea of subspace TPWL-MOR can be implemented from the system level and the circuit level as proposed in this paper based on the sub-circuit partition. The system-level subspace TPWL approach is based on the system equation decomposition. The circuit-level subspace TPWL approach is based on the port-preserving model order reduction, which further reduces the model size for the module-based design of nonlinear circuits.

The rest of the paper is organized as follows. The conventional TPWL-MOR method is reviewed in Section 2. The idea of the proposed subspace TPWL approximation is presented in Section 3. The system-level subspace TPWL-MOR is developed in Section 4, while the circuit-level subspace TPWL-MOR is developed in Section 5. Some remarks are addressed in Section 6. The numerical results will be demonstrated in Section 7, which show great improvement in both efficiency and accuracy of the proposed methods compared with the conventional TPWL-MOR method. Finally, the conclusions are drawn in Section 8.

## 2 Trajectory piecewise-linear model order reduction

Generally, a nonlinear circuit can be formulated as the following nonlinear ordinary differential equation (ODE),

$$\frac{d(g(x(t)))}{dt} + f(x(t)) = Bu(t),$$
(2.1)

where  $x(t) \in \mathbb{R}^n$  is a state variable vector representing the node voltages and branch currents of the circuit at time  $t, B \in \mathbb{R}^{n \times p}$  is the input matrix and  $u(t) \in \mathbb{R}^p$  is the input signals. Function  $g: \mathbb{R}^n \to \mathbb{R}^n$  and function  $f: \mathbb{R}^n \to \mathbb{R}^n$  are nonlinear vector-valued functions representing the charges of the nonlinear capacitance elements and the currents of nonlinear resistance elements, respectively. The idea of TPWL is to represent the nonlinear system (2.1) by its piecewise-linear approximation, and reduce the order of the resulting piecewise-linear system by the MOR techniques that have been developed for the linear time-invariant systems.

Let  $\{\hat{x}_i; i=1,\dots,N\}$  denote some expansion points of x in the state space. The nonlinear system in (2.1) can be approximated by its linearized models around these expansion points,

$$\frac{d}{dt}(g(\hat{x}_i) + C_i \cdot (x - \hat{x}_i)) + (f(\hat{x}_i) + G_i \cdot (x - \hat{x}_i)) = Bu(t),$$
(2.2)

where  $C_i \in \mathbb{R}^{n \times n}$  and  $G_i \in \mathbb{R}^{n \times n}$  are the Jacobian matrices of nonlinear function g(x) and f(x) evaluated at the state  $\hat{x}_i$ . Given some state-dependent weighting functions

 $\{w_i(x); i=1,\dots,N\}$  for the expansion points  $\{\hat{x}_i; i=1,\dots,N\}$ , the original nonlinear system in (2.1) can be approximated by the "piecewise linear model" as

$$\sum_{i=1}^{N} w_i(x) \left( C_i \frac{d(x(t))}{dt} + G_i x(t) + I_i \right) = Bu(t),$$
(2.3)

where  $I_i = f(\hat{x}_i) - G_i \hat{x}_i$ . The weighting function can be defined as  $w_i(x) = e^{-\beta ||x - \hat{x}_i^k||/m}$ , where  $\beta$  takes the constant value and m is the minimal distance between x and the expansion points  $\{\hat{x}_i^k\}$  [6].

For model order reduction, a projection matrix  $V \in \mathbb{R}^{n \times q}$  is calculated by combining the Krylov subspaces of the linearized systems in (2.2) at expansion points [6], as summarized in Algorithm 2.1. With the projection matrix V, the reduced-order model of the nonlinear circuit (2.1) can be obtained as

$$\sum_{i=1}^{N} \tilde{w}_{i}(\tilde{x}) \left( \tilde{C}_{i} \frac{d(\tilde{x}(t))}{dt} + \tilde{G}_{i} \tilde{x}(t) + \tilde{I}_{i} \right) = \tilde{B}u(t),$$
(2.4)

where  $\tilde{w}_i(\tilde{x}) = w_i(V\tilde{x})$  for  $i = 1, \dots, \alpha$ ,  $\tilde{C}_i = V^T C_i V$ ,  $\tilde{G}_i = V^T G_i V$ ,  $\tilde{I}_i = V^T I_i$  and  $\tilde{B} = V^T B$ . The method proposed in [8] follows the same idea of trajectory piecewise linearization, but employs the TBR algorithm to generate the projection basis.

Algorithm 2.1: Algorithm for the projection basis V

- 1: Let  $V_{agg} = [], i = 0.$
- 2: for i=1 to N do
- 3: Construct q-th order orthogonal Krylov bases V<sub>1</sub> and V<sub>2</sub> from the q-th order Krylov subspaces, span{V<sub>1</sub>} = K<sub>q</sub>(G<sub>i</sub><sup>-1</sup>C<sub>i</sub>,G<sub>i</sub><sup>-1</sup>B<sub>i</sub>) and span{V<sub>2</sub>} = K<sub>q</sub>(G<sub>i</sub><sup>-1</sup>C<sub>i</sub>,G<sub>i</sub><sup>-1</sup>I<sub>i</sub>).
  4: V<sub>agg</sub> = [V<sub>agg</sub>, V<sub>1</sub>, V<sub>2</sub>, x<sub>i</sub>]
- 5: **end for**

```
6: V'_{agg} = svd(V_{agg})
```

7: V = Columns of  $V_{agg}^{\prime}$  corresponding to singular value greater than  $\epsilon$ 

One of the key steps in TPWL model order reduction is the piecewise linear approximation in (2.3), which is actually a high-dimensional interpolation problem since  $x \in \mathbb{R}^n$ . High-dimensional interpolation based on full tensor product suffers from the "curse of the dimensionality" [28], i.e. the computational cost increases exponentially w.r.t. the dimensionality. Take the nonlinear system in (2.1) for example, the interpolation of nonlinear function f(x) and g(x) based on the full tensor product needs  $m^n$  expansion points, where *m* is the number of expansion points for each dimension and *n* is the dimension of the state variable *x*.

To avoid this exponential increasing in the computational cost, the expansion points  $\{\hat{x}_i; i=1,\dots,N\}$  in TPWL are selected from the state trajectory instead of uniformly distributed in the state space [6]. Note that the linearized models in (2.2) are only accurate

near the expansion points. In order to cover as much reachable state space as possible while keeping the number of expansion points small, TPWL approach imposes some representative "training" inputs to simulate the nonlinear system and selects the expansion points  $\{\hat{x}_i; i=1,\dots,N\}$  from its state trajectories [6]. The training inputs of TPWL are generally defined by some current or voltage sources that are connected to the input ports of the circuits. As a result, the number of expansion points can be dramatically reduced compared with that of directly discretizing the whole *n*-dimensional state space. However, for large scale circuits, it still needs a large number of expansion points to cover the state trajectories. In some cases, hundreds or even thousands of expansion points are needed, which will inevitably increase the computational cost and complexity of the piecewise linear model in (2.3).

## 3 Subspace TPWL approximation: the idea

One strategy for solving the high-dimensional problems is to attack the "curse of dimensionality" from the lower dimension. For example, the high-dimensional partial differential equation problems can be solved by the low-dimensional bases in ANOVA decomposition [28]. This strategy can be applied here to reduce the complexity in trajectory linear model interpolation.



Figure 1: Example of a two-node nonlinear circuit.

Take a very simple two-node nonlinear circuit in Fig. 1(a) as an example. The state variables of this nonlinear circuit are  $x = \begin{bmatrix} v_1 & v_2 \end{bmatrix}^T$ , and the vector-valued functions f and g in the system equation (2.1) can be formulated as

$$f(x) = \begin{bmatrix} \frac{v_1 - v_2}{r} + i_d(v_1) \\ \frac{v_2 - v_1}{r} + id(v_2) \end{bmatrix}, \quad g(x) = \begin{bmatrix} C_1 v_1 \\ C_2 v_2 \end{bmatrix},$$
(3.1)

where  $i_d(v) = e^{20v} - 1$  is the current model of the two diodes. *r* is the resistance,  $C_1$  and  $C_2$  are capacitances, as shown in Fig. 1(a). The RHS of system equation (2.1) is  $Bu(t) = \begin{bmatrix} I(t) & 0 \end{bmatrix}^T$ .



Figure 2: The idea of subspace TPWL.

Given a damping sinusoidal current I(t) as the training inputs, the state trajectory of this two-node circuit is a spiral in two-dimensional state space as shown in Fig. 1(b). The TPWL approach selects the expansion points from the state trajectory. At each expansion point, the nonlinear system is approximated by its linearized model in (2.2). The effective regions of the linearization models at the expansion points are represented by the circles in Fig. 2(a). In this case, about 14 expansion points are needed to cover the whole state trajectory.

On the other hand, since the current model of each diode only depends on its voltage, the system equation can be reformulated as

$$\begin{bmatrix} \frac{1}{r} & -\frac{1}{r} \\ -\frac{1}{r} & \frac{1}{r} \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} + \begin{bmatrix} C_1 \\ C_2 \end{bmatrix} \frac{d}{dt} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} + \underbrace{\begin{bmatrix} i_d(v_1) \\ 0 \end{bmatrix}}_{f^1(v_1)} + \underbrace{\begin{bmatrix} 0 \\ i_d(v_2) \end{bmatrix}}_{f^2(v_2)} = \begin{bmatrix} I(t) \\ 0 \end{bmatrix}.$$
(3.2)

To approximate the above system equation, it is more reasonable to use two onedimensional piecewise linear models to approximate  $f^1(v_1)$  and  $f^2(v_2)$ , other than using a two-dimensional piecewise linear approximation for f(x) in the original system equation (2.1). We can first project the trajectory into dimension  $v_1$  and dimension  $v_2$ , and select a few expansion points from the projected trajectories. The nonlinear function  $f^1(v_1)$  (or  $f^2(v_2)$ ) in (3.2) can then be approximated by the piecewise linear model at the expansion points in dimension  $v_1$  (or  $v_2$ ) that is similar to (2.3). As illustrated in Fig. 2(b), 6 expansion points are selected in the interval  $[v_1^l, v_1^r]$  and 3 expansion points in  $[v_2^l, v_2^r]$ . As a result, the number of expansion points can be reduced from 14 to 9 in this case. On the other hand, by decomposing the dependences on the state variables in nonlinear functions, we can see that the effective region of the subspace TPWL model is actually larger than that of the conventional TPWL model, as illustrated in Fig. 2(b), which means higher accuracy of the proposed subspace TPWL approach. This idea of subspace TPWL approximation can be applied to more general nonlinear circuits, based on the fact that 1) the system equation (2.1) of nonlinear circuits is generally formulated by the "stamping" procedure of Modified Nodal Analysis [29]; and 2) the nonlinear model of each nonlinear device is generally local that depends only on the voltages or currents of that device. A system-level and a circuit-level subspace TPWL-MOR will be explained in the next two sections. The system level subspace TPWL-MOR is based on the system partition of Eq. (2.1). For the module-based designs where the nonlinear circuits are designed by some pre-defined modules and the inputs of the same module are very similar, a circuit-level subspace TPWL-MOR is developed to further reduce the model size based on the fact that the multiple instantiations of the same module can share the same reduced-order model.

### 4 System-level subspace TPWL-MOR

As shown in the example of Fig. 2, in order to reduce the number of expansion points, the major idea of system-level subspace TPWL-MOR is to break the nonlinear system in (2.1) into several small subsystems, each of which only depends on a small subset of the state variable x.



Figure 3: Example of nonlinear circuit structure.

It is natural to partition the nonlinear circuits based on its hierarchical/block structure, of which a typical example is shown in Fig. 3. This nonlinear circuit consists of two sub-circuits, and the "connection" is a collection of all the common circuit nodes of the two sub-circuits. In the following, we will use the example of Fig. 3 to develop systemlevel subspace TPWL-MOR. A more detailed discussion about the circuit partition for subspace TPWL-MOR will be given in Subsection 4.4.

### 4.1 Subsystem formulation

In circuit analysis, the system equation (2.1) is generally formulated by the so-called "stamping" procedure of Modified Nodal Analysis [29], where each circuit component is modeled as some linear or nonlinear functions of its node voltages or branch currents in x. These linear or nonlinear functions represent the contribution of the circuit component to the net currents or charges at the corresponding circuit nodes, which is "stamped" into

f and g in Eq. (2.1). As a result, each element in f and g is actually a summation of the current or charge models of the linear/nonlinear circuit components that connected to the corresponding circuit node, as shown in the example of (3.1).

Therefore, based on the circuit structure, we can find a subsystem formulation for the system equation (2.1). Take the circuit structure in Fig. 3 for example, the state variables  $x \in \mathbb{R}^n$  can be grouped into  $x_{b1}$ ,  $x_{b2}$  and  $x_c$  as  $x = \begin{bmatrix} x_c & x_{b1} & x_{b2} \end{bmatrix}^T$ , where  $x_{b1}$ ,  $x_{b2}$  and  $x_c$  are the node voltages or branch currents within Sub-circuit 1, Sub-circuit 2 and the "connection", respectively. By properly arranging the sequence of the equations in (2.1), the function vector f (or g) can be divided into two parts.

$$f = \begin{bmatrix} f_c^{b1} + f_c^{b2} \\ f_{b1} \\ f_{b2} \end{bmatrix} = \underbrace{\begin{bmatrix} f_c^{b1} \\ f_{b1} \\ 0 \end{bmatrix}}_{f^1} + \underbrace{\begin{bmatrix} f_c^{b2} \\ 0 \\ f_{b2} \end{bmatrix}}_{f^2},$$
(4.1)

where  $f_{b1}$  and  $f_{b2}$  represent the currents of the linear/nonlinear resistances within Subcircuit 1 and Sub-circuit 2, respectively.  $f_c^{b1}$  (or  $f_c^{b2}$ ) represents the currents of the resistances between the "connection" and Sub-circuit 1 (or Sub-circuit 2). Note that the circuit model of each circuit element representing its current or charge is only related to its node voltages or branch currents, e.g. the current of a resistance depends on the voltages at its two nodes. We can find that the functions  $f^1$  ( $f^2$ ) in (4.1) only depends on the state variables in  $x_{b1}$  and  $x_c$  ( $x_{b2}$  and  $x_c$ ). The same decomposition can be applied to the function vector g and yields  $g = g^1 + g^2$ .

Therefore, based on the sub-circuit partition in Fig. 3, the LHS of (2.1) can be divided into two parts.

$$\underbrace{\left[\frac{d(g^{1})}{dt} + f^{1}\right]}_{S^{1}(x^{1})} + \underbrace{\left[\frac{d(g^{2})}{dt} + f^{2}\right]}_{S^{2}(x^{2})} = Bu(t),$$
(4.2)

where  $S^1$  and  $S^2$  represent the subsystems of Sub-circuit 1 and Sub-circuit 2, respectively.  $x^k = \begin{bmatrix} x_c & x_{bk} \end{bmatrix}^T$  for k = 1,2 is the vector of state variables in Sub-circuit k and the "connection", and can be expressed as

$$x^k = P_k \cdot x, \tag{4.3}$$

where the "projection"  $P_k \in \mathbb{R}^{n_k \times n}$  (*n* and  $n_k$  are the dimension of *x* and  $x^k$ , respectively) is a submatrix of identity matrix  $I \in \mathbb{R}^{n \times n}$  that selects the  $n_k$  elements from *x*.

In general, if a nonlinear circuit has *M* sub-circuits, the system equation can be divided into *M* parts as

$$\sum_{k=1}^{M} \underbrace{\left[\frac{d(g^k)}{dt} + f^k\right]}_{S^k(x^k)} = Bu(t), \tag{4.4}$$

where  $S^k$  only depends on the state variables in the  $k^{th}$  sub-circuit and its related "connections".

### 4.2 Subspace TPWL approximation

The subspace TPWL approximation is developed based on the subsystems  $S^k$ ,  $k=1,\dots,M$  in (4.4). Take the system formulation in (4.2) for example, similar to the linearization model in (2.3), the piecewise linear model of the subsystem  $S^k$  can be approximated as

$$S^{k}(x^{k}) \approx \sum_{i=1}^{N_{k}} w_{i}^{k}(P_{k} \cdot x) \left[ C_{i}^{k} \frac{dx}{dt} + G_{i}^{k} x + I_{i}^{k} \right], \qquad (4.5)$$

based on the expansion points  $\{\hat{x}_i^k | i=1, \dots, N_k\}$  in the subspace of  $x^k$ , and the corresponding weighting function  $w_i^k(P_k \cdot x)$ .

The expansion points  $\hat{x}_i^k$  in (4.5) are selected from the subspace trajectory as shown in Fig. 2(b). Let the trajectory in the full state space be represented by a set of trajectory points  $\{\hat{x}_i | i=1, \dots, \alpha\}$  obtained from the circuit simulation given the training inputs (same as the conventional TPWL approach). These trajectory points can be projected into the subspace of  $x^k$  by applying the "projection"  $P_k$ , that is  $P_k \cdot \hat{x}_i$  for  $\hat{x}_i$ . The expansion points for the  $k^{th}$  subsystem approximation (4.5) are then selected from the projected trajectory points as described in Algorithm 4.1.

Algorithm 4.1: Expansion Points  $\mathcal{U}_k = \{\hat{x}_i^k | i = 1, \dots, N_k\}$  for the  $k^{th}$  subsystem approximation

2: for i=2 to  $\alpha$  do

- 4: Add the trajectory point  $P_k \hat{x}_i$  to  $\mathcal{U}_k$ .
- 5: end if
- 6: end for

Formally, the error function  $\xi(P_k \hat{x}_i; \mathcal{U}_k)$  in Algorithm 4.1 can be defined by the error of TPWL approximation (4.5) at the projected trajectory point  $P_k \hat{x}_i$  substituting the expansion points in  $\mathcal{U}_k$ , which may need some efforts for calculation. To reduce the computational cost, an alternation definition of  $\xi(P_k \hat{x}_i; \mathcal{U}_k)$  is the distance function as in [6]. If the minimal distance between the projected trajectory point and the selected expansion points in  $\mathcal{U}_k$  is larger than a predefined grid size  $\delta$ , the projected trajectory point  $P_k \hat{x}_i$  is added to set  $\mathcal{U}_k$ . The grid size  $\delta$  can be determined by the nonlinearity of the subsystems. Take the nonlinear function  $e^{4x} - 1$  for example, a grid size  $\delta = 0.2$  can actually produce enough accuracy for the piecewise linear approximation in  $x \in [0,1]$ . As a result, different numbers of expansion points will be determined for different subsystems according to their nonlinearity, which is more efficient and flexible for TPWL approximation.

<sup>1:</sup> Let  $U_k = \{P_k \hat{x}_1\}.$ 

<sup>3:</sup> **if** the error function  $\xi(P_k \hat{x}_i; U_k)$  is larger than the predefined threshold  $\delta$  **then** 

The weighting function  $w_i^k(P_k \cdot x)$  can be defined according to that of the conventional TPWL approaches.

$$w_i^k(P_k \cdot x) = e^{-\beta ||P_k x - \hat{x}_i^k||/m}, \qquad (4.6)$$

where  $\beta$  takes the constant value and *m* is the minimal distance between  $P_k x$  and the expansion points  $\{\hat{x}_i^k\}$  as defined [6]. A typical value of  $\beta$  is 25 as explained in [6].

The nonlinear system in (2.1) can then be approximated by substituting (4.5) into (4.2). In general, for *M* sub-circuits, the nonlinear system can be approximated by

$$\underbrace{\left[\sum_{k=1}^{M} \mathcal{C}_{k}(x)\right]}_{\mathcal{C}(x)} \cdot \frac{dx}{dt} + \underbrace{\left[\sum_{k=1}^{M} \mathcal{G}_{k}(x)\right]}_{\mathcal{G}(x)} \cdot x + \underbrace{\left[\sum_{k=1}^{M} \mathcal{I}_{k}(x)\right]}_{\mathcal{I}(x)} = Bu(t).$$
(4.7)

where  $C_k(x)$ ,  $G_k(x)$  and  $I_k(x)$  are the piecewise linear models for the subsystems,

$$C_k(x) = \sum_{i=1}^{N_k} w_i^k (P_k \cdot x) C_i^k,$$
  

$$G_k(x) = \sum_{i=1}^{N_k} w_i^k (P_k \cdot x) G_i^k,$$
  

$$I_k(x) = \sum_{i=1}^{N_k} w_i^k (P_k \cdot x) I_i^k.$$

#### 4.3 Model order reduction

The projection matrix *V* calculated in the conventional TPWL-MOR is actually an "average" of the Krylov bases at the expansion points, as summarized in Algorithm 2.1. The Krylov bases  $[V_1, V_2]$  produced in Step 3 and 4 guarantee the moment matching of the reduced system [3] at the corresponding expansion point, and the truncated SVD in Step 6 and 7 can be interpreted as finding an "average" of these Krylov bases.

The similar idea can be applied in subspace TPWL-MOR. Since the nonlinear system can be approximated by the linearized system at the projected trajectory points as in (4.7), it is reasonable to use those trajectory points that have been used in the subspace TPWL approximation to calculate the "samples" of Krylov bases. In general, the total number of expansion points for subspace TPWL approximation is from several tens to a few hundreds, which is sufficient to produce enough "samples" to calculate the "average" projection matrix *V* for the reduced system. The reduced order model of (4.7) can then be expressed as

$$\underbrace{\left[\sum_{k=1}^{M} \tilde{\mathcal{C}}_{k}(\tilde{x})\right]}_{\tilde{\mathcal{C}}(\tilde{x})} \cdot \frac{dx}{dt} + \underbrace{\left[\sum_{k=1}^{M} \tilde{\mathcal{G}}_{k}(\tilde{x})\right]}_{\tilde{\mathcal{G}}(\tilde{x})} \cdot x + \underbrace{\left[\sum_{k=1}^{M} \tilde{\mathcal{I}}_{k}(\tilde{x})\right]}_{\tilde{\mathcal{I}}(\tilde{x})} = \tilde{\mathcal{B}}u(t), \tag{4.8}$$

648



Figure 4: Matrix patterns of Jacobian matrices.

where

$$\begin{split} \tilde{\mathcal{C}}_k(\tilde{x}) &= \sum_{i=1}^{N_k} w_i^k (P_k V \tilde{x}) V^T C_i^k V, \\ \tilde{\mathcal{G}}_k(\tilde{x}) &= \sum_{i=1}^{N_k} w_i^k (P_k V \tilde{x}) V^T G_i^k V, \\ \tilde{I}_k(\tilde{x}) &= \sum_{i=1}^{N_k} w_i^k (P_k V \tilde{x}) V^T I_i^k, \end{split}$$

and  $\tilde{B} = V^T B$ . The weighting function for the reduced system is defined accordingly,

$$w_i^k(P_k V \tilde{x}) = e^{-\beta ||P_k V \tilde{x} - \hat{x}_i^k||/m}.$$
(4.9)

For the calculation of reduced-order system, the coefficient matrices  $\tilde{C}_k(\tilde{x})$ ,  $\tilde{\mathcal{G}}_k(\tilde{x})$  and  $\tilde{\mathcal{I}}_k(\tilde{x})$  can be evaluated very efficiently since the Jacobian matrices  $C_i^k$  and  $G_i^k$  are all sparse. Take the circuit partition in Fig. 3 for example, the matrix pattern of Jacobian matrices as shown in Fig. 4 can help reducing the cost of the calculation of reduced order model.

#### 4.4 Circuit partition for subspace TPWL-MOR

In general, the circuit structure can be used to define the circuit partition for subspace TPWL-MOR. Take a XOR circuit for example, its four sub-circuits are shown in Fig. 5. (Node  $d_1$  in Sub-Circuit 1 is connected to node  $d_1$  in Sub-Circuit 3, node  $d_3$  in Sub-Circuit 2 is connected to node  $d_3$  in Sub-Circuit 3, etc..) Each sub-circuit has its own functionality, that is, Sub-Circuit 1 calculates the inversion of A, Sub-Circuit 2 calculates the  $\overline{B}$  and B, Sub-Circuit 3 evaluates  $\overline{A \otimes B}$ , Sub-Circuit 4 inverts the output of Sub-Circuit 3 and gets  $A \otimes B$ . Based on this circuit partition, the subspace TPWL-MOR can reduce the number of expansion points and greatly improve the accuracy as will be shown in Section 7.

Actually, the circuit partition for subspace TPWL-MOR can be carried out automatically. One observation from Fig. 2 is that, if two state variables  $x_i$  and  $x_j$  are uncorrelated, assigning the two corresponding circuit nodes to two different sub-circuit will help reducing the number of expansion points in TPWL-MOR. One the other hand, if the two circuit nodes are fully correlated, the state trajectory in Fig. 2 will become a straight line. Subspace TPWL-MOR that separates the two correlated circuit nodes will not reduce the number of expansion points. Based on this observation, we propose a "two-step" procedure for finding the proper circuit partition for subspace TPWL-MOR.

**1)** Given the training inputs of the circuit, the correlation of two circuit nodes can be evaluated by their simulated waveforms  $x_i(t)$ ,  $x_i(t)$ .

$$c_{ij} = \frac{\left|\int x_i(t)x_j(t)dt - \int x_i(t)dt \cdot \int x_j(t)dt\right|}{\sqrt{\left[\int x_i^2(t)dt - \left(\int x_i(t)dt\right)^2\right] \left[\int x_j^2(t)dt - \left(\int x_j(t)dt\right)^2\right]}},$$
(4.10)

which actually defines the similarity for each pair of the nodes.

The circuit nodes can first be categorized by the spectral clustering approach [31] according to their correlation function  $c_{ij}$ . Take the XOR circuit for example, the circuit nodes will be categorized into three groups, that is  $D_1 = \{d_0, d_1\}$ ,  $D_2 = \{d_2, d_3, d_4\}$  and  $D_3 = \{d_5, d_6\}$ .

**2)** Based on the results of Step 1), we can assign a tag to each circuit components representing its "dimensionality". For the example of Fig. 5, the tag of MOSFET  $M_0$  is  $\{D_1\}$  and the tag of MOSFET  $M_6$  is  $\{D_1, D_2, D_3\}$ , etc.. The circuit partition can then be defined by grouping the circuit components with the same tags into the same sub-circuit.



Figure 5: Circuit partition in XOR circuit example.

It is very interesting that the resulting circuit partition of XOR circuit based on this "two-step" procedure is actually exactly the same as Fig. 5.

## 5 Circuit-level subspace TPWL-MOR

An alternative approach to develop the subspace TPWL-MOR is to build the reducedorder macromodel for each sub-circuit, assuming that the input of each sub-circuit can be predetermined by the "training inputs". The advantage is that the hierarchical/block structure of the nonlinear circuits can be preserved, and furthermore, the sub-circuits with the same structure can share one macromodel which will further reduce the complexity of the TPWL model.

### 5.1 Port-preserving MOR for sub-circuits

- - -

One of the major problems of conventional MOR approaches is that the circuit structure like the external ports of the sub-circuits is not preserved in the reduced system. Once the macromodels of the sub-circuits are obtained, it is difficult to combine them together to generate the reduced-order model of the top-level circuit. A port-preserving MOR approach for linear circuits is developed in [30]. The similar idea is applied here to derive the reduced-order macromodels of the nonlinear sub-circuits.

As shown in Fig. 3, each sub-circuit is an individual circuit, of which the external ports include some ports of the top-level circuit and some circuit nodes in the "connection". The system equation of each sub-circuit is similar to (2.1), i.e.

$$\frac{d(g^k(x^k(t)))}{dt} + f^k(x^k(t)) = B^k u^k(t),$$
(5.1)

where  $x^k(t) \in \mathbb{R}^{n_k}$  is the vector of the state variables of the sub-circuit, and  $g^k$ ,  $f^k: \mathbb{R}^{n_k} \to \mathbb{R}^{n_k}$  are the nonlinear vector-valued functions of the sub-circuit. When two sub-circuits are designed from the same module, they will have the same system equation (5.1).

The training inputs of the sub-circuit are represented by some current or voltage sources that connected to the external ports of the sub-circuit, and are formulated as  $B^k u^k(t)$  in the system equation (5.1). Let  $\{\hat{x}_j^k; j=1, \cdots, N'_k\}$  denote some expansion points at the simulated state trajectory. The TPWL approximation of the sub-circuit can be formulated as

$$\sum_{i=1}^{N'_k} w_j^k(x^k) (C_j^k \frac{dx^k}{dt} + G_j^k x^k + I_j^k) = B^k u^k(t),$$
(5.2)

where  $C_j^k$  (or  $G_j^k$ ) are the Jacobian matrices of  $g^k$  (or  $f^k$ ) evaluated at the expansion point  $\hat{x}_j^k$ ,  $I_j^k = f^k(\hat{x}_j^k) - G_j^k \hat{x}_j^k$ , and  $w_j^k(x^k)$  is the weighting function.

A projection matrix  $V^k \in \mathbb{R}^{n_k \times q_k}$  can be calculated for system (5.2) based on the conventional TPWL-MOR as described in Algorithm 2.1. (Here  $q_k$  represents the reduced order.) Suppose  $p_k$  is the number of the ports of sub-circuit that need to be preserved after the reduction. For simplicity, we assume that the first  $p_k$  rows of the circuit equation (5.1) represent the Kirchhoff Current Law (KCL) equations formulated at the  $p_k$  ports of the sub-circuit. The projection matrix  $V^k$  can then be split into two parts accordingly.

$$V^k = \left[ \begin{array}{c} V_1^k \\ V_2^k \end{array} \right],$$

where  $V_1^k \in \mathbb{R}^{p_k \times q_k}$  and  $V_2^k \in \mathbb{R}^{(n_k - p_k) \times q_k}$ .

In order to preserve the ports in the reduced-order model of sub-circuit, a new projection matrix  $W^k$  is defined as

$$W^k = \left[ \begin{array}{cc} I_{p_k \times p_k} & 0 \\ 0 & V_2^k \end{array} \right],$$

where  $I_{p_k \times p_k}$  represents a  $p_k \times p_k$  identity matrix. The state vector  $x^k$  can also be divided into two parts as  $x^k = \begin{bmatrix} x_1^k & x_2^k \end{bmatrix}^T$ , where  $x_1^k$  represents the nodal voltages at the subcircuit ports, and  $x_2^k$  represents the nodal voltages of internal nodes. The new state vector  $\tilde{x}^k$  of the reduced-order model of the sub-circuit, which can be viewed as projecting  $x^k$ onto the subspace  $W^k$ , is expressed as

$$\tilde{x}^{k} = W^{k^{T}} x^{k} = \begin{bmatrix} I_{p_{k} \times p_{k}} & 0\\ 0 & V_{2}^{k^{T}} \end{bmatrix} \begin{bmatrix} x_{1}^{k}\\ x_{2}^{k} \end{bmatrix} = \begin{bmatrix} \tilde{x}_{1}^{k}\\ \tilde{x}_{2}^{k} \end{bmatrix},$$
(5.3)

where  $\tilde{x}_2^k = V_2^{k^T} x_2^k$ . From Eq. (5.3), we can see that by using the projection matrix *W*, the ports of the sub-circuit is preserved in the reduced-order model, while the internal nodes of the sub-circuit is reduced.

The reduced-order model of the sub-circuit can then be formulated as

$$\sum_{j=1}^{N'_{k}} \tilde{w}_{j}^{k}(\tilde{x}^{k}) (\tilde{C}_{j}^{k} \frac{d(\tilde{x}^{k})}{dt} + \tilde{G}_{j}^{k} \tilde{x}^{k} + \tilde{I}_{j}^{k}) = \tilde{B}^{k} u^{k}(t),$$
(5.4)

where  $\tilde{w}_{j}^{k}(\tilde{x}^{k}) = w_{j}^{k}(W^{k}\cdot\tilde{x}^{k})$  is the weighting function of the reduced system,  $\tilde{C}_{j}^{k} = W^{k^{T}}C_{j}^{k}W^{k}$ ,  $\tilde{G}_{j}^{k} = W^{k^{T}}G_{j}^{k}W^{k}$ ,  $\tilde{I}_{j}^{k} = W^{k^{T}}I_{j}^{k}$  and  $\tilde{B}_{k} = W^{k^{T}}B_{k}$ .

Since the ports of the sub-circuits are preserved in the reduced-order models, the reduced-order model for the top-level circuit can then be derived by directly combining the reduced-order models of the sub-circuits, as will be described in the next subsection.

### 5.2 Reduced-order model for the top-level circuit

The macromodels of sub-circuits can be viewed as physically connected in the reducedorder model of the top-level circuits. Based on the "stamping" procedure of Modified Nodal Analysis, the reduced-order model of the top-level circuit can be generated by "stamping" the reduced-order macromodels of the sub-circuits into its system equation.

Let *q* denote the number of nodes in the reduced-order model of the top-level circuit, and  $\tilde{x}$  is the state vector. Note that the state variables of the macromodel of the sub-circuit in (5.4) are only a subset of the top-level circuit. The first step of "stamping" procedure is to augment the reduced-order macromodels in (5.4) to *q* dimensional systems. The

652

corresponding q dimensional augmented system can be expressed as

$$\sum_{j=1}^{N'_k} \tilde{v}_j^k(\tilde{x}) (\tilde{\mathcal{C}}_j^k \frac{d\tilde{x}}{dt} + \tilde{\mathcal{G}}_j^k \tilde{x} + \tilde{\mathcal{I}}_j^k) = \tilde{\mathcal{B}}^k u(t).$$
(5.5)

The augmented matrices  $\tilde{C}_j^k$ ,  $\tilde{\mathcal{G}}_j^k$ , and the augmented vector  $\tilde{\mathcal{I}}_j^K$  are calculated from  $\tilde{C}_j^k$ ,  $\tilde{G}_j^k$  and  $\tilde{I}_j^k$  as

$$\tilde{\mathcal{C}}_{j}^{k}(\mathbf{ind}_{k},\mathbf{ind}_{k}) = \tilde{\mathcal{C}}_{j}^{k}, \qquad (5.6a)$$

$$\tilde{\mathcal{G}}_{j}^{k}(\mathbf{ind}_{k},\mathbf{ind}_{k}) = \tilde{C}_{j}^{k}, \qquad (5.6b)$$

$$\tilde{\mathcal{I}}_{j}^{k}(\mathbf{ind}_{k}) = \tilde{I}_{j}^{k}, \qquad (5.6c)$$

where  $\operatorname{ind}_k$  represents the indices of the nodes of the sub-circuit in the vector of state variables  $\tilde{x}$ . The augmented matrix  $\tilde{\mathcal{B}}^k$  is calculated similarly. The new weighting function  $\tilde{v}_j^k(\tilde{x}) = w_j^k(P_k \cdot W^k \tilde{x})$  is calculated from the weighting function  $\tilde{w}_j^k(\tilde{x}^k)$  in (5.4) based on the "projection"  $P_k$ .

The reduced-order model for the top-level circuit is finally obtained by combining all the augmented systems for the sub-circuits together.

$$\underbrace{\left[\sum_{k=1}^{M}\sum_{j=1}^{N'_{k}} \tilde{v}_{j}^{k}(\tilde{x})\tilde{\mathcal{C}}_{j}^{k}\right]}_{\tilde{\mathcal{C}}'(\tilde{x})} \cdot \frac{d\tilde{x}}{dt} + \underbrace{\left[\sum_{k=1}^{M}\sum_{j=1}^{N'_{k}} \tilde{v}_{j}^{k}(\tilde{x})\tilde{\mathcal{G}}_{j}^{k}\right]}_{\tilde{\mathcal{G}}'(\tilde{x})} \cdot \tilde{x} + \underbrace{\left[\sum_{k=1}^{M}\sum_{j=1}^{N'_{k}} \tilde{v}_{j}^{k}\tilde{\mathcal{I}}_{j}^{k}\right]}_{\tilde{\mathcal{I}}'(\tilde{x})} = \underbrace{\left[\sum_{k=1}^{M}\tilde{\mathcal{B}}^{k}\right]}_{\tilde{\mathcal{B}}'} \cdot u(t).$$
(5.7)

Instead of applying the model order reduction to the top-level circuit, the reducedorder model of the original nonlinear circuit is constructed by combining the macromodels of sub-circuits. For module-based designs, the model size can be further reduced taking advantage of the multiple instantiations of the same module in the circuit. Furthermore, from Eq. (5.6) we can find that, the augmented matrices  $\tilde{C}_j^k$  and  $\tilde{G}_j^k$  in the circuit-level subspace TPWL-MOR are not dense but significantly sparse. Therefore, the reduced-order model of the circuit tends to be more sparse than the conventional TPWL model, and the simulation of the reduced-order model will be more efficient.

### 6 Remarks

One common assumption for TPWL-MOR approaches is that the simulated state trajectory has to cover most of the reachable state space. This depends on the selection of the training inputs. Generally, they can be defined according to the current or voltage sources that are connected to the input nodes of the circuits, or some pre-knowledge of the circuits. Given the training inputs, the simulation of state trajectory may be computationally costly. A fast approximate approach was proposed in [6] that is able to reduce the simulation time for the state trajectory. As will be demonstrated by the example of asynchronous counter in Section 7, the circuit partition in circuit-level subspace TPWL-MOR can also help reducing the computational cost for state trajectory simulation. Nevertheless, the state trajectory is calculated only once in TPWL-MOR and the resulting reduced model can then be used in the circuit design for analyzing the circuit performance.

An additional premise for subspace TPWL-MOR is that the nonlinearity of the system can be efficiently separated, which actually requires the nonlinearity of each sub-circuit to be dependent locally on the state variables. Nevertheless, this is generally true for nonlinear circuits and many other nonlinear systems like bio-chemical systems, etc.. It is also worth noting that, in circuit-level subspace TPWL-MOR, the number of ports of each module is required to be smaller than the number of inner nodes ( $p_k \ll n_k$ ). This is because the ports will be preserved in the reduced model and too many ports will comprise the efficiency of model order reduction. For system-level TPWL-MOR, this requirement can be relaxed to that  $\frac{M \times n_{kmax}}{n}$  is bounded, where *M* is the number of subcircuits, *n* is the number of state variables in the top-level circuit and  $n_{kmax} = \max\{n_k\}$  is the largest number of state variables in sub-circuits (cf. Appendix).

The two subspace TPWL-MOR approaches can actually be incorporated to solve the model order reduction problem of large scale nonlinear circuits. In this case, the circuit-level subspace TPWL-MOR is applied as the top level MOR approach, while the system-level subspace TPWL-MOR combined with port-preserving technique is applied to the MOR of each sub-circuit module. Since the external ports of sub-circuits are preserved in the circuit-level TPWL-MOR, one important feature of the proposed subspace TPWL-MOR approaches is that it can also preserve the hierarchical/block structure of the non-linear circuits, which is quite important for nonlinear circuit design.

## 7 Numerical results

Numerical results will be illustrated in this section to demonstrate the efficiency and accuracy of the proposed subspace TPWL-MOR approaches. All the experiments are carried out on a UNIX workstation with 3.0-GHz CPU and 4-GB RAM.

### 7.1 Convergence of subspace TPWL approximation

The convergence behavior of subspace TPWL approximation is first illustrated using the simple example of two-node circuit in Fig. 1(a). In this test case, the training input I(t) for the two TPWL approximations is a damping sinusoidal function  $20\sin(t)\exp(-0.05t)$ , which is also used as the testing input to estimate the errors of two TPWL approximations.

The standard deviation errors of the voltage waveform at node 2 in the conventional and the subspace TPWL approximations are compared in Fig. 6. With the number of expansion points increases, the subspace TPWL approximation converges much faster



Figure 6: The standard deviation errors for the two-nodes circuit w.r.t. points number.

than the conventional TPWL approximation. The convergence rate is about  $\mathcal{O}(N^{-2.1})$  for the subspace TPWL approximation compared with  $\mathcal{O}(N^{-1.8})$  for the conventional TPWL approximation in this test case.

### 7.2 Efficiency and accuracy of subspace TPWL-MOR

The efficiency and accuracy of the subspace TPWL-MOR approach are further validated through three test cases that come from real designs, including a XOR circuit, a nonlinear transmission line and an asynchronous counter circuit. The system-level TPWL-MOR approach is applied in the first two test cases, while the last one is an example of module-based design and the circuit-level TPWL-MOR approach is applied.

#### 7.2.1 XOR

The first test case is a XOR gate as shown in Fig. 5. In this experiment, a sinusoidal signal and a pulse signal are applied to XOR gate's two inputs as the training inputs. The expansion points are extracted from the state trajectories in the full state space for the conventional TPWL-MOR and the sub-state space for the proposed system-level sub-space TPWL-MOR. To compare the accuracy of the two methods, a sinusoidal signal and a pulse signal with different frequency and pulse width are used as testing inputs to the nonlinear circuit.

The results in Fig. 7(a) have shown that, even with all 499 trajectory points from the training trajectory, the conventional TPWL approach (before model order reduction) fails to match the result of SPICE simulation. The functionality of the conventional TPWL model is totally incorrect. On the other hand, the accuracy can be greatly improved by the subspace TPWL-MOR approach. The nonlinear circuit can be reduced from 35 nodes (including internal nodes of the MOSFETs) to 18 nodes by subspace TPWL-MOR, and the simulation result fits SPICE simulation very well as shown in Fig. 7(b). In this case, the



Figure 7: XOR: time domain responses of the TPWL models compared with SPICE simulation.

number of expansion points that are used in subspace TPWL model is 90+23+26+38=177 for the four sub-circuits.

#### 7.2.2 Nonlinear transmission Line

The second test case is a nonlinear transmission line as shown in Fig. 8, which is composed of several repeated units of resistor, capacitor and diode. All the values of resistors and capacitors are set to 1, and the constitutive equation of diode is  $i_d(v) = exp(40v) - 1$ . In this test case, the number of units is equal to 400, and the whole circuit is divided into 5 sub-circuits each containing 80 units.



Figure 8: Schematic of the nonlinear transmission line.

A square wave current with duty ratio 0.25 is used as the training input for the subspace and the conventional TPWL-MOR approaches, while a square wave current with duty ratio 0.75 is used as the testing input. We have compared the number of expansion points that is needed to achieve the general accuracy for both TPWL models. The conventional TPWL-MOR method uses totally 731 expansion points, while the systemlevel subspace TPWL-MOR method uses 33+27+23+13+13=109 for the five sub-circuits, respectively. Totally, the subspace TPWL model uses only about 1/7 expansion points compared with the conventional TPWL model.

| test cases                  | conventional TPWL model | subspace TPWL model |
|-----------------------------|-------------------------|---------------------|
| Nonlinear Transmission Line | 594KB                   | 127KB               |
| Asynchronous Counter        | 21.5MB                  | 319KB               |

Table 1: Memory cost (model size) of reduced TPWL models.

For model order reduction, the circuit is reduced from 402 nodes to 10 nodes in both the subspace and the conventional TPWL-MOR. From Table 1, the model size of subspace TPWL model is 127KB compared with 594KB of the conventional TPWL model. The time domain responses of two TPWL models are compared with SPICE simulation in Fig. 9(a). More careful comparison of the accuracy of two TPWL models are shown by their logarithm errors in Fig. 9(b). The results in Fig. 9 show that even with 1/7 expansion points, the subspace TPWL-MOR can be more accurate than the conventional TPWL-MOR approach. This is mainly because of the larger effective region of the subspace TPWL model. The simulation time in this test case is 8.0 seconds for the subspace TPWL model compared with 23.3 seconds for the conventional TPWL model.



Figure 9: Nonlinear transmission line: time domain responses of TPWL models compared with SPICE simulation.

### 7.2.3 Asynchronous counter

The last test case is a module-based design of the asynchronous counter as shown in Fig. 10(a). The circuit is composed of four D-Flip-Flops (DFF), of which the inputs are clock signals. In the circuit-level subspace TPWL-MOR approach, each DFF is considered as a nonlinear sub-circuit as shown in Fig. 10(b) and shares the same macromodel. A square wave clock signal is used as the training input of the full circuit in Fig. 10(a) for the conventional TPWL-MOR approach, and the sub-circuit of DFF in Fig. 10(b) for the subspace TPWL-MOR approach.

In our experiment, the conventional TPWL-MOR uses totally 296 expansion points while the circuit-level subspace TPWL-MOR can use only 31 expansion points for model-



Figure 10: Asynchronous counter test case.

ing the DFF sub-circuit. Furthermore, unlike the coefficient matrices of the reduced-order model in conventional TPWL-MOR which are all full matrices, the coefficient matrices in reduced circuit-level subspace TWPL model are pretty sparse. The matrix pattern of the coefficient matrices (*C* and *G*) in this test case is shown in Fig. 11. As a result, the size of the reduced-order model from the subspace TPWL-MOR can be orders of magnitude smaller than that of the conventional TPWL-MOR as listed in Table 1.



Figure 11: Circuit matrices in circuit-level subspace TPWL-MOR are sparse.

For the model order reduction, the nonlinear circuit is reduced from 84 nodes to 68 nodes by the conventional TPWL-MOR approach, and can be reduced to 64 nodes by subspace TPWL-MOR approach. The results of the circuit response of the testing input are shown in Fig. 12(a), from which we can see that the subspace TPWL model is still slightly more accurate than the conventional TPWL model. For the simulation time of the subspace TPWL model, there would be some overheads due to the extra calculation that is needed to "stamp" the reduced macromodels into the system coefficient matrices in Fig. 11. The simulation time for the circuit-level subspace TPWL model is 59.9 seconds compared with 85.2 seconds of the conventional TPWL model.

Furthermore, in TPWL-MOR approaches, the training input must be selected very careful in order to cover as much reachable state space as possible. For each DFF in Fig. 10(a), the circuit needs to experience both "falling" and "rising" during the training, that is two periods of clock signal input for a DFF. In general, for a chain of *M* DFFs,



Figure 12: Asynchronous counter: time domain responses of TPWL models compared with SPICE simulation.

the training input of 2<sup>*M*</sup> clock periods is needed to cover the state space, which increases exponentially w.r.t. the circuit size. In this test case, the conventional TPWL-MOR approaches needs a training input with at least 16 clock periods, otherwise the result of the resulting model will have totally incorrect functionality. On the other hand, the circuitlevel subspace TPWL-MOR can reduce the training time to only two clock periods, as shown in Fig. 12(b). Therefore, the modeling time of the nonlinear circuits can also be reduced by the circuit-level subspace TPWL-MOR approach. This becomes especially important when the nonlinear circuit consists of a series of sub-circuits and the last one is extremely difficult to be driven from the training inputs.

## 8 Conclusion

The system-level and the circuit-level subspace TPWL-MOR approaches are proposed in this paper for the model order reduction of nonlinear circuits. Both of these approaches are based on the idea of separating the nonlinear dependence on state variables in the system formulation, which can greatly reduce the number of expansion points as well as improve the accuracy. The system-level subspace TPWL-MOR approach can be applied to general nonlinear circuits. For module-based designs, the circuit-level subspace TPWL-MOR approach can be applied to further reduce the model size taking advantage of the multiple instantiations of the same module in the circuit.

The advantages of the proposed subspace TPWL-MOR approaches have been demonstrated in the paper. The model size is greatly reduced compared with the conventional TPWL-MOR method. In the module-based design of asynchronous counter, the model size of the subspace TPWL model is about two orders of magnitude smaller than that of the conventional TPWL model as shown in Table 1. This is also the main reason that the simulation of subspace TPWL model can be much faster than the conventional TPWL model. In addition to the reduction in model size, the accuracy of subspace TPWL model is also improved due to larger effective region in the state space. Last but not least, the proposed subspace TPWL-MOR can preserve the hierarchical/block structure of nonlinear circuits, which is very helpful in nonlinear circuit design.

## Acknowledgments

This research is supported in part by the National Basic Research Program of China under the grant 2011CB309701, the NSFC research projects 61106032, 61006030, 60976034, 61076033 and 61125401, the National Major Science and Technology special project 2011ZX01034-005-001-03 of China during the 12-th five-year plan period, the Doctoral Program Foundation of the Ministry of Education of China 200802460068, and the Program for Outstanding Academic Leader Shanghai.

## Appendix: A simple analysis of system-level TPWL-MOR

A simple analysis is presented here to show how the circuit partition affects the efficiency of the system-level subspace TPWL-MOR. Let M denotes the number of sub-circuits, nand  $n_k$  denote the number of state variables of the top-level circuit and the  $k^{th}$  sub-circuit, respectively. Suppose the number of expansion points that is needed for the conventional TPWL approach can be represented as  $N_T = (\eta m)^n$ , where  $m^n$  is the number of expansion points of the full tensor product method,  $\eta$  is the average coverage rate of the trajectory at each dimension, and  $\eta m$  actually represents the average number of expansion points of the conventional TPWL-MOR for each state variable.

Similarly, the number of expansion points for the  $k^{th}$  sub-circuit in the system-level subspace TPWL-MOR can be represented as  $(\eta_k m)^{n_k}$ . Therefore, the efficiency of the system-level subspace TPWL-MOR can be represented by the improving rate *r* 

$$r = \frac{(\eta m)^{n}}{\sum_{k=1}^{M} (\eta_{k} m)^{n_{k}}}.$$
 (A.1)

Suppose  $\frac{M \times n_{kmax}}{n}$  is less than some constant, let's say *B*. The lower bound of this improving rate can be formulated as

$$r \ge \frac{(\eta m)^{n}}{M(\eta_{kmax}m)^{n_{kmax}}} \ge \frac{1}{B} \cdot \frac{n_{kmax}(\eta m)^{n}}{n(\eta_{kmax}m)^{n_{kmax}}}$$
$$= \underbrace{\frac{1}{B} \cdot \frac{n_{kmax}}{n}(\eta m)^{n-n_{kmax}}}_{R_{1}(n_{kmax})} \times \underbrace{\left(\frac{\eta}{\eta_{kmax}}\right)^{n_{kmax}}}_{R_{2}(\eta_{kmax},n_{kmax})},$$
(A.2)

where  $\eta_{kmax} = \max{\{\eta_k\}}$  and  $n_{kmax} = \max{\{n_k\}}$ .

When  $\eta m \ll n$ , which is generally true,  $R1(n_{kmax})$  is a decreasing function in [1,n] and takes the maximum value at  $n_{kmax} = 1$ . As a consequence, smaller sub-circuit size is preferred to get a better improving rate. On the other hand, the average coverage rate of the trajectory in the sub-state space ( $\eta_k$ ) is generally larger than that in the full state space ( $\eta$ ). Actually, this is the reason that the subspace TPWL-MOR can be more accurate than the conventional TPWL-MOR. But as to the improving rate, the second term  $R_2(\eta_{kmax}, n_{kmax})$  presents a loss of the improving rate in circuit partition.

One extreme example is when two state variables  $x_1$  and  $x_2$  are fully correlated. In this two-dimensional state space, the trajectory is a straight line and  $\eta = 1/\sqrt{m}$ . But when projecting the trajectory to two directions of  $x_1$  and  $x_2$ , we could see that the coverage rate will increase to  $\eta_k = 1$ . These correlated state variables are grouped in the circuit partition algorithm proposed in Section 4.4, which makes sure that the value of  $\frac{\eta}{\eta_{kmax}}$  is not too small and the improving rate *r* is greater than 1.

So we can conclude from (A.2) that, when *B* is not a very large number, the systemlevel subspace TPWL-MOR can generally be more efficient ( $r \gg 1$ ) and accurate ( $\eta_k > \eta$ ) than the conventional TPWL-MOR.

### References

- L. Pillage and R. Rohrer, Asymptotic waveform evaluation for timing analysis, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 4, pp. 352–366, April 1990.
- [2] P. Feldman and R. Freund, Efficient linear circuit analysis by Padé approximation via the Lanczos process, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 14, no. 5, pp. 639–649, May 1995.
- [3] A. Odabasioglu, M. Celik, and L. Pileggi, PRIMA: Passive reduced-order interconnect macromodeling algorithm, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 17, no. 8, pp. 645–654, Aug. 1998.
- [4] R. Freund, SPRIM: Structure-preserving reduced-order interconnect macromodeling, in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, Nov. 2004, pp. 80–87.
- [5] J. R. Phillips, Projection frameworks for model reduction of weakly nonlinear systems, in Proceedings of IEEE/ACM Design Automation Conference, 2000, pp. 184–189.
- [6] M. Rewienski and J. White, A trajectory piecewise-linear approach to model order reduction and fast simulation of nonlinear circuits and micromachined devices, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 155–170, Feb. 2003.
- [7] C. Gu, Model order reduction of nonlinear dynamical systems, Dissertation, University of California, Berkeley, 2011.
- [8] D. Vasilyev, M. Rewienski, and J. White, A TBR-based trajectory piecewise-linear algorithm for generating accurate low-order models for nonlinear analog circuits and MEMS, in Proceedings of IEEE/ACM Design Automation Conference, 2003, pp. 490–495.
- [9] C. Gu and J. Roychowdhury, ManiMOR: Model reduction via projection onto nonlinear manifolds, with applications to analog circuits and biochemical systems, in Proceedings of the IEEE International Conference on Computer-Aided Design, pp. 85–92, Nov. 2008.

- [10] D. Vasilyev, M. Rewienski, and J. White, Perturbation analysis of TBR model reduction in application to trajectory-piecewise linear algorithm for MEMS structures, in NSTI Nanotechnology Conference, vol. 2, 2004, pp. 434–437.
- [11] Y. Wang, H. Song, K. Pant, H. Peabody, J. Ku, and C. D. Butler, A projection-based model order reduction simulation tool for spacecraft thermal analysis, in Thermal and Fluids Analysis Workshop, Aug. 2011.
- [12] D. Gratton and K. Willcox, Reduced-order, trajectory piecewise-linear models for nonlinear computational fluid dynamics, in 34th AIAA Fluid Dynamic Conference, June 2004.
- [13] M. A. Cardoso and L. J. Durlofsky, Linearized reduced-order models for subsurface flow simulation, Journal of Computational Physics, vol. 229, pp. 681–700, February 2010.
- [14] C. Long, L. Simonson, W. Liao, and L. He, Microarchitecture configurations and floorplanning co-optimization, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 15, pp. 830–841, July 2007.
- [15] C. Long, L. Simonson, W. Liao, and L. He, Floorplanning optimization with trajectory piecewise-linear model for pipelined interconnects, in Proceedings of IEEE/ACM Design Automation Conference, July 2004, pp. 640–645.
- [16] X. Lai and J. Roychowdhury, TP-PPV: Piecewise nonlinear, time-shifted oscillator macromodel extraction for fast, accurate PLL simulation, in IEEE/ACM International Conference on Computer-Aided Design, November 2006, pp. 269–274.
- [17] H. Yu, X. Liu, H. Wang, and S. X.-D. Tan, A fast analog mismatch analysis by an incremental and stochastic trajectory piecewise linear macromodel, in 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC), January 2010, pp. 211–216.
- [18] N. Dong and J. Roychowdhury, General-purpose nonlinear model-order reduction using piecewise-polynomial representations, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 2, pp. 249–264, February 2008.
- [19] N. Dong and J. Roychowdhury, Piecewise polynomial nonlinear model reduction, in Proceedings of IEEE/ACM Design Automation Conference, Jun. 2003, pp. 484–489.
- [20] K. Zong, F. Yang, and X. Zeng, A wavelet-collocation-based trajectory piecewise-linear algorithm for time-domain model-order reduction of nonlinear circuits, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 57, pp. 2981–2990, November 2010.
- [21] X. Pan, F. Yang, X. Zeng, and Y. Su, An efficient transistor-level piecewise-linear macromodeling approach for model order reduction of nonlinear circuits, in Design, Automation and Test in Europe Conference, March 2010, pp. 1673–1676.
- [22] J. P. Amorocho and H. Fabbender, Time dependent weight functions for the trajectory piecewise-linear approach, in European Conference on Mathematics for Industry, vol. 2, July 2010, pp. 434–437.
- [23] B. N. Bond and L. Daniel, Stabilizing schemes for piecewise-linear reduced order models via projection and weighting functions, in Proceedings of the IEEE International Conference on Computer-Aided Design, pp. 860–867, 2007.
- [24] T. Bechtold, M. Striebel, K. Mohaghegh, and E. Maten, Nonlinear model order reduction in nanoelectronics: Combination of POD and TPWL, PAMM, vol. 8, pp. 10057–10060, 2008.
- [25] S. K. Tiwary and R. A. Rutenbar, Faster, parametric trajectory-based macromodel via localized linear reductions, in IEEE/ACM International Conference on Computer-Aided Design, November 2006, pp. 876–883.
- [26] C. Gu and J. Roychowdhury, Manifold construction and parameterization for nonlinear manifold-based model reduction, in Proceedings of the IEEE Asia and South-Pacific Design Automation Conference, pp. 205–210, 2010.

X. Pan et al. / Commun. Comput. Phys., 14 (2013), pp. 639-663

- [27] C. Gu, QLMOR: A projection-based nonlinear model order reduction approach using quadratic-linear representation of nonlinear systems, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 9, pp. 1307–1320, Sep. 2011.
- [28] J. Foo and G. E. Karniadakis, Multi-element probabilistic collocation method in high dimensions, Journal of Computation Physics, vol. 229, pp. 1536–1557, 2010.
- [29] C.-W. Ho, A. E. Ruehli, and P. A. Brennan, The modified nodal approach to network analysis, IEEE Transactions on Circuits and Systems, vol. 22, pp. 504–509, June 1975.
- [30] F. Yang, X. Zeng, Y. Su, and D. Zhou, RLCSYN: RLC equivalent circuit synthesis for structure-preserved reduced-order model of interconnect, in IEEE International Symposium on Circuits and Systems, May 2007, pp. 2710–2713.
- [31] Y. Su, F. Yang, X. Zeng, AMOR: An efficient aggregating based model order reduction method for many-terminal interconnect circuits, ACM/IEEE Design Automation Conference, pp. 295-300, Jun. 2012.