Hierarchical Optimization of Large-Scale Analog/Mixed-Signal Circuits Based-on Pareto-Optimal Fronts

Jun Zou

Die Dissertation wurde am 06.02.2009 bei der Technischen Universität München eingereicht und durch die Fakultät für Elektrotechnik und Informationstechnik am 07.06.2009 angenommen.
路漫漫其修远兮，
吾将上下而求索。
屈原

My way ahead is long long one oh,
I will seek my Beauty high and low.
zu yuan
The thesis is the result from my work as a research assistant at the institute for Electric Design Automation, Technische Universität München.

I would like to take this opportunity to express my sincere gratitude to many individuals who have given me a lot of supports during my three-year PhD study.

With utmost respect and gratitude, I wish to thank my advisor Dr. Helmut Gräb for his patience, valuable guidance and encouragement throughout the entire research. He never got tired of discussing my ideas and patiently proofread my publications.

I would also like to thank Professor Ulf Schlichtmann for giving me the chance to work at this institute. He fostered a creative atmosphere and a stimulating work environment at the institute that were essential for the successful completion of my research work.

A grateful word of thanks also to the committee member Professor Schmitt-Landsiedel for her interest in my work.

I am very grateful to my “analog” partner Daniel Müller. His collaboration and support contributed significantly to the successful completion of my research. And also thanks to Tobias Massier, who is generous with his time to help me.

Thanks to Dr. Michael Pronath, Dr. Volker Glöckel, Dr. Bernd Obermeier and Dr. Frank Schenkel for technical supporting on WiCkeD.

Thanks to Infineon Technologies AG and Qimonda AG for their financial support.

Finally, thanks to my parents Qinjuan Cao and Heqing Zou for their love and continuous support. In deepest appreciation, I dedicate my work and this dissertation to my wife Ying Zhang.

Munich, Jan. 2009
Jun Zou
Contents

1 Introduction ........................................... 1
  1.1 Motivation ....................................... 3
    1.1.1 Indispensable Analog Integrated Circuits ....... 3
    1.1.2 Challenges in Design & Optimization of Analog Circuits .... 4
    1.1.3 Analog Bottleneck .................................. 5
  1.2 State of the Art ................................... 6
    1.2.1 Analog/Mixed Signal Design Flow ................. 6
    1.2.2 Design Process on Analog Circuits ............... 8
    1.2.3 Automatic Sizing Method on Analog Circuits ....... 9
      1.2.3.1 Knowledge-Based Sizing Approaches .......... 9
      1.2.3.2 Optimization-Based Sizing Approaches ........ 11
    1.2.4 Optimization Methodology for Large-Scale Analog Circuits .... 12
      1.2.4.1 Flat Optimization Methodology ............ 12
      1.2.4.2 Hierarchical Optimization Methodology ....... 13
  1.3 Objectives of the Work ......................... 14

2 Automatic Design Methods on Analog Circuits ........ 17
  2.1 Automatic Circuit Sizing ..................... 17
    2.1.1 Circuit Parameters ............................. 17
    2.1.2 Circuit Performances and Evaluation ............. 18
    2.1.3 Circuit Specifications and Yield Estimation .... 20
    2.1.4 Automatic Sizing Process ...................... 21
      2.1.4.1 Sizing Rules ................................ 22
      2.1.4.2 Automatic Sizing Flow ...................... 24
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.3.2.1</td>
<td>Phase Noise</td>
<td>53</td>
</tr>
<tr>
<td>4.3.2.2</td>
<td>Jitter</td>
<td>55</td>
</tr>
<tr>
<td>4.3.2.3</td>
<td>Extracting Jitter from Phase Noise</td>
<td>56</td>
</tr>
<tr>
<td>4.3.3</td>
<td>Stability of PLLs</td>
<td>57</td>
</tr>
<tr>
<td>4.3.4</td>
<td>Design Trade-offs</td>
<td>60</td>
</tr>
<tr>
<td>4.4</td>
<td>Example: Hierarchical Optimization of a CPPLL</td>
<td>62</td>
</tr>
<tr>
<td>4.4.1</td>
<td>CPPLL Hierarchical Modeling</td>
<td>63</td>
</tr>
<tr>
<td>4.4.1.1</td>
<td>CPPLL on System Level</td>
<td>64</td>
</tr>
<tr>
<td>4.4.1.2</td>
<td>CPPLL on Circuit Level</td>
<td>65</td>
</tr>
<tr>
<td>4.4.2</td>
<td>Modeling CPPLL in Verilog-A</td>
<td>66</td>
</tr>
<tr>
<td>4.4.3</td>
<td>Pareto-Optimal Fronts of Building Blocks</td>
<td>69</td>
</tr>
<tr>
<td>4.4.4</td>
<td>Hierarchical Optimization</td>
<td>69</td>
</tr>
<tr>
<td>4.4.4.1</td>
<td>File system in WiCkeD</td>
<td>70</td>
</tr>
<tr>
<td>4.4.4.2</td>
<td>Hierarchical Optimization Results</td>
<td>72</td>
</tr>
<tr>
<td>4.5</td>
<td>Pareto-Optimal Front Computation (POFC) of a whole CPPLL</td>
<td>73</td>
</tr>
<tr>
<td>4.5.1</td>
<td>POFC of the CP Block</td>
<td>73</td>
</tr>
<tr>
<td>4.5.2</td>
<td>POFC of the VCO Block</td>
<td>73</td>
</tr>
<tr>
<td>4.5.3</td>
<td>POFC of the CPPLL System</td>
<td>75</td>
</tr>
<tr>
<td>4.6</td>
<td>Summary</td>
<td>78</td>
</tr>
<tr>
<td>5</td>
<td>Hierarchical Optimization of Switched-Capacitor Sigma-Delta Modulators</td>
<td>81</td>
</tr>
<tr>
<td>5.1</td>
<td>$\Sigma\Delta$ Oversampling A/D Converters</td>
<td>82</td>
</tr>
<tr>
<td>5.2</td>
<td>Second-Order Switched-Capacitor Sigma-Delta Modulators</td>
<td>85</td>
</tr>
<tr>
<td>5.2.1</td>
<td>Building Blocks of a second-order SC $\Sigma\Delta$ Modulator</td>
<td>86</td>
</tr>
<tr>
<td>5.2.1.1</td>
<td>Switched-Capacitor Integrators</td>
<td>86</td>
</tr>
<tr>
<td>5.2.1.2</td>
<td>Comparator</td>
<td>87</td>
</tr>
<tr>
<td>5.2.1.3</td>
<td>1-bit D/A Converter</td>
<td>87</td>
</tr>
<tr>
<td>5.3</td>
<td>Analysis on $\Sigma\Delta$ modulator in z-domain</td>
<td>88</td>
</tr>
<tr>
<td>5.3.1</td>
<td>Effects of Non-idealities</td>
<td>90</td>
</tr>
<tr>
<td>5.3.1.1</td>
<td>Clock Jitter</td>
<td>91</td>
</tr>
<tr>
<td>5.3.1.2</td>
<td>Noise Sources</td>
<td>91</td>
</tr>
</tbody>
</table>
Chapter 1

Introduction

Integrated circuits (ICs) nowadays exist almost everywhere in our life, we are in constant interaction with various kinds of IC products, from a cell phone in our pockets, or a digital TV at home to a GPS satellite roaming in outer space. In the last few decades, IC industry has grown with astounding speed. According to the report from the global semiconductor alliance (GSA), semiconductor industry revenue totaled $267.5 Billion in 2007, and the torrid growth is expected to continue in the future. IC industry is extremely dynamic with the rapid development in technology - today moving towards sub-45nm geometries and beyond until physical limits. The evaluation of the change on technology is well characterized by Moore’s law, which states “that the number of transistors per chip will double every 18 months” [Moo]. This dynamic development has also prompted many incremental challenges, e.g. increased circuit complexity, high design cost and short time to market (TTM).

Combined with advances in deep sub-micron technology, it becomes feasible to integrate hundred million transistors operating concurrently on a single monolithic substrate. Furthermore, various functionalities are tend to be monolithically integrated on one chip, which is usually called systems on a chip (SoC). A particular circuit is categorized as either digital or analog, depending on its intended application. Some examples of digital circuits are digital signal processing (DSP) units, memories circuits and micro-controllers with embedded software, while some examples of analog circuits are low noise amplifiers, low/band/high-pass filters, phase-locked loops. Today, most SoCs consist of digital and analog circuits, where they are integrated together in a mixed-signal chip [KHC+01].

The design on digital and analog circuits are two different arts. Digital circuits are comparatively insensitive to processes variation and operating conditions. They consequently offer a more robust behavior than their analog counterparts, although often costing more power, more area, low speed or other drawbacks. Digital circuit design is abstract from the physical details of the actual circuit implementation. A digital design is a top-down process, starting from circuit logic function definition, by means of behavioral description based on hardware description languages (HDLs), then automatic synthesis into gate level, finally to physical layout. Many mature computer-aided design (CAD) tools are provided by electrical design automation (EDA) vendors [Cada,Syn,Men] for digital circuit design. Compared to digital circuit with discrete-time and discrete-quantity signals, analog circuit deals with continuous-time and value-continuous signals. It makes analog circuits more difficult to abstract the structural characteris-
tics from the physical realization, and hence increases analog design complexity. Moreover, the performances of analog circuits are more sensitive to variations during fabrication and operation than the performances of digital circuits. Except the well-established spice-like simulators (e.g. Eldo [Eld], Saber [Sab] and Spectre [Spe]), very few CAD tools are available for analog design. Up to now, automatic analog synthesis and layout tools are still absent in today market. In consequence, analog design is still a full-custom, multi-iterative-phase, intensive-knowledge and large portfolio of skills required task [AN96].

According to data from leading semiconductor manufacturers, the analog circuits in SoC are estimated to account for just 2% of the total transistors, yet these circuits are 20% of the area, 40% of the design effort and 50% of the re-spins. Hence, any critical analog circuit tends to be a bottleneck for design, implementation, verification, and migration to manufacturing for the overall SoC design, as seen in Fig. 1.1 [Cad02].

![Figure 1.1: Digital versus analog design in SoC [Cad02]](image)

The IC capacity has grown 58% per year, while the rate of productivity increase is only 21% annually, which results in an ever-widening design productivity gap [Ass]. An efficient way to close the gap is to use more advanced CAD tools not only for digital design but also for analog design. Recently, some automatic sizing tools for analog circuit have been introduced in industrial branch, such as WiCkeD [Wic] or Neolinear [Neo]. However these tools can handle only small analog circuits, e.g. operation amplifier (OP AMP). The scale of analog circuits becomes larger, moreover digital circuits are often mixed into the analog environment. This kind of circuit is called large-scale analog/mixed-signal circuit. An efficient and fast design flow usually becomes the key idea for commercial CAD tools. This thesis will address already well-established and upcoming design methods for analog circuits. An efficient design flow is proposed in order to realize a hierarchical optimization process of large-scale analog/mixed-signal circuits.
1.1 Motivation

1.1.1 Indispensable Analog Integrated Circuits

Although analog functionality will be replaced gradually by digital computation, e.g. DSP in place of analog filtering, there are still some typical functions that will always remain analog implementation [GR00]. Let’s take a transmitter and a receiver in wireless communication system as the example here.

The wireless communication is principally based on the propagation of analog signals in our real world. Digital-to-analog converter (DAC) and analog-to-digital converter (ADC) are the bridges between the real world and the digital domain, as shown in Fig. 1.2 [Raz97]. In the transmitter path, the digital signal from baseband processor is firstly converted to analog signal through the DAC block. Subsequently, the analog signal is carried to a predefined high frequency\(^*\), which is generated by a phase-locked loop (PLL). Then the analog signal will be amplified by a power amplifier (PA) so that the signal can drive an outside antenna without too much distortion. In the receiver path, a low-noise amplifier (LNA) is in charge of filtering out the noise of the received analog signal. Another PLL provides a low-frequency (i.e. baseband-frequency) carrier which mixes together with the filtered signal to the baseband frequency. This down-converted analog signal is proceeded into the baseband processor through the subsequent ADC block. Moreover, either analog or digital circuits require stable biases (supply voltages/currents) for their operation, which are provided by analog circuits, e.g. generators and charge pumps.

\(^*\) For example in UMTS wireless system, uplink frequency band is 1920-1980MHz and downlink frequency is 2110-2170MHz, while in GMS system, uplink frequency band is 890-915MHz and downlink frequency is 935-960MHz [Mis04].

![Figure 1.2: Analog circuits in wireless communication system](image-url)
1.1.2 Challenges in Design & Optimization of Analog Circuits

"Analog design is a process of choosing the correct subset of parameters to optimize, a choice that’s highly dependent on the sophisticated knowledge and long years of experience of the analog circuit designer." [Wil]. Compared to the digital counterpart, analog design has always been a more involved process. In principle, the distinct challenges in analog domain stem from the following unique aspects of analog circuits.

The progress of IC technology is mainly presented by the shrinking of device sizes and the lessened supply voltages. On the one hand, analog circuits benefit from the reduction of device sizes like digital circuits. The circuits become smaller, faster and more power efficient. On the other hand, analog circuits suffer from scaled-down devices, reduced supply voltages, electronic noises and other factors. The smaller the devices are, the larger their mismatch is. As the supply voltage goes down, analog designers face more difficulties due to less voltage headroom. For instance in a standard cascode current mirror structure, the current mirror has to operate in a certain voltage range to provide the desirable properties. Under the condition of a low supply voltage, the precise current mirror might be degraded due to an insufficient voltage headroom. Moreover, parasitic effects (e.g. gate/wire capacitance, cross talk, etc.) are more significant with the shrinking device dimension. Analog designers have to take into account these effects during schematic design phase in advance, whereas some effects can be quantified only after their physical layout. At the worst case, some unknown parasitic effects could result in undesired effects, e.g. latch-up phenomenon or large leakage currents.

As the transistor length decreases from 10\(\mu\)m in the 1970s to 45nm today, the impact of process variation on analog performance becomes more significant. Today, analog designers have to evaluate circuit performances at all process corners instead of at one normal corner. The process variations involve not only global/local process parameters but also operation conditions (supply voltages, temperature). Process corner analysis and Monte-Carlo analysis are used to verify the validity of circuit performance. Hence, many more simulations are needed for analog circuit verification than digital circuit verification.

Regardless of analog circuit or digital circuit, it always exists the conflicting relationship among performances. Power, speed and area are the typical performances of digital circuit. Besides the three performances, analog performances have many more forms: e.g. DC gain, gain bandwidth, phase margin and supply/substrate noise rejection in frequency domain, slew rate, locking time, propagation delay and jitter in time domain. Analog designers face more complex and elusive trade-off optimization problems in analog design. Moreover, the total design freedom in analog circuit is much bigger than that in digital circuit, although the design parameters of analog circuit are often interdependent. In case that analog circuits are designed manually, experienced designers usually size circuit with the help of “thumbs table”. Fig. 1.3(b) [TMG02] shows an example of a design “thumbs table” for a two-stage CMOS OP AMP shown in Fig. 1.3(a). Four OP AMP’s performances, i.e. slew rate (SR), voltage gain (DC gain), phase margin (PM) and gain-bandwidth product (GBW) are listed from left to right in the table. Four design parameters, i.e. differential-pair bias current (I), compensation capacitance (C\(_c\)) and input differential-pair transistor’s width (W) are the dominant design parameters, which are listed from top to bottom in the table. For instance, when DC gain and PM are less than their respective specifications, there is only one way to increase their values simultaneously, i.e. by decreasing the value of I. At the same time, the values of SR and of GBW have to be observed
1.1 Motivation

so as to keep meeting their specifications too. It is obvious that the optimization task becomes too difficult for designers to comprehend with the increasing number of design parameters and of circuit performances taken into account.

![Two-stage OP AMP](image)

**Figure 1.3:** (a) Two-stage OP AMP (b) its “thumbs models” [TMG02]

1.1.3 Analog Bottleneck

"While analog and digital system performance increase exponentially over time, microprocessor performance increased more than a thousandfold compared with an increasing of only 10 times for ADCs" [BM04]. Fig. 1.4 shows the ever-widening gap between the relative performance of microprocessors and that of ADC over the last decades. The SoC’s performances are increasingly mainly limited by their analog circuits, not by their digital part.

![Relative performance of analog and digital circuits over time](image)

**Figure 1.4:** Relative performance of analog and digital circuits over time [BM04]

The analog bottleneck is caused not only by the difficulties and challenges of analog circuits themselves, but also by the lack of CAD tools on analog circuits. The design automation degree is much more developed on digital circuits than on analog circuits, which presents on various
aspects, e.g. optimization algorithms on circuit design and layout, standard function models, etc. A comprehensive standard libraries are available for designers and these standard digital cells can be easily incorporated into each design, while most analog circuits are often essentially full-custom design every time. The automatic digital synthesis tools are utilized throughout the whole digital design flow, whereas only few specific analog design tools can be applied. Up to now, a general analog synthesis tool doesn’t exist "due to the tremendous variability in analog circuits, devices and processes" [Wil]. Furthermore, the analog/mixed-signal circuits bring a new challenge to the traditional analog CAD tools. The spice-like numerical simulators are still applied to simulate the large analog/mixed-signal circuits, but it takes too much computer time. As the scale of circuits becomes larger and larger, one single simulation could last over hours or days, which designer cannot endure. Recently, some advanced circuit simulators, e.g. NanoSim [Syn], are developed for analog, digital and mixed-signal circuit simulation. Such simulators can provide much faster simulations than traditional analog simulators with acceptable decrease of simulation accuracy.

1.2 State of the Art

Though fully automatic synthesis on analog circuit is not yet available today, research on analog synthesis has developed in many directions over the past decades. In this section, a top-down design flow on analog design is briefly described at first. Then, a hierarchical design methodology is introduced for large and complex analog designs. After that, various kinds of automatic sizing methods for analog design are classified. Finally, two main optimization strategies for large-scale analog/mixed-signal circuits are discussed, i.e. flat and hierarchical optimization methodologies. Moreover, performance space exploration methods, which are capable of computation on the performance capability of circuits, are also summarized here.

1.2.1 Analog/Mixed Signal Design Flow

A top-down analog/mixed-signal design process is addressed in [GR00], as shown in Fig. 1.5(a). It mainly consists of seven design steps, which are listed as follows.

1. **Conceptual Design**, where product concept is developed regards to marketing requirements. Overall information on specifications and functionalities are gathered.

2. **System Design**, where the product concept transfers to an actual design plan. System architecture is designed here. Functionalities are defined to implement by software or hardware.

3. **Architectural Design**, where the whole system is partitioned into analog and digital subblocks. System functionality can be firstly verified at this stage by using behavioral function models. The models can be described in C, MATLAB or HDLs.

4. **Cell Design**, where the analog circuits are detailed implemented according to the specific requirements. The tasks include proper circuit topology selection, device sizing and circuit verification. More details are discussed in Sec. 1.2.2.
1.2 State of the Art

<table>
<thead>
<tr>
<th>System design</th>
<th>Architectural design</th>
<th>Cell design</th>
<th>Cell layout</th>
<th>System layout</th>
<th>Fabrication &amp; test</th>
</tr>
</thead>
<tbody>
<tr>
<td>design &amp; verification</td>
<td>design &amp; verification</td>
<td>design &amp; verification</td>
<td>design &amp; verification</td>
<td>design &amp; verification</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Fabrication &amp; test</th>
</tr>
</thead>
<tbody>
<tr>
<td>No</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Circuit specification</th>
</tr>
</thead>
<tbody>
<tr>
<td>Circuit topology selection/generation</td>
</tr>
<tr>
<td>Circuit sizing</td>
</tr>
<tr>
<td>Schematic evaluation</td>
</tr>
<tr>
<td>Spec. satisfied?</td>
</tr>
<tr>
<td>Yes</td>
</tr>
<tr>
<td>No</td>
</tr>
<tr>
<td>RCX evaluation</td>
</tr>
<tr>
<td>Spec. satisfied?</td>
</tr>
<tr>
<td>Yes</td>
</tr>
<tr>
<td>No</td>
</tr>
<tr>
<td>Relased to system integration</td>
</tr>
</tbody>
</table>

Figure 1.5: (a) Hierarchical design steps of analog/mixed-signal integrated circuit design (b) detailed design processes on an analog circuit
5. **Cell Layout**, where each device symbol in circuit schematic are translated into their geometric shapes in circuit layout. The layout is a multi-layer placement of metal, oxide and other semiconduct materials.

6. **System Layout**, where all subblocks (including analog and digital circuits) are well placed and routed together. Chip area, less IR drop on supply nets, isolation of sensitive circuits from noise sources and other issues have to be considered here.

7. **Fabrication and Testing**, where IC chips are eventually produced through certain photolithography processes on silicon substrate. After fabrication, testing is performed to prove product functionality. Products are sorted to sell according to their qualities respectively.

The above seven steps can be classified into two categories: items 1-4 are referred to as the **frontend** design process, while items 5-7 are referred to as the **backend** design process. From item 1 to item 7, it is an ideal **forward progress**. However there exists rarely a pure forward progress in analog design. In fact, extensive simulations and validation steps are required to detect potential problems. If the design fails to meet the target requirement, **backtracking or redesign** processes are needed to revise the failure design steps. In this thesis, the sizing process of the cell design is the mostly focused topic.

### 1.2.2 Design Process on Analog Circuits

As analog circuits become larger and more complex, prevalent hierarchical design methodology has been introduced in many of the emerging experimental analog CAD systems [HRC89, DGS96, dPDL01, CSVM03]. For the design of a large-scale analog/mixed-signal circuit such as phase-locked loops or data converters, the whole circuit is typically decomposed into smaller building blocks, and the hierarchical decomposition goes forward until a level is reached to a physical implementation, i.e. circuit (transistor) level. For design on each analog block, the design steps can be described as the following steps, which are shown in Fig. 1.5(b):

- **Circuit Specification**: The specifications of each building block are derived from the initial system specifications. Examples of circuit specifications are the minimum DC gain, the minimum slew rate, the minimum bandwidth of an OP AMP.

- **Topology Selection/Generation**: Based on the specification requirements, designers choose a suitable circuit topology based on a set of already known alternative topologies. As the requirements become more demanding, new circuit topologies may need to be created.

- **Circuit Sizing**: Actual values are assigned to the design parameters of the circuit elements, such as transistor dimensions, resistance, capacitance, inductance and bias voltage and current. The goal of circuit sizing is to find a set of design parameters so that the circuit can provide the circuit performances which fulfill the predefined specification values.

- **Schematic/RCX Evaluation**: Performances of the sized circuit are evaluated by numerical simulation. A schematic simulation is a pre-layout evaluation, while a RCX simulation is a post-layout evaluation. Compared to a schematic netlist, a RCX netlist includes more parasitic data, e.g. resistance and capacitance on wires, decoupled capacitance between wires, etc. Hence the RCX evaluation validates circuit performances more accurate than the schematic evaluation but at the cost of simulation time.
From circuit specification to circuit sizing, it is a multi-iterations process. The device parameters have to be repeated to tune till the specifications are satisfied. If the selected circuit topology is not able to meet the given specifications, the circuit topology need to be reselected or regenerated. The layout of analog circuit has to be implemented correctly so that the circuit with layout parasitic effects can still fulfill its specification. All presented simulation and optimization results throughout this work are obtained by schematic evaluation.

1.2.3 Automatic Sizing Method on Analog Circuits

If the device values are sized, the circuit performances are uniquely determined. Hence, an optimization process of circuit performances can be referred to a circuit sizing process. In other words, a performance optimization process can be regarded as an automatic sizing process with the predefined circuit specification. Since the design parameters mostly outnumber the performances, which results in an underconstrained problem with many degrees of design freedom, the inverse mapping from circuit performances to design parameters is usually not unique and also unknown. Basically, there are two methods to solve that. One way is the knowledge-based sizing optimization approach by exploiting analog design knowledge and heuristics. The other way is the optimization-based sizing approach by interpreting the sizing process as a mathematical optimization problem.

1.2.3.1 Knowledge-Based Sizing Approaches

In case of manual design on analog circuit, designers doesn’t need to find out the exact values of device parameters immediately, but rather search for circuit topology modifications, a set of pivotal device parameters and their right change directions that mostly determine the circuit performances. Then, designers have to modify design parameters and simulate the circuit several times until circuit provides the desirable properties. The “thumbs table” gives designers an initial idea of how to adjust the device parameters to approach the specification, but not precise. Compared to the qualitative “thumbs table” analysis, a more quantitative analysis is to use "design equation", in which circuit performances are formulated as a function of device parameters. For example, a well known design equation for the open-loop DC gain of the two-stage CMOS OP AMP shown in Fig. 1.3(a) can be expressed as

$$A = \frac{g_{mM1}g_{mM6}}{(g_{dsM1} + g_{dsM4})(g_{dsM6} + g_{dsM7})}.$$  \hspace{1cm} (1.1)

where $g_m$ is the transconductance and $g_{ds}$ is the output transconductance of MOSFET respectively. The expression gives a clear insight into which small-signal parameters of devices predominantly determine the DC gain in this OP AMP structure and how designers can tune devices to meet the certain specification. In the automatic sizing design flow, these design equations are reformulated in a reverse way so that the design parameters can be calculated for a set of given performance requirements. The reformulated equations are called design plans. The knowledge-based optimization approach is illustrated in Fig. 1.6. For a circuit topology under design, specific heuristic design knowledge (including design equations and design strategies) is acquired and programmed explicitly in some certain computer-executable forms. Through
executing design plans during the analog synthesis, the design parameters can be automatically sized for a given set of input specifications.

Previously, device models are very simple which includes only few device parameters. With the low complexity, design equations can be created by the experienced analog designers. As the dimension of MOSFET scales down to sub-100nm today, a more comprehensive BSIM (Berkeley Short-channel IGFET Model) is used to accurately reflect the transistor's behavior. Consequently, it is difficult to manually extract the equations between circuit performances and device parameters. Recently, symbolic analysis methods [GWS94, WFG+95, FRV96] enable the automatic extraction of design equations on some analog circuits. "A symbolic simulator is a computer tool that takes as input an ordinary (spice-type) netlist and returns as output (simplified) analytic expressions for the requested circuit network functions in terms of the symbolic representations of the frequency variable and (some of) the circuit elements" [GWS94].

In the development process of automatic synthesis on analog circuit, the knowledge-based sizing approach is the first generation and some tools came into market in the mid to late 1980s, e.g. IDAC [DND87], OASYS [HRC89], BLADES [TP89], ISAID [TM95, MT95]. However, the knowledge-based approaches suffer from several disadvantages. First, it is very difficult to accurately formalize the circuit behavior. Even symbolic analysis can only handle with the limited kinds of performance on the special circuits. The application of this synthesis method is restricted basically on the circuits whose design plans are available. Second, the design plans have to be updated when the process technology develops from an old generation to the next new one. And it is also very distrustful whether the design equations in the old technology are still valid for the new technology. The updating of design equations costs many manual efforts and time consuming. Third, the optimization results are tightly dependent on the quality of the design equations. Its accuracy is normally lower than that of the simulation-based approach. Forth, procedural knowledge is also required to generate design plans, to handle failure and to backtrack, where many acquisition processes have to be manually conducted. The overall overhead costs much more than the cost by using direct design steps [Hja03]. In summary, the coverage range of the knowledge-based optimization approach was found "to be too small for the real-life industrial practice and therefore these approaches failed in the commercial marketplace" [GR00]. Moreover, the knowledge-based sizing approach is not a real optimization process in strict sense.

† "BSIM model is a physics-based, accurate, scalable, robust and predictive MOSFET SPICE model for circuit simulation and CMOS technology development." [BSI]
1.2.3.2 Optimization-Based Sizing Approaches

In order to make the sizing tools more flexible and extensible for various kinds of analog circuits, the optimization-based sizing strategy was developed. In this kind of approach the design result is determined by a numerical optimization algorithm instead of design plans. Some special numerical algorithms are used to implicitly solve the analog design freedom and to optimize the circuit performances under the given specification constraints. An optimization-based sizing approach consists of two main engines: optimization engine and performance evaluation engine, illustrated in Fig. 1.7. According to the method used for performance evaluation, two subcategories can be distinguished: equation-based approach and simulation-based approach. According to the numerical algorithm for optimization process, two subcategories can also be distinguished: deterministic and stochastic.

Equation-Based Approach means that the circuit performance is evaluated by a set of analytic design equations. The equations can be derived manually, e.g. OPASYN [KSpG90] and STAIC [HEL92], or by using symbolic analyzers, e.g. AMGIE [dPDL+01, dPGS02]. In general, the big advantage of these analytic equations is their fast evaluation time. Recently, it has been shown that the designs of OP AMPs in [HBL98] and PLLs in [CPH+03] "can be formulated as a posynomial convex optimization problem that can be solved by using geometric programming techniques, producing a close-by first-cut design in an extremely efficient way" [AH06]. The optimization time can be reduced to minutes or seconds. However, these analytic equations still have to be derived with big manual effort. The accuracy of performance prediction depends strongly on the quality of the analytic equations. Moreover, some circuit characteristics (e.g. transient responses) are difficult to accurately represent by analytic equations, and the current symbolic analysis methods cannot handle most circuit’s large-signal properties yet.

Simulation-Based Approach means that the circuit performance is evaluated directly from a spice-like simulator. With improving computer power and advanced numerical algorithms in recent years, the idea of simulation-based approach [DR69], which comes from about
40 years before, becomes really practical and more popular today in analog synthesis, e.g. DELIGHT.SPICE [WRSVT88], FRIDGE [MPVARVH94], ANACONDA [PKR+00], MAELSTROM [KPRC99]. These methods perform some forms of full numerical simulations to evaluate the circuit’s performance in the optimization loop. Compared to the equation-based approach, a big advantage of the simulation-based approach is that the preparative effort is very low and there exists no issue on performance valuation. The work for designers is only to set up the proper testbenches, which define the real working-environment of circuits and the post-processing for the performance extraction. As long as the circuit performance can be extracted from the simulation, the setup for optimization can be accomplished in a short time usually. The performance prediction by using spice-like simulators is the most accurate, since precise device models, e.g. BSIM3 or BSIM4, are applied. The main drawback of the simulation-based approach is the long evaluation time, as performance values are extracted directly from circuit-level simulation and the simulation is executed in each optimization loop.

**Deterministic/Stochastic Optimization** differs on the applied mathematical algorithm for optimization process. The optimization engine determines the quality of the optimization results and the execution time of the optimization process. Numerical deterministic techniques are mostly based on gradient information and can find a solution in a short time. Sometimes, due to the nonlinear properties of analog circuit these optimization methods might stuck in a local optimum. To avoid a local optimum, stochastic approaches randomly sample on the objective function with a certain probability and can provide a global optimum at the price of a large number of performance evaluations.

According to the above-mentioned methods on the performance evaluation and on the optimization algorithm, the simulation-based approaches from the literatures can be categorized as follows:

<table>
<thead>
<tr>
<th><strong>Table 1.1:</strong> Classification of automatic sizing tools</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Equation-based</strong></td>
</tr>
<tr>
<td>Deterministic optimization</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>Stochastic optimization</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td></td>
</tr>
</tbody>
</table>

### 1.2.4 Optimization Methodology for Large-Scale Analog Circuits

#### 1.2.4.1 Flat Optimization Methodology

In flat methodology, the whole design is attacked at once and all design parameters are treated at the same time. During the design of a large-scale analog/mixed-signal circuit, e.g. PLL or
1.2 State of the Art

A/D converter, sizing of all transistors at once will result in a design problem too complex to solve. Furthermore, it will take also too long runtime for the simulation-based sizing methods. Although, a fast performance estimator (equation-based evaluation) and a special algorithm which is capable of handling many design variables, can make the flat optimization realizable in the acceptable time cost, such as posynomial function with geometric programming for A/D converters in [Her02, LWT+05] and PLLs in [CPH+03]. However, the aforementioned disadvantages accompany with the flat optimization: big manual effort for the equation building and the preparation process, less accuracy of the optimization result and the limited application range.

1.2.4.2 Hierarchical Optimization Methodology

The idea of hierarchy is widely adopted nowadays in analog circuit design. For example in [VCD+96], a complex video driver system has been divided into kinds of small analog function blocks, e.g. A/D converter, PLL, digital interface, which are relatively easier to design individually. Starting with the initial system specifications, an optimization process at the top level determines each target specification of the next lower-level design blocks. Through the same way, the hierarchical optimization processes proceed until all the devices at the lowest level of the hierarchy are sized. If any building block is not feasible or the specification cannot be fulfilled at the current hierarchical level, the optimization process at the next higher-level has to be re-conducted to get the new circuit specifications or architecture. The transfer from the initial system requirements to the block specifications is also known as constraint transformation. The key for a successful hierarchical design process is to strictly comply with the top-down constraint-driven (TDCD) rules [CCC+97].

In order to avoid the design iterations, bottom-up characterization techniques are introduced into the hierarchical sizing approach in [HS96, KCI+00, BGVI04, BNSV06]. Recently, a bottom-up characterization approach, i.e. performance space exploration (PSE) becomes a hot topic in academic region. PSE has been considered as a key to a true hierarchical design process based on the following two aspects. First, PSE makes it possible to realize an automatic selection of circuit topology, as PSE methods can compute the respective performance ranges of each circuit topology. It is easy to quantitatively compare them and to select the best one for the given requirements. Second, it provides the achievable performance space of lower-level and prevents the sizing on higher-level from producing requirements that cannot be achieved by lower-level realization. Many kinds of PSE approaches are developed and applied to a board range of design problems. Some are more customized to certain circuit types, for example [HMBL99] for LC oscillator and [Her02, BGH04] for A/D converter, and some are more general in [SG03, SGA03, SGA04]. According to the realization technique, three subcategories can be distinguished: Intermediate performance modeling, e.g support vector machines in [BJS03, BGV+04], stochastic optimization techniques in [SG03, EMG05, SCP05], and deterministic optimization techniques in [SG03, SGA03, SGA04]. In the stochastic/deterministic optimization techniques, the performance values are fully evaluated by circuit simulation, while in the intermediate performance modeling, performance values are from simulation and estimation. Based on PSE method, a successful hierarchical top-down optimization process is realizable on various large-scale analog/mixed-signal circuits, for example [BNSV05, ESG+06] for A/D converters and [TVRM04, ZMGS06] for PLLs.
1.3 Objectives of the Work

The goal of this thesis is to construct an effective and efficient optimization methodology for large-scale analog/mixed-signal circuits. The optimization methodology is intended to tackle today’s analog bottleneck. There are two popular strategies, i.e. flat and hierarchical optimization methodology. Which of the two methodologies is better, certainly depends on the targeted application. This thesis follows the hierarchical strategy for the following reasons:

- Re-use of building blocks (also in system classes) may be easier.
- The clear distinction between system requirements and building-block requirements enables a deeper insight into the complex trade-offs in an interactive design process.
- Different building-block implementations can easily be investigated.

The top-down propagation of performance specification from a whole system to each building block is the key task in hierarchical design. How to define specification for each building block is the main challenge for manual design. A too stringent specification could overload the design of the building block, while a too loose specification could result in the whole circuit performance out of the original specification. An optimization-based automatic sizing method is applied at system level to find a good combination of the performances of each building block. Additionally, performance space exploration is used to guarantee that the optimized values of design parameters at the higher level can be achieved by the lower-level circuit realization.

In order to achieve more flexibility, more accuracy and more generality with low manual effort, the simulation-based performance evaluation method is adopted in automatic sizing process and performance space exploration process as well. According to [RSA99], the behavior of a circuit is usually well natured as long as it works in the correct region of operation. Hence, the corresponding deterministic methods are applied in both processes respectively, in order to keep the execution time in reasonable limits. In summary, a "simulation-in-a-loop"-based hierarchical optimization methodology will be proposed in this thesis. By applying the proposed methodology to some experimental circuits, e.g. PLLs and modulators, the following detailed objectives are achieved within this thesis and these works are published in papers below.

- A first-time-successful top-down sizing process is realizable without iteration redesign steps [ZMG^+05, ZMG06, GZM07, ZMG07a].
- Hierarchical optimization of a large-scale analog/mixed-signal circuit is accomplished in reasonable time cost. To meet different performance specifications in various applications, the circuit resizing process can be quickly finished [ZMG06, GZM07, ZMG07b].
- The detailed insight into the capability of the building blocks and the whole circuit system as well can be obtained by respective Pareto-optimal front computation [ZMG07b].
- Based on the nominal Pareto-optimal front, the circuit performance can be maximized/minimized considering the capability of its building blocks. Based on the worst-case-aware Pareto-optimal front, the actual optimized performance value with a yield of the circuit after fabrication is obtained, where the impact of the inevitable fluctuations of statistical parameters and the variation of operation parameters are considered [ZMG07a].

This thesis is organized as follows: Chapter 2 introduces two automatic design methods for analog design, i.e. automatic sizing method and performance space exploration. Chapter 3
1.3 Objectives of the Work

proposes a comprehensive hierarchical optimization methodology on large-scale analog/mixed-signal circuits. And the practical applications of the proposed methodology are presented on charge-pump phase-locked loops in Chapter 4 and on switch-capacitor sigma-delta modulators in Chapter 5. Chapter 6 summarizes the main topics discussed in this thesis. Appendix A lists the sizing rules for CMOS design. Appendix B shows the system-level modeling of the $\Sigma\Delta$ Modulator in Simulink. Appendix C presents the relationship between phase noise and jitter and explains how to extract jitter performance from phase noise analysis. Appendix D lists the system-level modeling of the charge-pump phase-locked loop in Verilog-A.
Chapter 2

Automatic Design Methods on Analog Circuits

This chapter introduces two automatic design methods, i.e., automatic sizing method and performance space exploration, which are important components of the later proposed hierarchical optimization methodology. The two design processes will be formalized mathematically and the corresponding terminology and fundamental concepts are declared.

2.1 Automatic Circuit Sizing

Analog circuit sizing is usually referred to the determination on sizes of the circuit elements. Automatic sizing methods intend to automatically assign the device sizes according to the pre-defined circuit specification, and the sized circuit’s performance can eventually achieve the target values. Let’s take a standard current-mode-logic (CML) block in Fig. 2.1 to explain the corresponding basic knowledge of analog design.

2.1.1 Circuit Parameters

For a fixed topology and process technology, the circuit property is determined by its circuit parameters [SS88]. The circuit parameters are comprised of three types of parameters:

- **Design parameters**, vector \( \mathbf{d}^* \), are sole designable circuit parameters, whose values can be chosen explicitly by designers. Typical design parameters of CMOS circuits are channel widths/lengths of transistors (W/L) such as \( W_{0/1/2/3}/L_{0/1/2/3} \) for \( M_{0-3} \) in Fig. 2.1, as well as the values of capacitors (C) and of resistors (R).

- **Statistical parameters**, vector \( \mathbf{s} \), present the inevitable fluctuations in the manufacturing process. Typical statistical parameters are oxide thickness \( t_{ox} \) and threshold voltage \( V_{th} \) of transistors. These parameters are beyond the control of designers and are generally not shown in the circuit schematic.

\* In this thesis, regular lower case letters denote scalars. Bold lower case letters denote vectors. Bold capitals letters are matrices.
Automatic Design Methods on Analog Circuits

Figure 2.1: Schematic of a current-mode-logic (CML) cell

- Operational parameters, vector $\theta$, take into account the variability of the operating conditions, such as ambient temperature ($T$), supply voltage ($V_{DD}$) and bias current ($I_b$). The ranges of the operational parameters are given as part of the specifications and cannot be controlled by designers. For example, the outside temperature varies from -25°C to 115°C and the supply voltage varies from 1.0V to 1.2V.

The circuit parameters can be expressed as [Sch04]

$$
\text{circuit parameters} = \begin{cases} 
    d & \in \mathbb{R}^{n_d} \quad \text{design parameters} \\
    s & \in \mathbb{R}^{n_s} \quad \text{statistical parameters} \\
    \theta & \in \mathbb{R}^{n_\theta} \quad \text{operational parameters.}
\end{cases}
$$  \hspace{1cm} (2.1)

2.1.2 Circuit Performances and Evaluation

Circuit performances, vector $f$, characterize the behavior of a circuit. The performances of analog circuit $f$ are dependent not only on its own circuit realization (i.e. circuit topology, device model and design parameters) but also its corresponding operation environment (e.g. stimuli, output loads). The flow of a simulation-based performance evaluation is briefly shown in Fig. 2.2. The start point is the testbench setup for the DUT (design under test) block. The testbench should represent the real operation environment, which characterizes the DUT’s properties under the practical working conditions. For example, a CML cell acts as the DUT. Another CML cell is inserted between the outside stimuli and the DUT, so that the DUT can get a more real input signal (slew rate, input capacitance etc.). And another CML cell acts as the real load for the DUT. Then, the netlist of this testbench is the input of the numerical (spice-like) simulators. During the numerical operation process, node voltages and branch currents are calculated based on Kirchhoff’s rules and with the help of iterative numerical integration methods. Their values construct a raw simulation data bank. Finally, the circuit behavior is
2.1 Automatic Circuit Sizing

determined by means of circuit simulation and the performance values can be extracted through postprocessing.

Analog design normally consists of two strategies: Nominal Design and Robust Design.

**Nominal Design** focuses only on how to adjust design parameters $\mathbf{d}$, while statistical parameters $\mathbf{s}$ and operational parameters $\mathbf{\theta}$ are assigned to the fixed values (mean value normally). For a given circuit realization (i.e. topology and technology) and the corresponding testbench, the performance evaluation $\mathbf{m}_{\text{nom}}$ maps the circuit design parameters $\mathbf{d}$ to the circuit performances $\mathbf{f}$:

$$\mathbf{f} = \mathbf{m}_{\text{nom}}(\mathbf{d}), \quad \mathbf{f} \in \mathbb{R}^{n_f}. \quad (2.2)$$

**Robust Design** intends to design circuits more robust against the inevitable variations on process and environment, while nominal design aims at optimizing the various performances at the same time under one certain process corner and environment condition. The performance evaluation $\mathbf{m}_{\text{rob}}$ maps the circuit design parameters $\mathbf{d}$, statistical parameters $\mathbf{s}$ and operational parameters $\mathbf{\theta}$ to the circuit performances $\mathbf{f}$:

$$\mathbf{f} = \mathbf{m}_{\text{rob}}(\mathbf{d}, \mathbf{s}, \mathbf{\theta}), \quad \mathbf{f} \in \mathbb{R}^{n_f}. \quad (2.3)$$

Since the impact of the process variation during manufacturing process and operation environment is much more significant on analog circuits than on digital circuits, analog designers have to do many more simulations in order to acquire a comprehensive insight of the circuit performance. *Process corner analysis* is popular for verification circuit on various technical corners, which includes not only the variation of device process, e.g. five process corners for N/PMOS (TT, FF, SS, FS, SF†), high or low passive resistance and capacitance, but the variation of temperature and bias voltage/current as well. Fig. 2.3 shows that the delay of each CML cell in

† TT: Typical NMOS & Typical PMOS; FF: Fast NMOS & Fast PMOS; SS: Slow NMOS & Slow PMOS; FS: Fast NMOS & Slow PMOS; SF: Slow NMOS & Fast PMOS.
Additionally, the variation range is also defined for each process, e.g. $2.0\sigma$ TT or $3.0\sigma$ FF.
Fig. 2.1 varies with a 72 selected technical corners (which is only a subset of the whole technical corners). The waveforms in the lower figure are the input and the output signals. The middle figure shows the delay values between input and output signals at the 72 corners. The upper figure shows the histogram of the delays.

![Figure 2.1: Delay variation of CML vs. process corners](image)

Actually in simulation-based performance evaluation, Equations 2.2 and 2.3 are referred to the same mapping process, but the values of $s, \theta$ are different between in nominal design and in robust design. Throughout this thesis, the mapping from circuit parameters to circuit performances is simplified to $f = m(*)$, where * represents $d$ in nominal design case and represents $d, s, \theta$ in robust design, respectively.

### 2.1.3 Circuit Specifications and Yield Estimation

Any design should have its targets or requirements. These requirements on circuit performances $f$ are called circuit specifications. For example, lower specifications $f_l$ or/and upper specifications $f_u$ exist for performances $f$, i.e.

$$f \geq f_l \quad \text{or/and} \quad f \leq f_u.$$  \hfill (2.4)

The circuit performances often suffer from the inevitable process fluctuation and the variation of operation condition. Although circuits are sized to meet their requirements in nominal design, some performances of the fabricated circuits lie out of the specification unfortunately. Such as the delay of the CML in Fig. 2.3, the delay of CML is designed at 400ps at the nominal case. However the delay value varies from case to case and some delays exceed the upper and the lower limits at some corners.
Monte Carlo (MC) analysis is the most popular way to estimate yield of ICs before final silicon tape-out. Device models include global process variation (from wafer to wafer) and local process variation (from die to die in a wafer). For each variation of parameter, a statistical model is used to describe its value distribution. According to the practical number of the statistical parameters in circuits, sufficient simulations (thousands or even many more) are run for the statistical collection. A random generator derives the actual values for these statistical parameters from their models, as shown in Fig. 2.4. The performances of all simulations are measured and then asserted whether they pass or fail their specifications. Finally, the yield can be estimated by

\[
Y = \frac{\text{number of pass}}{\text{number of simulation}} = \frac{N_{\text{pass}}}{N_{\text{sum}}} \cdot 100\%.
\]  

(2.5)

Since yield estimation by means of MC analysis is based on the huge simulation cost, academic branch tries to develop other efficient ways to estimate yield value. Worst-Case Analysis in [Gra93] is much quick method and with less simulation cost, which will be discussed in the Sec. 3.5.

The overall yield \(Y_{\text{sum}}\) for all performances \(f\) is defined as the cut set of all individual parametric yield \(Y_{f_i}\):

\[
Y_{\text{sum}} \leq \min(Y_1, Y_2, \ldots, Y_i, \ldots), \quad i = 1, \ldots, n_f.
\]  

(2.6)

For example, although the partial yield of \(Y_{f_1}\) is 99.9\%, the maximal overall yield \(Y\) can be only 68\%, because the smallest partial yield is 68\% of performance \(f_2\). It can also be known from this table, that we have to take care of each partial yield of every subblock in the whole system design. It would be useless to maximize only partial yield at the cost of hurting other partial yield.

<table>
<thead>
<tr>
<th>Performance</th>
<th>(f_1)</th>
<th>(f_2)</th>
<th>(f_3)</th>
<th>(f_4)</th>
<th>(f_4)</th>
<th>Overall</th>
</tr>
</thead>
<tbody>
<tr>
<td>Yield</td>
<td>99.9%</td>
<td>68%</td>
<td>90%</td>
<td>96%</td>
<td>76%</td>
<td>68%</td>
</tr>
</tbody>
</table>

2.1.4 Automatic Sizing Process

Since sole the design parameters \(d\) can be determined by designers, the sizing process mentioned in the remainder part of the thesis only refers to the sizing on the design parameters
An automatic sizing process can be interpreted as an optimization task, which intends to minimize the difference between the circuit performances \( f \), i.e. \( m(d) \), and the predefined specifications \( f_{\text{spec}} \) by tuning the design parameters \( d \), expressed as

\[
\min_d \| m(d) - f_{\text{spec}} \| \quad \Rightarrow \quad d_{\text{opt}}, \quad f_{\text{spec}} = m(d_{\text{opt}}).
\]  

(2.7)

where \( d_{\text{opt}} \) are the optimized results of the design parameters. The mapping process from circuit specifications to design parameters is not unique in mathematics since the design parameters usually outnumber the performances. In case of only considering performance specifications, the mathematical optimizer for automatic sizing process cannot guarantee physically meaningful circuit realization, e.g. transistor working out of saturation. This kind of malfunction doesn’t influence the circuit performance at nominal case, but increases the sensitivity of the performance to process variation, environment variation and noise. To guarantee the automatic sizing results in the technically meaningful regions, sizing rules for CMOS technology are proposed in [GZEA01].

### 2.1.4.1 Sizing Rules

As each analog circuit builds on some elementary transistor-pair blocks, e.g. current mirror, level shifter or differential stage, the performance of the whole analog circuit is crucially dependent on the operation of these elementary blocks. In order to fulfill the desired analog function, most of these elementary structures have to follow some particular sizing rules (structural constraints), e.g. transistor’s matching or saturation conditions. Hence, a successful and more reliable automatic sizing process can be reformulated as

\[
\min_d \| m(d) - f_{\text{spec}} \| \quad s.t. \quad c(d) \geq 0 \quad \Rightarrow \quad d_{\text{opt}}, \quad f_{\text{spec}} = m(d_{\text{opt}}).
\]

(2.8)

Without loss of generality, \( c(d) \geq 0 \) represents all fulfilled sizing rules, which are the additional constraints for the mathematical optimization.

For a given circuit topology, an automatic setup of the sizing rules are presented in [GZEA01, MSG03]. It consists of two main steps. First, circuit substructures are identified bottom-up in a hierarchical fashion as described in Tab. A.1 in Appendix A. Second, based on the recognized structures, the corresponding sizing rules are assigned to the individual transistors. Such as the CML circuit in Fig. 2.1, it is consisted of three structure elements: a resistance load pair (R1 & R2), an NMOS differential input pair (M3 & M4) and a current NMOS source (M2). M2 and M1 form a NMOS current mirror together. According to the design manual of [mun], all sizing rules for the three structure elements are listed in Tab. 2.2. As can be seen from this table, these sizing rules can be classified into three categories [Ste05], as shown in Tab. 2.3.

- **Geometric & Electrical**

  Geometrical sizing rules directly relate to the geometrical dimension of devices, e.g. width and length of transistors. Electrical sizing rules check whether devices work in the expected region. At the current development stage, electrical rules involve only circuits’ DC simulation, which calculates the static state of the circuit. The DC voltages and currents are the initial state for nonlinear devices which work in AC (frequency) domain, and in large-signal (time) domain. The electrical sizing rules by means of DC simulation are not sufficient for circuits which work in transient operation.
## Table 2.2: Sizing rules for common-mode-logic cell in Fig. 2.1

<table>
<thead>
<tr>
<th>Structure element</th>
<th>No.</th>
<th>Constraint</th>
<th>Safety margin</th>
<th>Reason</th>
</tr>
</thead>
<tbody>
<tr>
<td>Resistance load pair</td>
<td>1</td>
<td>$R_{W1,l1} \geq m$</td>
<td>$m = 1 \mu m$</td>
<td>limit relative variance of the resistance factor</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>$R_{l1,2} \geq m$</td>
<td>$m = 1 \mu m$</td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>$R_{W1} = R_{W2}$</td>
<td></td>
<td>systematic load match</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>$R_{l1} = R_{l2}$</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>$V_{DD} - R_{l1,2} \cdot I_b \leq m$</td>
<td>$m = 0.4V$</td>
<td>sufficient drain-source voltage headroom for M2 current source</td>
</tr>
<tr>
<td>NMOS differential pair</td>
<td>6</td>
<td>$V_{GS3,4} - V_{th3,4} \geq m$</td>
<td>$m = 10mV$</td>
<td>inversion</td>
</tr>
<tr>
<td></td>
<td>7</td>
<td>$V_{DS3,4} - (V_{GS3,4} - V_{th3,4}) \geq m$</td>
<td>$m = 10mV$</td>
<td>saturation</td>
</tr>
<tr>
<td></td>
<td>8</td>
<td>$L_{3,4} \cdot W_{3,4} \geq m$</td>
<td>$m = 1 \mu m^2$</td>
<td>limit $V_{th}$ mismatch</td>
</tr>
<tr>
<td></td>
<td>9</td>
<td>$L_{3,4} \geq m$</td>
<td>$m = 0.5 \mu m$</td>
<td>limit relative variance of the transconductance factor</td>
</tr>
<tr>
<td></td>
<td>10</td>
<td>$W_{3,4} \geq m$</td>
<td>$m = 0.5 \mu m$</td>
<td></td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>$V_{GS3,4} - V_{th3,4} \leq m$</td>
<td>$m = 1.0V$</td>
<td>reduce the influence of transconductance mismatch on the input offset</td>
</tr>
<tr>
<td></td>
<td>12</td>
<td>$-m \leq V_{DS3} - V_{DS4} \leq m$</td>
<td>$m = 200mV$</td>
<td>reduce the influence of the channel length modulation factor on the current transmission coefficient</td>
</tr>
<tr>
<td></td>
<td>13</td>
<td>$L_3 = L_4$</td>
<td></td>
<td>avoid transconductance mismatch</td>
</tr>
<tr>
<td></td>
<td>14</td>
<td>$W_3 = W_4$</td>
<td></td>
<td>and input offset voltage mismatch</td>
</tr>
<tr>
<td>NMOS current mirror</td>
<td>15</td>
<td>$V_{GS1,2} - V_{th1,2} \geq m$</td>
<td>$m = 100mV$</td>
<td>inversion</td>
</tr>
<tr>
<td></td>
<td>16</td>
<td>$V_{DS1,2} - (V_{GS1,2} - V_{th1,2}) \geq m$</td>
<td>$m = 100mV$</td>
<td>saturation</td>
</tr>
<tr>
<td></td>
<td>17</td>
<td>$L_{1,2} \cdot W_{1,2} \geq m$</td>
<td>$m = 1 \mu m^2$</td>
<td>limit $V_{th}$ mismatch</td>
</tr>
<tr>
<td></td>
<td>18</td>
<td>$L_{1,2} \geq m$</td>
<td>$m = 0.5 \mu m$</td>
<td>limit relative variance of the transconductance factor</td>
</tr>
<tr>
<td></td>
<td>19</td>
<td>$W_{1,2} \geq m$</td>
<td>$m = 0.5 \mu m$</td>
<td></td>
</tr>
<tr>
<td></td>
<td>20</td>
<td>$-m \leq V_{DS1} - V_{DS2} \leq m$</td>
<td>$m = 200mV$</td>
<td>reduce the influence of the channel length modulation factor on the current transmission coefficient</td>
</tr>
<tr>
<td></td>
<td>21</td>
<td>$L_1 = L_2$</td>
<td></td>
<td>limit systematic mismatches</td>
</tr>
</tbody>
</table>
• **Function & Robustness**
  Functional sizing rules guarantee the elementary structures to operate the desired analog functions, e.g. M1 & M2 working in saturation for current mirror operation. Robustness sizing rules define the design margin in order to decrease the sensitivity of analog performance due to the variation of process and of operation conditions, e.g. the minimal length/width/area. The margin values in Tab. 2.2 are closely dependent on technology process.

• **Equality & Inequality**
  Equality sizing rules state that the design parameters have same values or differ only by a constant factor. In general, the equality relationship exists only for the geometric quantities, e.g. $L_1 = L_2$ in NMOS current mirror. Inequality sizing rules state the upper or lower bounds of the electrical or geometric circuit quantities, e.g. $V_{DS1} \geq V_{GS1} - V_{th1}$ for M1 in saturation.

| Table 2.3: Classification of the sizing rules on CML in Tab. 2.2 |
|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| Rules No.        | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| Geometric        | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * |
| Electrical       | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * |
| Function         | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * |
| Robustness       | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * |
| Equality         | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * |
| Inequality       | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * | * |

2.1.4.2 **Automatic Sizing Flow**

The design flow of a simulation-based automatic sizing process is briefly shown in Fig. 2.5. The starting point of the automatic sizing flow is the circuit topology in schematic including the corresponding testbenches. By using the graphical interface, e.g. Virtuoso schematic editor of Cadence [Cada], the circuit netlist can be generated and be forwarded to *spice*-like simulator. By means of circuit simulation, the circuit behavior can be determined, including the circuit DC operation points, i.e. node voltages and branch currents, AC (small-signal) performances, e.g. DC gain and phase margin, and transient (large-signal) performances, e.g. delay and slew rate. The geometric sizing rules are configured according to the circuit netlist, and the electrical sizing rules are evaluated by means of the DC simulations.

The performances to be optimized and their corresponding specifications are the inputs of the cost-function (objective-function) generator. With the help of mathematical optimization algorithm, circuit optimizer can find a set of design parameters after several optimization loops. The obtained circuit performances can fulfill their specifications. The cost function and the optimization method have big effect on the time cost of the sizing process and the quality of the final results. Although the cost functions appearing in analog circuit are nonlinear with potential local minimum, experimental results show that the deterministic method performs well, e.g. sequential quadratic programming (SQP) algorithm. An acceptable amount of the starting points result in good solutions while the optimization time is kept in a reasonable cost. The detailed realizations of cost functions and optimization algorithms are beyond the scope of this thesis.
2.2 Performance Space Exploration

The design parameters are the variables of the optimization. The upper and/or lower bounds of
the design parameters should be predefined. For example, the minimal and the maximal limits
of transistor dimensions are set to avoid any unrealistic physical implementation.

In the simulation-based automatic sizing method, circuit performances need to be newly eval-
uated in each optimization loop, so a mass of simulations are required. In order to accelerate
the sizing process, the simulation tasks can be distributed onto a cluster of workstations and
be executed in parallel. A master machine is in charge of collecting the simulation results and
controls the optimization process.

The automatic sizing method provides an automatic mapping process from circuit specifications
to design parameters. Currently, the simulation-based automatic sizing method is successfully
applied to some analog circuits, e.g. OP AMP design, whose sizes are less than 20-30 devices
and the self performance evaluations (simulations) are fast.

2.2 Performance Space Exploration

As we know, circuit simulation provides an automatic mapping process from design parameters
to circuit performances. Compared to the mapping from one design point to one performance
point, performance space exploration (PSE) intends to find the whole feasible performance
space for a given circuit topology and process technology.

2.2.1 Feasible Parameter Space

There exists geometric limitations for each device, i.e. the lower or/and the upper bounds for
length/width of devices. For example in 110nm CMOS process technology, the smallest physi-
cal channel length($L_{\text{min}}$) of CMOS is 110nm. Generally in analog design, the minimal lengths
of transistors are set at least 1.2 * $L_{\text{min}}$, while the maximal bounds for length and width prevent
the devices from excessively large. For $n$-number design parameters, the initial design parama-
ter space is a $n_{d}$-dimension parameter space. Each geometrical point in the initial whole space
can be mathematically interpreted as a vector of design parameters $d \in \mathbb{R}^{n_{d}}$. In practice, the
sizing rules in Sec. 2.1.4.1 separate the whole design space into two nonoverlapping subspaces:
one parameter subspace where the associated sizing rules are violated ($c(d) < 0$) and the other
parameter subspace where the sizing rules are satisfied ($c(d) \geq 0$). The subspace, where all
sizing rules are fulfilled, is called feasible parameter space $\mathcal{D} \subset \mathbb{R}^{n_{d}}$, i.e.

$$\mathcal{D} = \{d \mid c(d) \geq 0\}, \quad c(d) \in \mathbb{R}^{q}, \quad d \in \mathbb{R}^{n_{d}}. \quad (2.9)$$

Fig. 2.6 illustrates the feasible parameter space $\mathcal{D}$ for $\mathbb{R}^{2}$.

To decrease the optimization complexity, less design variables are wanted. Through equality
sizing rules, the explicitly algebraic relationships of the correlated design parameters are known.
In consequence, the dimension of the design space can be decreased algebraically. It is worthy
to mention that only the reduced design parameters are considered throughout this thesis. With
Figure 2.5: Design flow of the automatic sizing process

Figure 2.6: Feasible parameter space $\mathcal{D}$
the reduced design parameters, all inequality sizing rules can be reformulated as \( \mathbf{c}(\mathbf{d}) \geq 0 \), a single nonlinear vector inequality interprets the combination of all the sizing rules

\[
\mathbf{c}(\mathbf{d}) \geq 0 \iff \bigwedge_{i \in \{1, \ldots, q\}} c_i(\mathbf{d}) \geq 0, 
\]  

(2.10)

where \( q \) is the total number of sizing rules and the index \( i \) denotes the \( i \)-th entry of the vector.

### 2.2.2 Feasible Performance Space

Based on the feasible parameter space \( \mathcal{D} \), the feasible performance space \( \mathcal{F} \in \mathbb{R}^n \) can be obtained from the mapping \( \mathbf{m}(\cdot) \), which expressed as

\[
\mathcal{F} = \{ \mathbf{f} \mid \mathbf{f} = \mathbf{m}(\mathbf{d}) \wedge \mathbf{d} \in \mathcal{D} \}, \quad \mathbf{d} \in \mathcal{D} \implies \mathbf{c}(\mathbf{d}) \geq 0. 
\]  

(2.11)

Each feasible parameter vector \( \mathbf{d} \) can generate a performance value. The whole feasible performance space can be obtained by means of pointwise simulation. However, too many simulations are needed to find the whole range of \( \mathcal{F} \). It would be more effective to search only for the boundary of \( \mathcal{F} \), \( \partial \mathcal{F} \), instead of the entire \( \mathcal{F} \). It is worth noting that \( \partial \mathcal{F} \) is not the mapping of the boundary of \( \mathcal{D} \), \( \partial \mathcal{D} \), generally.

![Figure 2.7: Feasible performance space \( \mathcal{F} \)](image)

The knowledge of the feasible performance space is extremely useful for designers [Ste05]:

- For a given technology and circuit topology, the feasible performance space presents the circuit ultimate capabilities without violating any sizing rules. Various circuit topologies can be easily compared with each other. The advantages and disadvantages of each topology can be accurately evaluated by the comparison on their feasible performance spaces.

- Feasible performance space offers a whole insight into the circuit performance, while traditional optimization method generates only one optimized result. Hence, analog designers have more overview on the circuit capability, and can deliberately select an compromised optimal result among the conflicting performances.

- In a hierarchical design process on a large-scale analog/mixed signal circuit, feasible performance space can be taken as the additional design constraints for the higher-level design. Considering the performance capability of the lower-level circuit can avoid any iteration steps, e.g. redefinition on specifications of subblock.
2.2.3 Performance Space Exploration

The goal of performance space exploration (PSE) is to find a description of \( \mathcal{F} \) in the performance space. In Equation 2.11, the feasible performance space \( \mathcal{F} \) is coupled to the feasible parameter space \( \mathcal{D} \). Since the performances \( f_1, \ldots, f_n \) to be optimized are usually competing with each others, circuit sizing becomes a multi-objective optimization problem [LTZ79] over the design parameter \( \mathbf{d} \) under salification with certain sizing rules \( \mathbf{c}(\mathbf{d}) \geq 0 \), mathematically expressed as

\[
\max_{\mathbf{d}} f \triangleq \mathbf{m}(\mathbf{d}) \triangleq \begin{bmatrix} m_1(\mathbf{d}) \\ \vdots \\ m_n(\mathbf{d}) \end{bmatrix} \quad \text{s.t.} \quad \mathbf{c}(\mathbf{d}) \geq 0 \implies \mathbf{d}^*, \mathbf{f}^* = \mathbf{m}(\mathbf{d}^*). \tag{2.12}
\]

In practice, it is rarely possible to find a set of design parameters that can improve all performances simultaneously. A trade-off situation mostly occurs among performances, where one performance is improved at the cost of other performances. This phenomenon introduces the concept of Pareto Optimality [HM79, LTZ79]. A set of performances \( \mathbf{f_a} \) is considered more optimal than a set of \( \mathbf{f_b} \), if \( \mathbf{f_a} \) dominates \( \mathbf{f_b} \), i.e.

\[
\mathbf{f_a} \succ \mathbf{f_b} \iff \forall i \in \{1, \ldots, n\} (f_{a,i} \geq f_{b,i}) \land \exists i \in \{1, \ldots, n\} (f_{a,i} > f_{b,i}). \tag{2.13}
\]

A Pareto optimal point is a set of performances \( \mathbf{f}^* \) which is not dominated by any other set \( \mathbf{f} \). All of the Pareto optimal points compose of a Pareto-optimal front of the performance. Generally, the points on the Pareto-optimal front are also called efficient points. A 2D Pareto-optimal front for the two performances \( f_1 \) and \( f_2 \) is shown in Fig. 2.8. The shaded area shows the feasible performance space \( \mathcal{F} \).

Performance space exploration is a computationally expensive process. Under the compromise between accuracy and time cost, a deterministic PSE method, i.e. Normal-Boundary Intersection (NBI) method is adopted in this thesis. NBI method is fit for high accuracy and low-dimensional performance exploration.

2.2.3.1 Normal-Boundary Intersection

The Normal-Boundary Intersection in [SGA03] is a two-step process. In the first step, the individual maxima (IM) \( f_i^* \), where the individual performance \( f_i \) shows its global maxima \( f_i^* \), are determined by maximizing the following objective function \( o_{\text{IM},i} \):

\[
\max_{\mathbf{d}} \left[ o_{\text{IM},i} \triangleq m_i(\mathbf{d}) \right] \quad \text{s.t.} \quad \mathbf{c}(\mathbf{d}) \geq 0, \quad i = 1, \ldots, n_f \implies \mathbf{d}^*, f_i^* = m_i(\mathbf{d}^*). \tag{2.14}
\]

IM build up the matrix \( \mathbf{F} \):

\[
\mathbf{F} = \begin{bmatrix} f_1^* & \cdots & f_{n_f}^* \end{bmatrix}. \tag{2.15}
\]

\( \dagger \) Optimization is formulated as maximization. Minimization can be included by maximization of the negative values.
It is easy to know that these global maxima are the boundary points of the Pareto-optimal front. And the efficient points should locate between these IMs. Assume a polyhedron $H$ connects with the individual IMs as its corner points. The key idea of NBI is to search for efficient points by starting from the points on $H$ and going along the lines $n$, which are perpendicular to $H$ and direct away from origin. If the respective portion of $\partial F$ is convex, the search results in Pareto-optimal points. Moreover, when the starting points are evenly distributed over $H$, the found efficient points are also well-placed on the Pareto-optimal front. In fact, the searching direction is not necessary to be exactly perpendicular to $H$. The quasi-normal vector $n$ is calculated by

$$n = f_1^* + \cdots + f_{n_f}^*.$$ (2.16)

The efficient points are the best performance set of the circuit under satisfying the sizing rules:

$$\max_{d, \lambda} \lambda \quad s.t. \quad F \cdot w + \lambda \cdot n = f(d) \land c(d) \geq 0,$$ (2.17)

where $w$ is the set of weights $w = [w_1 w_2 \cdots w_J]^T$ and

$$\sum_{j=1}^{J} w_j = 1 \quad \text{and} \quad w_j > 0 \quad \text{for} \quad j \in 1, \ldots, J.$$ (2.18)

Fig. 2.8 illustrates the method for $n_f = 2$ and $J = 3$. The points $F \cdot w$ lie on the connecting line of the individual minima in the 2D case.

## 2.3 Summary

In this chapter, two automatic design methods for analog design have been described, i.e. the top-down automatic sizing process and the bottom-up performance space exploration.
For automatic sizing process, the sizing rules are used to prevent the automatically sized circuits from senseless results. The design flow of the simulation-based automatic sizing method are discussed. Since performance evaluation is by means of circuit-level simulation, this method is feasible only for small analog circuit design currently.

For performance space exploration, [SGA03, SG03, Her03, SGA04, MSGS05] focus on how to efficiently and accurately perform PSE by means of statistical or deterministic, simulation-based or equation-based methods. A normal-boundary intersection method is applied in this thesis to generate Pareto-optimal fronts.

In the next chapter, how to embed both analog design methods into the proposed hierarchical optimization process will be introduced.
Chapter 3

Proposed Hierarchical Optimization Methodology

For a large-scale analog/mixed-signal circuit, a comprehensive hierarchical optimization methodology is proposed in this thesis. Both analog design methods discussed in Chapter 2, i.e. automatic sizing process and performance space exploration, are well integrated in this methodology. The details of the design flow are described in this chapter.

First of all, let us discuss why hierarchical optimization process is needed and a flat simulation-based optimization method is not feasible for a large-scale analog/mixed signal circuit. The main reasons are the following two factors:

- **Time cost:** A circuit-level simulation on large-scale analog/mixed signal circuit by using numerical simulator is a huge time-consuming process. It is well known that higher simulation accuracy is achieved by setting smaller step size in numerical simulators. Although the accuracy requirements (i.e. step size in the simulation setup) could be different for each block, so long as all blocks are simulated together, the step size of the numerical simulator has to be set small enough to provide an accurate evaluation on the whole circuit. Hence, the common step size is configured as the cut set of all individual step sizes. Why does a single transient simulation (large-signal analysis) of a phase-locked loop last hours or days? Let’s have a look at one example in Fig 3.1, the states (voltages/currents) of the blocks CP and PFD need to be updated every 20ps and 30ps respectively. However, the common step size of the simulation has to be smaller than 8ps because the update rate of the block VCO is 8ps. Consequently, many computations on the blocks PFD and CP are redundant, which results in more computing cost and longer simulation time.

- **Optimization complexity:** The design space and the optimization complexity increases exponentially with the number of design parameters. If all blocks are simultaneously optimized at once, the optimization complexity will become extremely high due to the large number of design parameters.
3 Proposed Hierarchical Optimization Methodology

![Diagram of CP, VCO, and PFD with τ common ≤ min (τ₁, τ₂, τ₃)]

Figure 3.1: Step sizes in numerical simulation

3.1 Hierarchical Top-Down Circuit Sizing

Traditional bottom-up sizing method is not efficient for the design of a complex analog/mixed-signal circuit. One of the most common problems is failure to meet the initial system requirement, which is usually caused by the loose definition of specifications for building blocks. A hierarchical top-down sizing method is adopted to address this issue. The basic idea is that the whole system is partitioned into subsystems, which are further decomposed into smaller function blocks. Based on the initial requirements on the whole system, the respective specifications of each subsystem are first derived by the high-level design. Then, each function block is built in order to fulfill the subsystem requirements. Finally, all parts of the system are integrated together and are verified with respect to the original requirements.

The definition of each hierarchical level is closed dependent on the design system and its complexity. The number of hierarchical levels could be three or even more. In this thesis, only two design levels are considered in hierarchy. Fig 3.2 shows a two-level hierarchical decomposition of an analog/mixed-signal system, i.e., system level and circuit level. At the top of the hierarchy, the entire system performance can be expressed as

\[
f = [f₁, f₂, f₃, \ldots]^T.
\]  (3.1)

**System-Level Parameters**  The entire system is partitioned into \( α = A, B, \ldots \) building blocks. The performances of \( α \) building block, which do have effect on the system performances, are noted as \( p^T_α \). The whole system-level parameters can be expressed as

\[
p = [p^T_A, p^T_B, \ldots]^T.
\]  (3.2)

**System-Level Simulation**  The system-level simulation provides a mapping of the system-level parameters onto the system performances: \( p \mapsto f \). The simulation on system level runs on behavioral models, which represent circuit function in special languages instead of the detailed device models. Normally, the simulation based on behavioral models is very fast, so the simulation-based automatic sizing method is feasible on system-level design.
3.1 Hierarchical Top-Down Circuit Sizing

Automatic Sizing on System Level  System performances are optimized subject to system-level constraints by sizing of the system-level parameters \( p \):

\[
\max_{p} f \quad s.t. \quad c(p) \geq 0
\]

\[
\implies p_{\text{opt}}, \quad f_{\text{opt}} = m(p_{\text{opt}}), \quad (3.3)
\]

with

\[
c(p) = \{c_s(p) \geq 0 \land c_{p\alpha}(p_{\alpha}) \geq 0, \alpha = A, B, \ldots \}. \quad (3.4)
\]

The system-level constraint \( c(p) \) includes two aspects: 1) the necessary design constraints for system itself: \( c_s(p) \geq 0 \), e.g. system stability criterion; 2) a pure top-down refinement of specifications may easily produce overambitious block specifications if the performance capabilities of the underlying analog circuit implementations are not taken into account. Consequently, low-level physical effects have to be propagated bottom-up. At system-level sizing, it is necessary to have a description of the performance capabilities of the each functional block: \( c_{p\alpha}(p_{\alpha}) \geq 0 \).

Circuit-Level Parameters  At the circuit level, parameters such as transistor dimensions, resistance, capacitance, inductance and bias voltage/current are assigned to the design parameters of \( \alpha \) building block at circuit level, expressed as

\[
x_{\alpha} = [W_1, L_1, W_2, L_2, \ldots, C, R, I_{\text{bias}}, \ldots]^T. \quad (3.5)
\]

Circuit-Level Simulation  Circuit simulation provides a mapping of the circuit-level parameters onto the circuit-level performances, i.e. the system-level parameters: \( x_{\alpha} \mapsto p_{\alpha}, \alpha = A, B, \ldots, Z \). Normally, the simulation of each building block is very fast, so the simulation-based automatic sizing method is feasible on circuit-level design of each building block.

Automatic Sizing on Circuit Level  The optimized values of system-level parameters, i.e. \( p_{\text{opt}} \), are propagated to the performance specifications of circuit-level. These specifications are achieved by the final circuit-level results, \( x_{\text{opt}} \). The automatic sizing process on \( \alpha \) building block can be formulated as

\[
\min_{x_{\alpha}} \|m(x_{\alpha}) - p_{\text{opt}}\| \quad s.t. \quad c_{x_{\alpha}}(x_{\alpha}) \geq 0, \quad \alpha = A, B, \ldots \]

\[
\implies x_{\text{opt}}, \quad p_{\text{opt}} = m(x_{\text{opt}}). \quad (3.6)
\]

\( c_{x_{\alpha}}(x_{\alpha}) \geq 0 \) is the circuit-level sizing constraints. The automatic sizing processes for all building blocks can be executed in parallel.

In the hierarchical sizing process, we start with system specifications, follow the top-down propagation of the specifications, and end up with a fully sized circuit implementation on circuit level. Since the performance capabilities of the underlying analog physical implementations are already taken into account, a first-time-successful top-down sizing process can be realized consequently.
3 Proposed Hierarchical Optimization Methodology

Figure 3.2: Hierarchical sizing of a large-scale analog/mixed-signal circuit

3.2 Pareto-Optimal Front in Hierarchical Optimization

Performance space exploration (PSE) transforms technological constraints bottom-up into feasible performance spaces. Therefore, the feasible region of system-level parameters $p_\alpha$, i.e. $c_{pa}(p_\alpha) \geq 0$ can be defined by PSE method on $\alpha$ block. Since PSE is a computationally expensive process, is it really necessary to know the entire feasible region of circuit performances for the optimization at system level? Most of current PSE methods are proposed to find the Pareto-optimal front instead of the entire feasible region. In practice, Pareto-optimal front is sufficient for the system-level optimization.

In analog circuits, a circuit performance has normally the monotone relationship with one design parameter. For example in Fig. 1.3(a), DC gain becomes larger with a larger width of the input NMOS, slew rate increases with the increasing bias current or with the decreasing compensation capacitance. The performances of PLLs in Chapter 4 and of A/D converters in Chapter 5 satisfy the monotone relationship with their design parameters. To simplify theory analysis, a two-dimensional design space $x = (x_1, x_2)$ and a maximization problem with a two-dimension objective space $f = (f_1(x_1, x_2), f_2(x_1, x_2))$ are taken here as an example. Suppose there exists a dominated point $(x_1^\diamond, x_2^\diamond)$ in design parameter space, which contributes to an efficient point on the Pareto-optimal front in the objective space. However, this assumption conflicts with the essential definition of the Pareto-optimal front. For the point $(x_1^\ast, x_2^\ast)$, there exists a point $(x_1^\diamond, x_2^\diamond)$, whose $x_1^\ast$ dominates $x_1^\diamond$. It results in

$$x_1^\ast > x_1^\diamond \quad \Rightarrow \quad f_1(x_1^\ast, x_2^\ast) > f_1(x_1^\diamond, x_2^\diamond) \quad \land \quad f_2(x_1^\ast, x_2^\ast) > f_2(x_1^\diamond, x_2^\diamond)$$

(3.7)

Or there exists a point $(x_1^\diamond, x_2^\ast)$, whose $x_2^\ast$ dominates $x_2^\diamond$, it results in

$$x_2^\ast > x_2^\diamond \quad \Rightarrow \quad f_1(x_1^\ast, x_2^\ast) > f_1(x_1^\diamond, x_2^\diamond) \quad \land \quad f_2(x_1^\ast, x_2^\ast) > f_2(x_1^\diamond, x_2^\diamond)$$

(3.8)

Therefore, for a dominated design point $x^\diamond$ and the corresponding performance $f^\diamond$, there exists
definitely a dominating point $x^*$ and its corresponding dominating performances $f^*$. In the hierarchical design, the circuit-level performances are used as design variables at the system level. Hence, an optimal system-level performance should be generated by the design points lying on the Pareto-optimal front of the circuit level. This is coincident with the conclusion in [EMG05]. A lower-level solution moving towards its Pareto-optimal front should result in a movement of its corresponding higher-level solution towards the higher-level Pareto-optimal front, as visualized in Fig. 3.3. According to the conclusion, the system-level parameters $p_\alpha$ can be restricted on the Pareto-optimal front of $\alpha$ block during the system-level optimization.

![Figure 3.3: Pareto-optimal fronts in hierarchical optimization](image)

### 3.3 Behavioral Modeling

By using a numerical simulator, circuit-level simulation based on the BSIM device model is accurate but requires huge computational time cost, which is not practical for the optimization process. As the number of design parameters and the complexity of analog/mixed signal circuits continue to increase, circuit-level simulation is and will continue to be a critical issue for designers. An alternative to the circuit-level simulation, more efficient behavior-level simulation is becoming popular and gradually accepted by analog designers. Behavior-level simulation uses behavioral models, "that reflect the terminal characteristics of functions realized by circuits rather than the circuit level (transistor-level) details of the circuit " [CPH94].

Behavioral models can be applied not only to bottom-up verification process but also to top-down design process. At the beginning design phase of a large-scale analog/mixed signal circuit, it is more efficient and economical to use behavior-level simulations to check design concepts and to explore several system architectures, instead of undertaking the detailed final circuit realization. At the final design phase, behavioral models can obviously reduce the verification time. This thesis focuses on the application of behavioral models in the top-down design process.
Should the behavioral model exactly or approximately represent the electronic functions of the circuit? A compromised decision has to be made between simulation cost and model accuracy. On the one hand, behavioral model describes a circuit functionality with relatively few components, making simulation much faster. On the other hand, a behavioral model includes more secondary circuit effects, presenting more accurate circuit behavior but making simulation slower. To achieve a good balance between computational efficiency and performance accuracy, the proper CAD tools are required which well cooperate with the behavioral models. Two most popular modeling methods in analog/mixed-signal design are briefly described in the following.

3.3.1 Modeling in Hardware Description Languages

Hardware description languages (HDLs) are programming languages designed to describe the behavior of physical devices and processes. Behavioral models written in various kinds of HDLs have their corresponding suitable simulators. Two mixed-signal hardware description languages, VHDL-AMS [VHD] and Verilog-AMS [Vera] are developed for high-level verification on large-scale analog/mixed-signal systems. The commercial tools used are the Advance-MS simulator from Mentor Graphics [Men] for VHDL-AMS and the AMS Simulator from Cadence Design Systems [Cada] for Verilog-AMS. As indicated by their names, both mixed-signal simulators and HDLs can applied not only on digital circuits but analog circuits as well. The simulators are capable of simulation on pure circuit-level simulations, or on pure behavior-level simulations, or on mixed-level simulation (some blocks are circuit realization, while others are behavioral modules). Both digital and analog design share the same design environment and flow, which provides a straight-forward method to use behavioral models and improves design efficiencies. VHDL-AMS and Verilog-AMS describe complex continuous analog systems in form of differential algebraic equations (DAEs), which enables numerical simulators to evaluate the behavioral models like the usual device models. Additionally, they inherit the event-driven capability from digital simulator engines and allow analog event-driven models for analog simulation [CB99].

A VHDL-AMS model consists of one entity and one or more architectures, as shown in Fig 3.4(a). The entity specifies the interface of the model to the outside circuits. It includes the description of the model ports (the input/output pins that connect to other models) and the definition of its generic parameters. Fig 3.4(b) shows a mode of an ideal OP AMP with slew-rate limiting. There are two inputs (plus_in, minus_in) and one output (vout). VHDL-AMS introduces quantity to represent the unknown variables in the DAEs. The architecture contains the detailed implementation of the model, which can have three different styles: 1) Structure style is a description on the netlist connections of the elementary models. 2) Behavioral style is a transformation description from inputs to outputs by means of concurrent statements or/and simultaneous statements. 3) A style combines structural and behavioral elements. Simultaneous statements describe the continuous behavior by differential algebraic equations. Concurrent statements describe event-driven behaviors, e.g. concurrent signal assignment and process assignments like digital modeling. More details on VHDL-AMS modeling can be found in [CB99, APT02].

* "It is important to recognize that the AMS languages are primarily for verification. Unlike the digital languages, the AMS languages will not be used for synthesis in the foreseeable future because the only synthesis that is available for analog circuit is very narrowly focused" [KHC+01].
3.3 Behavioral Modeling

Entity
generic definition
port definition

Architecture 1
simultaneous statements
concurrent statements

Architecture 2

Architecture 3

(a)

(b)

Figure 3.4: (a) Basic structure of VHDL-AMS Model (b) A VHDL-AMS Model of an ideal OP AMP with slew-rate limiting [APT02]

3.3.2 Modeling in Simulink

Simulink is a companion of MATLAB, developed by the MathWorks [Mat]. A user friendly graphical interface is available for building models, which makes it easier to maintain an overview of the system and subblocks. Simulink includes a comprehensive library of standard models, e.g. sinks, sources, linear and nonlinear components. Designers can create and customize their own blocks by assembling models directly using these standard blocks or coding in MATLAB m-files. Moreover, simulink has built-in ordinary differential equation (ODE) solvers, which are automatically configured multi-rate at run-time of each model, i.e. that different parts are sampled or updated at different rates.

A behavioral model example of an integrator in Simulink, which includes its non-idealities, is shown in Fig. 3.5 [MBF+03]. In the model, the limited slew rate and the limited bandwidth performances are modeled in “GBW & SR” MATLAB function. The limited gain is modeled through that only fraction $\alpha$ of the previous integrator output is added to each new input sample. The limited output range can be simply modeled by using a saturation block inside the feedback loop of the integrator. As we can seen from this example, all behavioral modules for sub-building block are already available in Simulink’s library, the work for designers is to construct more complex models based on these fundamental cells.

Because of its strong capability and convenient use interface, Simulink is widely adopted in modeling tasks. However, Simulink is difficult to be integrated into the current popular analog/mixed-signal circuit design environments, e.g. Advance-MS [Men] and Virtuoso AMS [Cada]. While the behavioral modeling in HDLs is in the same design environment as circuit design, it is more convenient for designers to build up model, design circuit and simu-
late them in a single design environment. With the respective advantages and disadvantages of HDLs and Simulink, the optimal choice can always be made with respect to the task and the designers.

### 3.4 Proposed Hierarchical Optimization Flow

Based on the previous introduced hierarchical sizing flow, automatic sizing method, performance space exploration and behavioral modeling, a comprehensive hierarchical optimization methodology for large-scale analog/mixed signal circuits is proposed here and shown in Fig. 3.6. It consists mainly of four steps:

1. Starting with circuit-level realization of each building block, their proper testbenches and the corresponding extraction processes of performances are built firstly. These testbenches and the performance extractions are used not only in the performance space exploration process (bottom-up phase) but also in the automatic sizing process on circuit level (top-down phase).

2. Through applying PSE method to each block respectively, their individual Pareto-optimal front can be obtained. At system level, the whole circuit is modeled in HDLs or Simulink. In addition to the description of circuit function in these behavioral models, Pareto-optimal fronts are embedded so that the variations of the system-level parameters are restricted on their own Pareto-optimal fronts during the optimization process at system level.

3. Since the system performance can be quickly evaluated through simulation based on the behavioral models, a simulation-based automatic sizing process is feasible to generate optimized system-level parameters in acceptable time cost. These obtained values of the system-level parameters are directly propagated to the specifications of the circuit-level performances.

4. Afterwards, each building block can be designed individually and in parallel. The simulation-based automatic sizing process on each block can be accomplished with an acceptable time cost.
In summary, the proposed hierarchical optimization methodology is characterized by a bottom-up extraction process of circuit capability and a top-down hierarchical automatic sizing process.
3.5 Hierarchical Optimization based on Worst-Case-Aware Pareto-Optimal Front

The Pareto-optimal fronts extracted from the most current PSE methods [SGA03, SGA04, EMG05, MSGS05] are nominal Pareto-optimal fronts, which consider only design parameters. As device sizes shrink, the effect due to the variations of statistical parameters and of operation parameters becomes more and more pronounced in analog design. The nominal Pareto-optimal fronts cannot represent the actual capability of the circuit after fabrication. Generally, it will result in a poor yield if a nominal design point lies in the tail of the performance distribution. Therefore, a yield-aware Pareto-optimal front (points on the front guaranteeing a fixed yield number) would be very useful for improvement on production yield.

In [TTR06], a solution generates firstly the nominal Pareto-optimal fronts, and then combines with Monte Carlo approximation to compute yield-aware Pareto-optimal fronts. Compared to the high computational cost Monte Carlo analysis, worst-case analysis is applied to extract a worse-case-aware Pareto-optimal front (design point on the front guaranteeing a target yield number even under the worst-case operation condition) in this thesis.

Worst-Case Analysis The following discussion of worst-case analysis is based on one performance. Worst-case analysis takes into account fluctuations in the fabrication process and changes in the operating condition. Therefore, statistical parameters and operational parameters are considered through the worst-case analysis, whereas design parameters remain constant. In consequence,

\[
    \mathbf{d}^* = \text{const} \quad \text{and} \quad \mathbf{c}(\mathbf{d}^*) = \text{const} \quad (3.9)
\]

holds throughout the worst-case analysis.

Worst-case analysis (WCA) [AGC94] intends to find the lowest performance value that is obtained for a given design parameter set \( \mathbf{p}^* \), a given tolerance region \( T_\theta \) of operation parameters and a given tolerance region \( T_s \) of statistical parameters:

\[
    \min_{\mathbf{s}, \theta} f(\mathbf{d}^*, \mathbf{s}, \theta) \quad \text{s.t.} \quad \mathbf{s} \in T_s, \theta \in T_\theta, \quad (3.10)
\]

where

\[
    T_\theta = \{ \theta | \theta_L \leq \theta \leq \theta_U \} \quad (3.11)
\]

\[
    T_s = \{ \mathbf{s} | \|\mathbf{s}\|_C^2 = (\mathbf{s} - \mathbf{s}_0)^T \mathbf{C}^{-1} (\mathbf{s} - \mathbf{s}_0) \leq \beta_w^2 \} \quad (3.12)
\]

\( \theta_L, \theta_U \) define the lower and upper boundaries of the operation parameters. The circuit robustness \( \beta_w \) describes a weighted distance of the nominal design point \( \mathbf{s}_0 \) in the space of the statistical parameters \( \mathbf{s} \). \( \|\mathbf{s}\|_C^2 \) represents a tolerance ellipsoid according to the statistical parameter distribution, where \( \mathbf{C} \) is the covariance matrix of statistical parameters. As we know, there always exists a matrix \( \mathbf{G} \), such that the transformed statistical parameter \( \mathbf{G}(\mathbf{s} - \mathbf{s}_0) \) follows the distribution of \( N(0, 1) \) [Pap91], as depicted in Fig. 3.7 and given by

\[
    \|\mathbf{s} - \mathbf{s}_0\|_G = \|\mathbf{G}(\mathbf{s} - \mathbf{s}_0)\| = \sqrt{(\mathbf{s} - \mathbf{s}_0)^T \mathbf{G}^T \mathbf{G}(\mathbf{s} - \mathbf{s}_0)} \quad (3.13)
\]

\[
    \mathbf{C}^{-1} = \mathbf{G}^T \mathbf{G} \quad (3.14)
\]
3.5 Hierarchical Optimization based on Worst-Case-Aware Pareto-Optimal Front

Figure 3.7: Transformation from $s \sim N(s_0, C)$ into $s \sim N(0, 1)$

Without loss of generality, the statistical parameter $s$ is assumed as

$$s \sim N(0, 1),$$

(3.15)

whose statistical distribution is depicted in a unit cycle instead of ellipsoid throughout this thesis.

Using $s_w$ to present the unique worst-case parameter set and $f_w$ to denote the corresponding performance value at $s_w$:

$$f_w = m(s_w).$$

(3.16)

Then $f_w$ is the smallest performance value that can be achieved for each $s$ in the volume $\|s\|^2 \leq \beta_w^2$. The definition of worst-case analysis can be additionally interpreted as following, visualized in Fig. 3.8:

- "finding the greatest lower bound of the performance that violates the specified circuit robustness $\beta_w$ for all operation conditions,
- choosing a lower specification that results in at least the given robustness $\beta_w$ under worst-case conditions,
- picking the smallest performance value in the given sphere $\|s\|^2 = \beta_w^2$" [Sch04].

The robust measure $\beta_w$ is also called the worst-case distance, which is the positive distance between the nominal parameter set $s_0$ (origin in Fig. 3.8) and the worst-case parameter set $s_w$, if the specification is fulfilled for all operation conditions at the nominal parameter set. Otherwise, it is the negative distance between the nominal parameter set and the worst-case parameter set $s_w$. The worst-case distance is related to the robustness of a specification concerning disturbances in fabrication and operation. According to [Gra93], the yield of production can be derived from the worst-case distance. The larger the distance between the worst-case point and the nominal point is, the fewer produced circuits will lie behind the worst-case point and violate the specification. Therefore, the yield increases with the worst-case distance, whose relationship is listed in Tab. 3.1.

<table>
<thead>
<tr>
<th>$\beta_w$</th>
<th>-4</th>
<th>-3</th>
<th>-2</th>
<th>-1</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
</tr>
</thead>
<tbody>
<tr>
<td>Yield [%]</td>
<td>0.01</td>
<td>0.13</td>
<td>2.28</td>
<td>15.87</td>
<td>50.00</td>
<td>84.13</td>
<td>97.73</td>
<td>99.87</td>
<td>99.99</td>
</tr>
</tbody>
</table>

Table 3.1: Yield estimation by worst-case distance
According to a target yield value, the worst-case distance can be selected from Tab. 3.1. For instance, $\beta_w = 3$ corresponds to a yield of 99.87% that can be guaranteed for each performance. Then, the worst-case parameter set $s_w$ can be calculated through Equation 3.10, and the corresponding worst-case performance value $f_w$ is evaluated. Through applying the worst-case analysis (WCA) to each efficient point of the nominal Pareto-optimal front, their corresponding efficient points of the worst-case-aware Pareto-optimal front can be obtained. Fig. 3.9 illustrates the extraction process for a worst-case-aware Pareto-optimal front. There are two individual WCAs for $f_1$ and $f_2$ on each efficient nominal point. The respective worst-case values of $f_1$ and of $f_2$ build together an efficient point of the worst-case-aware Pareto-optimal front, although both values are not generated at the same time in practice. Instead of the nominal Pareto-optimal front, the worst-case-aware Pareto-optimal front is embedded into the behavioral modeling during the proposed hierarchical optimization methodology, the optimization results represent the actual circuit performances with a target yield after fabrication.
Phase-locked loops (PLLs) provide well-timed on-chip clock signals for various applications in areas such as communications, wireless systems, dynamic random access memory and disk drive electronics. Compared to the design concept of phase locking has almost keep same since it was invented in the 1930s, the design and implementation of PLLs have always developed with the progress of the ICs. The requirements on PLLs becomes more and more stringent, such as accurate clock timing, less power consumption and small area, robust noise rejection. Due to the PLL's circuit complexity, the optimization task of PLL is big challenge even for experienced analog designers. Currently, there are two strategies for optimization to design high performance PLLs. One way is through heuristic circuit topologies [Man96, Man03, WTHN04, CA05, KLK05]. The other way is through tuning design parameters for a given circuit topology [CPH03, TVRM04, ZMG05, TTR06, ZMGS06]. In this thesis, the second way is adopted. By using the proposed hierarchical optimization methodology in Chapter 3, an efficient design process for PLLs can be realized in an acceptable time cost.

In this chapter, Sec. 4.1 introduces the fundamentals of PLLs. Sec. 4.2 reviews the analysis on PLL systems and Sec. 4.3 lists out some performances of PLL and summarizes the major design trade-offs. Sec. 4.4 shows the details of the proposed hierarchical optimization method on a charge-pump phase-locked loop (CPPLL). Sec. 4.5 gives a comprehensive performance space exploration (PSE) on a whole CPPLL system. Finally, Sec. 4.6 concludes.

4.1 CPPLL Fundamentals

4.1.1 PLL Introduction

The primary function of phase-locked loops is to generate an output clock whose phase is locked to the phase of the input reference clock. In contrast to conventional feedback circuits operate on voltage/current amplitudes or their changing rate, PLL feedback systems work on signal’s phases. The operation of PLLs is briefly described as follows. More details can be found in [Raz96, Raz01, Bes03].
The basic building blocks of a typical PLL consists of a phase detector (PD), a low-pass loop filter (LPF), a voltage-controlled oscillator (VCO) and a frequency divider (D) as shown in Fig. 4.1. In the forward path, the PD produces an error output signal $\Delta \phi$ based on the phase difference between the phase of the feedback clock $\phi_{fb}$ and the phase of reference signal $\phi_{ref}$. A small frequency difference accumulates over time, which results in an increasing phase error. The error signal $\Delta \phi$ is filtered by the LPF and also transferred to a control voltage/current ($V/I$) to drive the subsequent VCO. According to the amplitude of the voltage/current, the VCO generates an output clock signal with the corresponding frequency. In the feedback path, the output frequency of the oscillation signal is scaled down by a factor $N$ through the frequency divider. The scaled-down signal will compare to the reference input signal at the PD block. Through the control loop, the output frequency of VCO is adjusted till both phases $\phi_{ref}$ and $\phi_{out}$ align with each other. PLLs experience two work states during the lock process. When $f_{out}$ approaches the desired output frequency asymptotically, the process is called as nonlinear acquisition process. When the frequency and the phase of the output clock are synchronized with the reference clock, i.e. a multiplies output frequency $f_{out}$: $f_{out} = N f_{ref}$, the output phase is locked with the input phase. This process is called as linear locked-in state.

**Figure 4.1:** A block diagram for a typical PLL

### 4.1.2 CPPLL Building Blocks

Charge-pump phase-locked loops (CPPLLS) are widely used in most PLL systems, since they provide a theoretical zero static phase error (offset), an extended tracking range and a frequency-aided acquisition. A typical 3rd-order CPPLL is one of the simplest and most effective circuit topology, which includes a phase frequency detector, a charge pump, a 2nd-order passive RC filter, a voltage-controlled oscillator and a frequency divider, whose block diagram is shown in Fig. 4.2. Each building block is briefly described below.

#### 4.1.2.1 Phase Frequency Detector (PFD)

PD can only detect the phase difference between two input signals and produce an error signal that is proportional to the phase difference. However the PD is insensitive to the frequency difference of the input signals. When the frequency of the feedback signal is far from the
frequency of the reference signal, the PLL may lock at a wrong frequency, e.g. \( f_{out} = 0.5 \times N f_{ref} \). The problem is due to an inadequate acquisition range of the PD. A phase frequency detector (PFD) is used to tackle the problem. PFD can detect the difference not only of the phase but of the frequency as well, so that the PLL can lock under any condition if the PLL system itself is stable. Therefore, PFD is the preferred comparator type compared to other phase detectors as Multiplier PDs or JK-Flip-Flops.

A typical PFD usually has three logical work-states, as shown in Fig. 4.3(a). Initially, PFD is in the “ground” state, i.e. \( state = 0 \), where signals \( up = dn = 0 \). If a rising transition of \( ref \) comes earlier than \( fb \), then PFD becomes “charging” state, i.e. \( state = 1 \), where signal \( up = 1 \), signal \( dn = 0 \). The PFD remains in this state until a rising transition occurs on \( fb \) earlier than
4 Hierarchical Optimization of Charge-Pump Phase-Locked Loops

on \(ref\), then the PFD returns to “ground”. If the next rising edge of \(fb\) is still ahead of \(ref\), the PFD jumps from “ground” to “discharging”, i.e. \(state = -1\), where signal \(up = 0\), signal \(dn = 1\). Fig. 4.3(b) shows the switching processes of signals \(up\) and \(dn\) according to the rising transitions of both input signals \(ref\) and \(fb\). Note that signals \(up\) and \(dn\) cannot be active at the same time.

4.1.2.2 Charge Pump and Loop Filter (CP & LF)

As shown in Fig. 4.2, a charge pump (CP) consists of two current sources: one source current and one sink current. In ideal case, the amplitude of both current sources are identical. The CP charges and discharges the subsequent loop filter according to the control signal from PFD. When signal \(up\) (\(dn\)) is active (=1), the current flows into (out of) the loop filter. In consequence, the LF’s output voltage will be pulled up (down).

The loop filter in PLL is usually a low-pass passive filter consisting of a resistor \(R\) in series with a capacitor \(C_1\). Through the combination operation of the CP and of the LF, the discrete digital pulse signal from the PFD is firstly converted to a current pulse, and then transferred to a continuous voltage signal, which determines the output frequency of the subsequent VCO block. Since the voltage signal directly modulates the VCO frequency, any dithering of the voltage will introduce excessive jitters on the VCO output signal. As well known, the voltage across a capacitor cannot instantaneously change. Due to often switching on/off actions on the source current and the sink current in CP, a voltage ripple on the output node of the loop filter occurs with a value \(I_{CP} \cdot R\) at the rising edge of each PFD pulse, while another ripple occurs with same value but in the opposite direction at the falling edge of the PFD pulse. In order to suppress the undesired voltage ripples, another capacitor \(C_2\) is placed in parallel with the \(R\) and \(C_1\) network as shown in Fig. 4.2.

In the PLL phase-controlled feedback system, a 1st-order RC passive filter introduces one pole and one zero. A VCO block is a phase integrator which contributes one intrinsic pole. Hence the whole PLL feedback system has two poles and one zero together. A 2nd-order PLL system is unconditional stable. Since the additional capacitor \(C_2\) introduces another pole, the PLL system increases from 2nd-order to 3rd-order. The degradation of system stability due to the additional pole has to be taken into account, which will be more discussed in Section 4.2.

4.1.2.3 Voltage-Controlled Oscillator (VCO)

How to adjust the phase of PLLs? The principle way is to tune the frequency of the PLL’s output signal, i.e. the output frequency of the voltage-controlled oscillators (VCOs). VCOs can be realized in various technologies, e.g. CMOS ring oscillators or LC oscillators. LC-VCO has better phase noise (jitter) performance for a given power drain [HL99] and can achieve very high frequencies. However, CMOS ring oscillators are capable of monolithic integration and cost less power compared to LC-VCOs.

A five-stage CMOS single-ended ring oscillator is shown in Fig. 4.4. In this kind of VCO, the input voltage \(V_{\text{ref}}\) controls the current through the delay elements, thus determines the delay time of each stage and ultimately determines the output oscillation frequency. An ideal VCO
generates a periodic signal whose frequency is a linear and limited function of the controlling voltage, as shown in Fig. 4.5. The output frequency $f_{\text{out}}$ can be expressed as

$$f_{\text{out}} = (V_{\text{ctrl}} - V_{\text{min}}) \cdot K_{\text{VCO}} + f_{\text{min}},$$  \hspace{1cm} (4.1)

where $K_{\text{VCO}}$ is defined as the linear gain of VCO, i.e.

$$K_{\text{VCO}} = \frac{f_{\text{max}} - f_{\text{min}}}{V_{\text{max}} - V_{\text{min}}}. \hspace{1cm} (4.2)$$

Each VCO has its output frequency range, here $f_{\text{min}}$ is the minimal limit and $f_{\text{max}}$ is the maximum limit, $V_{\text{max}}$ and $V_{\text{min}}$ correspond to the minimal and maximum input voltages and $V_{\text{ctrl}}$ is the input control voltage of VCO, i.e. the output voltage of the loop filter.

As a crystal is not capable to fulfill the increasing requirement on high speed circuits, PLLs now take the role to provide a reference clock signal for other circuits in a chip. By appropriate configuration of the divide ratio, PLLs can generate an output signal which has much higher frequency than the input frequency and inherits much more stability. The divider’s value can be integer or fraction according to the reference frequency and the desired output frequency.
Fig 4.6 shows the circuit diagram of a 1/4 divider. There are two D flip-flops (DFFs) connecting in serial. The output of the first DFF is the input of the second DFF, and the inversion of the second DFF feeds back to the first DFF. Both DFFs are driven by a same input clock signal and are asynchronously reset.

![Circuit diagram of a 1/4 divider](image)

**Figure 4.6:** Circuit diagram of a 1/4 divider

### 4.2 Analysis Methods on PLL System

Same as analysis on conventional feedback control systems, *s-domain* analysis (linear approximation) in [M.G80] is usually used to gain the intuition when PLLs work in lock-in state. *Impulse invariance* method in [HS88] and *state space* analysis in [HBMM04] are used to estimate the properties when PLLs work in nonlinear acquisition process including the discrete sampling nature of PFD operation. The following equations in this section are excerpted from these references.

#### 4.2.1 s-domain Analysis

When a CPPLL works in lock-in state, there is a small phase difference between the reference input signal and the scaled-down feedback signal. Meanwhile, the reference frequency is much higher than the loop bandwidth of the whole CPPLL. Hence, the sampling operation of PFD can be approximated as a continuous-time action. A linear model in *s*-domain is applied to analyze the CPPLL system. The linear model in Fig. 4.7 shows the transfer function of each building block. The model provides the overall transfer function for the phase $\phi_{\text{out}}(s)/\phi_{\text{ref}}(s)$.

A subtracter represents the comparison operation of the PFD block. The gain of PFD along with the CP can be expressed as $I_{CP}/2\pi$; the transfer function of a 2\textsuperscript{nd}-order LF, $F_{LF}(s)$, can be derived using linear analysis and is equal to

$$F_{LF}(s) = \frac{s + \frac{1}{RC_1}}{C_2s + \left(\frac{C_1+C_2}{RC_1C_2}\right)} = \frac{\frac{s}{\omega_p} + 1}{\frac{s}{\omega_p} + 1},$$

(4.3)
4.2 Analysis Methods on PLL System

where the loop filter has one zero at \( \omega_z = \frac{1}{RC_1} \) and one pole at \( \omega_{p3} = \frac{C_1 + C_2}{RC_1C_2} \). The VCO is usually assumed as an ideal integrator with the linear gain \( K_{VCO} \). The open-loop transfer function \( LG(s) \) of the 3\(^{rd}\)-order CPPLL is therefore equal to

\[
LG(s) = \frac{s + \frac{1}{RC_1}}{2\pi} \frac{K_{VCO} I_{CP}}{C_2 s(s + \frac{C_1 + C_2}{RC_1C_2})} \frac{1}{N} = \frac{K_{VCO} I_{CP}}{2\pi NC_2} \frac{s + \frac{1}{RC_1}}{s^2(s + \frac{C_1 + C_2}{RC_1C_2})} = K_s \frac{s + \omega_z}{s^2 + s^{2} \omega_{p3}}, \tag{4.4}
\]

where \( K_s = \frac{K_{VCO} I_{CP}}{2\pi NC_2} \). The PLL open-loop transfer function has one zero and three poles in which two poles are at the origin. The corresponding closed-loop transfer function \( H(s) \) of the 3\(^{rd}\)-order CPPLL can be expressed as

\[
H(s) = \frac{\phi_{out}(s)}{\phi_{ref}(s)} = \frac{N \cdot LG(s)}{1 + LG(s)} = \frac{K_{VCO} I_{CP}}{2\pi C_2} \frac{s + \frac{1}{RC_1}}{s^{3} + \frac{C_1 + C_2}{RC_1C_2} s^{2} + \frac{K_{VCO} I_{CP}}{2\pi NC_2} s + \frac{K_{VCO} I_{CP}}{2\pi NC_1C_2}} \tag{4.5}
\]

\[= K_s N \frac{s + \omega_z}{s^{3} + s^{2} \omega_{p3} + s K_s + K_s \omega_z}. \]

\( s \)-domain analysis is a fundamental method to know the properties of PLL system in lock-in state, such as a step response to noise interference. However, \( s \)-domain analysis is initially applicable to continuous-time systems. Because the PFD samples and compares the input signals at discrete timing, \( s \)-domain analysis is not sufficient to predict the properties of the CPPLL system. Moreover, Equation 4.5 cannot represent the CPPLL’s nonlinear acquisition behavior.

4.2.2 Impulse Invariance Analysis

To take into account the PFD’s discrete-time sampling nature, \( z \)-domain analysis has been applied for a 2\(^{nd}\)-order CPPLL in [HS88] and for a 3\(^{rd}\)-order CPPLL in [JYM

\(^{+}\)07]. Impulse invariant transformation is one common method to map the transfer function from \( s \)-domain to \( z \)-domain. For the blocks CP, LF and VCO, the \( z \)-domain descriptions can be directly transferred from their corresponding \( s \)-domain descriptions respectively, where \( s = j\omega \) is substituted by \( z = e^{j\omega T} \) and \( T \) is the period of the reference signals. According to the clock diagram shown
in Fig. 4.3(b), the output signal of PFD, i.e. phase error $\Delta \phi$, is a discrete-time event sequence, whose pulse width is approximately equal to the time difference between the rising edges of $ref$ and $fb$. Therefore, the $\Delta \phi$ can be accurately modeled as weighted impulse to drive the subsequent charge pump, whose $z$-domain description is $\frac{\alpha T_{\pi}}{s}$, as shown in Fig. 4.8. The value of $\alpha$ can be calculated by the latter Equation 4.8, i.e. $\alpha = e^{-\omega_{p3} T}$.

![Figure 4.8: Discrete model of a CPPLL](image)

The detailed transfer steps are described as follows. When input of the continues-time system $LG(s)$ in Equation 4.4 is a discrete-time event sequence, the corresponding impulse response $LG(t)$ can be calculated by the Inverse Laplace Transform:

$$LG(t) = \frac{K_s}{\omega_{p3}^2} [\omega_{p3} \omega_{c} t - (\omega_{p3} - \omega_{c})e^{-\omega_{p3} T} + (\omega_{p3} - \omega_{c})].$$  \hspace{1cm} (4.6)

By $n$-number sampling in $T$ interval time, the sampled impulse response $LG(nT)$ is

$$LG(nT) = \frac{K_s}{\omega_{p3}^2} [\omega_{p3} \omega_{c} (nT) - (\omega_{p3} - \omega_{c})e^{-\omega_{p3} nT} + (\omega_{p3} - \omega_{c})].$$  \hspace{1cm} (4.7)

By applying $z$ transformation on $LG(nT)$, the discrete $z$-domain transfer function for the open-loop equation of the 3rd-order CPPLL can be derived as

$$LG(z) = \frac{K_{VCO} I_{CP} R C_1}{N \omega_{ref}(C_1 + C_2)} \frac{z^2 [\frac{C_1}{C_1 + C_2} (1 - e^{-\omega_{p3} T}) + \omega_{c} T z^{-1}]}{z^3 - z^2 (2 + e^{-\omega_{p3} T}) + z(1 + 2e^{-\omega_{p3} T}) - e^{-\omega_{p3} T}}$$

$$= \frac{K_z z^2 [\frac{C_1}{C_1 + C_2} (1 - \alpha) + \omega_{c} T] - z^2 [\frac{K_z}{K_c + 1} (1 - \alpha) + \omega_{c} T \alpha]}{z^3 - z^2 (2 + \alpha) + z(1 + 2\alpha) - \alpha}.$$  \hspace{1cm} (4.8)

where $\omega_{ref} = 2\pi/T$, $K_z = \frac{K_{VCO} I_{CP} R C_1}{N \omega_{ref}(C_1 + C_2)}$ and $K_c = \frac{C_1}{C_2}$. The closed loop transfer function $H(z)$ is equal to

$$H(z) = \frac{\phi_{out}(z)}{\phi_{ref}(z)} = \frac{N \cdot LG(z)}{1 + LG(z)}$$

$$= N K_z \frac{z^2 \frac{K_z}{K_c + 1} (1 - \alpha) + \omega_{c} T] - z^2 [\frac{K_z}{K_c + 1} (1 - \alpha) + \omega_{c} T \alpha]}{z^3 + z^2 [K_z (\frac{K_z}{K_c + 1} + \omega_{c} T) - \alpha - 2] + z(2\alpha + 1 - K_z (\frac{K_z}{K_c + 1} + \omega_{c} T)] - \alpha}.$$  \hspace{1cm} (4.9)
4.2 Analysis Methods on PLL System

4.2.3 State Space Analysis

Compared to $s$-domain and $z$-domain analysis in the last two sections, state space analysis in [HBMM04] represents the PLL in time-domain by using the difference equations for signal phases and state-space variables for the node voltages. The difference equation describes the relationship between the input phase $\phi_{\text{ref}}(t)$ and the output phase $\phi_{\text{out}}(t)$, which is given by

$$\Delta \phi = \phi_{\text{ref}} - \phi_{\text{out}},$$  \hspace{1cm} (4.10)

where

$$\phi_{\text{ref}}(t) = \phi_{\text{ref}}(0) + \omega_{\text{ref}}t$$  \hspace{1cm} (4.11)

$$\phi_{\text{out}}(t) = \phi_{\text{out}}(0) + \omega_{\text{fr}}t + K_{\text{VCO}} \int_0^t v_{\text{ctrl}}(\tau) d\tau.$$  \hspace{1cm} (4.12)

$\phi_{\text{ref}}(0)$ is the initial input phase condition and $\omega_{\text{ref}}$ is the input reference frequency. $\phi_{\text{ref}}(out)$ is the initial output phase condition and $\omega_{\text{fr}}$ is the free oscillation frequency of VCO. $K_{\text{VCO}}$ is the gain of VCO, as expressed in Equation 4.2.

If the feedback signal is behind the reference signal, the PFD generates a positive phase error, i.e. $\Delta \phi > 0$, the capacitor in loop filter is consequently charged. When the feedback signal leads ahead of the reference signal, the PFD generates a negative phase error, i.e. $\Delta \phi < 0$, the capacitor is discharged. The charging/discharging current can be represented by

$$i_p = \begin{cases} 
\pm I_{CP}, & \text{if } 0 \leq t \leq t_p \\
0, & \text{if } t_p \leq t \leq T_-
\end{cases}.$$  \hspace{1cm} (4.13)

where $t_p$ means the duration of the charging or the discharging period, which is calculated by

$$t_p = \frac{|\Delta \phi|}{\omega} = \frac{|\Delta \phi| T_-}{2\pi}.$$  \hspace{1cm} (4.14)

The PFD’s output will update per $T_-$ time period. The value of $T_-$ varies with the PLL’s locking process, principally equal to the time interval between two sequential rising edges either of the reference or of the feedback signal. Fig. 4.9(a) shows the definitions of the variables $t_p$ and $T_-$. By charging or discharging process, the output voltage of the loop filter goes up or down consequently. These voltages in the RC network can be accurately described by the following differential equations:

$$\frac{dV_c}{dt} = \frac{V_{\text{ctrl}} - V_c}{RC_1}$$  \hspace{1cm} (4.15)

$$\frac{dV_{\text{ctrl}}}{dt} = \frac{V_c - V_{\text{ctrl}}}{RC_2} + \frac{i_p}{C_2}.$$  \hspace{1cm} (4.16)

where $V_{\text{ctrl}}$ and $V_c$ are the two state-space variables. $V_{\text{ctrl}}$ is the control voltage for VCO, i.e. the output voltage of the loop filter. $V_c$ is the voltage on the capacitor $C_1$, as shown in Fig. 4.9(b). Using from Equation 4.11 to 4.16, an accurate behavior of CPPLLs can be interpreted for the nonlinear acquisition process and the linear lock-in process as well. More details can be found
in [HBMM04]. Through approximation, linearization and calculation of these above equations, the closed-loop transfer function of the 3rd-order CPPLL in lock-in state can be derived as

\[
H(z) = NK_c \frac{z^2[K_c (1 - \alpha) + \omega_c T] - z[K_c (1 - \alpha) + \omega_c T \alpha]}{z^3 + z^2[K_c (1 - \alpha) + \omega_c T] - \alpha - 2 + z(2\alpha + 1 - k_c \frac{K_c (1 - \alpha)}{K_c + 1} + \omega_c T) - \alpha}.
\] (4.17)

It is worth noting that the closed-loop transfer function \(H(z)\) in Equation 4.17 calculated by the state space analysis is exactly identical to the closed-loop transfer function \(H(z)\) in Equation 4.9 calculated by the impulse invariant transformation for the 3rd-order CPPLL.

### 4.3 Performances of PLLs

The design of PLLs has to be tailor-made to apply for various kinds of scenarios. For example, PLLs in clock generation should be insensitive to noise sources and provide a clean clock signal, otherwise a clock signal with large jitter may result in bit errors. PLLs in frequency synthesizer should lock the desired frequency quickly, otherwise a slow locking process might make PLL fail to lock the frequency. In this thesis, two typical important performances of PLLs, i.e. locking time and phase noise (jitter), are the main objects to be optimized while considering the stabilities of PLLs in lock-in state and in nonlinear acquisition process as well.

#### 4.3.1 Locking Time

When a PLL begins to oscillate at an unlocked frequency which is very different of the desired frequency, and then always approaches till a predefined deviation with the desired frequency, e.g. \(\delta = 2.5\%\) in this thesis. The corresponding time-cost to lock the frequency is defined as the locking time \(T_s\), as shown in Fig. 4.10.

To save power consumption in a microprocessor, some circuits are often switched to power down mode when they are idle. PLLs are usually in charge of synchronizing the external and
4.3 Performances of PLLs

Figure 4.10: Locking time definition

the internal clock signals. The locking time performance of PLLs determines how often the circuit can be shut down, and how long the other part of the circuits in the system have to wait after the PLL is turned on again.

4.3.2 Phase Noise & Jitter

Phase noise and jitter are different ways to characterize a same nature phenomenon. When viewed in frequency domain, an ideal clock signal appears as a peak at one frequency, while a real clock signal with noise appears as “skirt” around the peak shown in Fig. 4.11, denoting the frequency fluctuations. This is referred to as phase noise. When viewed in time domain, phase noise appears as “fuzz” on rising and falling transient edges shown in Fig. 4.13, implying that the signal period is not kept constant. The period variation is referred to as jitter. Here, some types of phase noise and jitter and their correlation are presented in this thesis. More theory analysis and mathematic modeling can be found in [Raz96, DMR00, HL98, HLL99, LH00, Meh02].

4.3.2.1 Phase Noise

An ideal sine-wave signal can be described by

$$x(t) = A \cos(2\pi f_0 t + \phi),$$  \hspace{1cm} (4.18)

where \(A\) is the nominal amplitude the signal, \(f_0\) is the nominal frequency of the oscillation, and \(\phi\) is an arbitrary fixed phase reference. The spectrum of the ideal sine-wave signal without any fluctuation concentrates at the frequency \(f = f_0\), as shown in Fig. 4.11*. However such an ideal signal does not exist in the real world. A practical output signal is normally given by

$$x(t) = A(t) \cos(2\pi f_0 t + \phi(t)), \hspace{1cm} (4.19)$$

* Only one single side, i.e. \(0 \leq f < \infty\), is presented here.
where $A(t)$ and $\phi(t)$ represent the amplitude and the phase fluctuations of the signal respectively. Phase noise describes phase fluctuations of the signal due to random frequency fluctuations, its spectrum exhibits noise “skirt” the nominal frequency (also called carrier frequency). Fig. 4.11 shows the spectrum of the ideal signal and the spectrum of the practical signal with phase fluctuations.

$$L(\Delta f) = 10\log_{10}\left[\frac{P_{\text{sideband}}(f_0 + \Delta f, \text{1Hz})}{P_{\text{carrier}}}\right],$$  \hspace{1cm} (4.20)

where $P_{\text{sideband}}(f_0 + \Delta f, \text{1Hz})$ and $P_{\text{carrier}}$ represent the single side-band powers of the offset frequency and of the carrier frequency respectively. In Fig. 4.11, $P_{\text{sideband}}(f_0 + \Delta f, \text{1Hz})$ can be represented by the area of the rectangle with 1Hz bandwidth at offset $\Delta f$, and $P_{\text{carrier}}$ can be represented by the total area under the power spectrum curve. Phase noise can be approximately calculated by the height difference of the spectrum power at $f_0$ and at $f_0 + \Delta f$.

When power spectrum in Fig. 4.11 is specified in dBc/Hz at a given offset, the phase spectrum can be depicted in Fig. 4.12. A single-sideband phase noise spectrum falls at different rates caused by different dominant noise sources. The spectrum can be divided into three regions: a $1/\Delta f^3$ region, a $1/\Delta f^2$ region and a flat region ($1/\Delta f^0$) [HL98, Abi97]. In the frequency range with very small offset frequency, the flicker noise of devices generally dominates and the spectrum in this region falls at $1/\Delta f^3$. In the “white frequency” variation region, white or uncorrelated fluctuations dominate, e.g. thermal noise of devices, and the spectrum falls at $1/\Delta f^2$. External noise sources dominate in the third region.
4.3 Performances of PLLs

4.3.2.2 Jitter

Designers prefer to characterizing the signal properties in time domain rather than in phase spectrum for applications on clock generation or recovery. As shown in Fig. 4.13(a), suppose \( \{t_n\} \) is an ideal square-wave clock with nominal period \( T \). The phase fluctuation results in the unstable zero-crossing or transition time on \( \{t_n^*\} \). Jitter is defined as the time deviation between the transition events (rising/falling edges) in \( \{t_n^*\} \) and the corresponding transition events in \( \{t_n\} \). The discrete jitter values are shown in Fig. 4.13(b).

**Figure 4.13:** Jitter definition

**Jitter Types** Based on the involved noise sources, jitter can be classified into two forms: *deterministic jitter* and *random jitter*. Deterministic jitter is caused by effects in a predictable manner, such as channel bandwidth limitation, cross talk, duty cycle distortion, supply noise and etc. Random jitter is caused by stochastic processes in an unpredictable manner, such as

\[ dBc \text{ is the decibels relative to the carrier.} \]
Hierarchical Optimization of Charge-Pump Phase-Locked Loops

thermal and flicker noises. Deterministic jitter is usually bounded and has own maximal value, while random jitter is usually assumed to be Gaussian (normal) distribution. The total jitter exhibits usually the convolution of the deterministic jitter and the random jitter together.

Based on the jitter behavior in PLLs, jitter can be classified into another two forms: *synchronous jitter* and *accumulating jitter* [Kun05]. Synchronous jitter is exhibited mainly in driven blocks such as PFD, CP and Divider. In these blocks, the frequency of the output signal is exactly same as the input frequency, and the phase of the output signal fluctuates directly with respect to any fluctuation of the input phase. Accumulating jitter is exhibited in autonomous blocks such as VCO. In these blocks, a fluctuation at the output is not a direct result of one fluctuation event at the input signal, but rather an accumulation result of all previous input fluctuations.

### Table 4.1: Jitter metrics based on [Kun05]

<table>
<thead>
<tr>
<th>Jitter Metrics</th>
<th>Time Diagram</th>
<th>Mathematical Calculation</th>
<th>Remark</th>
</tr>
</thead>
</table>
| edge-to-edge jitter | ideal clock, driven clock | $J_{ee} = \sqrt{\text{var}(\delta t_n)}$ | · a scalar jitter metric  
· no information of correlation |
| long-term jitter | $T_n$, $T_{n+k}$ | $J_k = \sqrt{\text{var}(T_{n+k} - T_n)}$ | · in unit of time  
· information of correlation on $k$ adjacent/distant transitions |
| cycle-to-cycle jitter | $T_n$, $T_{n+1}$ | $J_{cc} = \sqrt{\text{var}(T_{n+1} - T_n)}$ | · a scalar jitter metric  
· only information of correlation on the adjacent transitions |

**Jitter Metrics** Jitter can be measured and evaluated in various ways. Here the three metrics defined in [Kun05] are adopted here and illustrated in Tab. 4.1. In PLLs, edge-to-edge jitter is only defined for the driven blocks, i.e. PFD, CP and Divider, while the remaining jitter metrics are suitable for both driven and autonomous blocks.

### 4.3.2.3 Extracting Jitter from Phase Noise

As circuit speed increases, the requirements on jitter-measuring equipment become more critical. It is easier to characterize signal by measuring its phase noise in frequency domain rather than by measuring its jitter in time domain\(^1\). As mentioned before, phase noise and jitter characterize the same phenomenon. Phase noise can be interpreted as a measurement of jitter at a specified offset frequency away from the carrier. In other words, the jitter value can be derived from a phase noise measurement.

\(^1\) "For example, most jitter measuring oscilloscopes are only capable of measuring jitters down to 1psRMS. Most modern real-time oscilloscopes only have bandwidths up to 7GHz. Phase-noise equipment, on the other hand, can measure noise levels of the best low-noise oscillators available (much less than 1ps in time domain) and offer bandwidths of up to 40GHz". [jit]
4.3 Performances of PLLs

The translation between phase noise and timing jitter has been explored in [HLL99, Dra01]. As shown in Appendix C.1, the relationship between the time jitter $\sigma_{\Delta T}$ and noise power spectral density (psd), $S(p)$ is

$$\sigma_{\Delta T}^2 = \frac{8}{w_0^2} \int_0^{\infty} S(p) \sin^2(\pi f \Delta T) df.$$  (4.21)

In Appendix C.2, the jitter value of a PFD/CP can be extracted from the phase noise simulation on the PFD/CP blocks:

$$J_{\text{PFD/CP}} = \frac{T}{K_{\text{det}}} \sqrt{\frac{\text{var}(\delta n)}{2}},$$  (4.22)

where $K_{\text{det}}$ is the gain of the PFD/CP, normally equal to $I_{\text{CP}}/(2\pi)$, and $\text{var}(\delta n)$ is equal to

$$\text{var}(\delta n) = \int_0^{\infty} S_n(f) df,$$  (4.23)

where $S_n(f)$ is the power spectral density of the $\delta t_n$ sequence. The jitter value of a VCO can be extracted from the phase noise simulation on the VCO block:

$$J_{\text{VCO}} = \frac{\Delta f}{f_{1.5}} 10^{\frac{\text{var}(\delta t)}{20}}.$$  (4.24)

Note the equation is only valid when the phase noise at $\Delta f$, which is in the $1/f^2$ region.

4.3.3 Stability of PLLs

In feedback control systems, the stability of system is always the top topic for designers. When PLLs work in lock-in state, they can be assumed as continuous-time system. Based on s-domain analysis, phase margin (PM) is normally taken as the stability criterion. Phase margin (PM) and unity gain bandwidth ($\omega_{\text{UGB}}$) can be calculated according to Equation 4.4. The corresponding bode diagram for the open-loop transfer function is shown in Fig. 4.14. The phase margin of the loop is expressed as

$$\text{PM} = \tan^{-1} \left( \frac{\omega_{\text{UGB}}}{\omega_z} \right) - \tan^{-1} \left( \frac{\omega_{\text{UGB}}}{\omega_{p3}} \right),$$  (4.25)

where $\omega_{\text{UGB}}$ is the open-loop unity-gain bandwidth, the zero frequency is $\omega_z = \frac{1}{RC_1}$ and the third pole frequency $\omega_{p3} = \frac{1}{R(C_1||C_2)}$. The phase margin at $\omega_{\text{UGB}}$ benefits from the zero $\omega_z$, but is degraded by the pole $\omega_{p3}$. Generally, phase margin should be not less than 45° [GM92] to guarantee a system stability. An proper setup of the capacitor ratio $C_1/C_2$ can provide a sufficient phase margin. Moreover, the phase margin is immune to the variation of the absolute capacitance values. $C_2$ is usually chosen to be about $\frac{1}{10}C_1$.

When PLLs work in nonlinear acquisition process, it could happen in the unstable state although its phase margin is 52° (> 45°), as shown in Fig. 4.15(a). This instability could be explained by the fact that the phase margin in z-domain analysis is only 40.2°. Based on the s-domain transfer function $H(s)$ in Equation 4.5 and the z-domain transfer function $H(z)$ in Equation 4.9, the corresponding bode diagrams of the closed-loop transfer functions for the 3rd-order CPPLL are depicted respectively in Fig. 4.15(b).
When the PFD update frequency (input frequency) is comparable to the unity loop bandwidth $\omega_{UGB}$ of the PLL, the excessive phase shift introduced by the feedback delay has to be considered. Besides, the discrete sampling nature of PFD cannot be ignored either. \textit{s}-domain analysis is not capable of accurate predicting the PLL stability any more, \textit{z}-domain analysis and the state-space analysis are introduced to tackle such instability problem. This instability scenario can also be explained by the positions of zeros and poles in root locus plot. Although all zeros and poles of the PLL closed-loop transfer function are in the left part of the \textit{s}-domain root locus, as long as one pole is out side of the unit circle in the \textit{z}-domain root locus, the PLL cannot be stable. More details can be found in [HBMM04, JYM+07]. Therefore, a stability constraint for the nonlinear acquisition process is defined here: the ratio between the reference signal frequency to the unity-gain-bandwidth (RUR) expressed as

$$\text{RUR} = \frac{\omega_{\text{ref}}}{\omega_{\text{UGB}}}, \quad (4.26)$$

where $\omega_*$ is the corresponding angle frequency of $f_*$, where $\omega_* = 2\pi f_*$. To be simple, both are called frequency throughout this thesis. Based on the results of [JYM+07], the \textit{s}-domain transfer function matches very well with their corresponding \textit{z}-domain transfer function when the RUR is larger than 20.

In summary, the stability of the CPPLL is dependent on the phase margin, the unity-gain-bandwidth and the reference frequency.
4.3 Performances of PLLs

(a) CPPLL instability although phase margin $> 45^\circ$ in $s$-domain analysis

(b) Comparison of $s$-domain and $z$-domain analysis in terms of Bode diagrams

**Figure 4.15:** CPPLL instability observed in $z$-domain analysis
4 Hierarchical Optimization of Charge-Pump Phase-Locked Loops

4.3.4 Design Trade-offs

The loop bandwidth is a key design factor for PLL performances, which determines locking time, jitter and stability of PLLs. Essentially, the loop bandwidth \( \omega_{BW} \), i.e. the -3dB bandwidth for the PLL closed-loop transfer function is same as the unity-gain-bandwidth (0dB bandwidth) for the PLL open-loop transfer function, \( \omega_{UGB} \). Therefore, the loop bandwidth \( \omega_{BW} \) can be computed by Equation 4.4.

Although the nonlinear acquisition process cannot be accurately predicted through \( s \)-domain analysis, the locking time is inversely proportional to the loop bandwidth of the PLL according to [Bes03, HBMM04]. With a larger loop bandwidth, the PLL can lock more quickly the expected output frequency and vice versa.

Noise in each building block of a PLL contributes to the total output noise, which is characterized in terms of phase noise in phase domain or jitter in time domain. Assuming that all the noise sources are not correlated and they are placed at the corresponding input node in Fig. 4.16 based on \( s \)-domain analysis, the overall noise power of the PLL can be computed by

\[
N_{total}^2 = N_{ref@out}^2 + N_{PFD/CP@out}^2 + N_{LF@out}^2 + N_{VCO@out}^2 + N_{D@out}^2,
\]

where \( N_{total}^2 \) is the total output noise power and the \( N_{***@out}^2 \) presents the noise power due to noise source \( N_{***} \). It is easy to know that the noise transfer function (NTF) of \( N_{ref} \) and \( N_{D} \) is same as the closed transfer function of the PLL in Equation 4.5. Hence both NTFs have also a low-pass filter characteristic. The transfer function is

\[
\frac{N_{out}}{N_{ref&D}} = \frac{K_{VCO}I_{CP}}{2\pi C_2} \frac{s + \frac{1}{RC_1}}{s^3 + \left( \frac{C_1+C_2}{RC_1C_2} \right)s^2 + \frac{K_{VCO}I_{CP}}{2\pi \tau C_2}s + \frac{K_{VCO}I_{CP}}{2\pi \tau C_2}}.
\]

If we move the noise from PFD/CP (\( N_{PFD/CP} \)) to the input of the PFD, the transfer function of \( N_{PFD/CP} \) to the output noise should be same as the transfer function of \( N_{ref} \) and \( N_{D} \). So we can

![Figure 4.16: Linear model of a CPPLL with noise sources](image)
simply obtain the actual NTF for PFD/CP noise through Equation 4.28 divided by the gain of PFD/CP, i.e.

\[
\frac{N_{\text{out}}}{N_{\text{PFD/CP}}} = \frac{K_{\text{VCO}}}{C_2} \left( \frac{1}{s} + \frac{s + \frac{1}{RC_1}}{s^2 + \left( \frac{C_1 + C_2}{RC_1 C_2} \right) s^2 + \frac{K_{\text{VCO}} \cdot \text{CP}}{2\pi NC_2} s + \frac{K_{\text{VCO}} \cdot \text{CP}}{2\pi NRC_1 C_2} } \right) \quad (4.29)
\]

Therefore, Equation 4.29 shows a low-pass characteristic as well. Similarly, the NTFs for LF and VCO are calculated respectively and expressed as

\[
\frac{N_{\text{out}}}{N_{\text{LF}}} = \frac{K_{\text{VCO}}}{s^3 + \left( \frac{C_1 + C_2}{RC_1 C_2} \right) s^2 + \frac{K_{\text{VCO}} \cdot \text{CP}}{2\pi NC_2} s + \frac{K_{\text{VCO}} \cdot \text{CP}}{2\pi NRC_1 C_2} } \quad (4.30)
\]

\[
\frac{N_{\text{out}}}{N_{\text{VCO}}} = \frac{s^3 + \left( \frac{C_1 + C_2}{RC_1 C_2} \right) s^2 + \frac{K_{\text{VCO}} \cdot \text{CP}}{2\pi NC_2} s + \frac{K_{\text{VCO}} \cdot \text{CP}}{2\pi NRC_1 C_2} } \quad (4.31)
\]

The NTF for LF in Equation 4.30 shows a bandpass filter characteristic, while the NTF for VCO in Equation 4.31 has a high-pass characteristic. As seen from the above NTFs, the overall loop transfer function are different filter types for each noise source in PLL, as shown in Fig. 4.17. In order to reduce the output noise due to the input noise from reference signal, divider and PFD/CP, the loop bandwidth of the PLL should be as small as possible. When the noise from the VCO dominates, the loop bandwidth should be larger. Apparently, a compromising selection of the loop bandwidth has to be made to optimize the overall PLL noise performance.

The respective NTF in z-domain for each noise source can be calculated based on the \( H(z) \) in Equation 4.9. Although z-domain analysis can provide more accurate description on PLL properties, NTFs in s-domain are usually used to predict the noise transfer behavior of PLLs, since noise performance in PLL’s lock-in state is mostly considered.

Moreover, the loop bandwidth is also involved in the stability of PLLs, as discussed in Sec. 4.3.3. Since a PLL is actually a discrete-time circuit due to the sample nature of the PFD, a too large loop bandwidth may make the PLL unstable during the nonlinear acquisition process. In summary, the impacts of the loop bandwidth on locking time, noise performance and system stability of PLLs are listed in Tab. 4.2.

### Table 4.2: Trade-offs in CPPLL

<table>
<thead>
<tr>
<th>Loop Bandwidth</th>
<th>Locking Time</th>
<th>Suppression Noise from Input, Divider &amp; PFD/CP</th>
<th>Suppression Noise from VCO</th>
<th>Loop Stability</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>+</td>
<td>-</td>
<td>+</td>
<td>-</td>
</tr>
</tbody>
</table>

Note: + improving, - deteriorating
4.4 Example: Hierarchical Optimization of a CPPLL

The trade-offs of PLLs design listed in Tab. 4.2 show us how difficult it is to optimize PLLs by hand-design. Moreover, the PLL performance evaluation by circuit-level simulation is very time consuming. For example, a transient simulation for locking time performance requires about 1.5 hours on this CPPLL experimental circuit. Suppose an optimization process needs 100 simulations, it would last 6.25 days only for optimization on the locking time. To optimize jitter performance, another long transient simulation needs to conduct in order to get sufficient samples of the output periods. Then jitter value can be estimated based on the standard deviation of these period-samples. Hence, the simulation for jitter lasts even longer. In consequence, the “simulation-in-a-loop” based optimization method cannot directly apply to the CPPLL. Here, the hierarchical optimization methodology proposed in Chapter 3 will be applied to the CPPLL, and the whole design process can be accomplished in a reasonable time cost.

The 3rd-order CPPLL architecture is already shown in Fig. 4.2. Since the PFD and the D building blocks are pure digital circuits, they will not be optimized in this thesis. The values of the resistance and the capacitances in the LF will be calculated, while their circuit-level realization will not be discussed here. The focus is on two important analog building blocks: i.e. the CP and the VCO.

The topology of the CP is an outside-biased current-steering configuration, as shown in Fig. 4.18. The reference current $I_{bias}$ is mirrored by the current mirror (P1&P2) into the charging path (P2&P4) or by the current mirror (N0&N2) into the discharging path (N2&N4). When signal $up$ ($dn$) is active, the source current flows into (out of) the loop filter, so that the output

![Figure 4.17: Loop transfer function from each noise source to PLL’s output](image_url)
4.4 Example: Hierarchical Optimization of a CPPLL

voltage of the loop filter rises (drops), which forces a higher (lower) oscillation frequency consequently. Note that signals \( up \) and \( dn \) cannot be active at the same time, which means the LF cannot be charged and discharged at the same time. Signals \( up \) and \( \overline{dn} \) are the complement of the signals \( up \) and \( dn \) respectively. The dummy switches (N5&P6) are intended to avoid any voltage peak at the output node.

The VCO is a five-stage single-end ring oscillator, as shown in Fig. 4.19. The input voltage \( V_{ctrl} \) controls the current through the delay elements, thus determines the delay time of each stage and in turn determines the output oscillation frequency. The upper and lower transistors, e.g. P1&N1, act as the voltage control current sources (VCCS), and the middle transistors, e.g. P6&N6, act as the delay elements. The current flowing through these delay elements is generated by these current mirror structures, e.g. (N0&N1 and P0&P2).

4.4.1 CPPLL Hierarchical Modeling

In this experiment, the CPPLL is realized in a 180nm technology with a supply voltage VDD of 3V, using a reference signal \( f_{in} \) of 25MHz to generate the output frequency ranging from \( f_{min} (=150MHz) \) to \( f_{max} (=500 \text{ MHz}) \). In the hierarchical modeling of the CPPLL, system performances are modeled in dependence of system-level parameters from each building block, while these system-level parameters can be determined by simulation of the blocks on circuit level. Fig. 4.20 gives an overview of the hierarchical performance modeling of the CPPLL.
4.4.1.1 CPPLL on System Level

**System-level performances**  Six important performances of the CPPLL system are considered: locking time, jitter, power consumption, unity-gain bandwidth, phase margin and output frequency.

The locking time is defined as the time taken by the CPPLL to synchronize with or to lock onto a new frequency. The performance optimization shall be considered at the worst case. Therefore the locking time is defined as the time for the output frequency directly jumping from $f_{\text{min}}$ (150MHz) to $f_{\text{max}}$ (500MHz).

The total output noise of the CPPLL comes from the CP and the VCO in this experiment, while the input signal and other building blocks are considered as noise-free. The CPPLL exhibits various noise transfer characteristics with respect to the different noise sources, as discussed in Sec. 4.3.4. Hence, the design of the loop bandwidth has to be a trade-off for jitter of the VCO and jitter of the CP when minimizing the total output jitter. The jitter performance shall also be optimized at its worst case, i.e. when the CPPLL works at its minimal output frequency $f_{\text{min}}$.

With the predetermined digital building blocks PFD and divider, the power consumption is defined as the sum of the power consumption in the CP and VCO (we consider only analog blocks):\[Power = VDD \cdot 4 \cdot I_{CP} + VDD \cdot I_{VCO}.\] (4.32)

---

§ At the minimal output frequency, the current consumption is minimal (VCO current consumption is minimal and other blocks keep constant) so that the jitter performance is worst, which can also be seen from the Pareto-optimal front of the VCO in Sec. 4.4.3

† The current consumption of LF is not needed to calculate, since the charging or discharging currents in LF are from or to CP.
4.4 Example: Hierarchical Optimization of a CPPLL

We consider the worst-case power consumption that occurs when the CPPLL works at its maximal output frequency \( f_{\text{max}} \).

For the optimization process, locking time, jitter and power consumption are taken as the optimization targets at the system level:

\[
\mathbf{f} = [T_s, J_{\text{sum}}, \text{Power}]^T. \tag{4.33}
\]

The output frequency range of the CPPLL is determined by the VCO, therefore this performance is directly forwarded as a circuit-level performance of the VCO block. Additionally, two constraints concerning stability are considered for system-level optimization:

- The stability criterion RUR for nonlinear acquisition process. We specify \( \text{RUR} \geq 20 \).
- The stability criterion PM for linear lock-in state. We specify \( \text{PM} \geq 45^\circ \)

**system-level parameters** For each building block of the CPPLL, a set of system parameters \( \mathbf{p} \) that are capable of capturing the influence of the blocks on the system performances & trade-offs, are defined:

- CP: outside-biased reference current \( I_{\text{CP}} \), jitter \( J_{\text{CP}} \)
- VCO: gain \( K_{\text{VCO}} \), current consumption \( I_{\text{VCO}} \), jitter \( J_{\text{VCO}} \)
- LF: loop filter elements \( R, C_1, C_2 \)
- D: divider value \( N \parallel \)

### 4.4.1.2 CPPLL on Circuit Level

**Circuit-level performances** On circuit level, the system-level parameters become performances of the building blocks that can be simulated in dependence of circuit-level design parameters. For the CP for instance, the absolute values of the simulated charging and discharging currents \( I_{\text{up}} \) and \( I_{\text{down}} \) are equal to the value of the outside-biased current \( I_{\text{CP}} \) on behavioral level, and the system-level jitter \( J_{\text{CP}} \) becomes a circuit-level performance of the CP. For the VCO, gain, current consumption, jitter and output frequency range are the circuit-level performances corresponding to the system-level parameters.

**Circuit-level parameters** The transistor dimensions, i.e. widths/lengths, are the circuit-level parameters. In the CP in Fig. 4.18, there are two sets of current mirror structures, i.e. \( P1&P2 \) and \( N0&N1&N2 \). The transistor dimensions in a current mirror structure should be identical. In the VCO in Fig. 4.19, there are also two current mirror structures, i.e. \( P0&P1&P2&P3&P4&P5 \) and \( N0&N1&N2&N3&N4&N5 \). The dimensions of other transistors in both circuits are set as same as that of the digital circuit. Since the transistors in VCO block work in large-signal function, the sizing rules in Fig. 4.20 are only related to the geometric rules in Tab. A.2. For the current mirror transistors in CP block, the sizing rules includes not only geometric rules but electrical sizing rules as well.

\[\parallel \text{The divider value is determined by the input and the expected output frequency.}\]
4.4.2 Modeling CPPLL in Verilog-A

Recently, various behavioral models [AETL96, SG99, HKH00, OHSA04, Kun05] have been developed to accelerate the system-level simulation of PLLs. In this thesis, the system-level models are based on [Kun05], which are realized in Verilog-A [Verb]. To achieve efficient modeling and fast extraction, two sets of models are used for locking time and jitter respectively. The first set of models, list D.1-list D.4 in Appendix D, focuses on more accurately representing the nonlinear acquisition locking process, while the second set, list D.5-list D.8, focuses on more efficient modeling of the jitter in order to accelerate jitter simulation.

Models for locking time extraction In order to more accurately model the nonlinear acquisition process, some properties of building blocks are included and presented as follows. For example, the rising/falling time, delay time of the PFD block are firstly extracted from circuit level simulation and then are parameterized as $td_{ul}$ and $td_{dl}$ in the behavioral model.

```verilog
module PFD (ref, feedback, u, ub, db, d);
    input ref, feedback;
    output u, ub, db, d;
    electrical ref, feedback, u, ub, db, d;
    .....  
    parameter real tt=120p from (0:1000000); //rise time and fall time
    // the following delay times are extracted from circuit level simulation
    // for "u" and "ub" signal:
    // ------------ rise delay -------------- fall delay
```
Since the charging/discharging current sources in Fig.4.18 are realized by N/PMOS transistors, these N/PMOS transistors will work out of saturation ranges if the output voltage goes too high or too low. Therefore the generated output current cannot be the same value as the out-sided biased reference current. The two parameters $V_{\text{max}}$ and $V_{\text{min}}$ in the CP model are used to represent the limit of the output voltage. The parameter $M_i$ is used to represent the mismatch current between the charging and the discharging currents.

In the VCO block, according to [LMC01], the linear VCO model predicts more than 90% of the real VCO characteristics, especially in the operating range. In this experimental PLL’s VCO, when the input voltage is in the input range, the VCO output frequency increases with the linear gain $K_{VCO}$. When the input voltage exceeds the maximum input level, the VCO output frequency can still increase, but the gain is set to be $0.25K_{VCO}$ instead of $K_{VCO}$.
parameter real Vmax=2.6 from(Vmin:10e5);
parameter real Kvco=600e6;

analog begin
......
// compute the freq from the input voltage
if (V(V_tune)<=Vmin)
  freq = Fmin;
else if (V(V_tune)<=Vmax)
  freq = (V(V_tune)-Vmin)*Kvco+Fmin;
else
  freq = (V(V_tune)-Vmax)*(0.25*Kvco)+(Vmax-Vmin)*Kvco+Fmin;
......
end
endmodule

Models for jitter extraction  In a transient simulation, step size in simulation setup is a crucial factor for time cost and simulation accuracy. If jitters of CP and of VCO cause two individual events and the two events are displaced at almost same time points but not exactly coincident, then circuit simulators have to spend many more time points (i.e. smaller step size) to resolve the two distinct events, hence the simulation will run much slower. For this reason, it is desirable to reduce the number of jitter events by combining jitter sources compactly. The jitter of CP block, i.e. edge-to-edge $J_{CP}$, can be embedded into the model of the reference input oscillator (OSC) block. The following parameter syncJitter represents the jitter of CP.

module OSC(out);
  output out; electrical out;
  parameter real freq=25e6 from (0:10e9);
  parameter real accJitter=0 from [0:0.1/freq);
  //period jitter from reference OSC
  parameter real syncJitter=0 from [0:0.1*ratio/freq);
  //edge-to-edge jitter from PFD/CP
  .......
end
endmodule

As well known, the higher frequency the circuit runs at, the step size in simulation setup has to be smaller. The output frequency of VCO is $N$ times (divider value) of the internal signal frequency of CPPLL. When divider and VCO are merged together, the simulation can run much faster because the high VCO output frequency is never generated actually. The acceleration of simulation is obvious. If divider is merged into the module of VCO, the period jitter at the VCO output should be $\sqrt{N}$ time larger than the period jitter $J_{VCO}$ [ Kun05]. Since there are two transient processes in one period: i.e. one rising edge and one falling edge, the output jitter of the VCO_Div block is therefore equal to

$$J_{VCO_{-}Div} = \sqrt{2NJ_{VCO}}.$$  \hspace{1cm} (4.34)

The merged behavioral model for VCO with divider is named as VCO_Div. The Verilog-A model is given in the following.
4.4 Example: Hierarchical Optimization of a CPPLL

module VCO_Div(V_tune,VCO_out);
   input V_tune; output VCO_out;
   electrical V_tune, VCO_out;
   ......
   parameter real N=20; //if N!=1, divider moved into the VCO block
   parameter real Jvco=10e-12; //Jitter of VCO
   ......
   analog begin
      @(initial_step)begin
         seed=160;
         Vout=VSS;
         delta=Jvco*sqrt(2*N);
         // calculating the corresponding jitter on the divider output
      end
   ......
endmodule

Based on the above efficient setups in the behavioral models, the system-level simulations for the locking time and the jitter can be finished in some minutes. The reasonable simulation time makes the “simulation-in-a-loop”-based optimization methods feasible for PLL circuits.

4.4.3 Pareto-Optimal Fronts of Building Blocks

The deterministic PSE method mentioned in Chapter 3 is applied to the circuit level of the CP and the VCO respectively. The extracted Pareto fronts presented here are computed through real circuit-level simulation, which is a non-trivial task.

PSE on CP For the CP, small jitter and low power consumption are desired. Fig. 4.21 shows the Pareto-optimal front for the outside-biased current and the output jitter of CP, which represents the lower bound on the combination of both performance values. Right and above the front is the feasible region. The trade-off front indicates that a larger current will result in a smaller jitter.

PSE on VCO Fig. 4.22 shows the feasible ranges bounded by the Pareto fronts for the competing objectives jitter, gain and current of the VCO in 3-D and in the different 2-D projections. We can evaluate e.g. the increasing trend of \( J_{VCO} \) with increasing \( K_{VCO} \) for constant \( I_{VCO} \) from Fig. 4.22(b), or how \( K_{VCO} \) increases at the cost of a larger \( J_{VCO} \) for constant \( I_{VCO} \) from Fig. 4.22(c), or the increasing trend of \( I_{VCO} \) with the decreasing of \( J_{VCO} \) for constant \( K_{VCO} \) from Fig. 4.22(d).

The feasible regions for system-level parameters, i.e. \( I_{CP}, J_{CP}, I_{VCO}, K_{VCO} \) and \( J_{VCO} \), can be extracted from the Pareto-optimal fronts in Fig. 4.21&4.22.

4.4.4 Hierarchical Optimization

Through restricting the system-level parameters on their own Pareto-optimal fronts, an optimal performance of the whole circuit can be obtained [EMG05]. In this experiment, the independent
system-level parameter set is selected as \( p = [I_{CP}, I_{VCO}, K_{VCO}, R, C_1, C_2] \). The parameter \( J_{CP} \) is dependent on the parameter \( I_{CP} \) and \( J_{VCO} \) is dependence on the parameters \( I_{VCO} \) and \( K_{VCO} \), as expressed in Equation 4.35. There are two ways to transfer the jitter values into the behavioral models. One way is to directly embed the fitting function into the models given in List D.5 and List D.7. The other way is to calculate the jitter values through a special script file firstly; and then to transfer these values to the jitter parameters of the models by means of circuit netlist mapping:

\[
\begin{align*}
J_{CP} &= \text{fitting\_function\_1}(I_{CP}) \\
J_{VCO} &= \text{fitting\_function\_2}(I_{VCO}, K_{VCO}).
\end{align*}
\]

### 4.4.4.1 File system in WiCkeD

In the thesis, the optimization process is realized in WiCkeD [Wic]. The file system for the CPPLL optimization is briefly listed in Fig. 4.23, and the setup for other optimization tasks in other parts of the thesis is similar to this. The file system can be classified into four levels. At the first level, “CPPLL.def” is the main file for optimization setup, including the following functions:

- The parameters are defined, i.e. design parameters, statistical parameters and operation parameters. And their corresponding upper and lower boundary are also given.
- The circuit performances and the necessary optimization constraints are defined. And their corresponding simulations are set, e.g. "LockingTime.rs" for locking time performance, "Jitter.rs" for jitter performance.
4.4 Example: Hierarchical Optimization of a CPPLL

Figure 4.22: Pareto-optimal front of the VCO
The simulation tasks are distributed to many machines simultaneously. The performances and constraints can be extracted in parallel.

At the second level, “***.rs” is the setup file for the performances and constraints, which controls the simulation and extraction files at the third level. For example, “Jitter.rs” controls two files: “pll_jit.scs” is the Spectre-format netlist of the transient simulation for jitter performance, and “pll_jit.ocn” is in charge of the jitter extraction from primal simulation data bank. The behavior-level modules list D.5-list D.8 are included at the fourth level.

In the optimization of the CPPLL, one transient simulation with Spectre is used to obtain the locking time $T_s$, while another transient simulation is for the jitter $J_{SUM}$. Based on s-domain analysis of the PLL, the linear lock-in state properties, e.g. unity-gain-bandwidth and phase margin, are calculated. A third simulation with octave [Oct] is used for the system-level constraints: the ratio RUR, the PM and the feasible ranges of the system-level parameters.

### 4.4.4.2 Hierarchical Optimization Results

Tab. 4.3 shows the results of the two optimization steps (top-down automatic sizing and bottom-up verification) in hierarchy. Starting from the system specifications in column 3 and the initial design in column 4, system-level optimization leads to the optimized system performances in the upper half of column 5 and the system-level parameters in the lower half of column 5. The latter, propagated as specifications to the circuit-level optimization, results in the final $P_{opt}$.
4.5 Pareto-Optimal Front Computation (POFC) of a whole CPPLL

which are listed in the lower half of the last column and depicted in dots in Fig. 4.21&4.22, respectively. A system simulation with the behavioral models calibrated according to the circuit-level design parameters yields the system performances in the upper half of column 6. We can see that by considering the technological feasibility of building blocks a first-time-successful hierarchical optimization has been achieved for a comprehensive set of design aspects. The optimization on system level and circuit level requires about 1.5 hours respectively, which results in a total of 3 hours.

4.5 Pareto-Optimal Front Computation (POFC) of a whole CPPLL

Compared to only one optimization design point in the last experiment example, a comprehensive optimization result, i.e. Pareto-optimal front of the system performance, will be computed in this experiment example. PSE method is applied not only to the building blocks but to the whole CPPLL system as well. The trade-offs in the performance of building blocks, e.g. gain, jitter and power in VCO, and the performance at system level, e.g. bandwidth, locking time and jitter, will be represented as Pareto-optimal fronts. These fronts offer designers a fast way to get insight into the capability of the whole system for a given technology.

The topologies of the building blocks in this experiment are exactly same as those of the last experiment. The technology is 130nm and the normal supply voltage VDD is 1.8 volt. The reference input signal is \( f_{in} = 25 \text{MHz} \). The CPPLL is supposed to generate the output frequency \( f_{out} = 500 \text{MHz} \).

4.5.1 POFC of the CP Block

For the CP block, the feasible range of the out-biased current is set from 5\( \mu \text{A} \) to 65\( \mu \text{A} \). The Pareto-optimal front in Fig. 4.24 represents the trade-off relationship between the two performances, that the output jitter of CP decreases with the increasing current. The front shows the lower bound for both performances, right and above the front is the feasible region.

4.5.2 POFC of the VCO Block

For POFC on the VCO block, two constraints are set:

1. the maximal output frequency is larger than the expected output frequency, so that the VCO is capable of providing the 500MHz output frequency;
2. the nonlinearity limit of the gain is set to 2.5 to guarantee the acceptable linearity of VCO, i.e.

\[
\frac{f_{mid} - f_{min}}{f_{max} - f_{mid}} \leq 2.5, \quad (4.36)
\]

where \( f_{mid} \) is the corresponding output frequency of the input voltage \( V_{mid} = (V_{max} - V_{min})/2 \).
## Table 4.3: Hierarchical Optimization Results of the CPPLL

<table>
<thead>
<tr>
<th>System</th>
<th>Performances</th>
<th>Specifications</th>
<th>HPBlocks</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>CP PLL</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>VCO</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Resistor (KΩ)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Current VCO (µA)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Gain VCO (MHz/V)</td>
</tr>
<tr>
<td>Locking Time (µs)</td>
<td>≤ 2.5</td>
<td>2.85</td>
<td>2.13</td>
</tr>
<tr>
<td>Jitter (ps)</td>
<td>≤ 2.5</td>
<td>20.8</td>
<td>2.25</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>≤ 2.5</td>
<td>3.02</td>
<td>2.42</td>
</tr>
<tr>
<td>CP PLL Ratio RUR</td>
<td>≥ 20</td>
<td>30.6@ fmin</td>
<td>28.5@ fmin</td>
</tr>
<tr>
<td>CP PLL Phase Margin PM(°)</td>
<td>≥ 45</td>
<td>43.6@ fmin</td>
<td>43.1@ fmin</td>
</tr>
<tr>
<td>CP PLL CP Jitter (ps)</td>
<td>55.2</td>
<td>13.34</td>
<td>14.4</td>
</tr>
<tr>
<td>CP PLL VCO Gain (MHz/V)</td>
<td>270.3</td>
<td>347.1</td>
<td>348.4</td>
</tr>
<tr>
<td>CP PLL VCO Current (µA)</td>
<td>805.9</td>
<td>619.2</td>
<td>619</td>
</tr>
<tr>
<td>CP PLL VCO Jitter (ps)</td>
<td>13.7</td>
<td>1.88</td>
<td>1.876</td>
</tr>
<tr>
<td>%up=46.3; %down=-46.5 CP PLL Resistor (KΩ)</td>
<td>20</td>
<td>16.98</td>
<td>16.98</td>
</tr>
<tr>
<td>LF Capacitor C (pF)</td>
<td>50</td>
<td>47.3</td>
<td>46.5</td>
</tr>
<tr>
<td>HF Capacitor C (pF)</td>
<td>50</td>
<td>47.3</td>
<td>47.3</td>
</tr>
<tr>
<td>Block:</td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Block:</td>
</tr>
<tr>
<td>Initial Design</td>
<td></td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td>Specifications</td>
<td></td>
<td>3</td>
<td>2</td>
</tr>
</tbody>
</table>

The table above summarizes the hierarchical optimization results of the Charge-Pump Phase-Locked Loops (CPPLL) system, including various performance metrics and design specifications at different levels of abstraction.
Large gain, small jitter and low current consumption are preferred in the VCO block. The 3D Pareto-optimal front for these three competing objectives is shown in Fig. 4.25(a). The contours of jitter on the Pareto front surface is shown in Fig. 4.25(b), which represents the jitter is reduced by increasing current consumption for a fixed gain.

Figure 4.25: (a) 3D Pareto-optimal front of the VCO (b) Contours of jitter in 2D surface (gain vs. current)

4.5.3 POFC of the CPPLL System

For POFC on CPPLL system level, three constraints are set:
1. phase margin is larger than 45° for PLL stability in the lock-in state;
2. RUR is bigger than 20 so that PLL can work properly in the nonlinear acquisition process;
3. the system design parameters are restricted on the Pareto-optimal fronts of the building blocks, which are derived from POFC on circuit level.

In the CPPLL, \( \mathbf{p} = [I_{CP}, K_{VCO}, I_{VCO}, R, C_1] \) is selected as independent system-level parameter set, \( N = 20 \) is determined by the output frequency. \( C_2 \) is set to \( C_1/10 \) for the sake of good matching and safety phase margin.

Here we consider two different application cases. In the first case the dominant noise comes from the reference input, which is the case in clock recovery applications, where the input is random data. In the second case the dominant noise originates from the PLL blocks, which is the case in frequency synthesizers application, where the input is a clear signal source.

In the first case, the loop bandwidth should be chosen to be small in order to suppress the reference noise. Assume the input jitter is 1ns, which corresponds to 4% distortion at the input signal (25MHz). Since the overall output jitter is dominated by the input noise, a clear trade-off exists between the locking time and the output jitter, as depicted in Fig. 4.26. The minimal jitter, point a, is obtained at the minimal loop bandwidth, which is produced by the minimal values of \( I_{CP} \) and \( K_{VCO} \). The feasible regions of \( I_{CP} \) and \( K_{VCO} \) can be found in Figs. 4.24 & 4.25. The minimal locking time, point c, is obtained at the maximum loop bandwidth, which is limited in theory by the RUR constraint. In this experiment, the maximum loop bandwidth happens to be generated by the maximal value of \( I_{CP} \) and \( K_{VCO} \). The best performance, the constraints and the corresponding design parameters are listed in the upper half of Tab. 4.4.

In the second case, the reference input is clean and so the loop bandwidth should be chosen to be large to suppress the VCO noise. Meanwhile, a large loop bandwidth can provide a
quick acquisition process. At a first glance, it seems that there is no contradiction between reducing locking time and output jitter. From Sec. 2.3, the loop bandwidth $\omega_{BW}$ is increased by increasing $I_{CP}$. When $I_{CP}$ reaches the maximum but the $\omega_{BW}$ is still not large enough, it can be further enlarged by increasing $K_{VCO}$. A larger $K_{VCO}$ introduces more VCO noise, which results in a larger output jitter. Fig. 4.27 shows the trade-off between the jitter and the locking time, where the input noise is 0ps and the $I_{CP}$ is set to maximum. The relationship of the loop bandwidth at the points e, f, g is: $\omega_e > \omega_f > \omega_g$. This indicates that the output jitter is increased much more by VCO noise than decreased by the enlarged loop bandwidth. Therefore, minimizing jitter is equivalent to minimizing VCO noise. The best performance, the constraints and the corresponding design parameters are listed in the lower half of Tab. 4.4.

The ultimate sizing goal is to optimize locking time, jitter and power consumption simultaneously. The performance and the corresponding system-level parameter sets for the two different cases introduced in Sec. 4.3 are collected in rows b and f in Tab. 4.4. These obtained values of the system-level parameters can be successfully realized by the final circuit level. Moreover, the two compromise optimization results, b and f, are also denoted by pentagrams in Fig. 4.26 and Fig. 4.27 respectively.

It is worth noting one point here. Based on the discrete computed efficient points, an entire Pareto-optimal front is estimated by means of a smooth fitting curve or surface in this work. Although the residual at certain efficient point ** in the fitting function is not small sometimes, the highest-level optimization results based on the fitting Pareto-optimal fronts can really realized by the final lowest-level circuit in the two optimization experiments. That proves again

** These efficient points might be generated due to non-robust algorithm of performance space exploration.
that the behavior of a circuit is usually well-natured so long as it works in the correct region of operation.

4.6 Summary

This chapter has discussed the basic concept behind of the phase-locked loop. The building blocks and their operation of a charge pump phase-locked loop are briefly explained. Three analysis methods: s-domain analysis, impulse invariance method and state space method are used to characterize the system properties of CPPLL in linear lock-in state and nonlinear acquisition process.

The PLL’s performances, i.e. locking time, phase noise/jitter are discussed in detail. Additionally, the PLLs’ stability criteria for linear lock-in state and for nonlinear acquisition process are discussed individually. The design trade-offs among the loop bandwidth, locking time and jitter are analyzed in theory and visualized by the experimental results.

In this work, to tackle the complex trade-offs in the PLL design, we present an optimization method to find a proper loop bandwidth in order to optimize performance in terms of locking time and jitter, which considers the capability of the building blocks. The experimental results are consistent with the theoretical analysis. The Pareto-front computation on system level provides a comprehensive insight into the circuit’s capability. In clock recovery or frequency synthesizer applications, starting from various requirements on the PLL’s loop bandwidth to final circuit realization, the automatic sizing process can be accomplished in hours. Moreover, it is a first-time-successful top-down sizing process without iteration. Without the proposed hierarchical approach, a complete CPPLL design could need many iteration steps due to complexity problems.
### Table 4.4: Hierarchical optimization results at two different cases

<table>
<thead>
<tr>
<th></th>
<th>Jitter $J_{sum}$ (ps)</th>
<th>Locking Time $T_s$ (µs)</th>
<th>Power (µw)</th>
<th>RUR</th>
<th>PM</th>
<th>VCO $f_{VCO}$</th>
<th>CP $f_{CP}$</th>
<th>LF $f_{LF}$</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Case 1: Input Jitter=1ns</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>a</td>
<td>386</td>
<td>0.64</td>
<td>642</td>
<td>20</td>
<td>45°</td>
<td>$K_{VCO}=2.2e9$ Hz/V</td>
<td>$I_{CP}=65$µA</td>
<td>$C_1=27.2pF$</td>
</tr>
<tr>
<td></td>
<td>$I_{VCO}=161.5$µA $J_{VCO}=8.41ps$</td>
<td>$J_{CP}=8.21ps$</td>
<td>$C_2=2.72pF R=5.9k$</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>b</td>
<td>124.1</td>
<td>2.64</td>
<td>209</td>
<td>75</td>
<td>46.4°</td>
<td>$K_{VCO}=1.537e9$ Hz/V</td>
<td>$I_{CP}=11.3$µA</td>
<td>$C_1=48pF$</td>
</tr>
<tr>
<td></td>
<td>$I_{VCO}=82.41$µA $J_{VCO}=5.6ps$</td>
<td>$J_{CP}=15.74ps$</td>
<td>$C_2=2.72pF R=13.4k$</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>c</td>
<td>66.4</td>
<td>19.02</td>
<td>279</td>
<td>244</td>
<td>45°</td>
<td>$K_{VCO}=573.6e6$ Hz/V</td>
<td>$I_{CP}=5$µA</td>
<td>$C_1=80pF$</td>
</tr>
<tr>
<td></td>
<td>$I_{VCO}=140.2µA J_{VCO}=1.048ps$</td>
<td>$J_{CP}=18.87ps$</td>
<td>$C_2=8pF R=24.5k$</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Case 2: Input Jitter=0</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>e</td>
<td>9.67</td>
<td>0.64</td>
<td>652</td>
<td>20</td>
<td>45°</td>
<td>$K_{VCO}=2.2e9$ Hz/V</td>
<td>$I_{CP}=65$µA</td>
<td>$C_1=27.2pF$</td>
</tr>
<tr>
<td></td>
<td>$I_{VCO}=167$µA $J_{VCO}=7.93ps$</td>
<td>$J_{CP}=8.21ps$</td>
<td>$C_2=2.72pF R=5.9k$</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>f</td>
<td>2.58</td>
<td>0.72</td>
<td>662</td>
<td>23.4</td>
<td>46.6°</td>
<td>$K_{VCO}=1.414e9$ Hz/V</td>
<td>$I_{CP}=65$µA</td>
<td>$C_1=23pF$</td>
</tr>
<tr>
<td></td>
<td>$I_{VCO}=173$µA $J_{VCO}=2.28ps$</td>
<td>$J_{CP}=8.21ps$</td>
<td>$C_2=2.3pF R=7.98k$</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>g</td>
<td>1.08</td>
<td>2.56</td>
<td>669</td>
<td>50.2</td>
<td>56°</td>
<td>$K_{VCO}=573.6e6$ Hz/V</td>
<td>$I_{CP}=65$µA</td>
<td>$C_1=80pF$</td>
</tr>
<tr>
<td></td>
<td>$I_{VCO}=175$µA $J_{VCO}=1.02ps$</td>
<td>$J_{CP}=8.21ps$</td>
<td>$C_2=8pF R=11.3k$</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Chapter 5

Hierarchical Optimization of Switched-Capacitor Sigma-Delta Modulators

Data modulator is an indispensable circuit in nowadays chip, it converts analog signal $x(t)$ in time-continuous domain to digital signal $x(n)$ in discrete-time domain or converts the signal in the reverse process. In last decades, oversampling data converters become more popular for high resolution medium-to-low-speed applications such as high-quality digital audio. "Oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled" [wik]. The major advantages of oversampling method compared to other converter architectures are that high-selectivity analog filters are not needed and the conversion properties are much less sensitive to the circuitry imperfections and noisy environment.

Switch capacitor (SC) sigma-delta ($\Sigma\Delta$) modulator is an effective topology for high resolution Analog-to-Digital (A/D) conversion. SC $\Sigma\Delta$ modulators use not only the oversampling method but the noise shaping method as well. Hence, SC $\Sigma\Delta$ modulators inherit the oversampling’s advantages, such as high tolerance to circuitry non-idealities and reduced accuracy requirements on sample-and-hold circuit. Despite such predominant properties of SC $\Sigma\Delta$ modulators, "it is still governed by the limitations of its analog building blocks. In particular, it is sensitive to circuit non-idealities at the input stage where no noise shaping has yet taken place" [TMG02]. To extract the effects of the non-idealities of analog blocks on the circuit performance, a circuit-level simulation is the direct way generally. However the circuit-level simulation of SC $\Sigma\Delta$ modulators is a huge time-consuming and large memory-capacity required process, which results in that it is unfeasible on such circuit by directly using the "simulation-in-a-loop" based optimization approaches stated in [WRSVT88, AEGP00, HBL01, PKR+00, dPDL+01]. A hierarchical process is needed for optimization on SC $\Sigma\Delta$ modulators.

In this chapter, Sec. 5.1 introduces the fundamentals of $\Sigma\Delta$ oversampling modulators. Sec. 5.2 reviews the topology of a second-order SC $\Sigma\Delta$ modulator and lists out the major non-idealities of the building blocks. Sec. 5.4 shows the proposed hierarchical optimization method applied to the second-order SC $\Sigma\Delta$ modulator. The SNR performance is maximized at the nominal design case and at the worst design case respectively. Finally, Sec. 5.5 concludes.
5 Hierarchical Optimization of Switched-Capacitor Sigma-Delta Modulators

5.1 ΣΔ Oversampling A/D Converters

In this section, the basic topology of conventional converters and of ΣΔ oversampling A/D converters are briefly described as follows. More details can be found in [FM98, Par]. A conventional Nyquist A/D converter consists mainly of three blocks, as shown in Fig. 5.1(a.1):

- **Anti-aliasing filter**: It is a low-pass filter and prevents the input signals above Nyquist frequency to bring new components in the base-band, which don’t belong to the original signal.
- **Sample-hold circuit**: By using a sampling signal with frequency $f_{s1}$, the sample-hold circuit samples a continuously changing input signal each $1/f_{s1}$ time interval and provides a constant signal for the subsequent quantization process. If the sampled signal has a $f_b$ frequency in Nyquist converters, the sampling frequency has to be twice times of $f_b$ in order to avoid any aliasing, i.e. Nyquist frequency $f_{s1} = 2 \cdot f_b$.
- **Multi-level quantizer**: The sampled analog signals are compared with the predefined reference levels, and then the comparison results will be digitalized as the output signals.

In a $N$-bit Nyquist converter, the number of the quantization level is $2^N$ and the number of the level interval is $2^N - 1$. The interval value $q$ between two successive levels is referred to as least-significant-bit (LSB) of the converter:

$$q = \frac{1}{2^N - 1}. \quad (5.1)$$

This conversion has a quantization error up to $\pm q/2$ with same probability and the quantization noise power is equally spread over the entire signal bandwidth. Assumed the above three building blocks are ideal, a perfect classical $N$-bit Nyquist A/D converter has quantization noise power spectral density of $q/\sqrt{12 \cdot f_{s1}}$ uniformly distributed within the signal bandwidth. The noise spectrum is presented only at one side here, as shown in Fig. 5.1(a.2). The *signal-to-noise ratio* (SNR) * performance of an ideal $N$-bit Nyquist converter is [Bak04]

$$\text{SNR}_{\text{ideal}} = (6.02N + 1.76) \text{ dB}. \quad (5.2)$$

However the A/D conversion is not ideal in practice. For example, the undesired signal above the Nyquist frequency cannot be attenuated enough to below the noise floor by the anti-aliasing filter, or the output of the sample-and-hold circuit varies during the quantization process. Consequently, the quantization noise will be larger than its theoretical minimum value, then its effective resolution will be less than $N$-bit. The actual resolution, i.e. *Effective Number of Bits* (ENOB) is defined by

$$\text{ENOB} = \frac{\text{SNR} - 1.76\text{dB}}{6.02\text{dB}}. \quad (5.3)$$

In order to achieve high-resolution by Nyquist converter, high performances for each building block are required.

In oversampling converters, a much higher sampling rate $f_{s2} (=R \cdot 2f_b)$ is applied in sample-hold circuit. The factor $R$ is generally referred to as **oversampling ratio** (OSR). By the price of

---

* SNR is defined as the ratio between the signal power and the noise power: $\text{SNR} = P_{\text{signal}}/P_{\text{noise}}$. SNR includes all noise sources, both thermal and quantization.
5.1 Oversampling A/D Converters

(a.1) Block diagram of Nyquist A/D converter

(a.2) Noise spectrum of Nyquist converter

(b.1) Block diagram of oversampling A/D converter

(b.2) Noise spectrum of oversampling converter

(c.1) Block diagram of sigma delta A/D converter

(c.2) Noise spectrum of sigma delta converter

Quantization noise level = \( \frac{q}{\sqrt{12 \cdot f_{s1}}} \)
high sampling rate at the input signal, oversampling method brings two benefits. One is that the requirement of anti-aliasing filter in oversampling converter is much more lower than that in Nyquist converter. A simple passive first-order filter suffices mostly. The another benefit is that a high resolution A/D conversion is accomplished with a low-bit A/D converter. Assumed a full precision quantizer, the total noise power of oversampling converter is same as that of Nyquist converter. However, the quantization noise is distributed over a wider bandwidth to $f_s/2$, i.e. $R \cdot f_b$ (this phenomenon is normally called noise averaging), as shown in Fig. 5.1(b.2). If a digital low-pass filter (LPF) is added at the output, then most part of the quantization noise can be removed, moreover the wanted in-band signal is not affected. Through the noise averaging, the power of quantization noise in the bandwidth of interest is decreased by factor $R$. The ideal SNR using oversampling method is calculated by \[\text{SNR}_{\text{ideal}} = (6.02N + 1.76 + 10 \log R) \quad \text{dB}. \] If OSR is 2, then SNR$_{\text{ideal}}$ can be increased by 3dB or the ENOB can be improved by 0.5 bits. In oversampling converters, decimator is needed besides the three blocks mentioned above, as shown in Fig. 5.1(b.1).

- **Decimator**: It filters all the signal components out of the signal band, which includes a big part of the quantization error power. The filtered signal is downsampled to the Nyquist rate without degrading SNR performance. The collective operation of the low-pass filtering and the downsampling is known as decimation, which can be realized by purely digital circuit.

If only using oversampling method to increase $N$-bit resolution more, the sample rate has to be faster by a factor of $2^N$. The \(\Sigma \Delta\) converter does not need such a high oversampling ratio because the operation of modulator (in Fig. 5.1(c.1)) not only reduces the quantization noise power within the signal band, but also pushes quantization noise power from the signal band to outside of the signal band (this phenomenon is normally called noise shaping), as shown in Fig. 5.1(c.2). Consequently, the ideal SNR by oversampling plus one-order noise shaping process is \[\text{SNR}_{\text{ideal}} = (6.02N + 1.76 - 5.17 + 30 \log R) \quad \text{dB}. \] The details of the noise shaping process in modulator will be discussed in Sec. 5.3.

- **Modulator**: In this block the signal is over-sampled and quantized. Same as oversampling converter, it spreads the whole quantization noise power (a fixed value) over the frequency range with bandwidth of $R \cdot f_b$ in order to reduce the quantization noise power within the signal band. Furthermore, the modulator attenuates the quantization noise within the signal band and amplifies it outside the signal band by means of noise shaping process. In consequence, most of the noise power lies out of the signal band. These out-of-band quantization noise could be filtered by decimator later.

Through noise averaging and noise shaping methods, the requirements on some building blocks are quite low in \(\Sigma \Delta\) A/D converter. The anti-aliasing filter can be realized by a simple RC low-pass filter. The decimator is a pure digital block and can be designed with the help of mature CAD tools. In contrast, the modulator is in charge of the simultaneous implementation of the noise averaging and the noise shaping. The enclosed intrinsic errors of modulator, e.g. inherent quantization errors and imperfections (non-idealities) of circuit performances, will degrade the
5.2 Second-Order Switched-Capacitor Sigma-Delta Modulators

The $\Sigma\Delta$ modulator can be implemented as a discrete-time system (switched-capacitor, switched OP AMP) [BW88, GSS00] or as a continuous-time one (active RC or transconductor-C) [SZ96, CS00]. Compared to the continuous-time realization, the implementation using switched-capacitor circuit is compatible with standard CMOS process and is insensitive to clock

converter performance [FAB99]. Therefore, the modulator block is taken as the to be optimized circuit in this work.

A large amount of $\Sigma\Delta$ modulator architectures have been developed since 1980s. Tab. 5.1 [FAB99] summarize the respective advantages and disadvantages of the most used $\Sigma\Delta$ modulator architectures. Based on system stability, oversampling ratio, circuit complexity and circuit sensitivity, second-order $\Sigma\Delta$ modulators are most selected architectures for high-resolution application. Their effectiveness has already been illustrated in a variety of applications, such as digital speech processing systems and voice-band telecommunication in [KHE+86, BW88].

<table>
<thead>
<tr>
<th>Architecture</th>
<th>Advantages</th>
<th>Disadvantages</th>
</tr>
</thead>
</table>
| Single-loop, 1-bit low-order [LNGB88] | * guaranteed stability  
* simple circuitry  
* maximum useful input range | ○ need of high OSR  
○ presence of noise patterns |
| Single-loop, 1-bit high-order [AP87] | * large SNR by using low OSR  
* smaller noise pattern | ○ potential instability  
○ useful input range smaller than full-scale range  
○ need of low gain integrators |
| High-order cascade [HIUK86] | * guaranteed stability  
* large SNR by using low OSR  
* maximum useful input range | ○ sensitivity to circuit imperfections  
○ larger complexity of the digital part |
| Multi-bit [CSC97] | * better stability  
* large SNR by using low OSR  
* smaller noise patterns | ○ more complex digital and analog circuitry  
○ sensitivity to multi-bit DAC nonlinearity |

Table 5.1: Summary of $\Sigma\Delta$ modulator architectures based on [FAB99]
jitter. Moreover, the frequency response of the noise shaping filter can be accurately set by the relative capacitor ratios. For this reason, a second-order SC ΣΔ modulators is taken in this thesis.

5.2.1 Building Blocks of a second-order SC ΣΔ Modulator

A topology of the second-order SC ΣΔ modulator [BW88] is shown in Fig. 5.2, consisting of two switched-capacitor integrators, a comparator and a 1-bit D/A converter.

Figure 5.2: A second-order SC ΣΔ modulator [BW88]

5.2.1.1 Switched-Capacitor Integrators

The two switched-capacitor integrators in Fig. 5.2 are identical. Each consists of an operation amplifier (OP AMP), a sampling capacitor $C_s$ and an integrating capacitor $C_f$. Compared to a standard RC integrator, switched-capacitor (SC) integrator has two major advantages:

- Much reduced area to realize the same functionality: as seen in Fig. 5.3, a resistor is replaced by the sampling/integrating capacitors and the periodical turn-on/off switches. For a same time constant ($\tau = R_{eq}C_f$), a meg-ohm resistance can be replaced by a pico-farad MOS capacitance with a proper sampling rate.
- More stable time constant: the time constant of the RC integrator is closely related to the absolute values of the resistance and the capacitance, while the time constant of the SC integrator is determined by the relative ratio of the capacitances. The absolute values of resistance and capacitance can vary $\pm 20 \sim 30\%$ in silicon realization. In contrast, capacitances can superiorly match with each other, its relative ratio is much more constant than its absolute values. Hence, the time constant is insensitive to the variations of process and of temperature.

In this experiment, the ratio of $C_s/C_f$ is set as Equation 5.6 in order to realize the gain of 0.5, i.e.

\[ b = \frac{C_s}{C_f} = 0.5. \] (5.6)

A SC integrator has two operation phases, which are controlled by two non-overlapping clocks. The switches are closed when the controlling clocks are high and vice versa. The corresponding
time diagram for all switches in the modulator is shown in Fig. 5.3. During \(\phi_1\), switches \(s_1\&s_3\) are closed while \(s_2\&s_4\) are open, and the capacitor \(C_s\) of the integrator is charged by the input \(V_{in}\). During \(\phi_2\), switches \(s_2\&s_4\) are closed while \(s_1\&s_3\) are open, the charge stored on \(C_s\) is transferred to \(C_f\). Switches \(s_1\&s_2\) are closed after a little delay \(\Delta t\) of switches \(s_3\&s_4\) respectively in order to avoid charge injection on \(C_s\).

![Figure 5.3: Single-ended SC Integrator](image)

The SC integrator is a key building block in \(\Sigma\Delta\) modulators. Its non-idealities, e.g. switch thermal noise, finite DC gain and slew rate of OP AMP, have strong impact on the SNR performance of \(\Sigma\Delta\) modulators. We will discuss that in detail in Sec. 5.3.1.3.

### 5.2.1.2 Comparator

A comparator is used to quantize an analog signal in the loop. The generated digital output will be directly taken as the output signal and be fed back to the D/A converter at the same time. As shown in Fig. 5.2, the comparator lies after the integrator block, its non-idealities are shaped by the loop in the same way that quantization noise is shaped. Therefore, the impact of the comparator’s non-idealities can be ignored. Generally, "a simple regenerative latch without pre-amplification or offset cancellation fulfills the comparator requirements" [BW88].

### 5.2.1.3 1-bit D/A Converter

The D/A converter transforms the digital output signal back into an analog form, which acts as another input of the modulator in addition to the reference input signal. Any D/A conversion error will result in distortion at the modulator’s output. Since the conversion error is not attenuated, the DAC’s non-linearity conversion will considerably hurt the performance of modulators.
In order to minimize the impact of non-idealities D/A converter on the ΣΔ modulators, 1-bit D/A converter is selected here. The advantage of an 1-bit D/A converter is that it is inherently linear and guarantees the D/A conversion from differential nonlinearity, because there are only two output values and only one step size.

### 5.3 Analysis on ΣΔ modulator in z-domain

In principle, quantization process is an intrinsic non-linear procedure, so a ΣΔ modulator is also an inherently non-linear system. To analyze the behavior of a ΣΔ modulator, it can still be implemented by a linear operation if the quantization error is assumed to be additive white noise by fulfilling the following four conditions [Mal02]:

I. "All quantization levels are exercised with equal probability.

II. The quantization steps are uniform.

III. The quantization error is not correlated with the input signal.

IV. A large number of quantization levels are used †

A general linear model for a modulator is shown in Fig. 5.4. It is a two-input \((x(n) \& e(n))\) and one-output \((y(n))\) system, whose z-domain expression is represented by

\[
Y(z) = STF(z)X(z) + NTF(z)E(z) = \frac{H(z)}{1 + H(z)}X(z) + \frac{1}{1 + H(z)}E(z),
\]

where \(X(z)\) and \(E(z)\) presents the input signal and the quantization noise respectively; and \(STF(z)\) and \(NTF(z)\) are the respective z-domain transfer functions of the input and the quantization noise.

\[\text{Figure 5.4: Linear model of the modulator with an injected quantization noise}\]

For the 2\textsuperscript{nd}-order SC ΣΔ modulator topology in Fig. 5.2, the corresponding linear model is shown in Fig. 5.5. The ideal integrator’s z-domain transfer function including the necessary closed loop gain are given by \(H_{1,2}(z)\), i.e.

\[
H_1(z) = H_2(z) = 0.5 \frac{z^{-1}}{1 - z^{-1}}.
\]

† Although only one-level quantization is used in the second-order SC ΣΔ modulator, the quantization error is still assumed to be white noise, since the quantization noise will be firstly sharpened.
Thus, the ideal signal transfer function is given by

\[ STF(z) = \frac{H_1H_2}{1 + H_1H_2 + H_2} = \frac{0.25}{z^2 - 1.5z + 0.75}. \]  

(5.9)

Whereas, the noise transfer function is to be

\[ NTF(z) = \frac{1}{1 + H_1H_2 + H_2} = \frac{z^2 - 2z + 1}{z^2 - 1.5z + 0.75}. \]  

(5.10)

As can be seen, the \( STF(z) \) in Equation 5.9 is a low-pass function, while \( NTF(z) \) in Equation 5.10 is a high-pass function. Therefore, the 2\(^{nd}\)-order SC \( \Sigma \Delta \) modulator can suppress the quantization noise in the signal band and will not attenuate the input signal.

Fig. 5.6 shows the power spectral density (PSD) of an ideal 2\(^{nd}\)-order SC \( \Sigma \Delta \) modulator when the OSR is set to 256. As seen in this figure, the PSD of noise power is attenuated in the signal

![Figure 5.5: Ideal linear Model of the 2\(^{nd}\)-order SC \( \Sigma \Delta \) modulator](image)

![Figure 5.6: PSD plot of the ideal 2\(^{nd}\)-order SC \( \Sigma \Delta \) modulator @ OSR=256](image)
band and is amplified outside the signal band. With 256 oversampling ratio, we can obtain the maximal SNR is 101.5dB and the corresponding ENOB is 16.56bit. The higher OSR is, the lower PSD quantization noise is in the signal band. Consequently, SNR increases with the increasing OSR as shown in Fig. 5.7.

5.3.1 Effects of Non-idealities

Sole quantization noise is taken into account in the ideal linear model of $\Sigma\Delta$ modulator in the last section. Although $\Sigma\Delta$ modulator is usually well-known for its robustness to the non-idealities of the building block compared to other data conversion architectures [CT92], it is still necessary to take into account such non-idealities of the electrical implementations and the corresponding introduced conversion error. Actually, the impact associated with such imperfections increases when the modulator specifications are highly demanded because they become the dominant error sources [BKMA88,DLM92]. The detailed analysis on the non-idealities of building blocks are described in [BW88,FAB99]. Here, the effects of non-idealities on the SNR performance are presented and the corresponding modeling of these non-idealities are built. The main non-idealities of SC $\Sigma\Delta$ modulators [BFM+99,MBF+03] can be classified into three categories, shown as follows:

- clock jitter
- integrator noise
  - switches thermal noise
  - OP AMP noise
- integrator non-idealities
5.3 Analysis on $\Sigma\Delta$ modulator in $z$-domain

Assumed as white noise in this thesis, so the oversampling process is helpful to reduce the signal uniformly distributed from 0 to $\frac{1}{2}\pi f_{in}$ and frequency $A$

5.3.1.2 Noise Sources

In the real circuit implementation of $\Sigma\Delta$ modulators, the property of SC circuits is closely dependent on the charge transfer process during each clock phase. As seen in Fig. 5.3, the four clock signals are used to trigger the switches in order to sample the input analog signal. Due to the clock jitters on the four clock signals, a non-uniform sampling time sequence is applied on the input signal, which results in nonlinear distortions of the sampled signal. In consequence, the noise floor or the total harmonic distortion of modulator is increased.

The distortion of the sampled signal is a function related to the statistical property of the clock jitter and the modulator input signal. If the input signal is a sinusoidal signal $x(t)$ with amplitude $A$ and frequency $f_{in}$, the distortion introduced by a sampling with an instant clock jitter ($\delta$) is

$$x(t+\delta) - x(t) = \frac{d}{dt}x(t)\delta \approx 2\pi f_{in} A \cos(2\pi f_{in}t)\delta.$$  (5.11)

Fig. 5.8 illustrates the relationship between distortion and clock jitter. Since clock jitter is assumed as white noise in this thesis, so the oversampling process is helpful to reduce the signal distortion introduced by the clock jitter. The total power of the distortion is $(2\pi f_{in} \delta \Delta t)^2 / 2$ and uniformly distributed from 0 to $f_s / 2$ [BW88].

Fig. 5.9 shows the PSD of the 2nd-order SC $\Sigma\Delta$ modulator with the clock jitter when the OSR is set to 256. The SNR performance is reduced from 101.5dB (only considering the inherent quantization noise) to 99.95dB (considering additional clock jitter with standard deviation $\Delta t = 4\epsilon - 9$). This effect can be simulated with SIMULINK by using the model in Fig. B.1 in Appendix B, which represents Equation 5.11.

5.3.1.1 Clock Jitter

In SC $\Sigma\Delta$ modulators, the property of SC circuits is closely dependent on the charge transfer process during each clock phase. As seen in Fig. 5.3, the four clock signals are used to trigger the switches in order to sample the input analog signal. Due to the clock jitters on the four clock signals, a non-uniform sampling time sequence is applied on the input signal, which results in nonlinear distortions of the sampled signal. In consequence, the noise floor or the total harmonic distortion of modulator is increased.

The distortion of the sampled signal is a function related to the statistical property of the clock jitter and the modulator input signal. If the input signal is a sinusoidal signal $x(t)$ with amplitude $A$ and frequency $f_{in}$, the distortion introduced by a sampling with an instant clock jitter ($\delta$) is

$$x(t + \delta) - x(t) = \frac{d}{dt}x(t)\delta \approx 2\pi f_{in} A \cos(2\pi f_{in}t)\delta.$$  (5.11)

Fig. 5.8 illustrates the relationship between distortion and clock jitter. Since clock jitter is assumed as white noise in this thesis, so the oversampling process is helpful to reduce the signal distortion introduced by the clock jitter. The total power of the distortion is $(2\pi f_{in} \delta \Delta t)^2 / 2$ and uniformly distributed from 0 to $f_s / 2$ [BW88].

Fig. 5.9 shows the PSD of the 2nd-order SC $\Sigma\Delta$ modulator with the clock jitter when the OSR is set to 256. The SNR performance is reduced from 101.5dB (only considering the inherent quantization noise) to 99.95dB (considering additional clock jitter with standard deviation $\Delta t = 4\epsilon - 9$). This effect can be simulated with SIMULINK by using the model in Fig. B.1 in Appendix B, which represents Equation 5.11.

5.3.1.2 Noise Sources

In the real circuit implementation of $\Sigma\Delta$ modulators, the signal is corrupted not only by the intrinsic quantization noise, but by various electrical noises in the building blocks as well. A
typical switched-capacitor integrator is already shown in Fig. 5.3. The most important noise sources affecting the operation of the SC integrator are thermal noise associated with the sampling switches and the intrinsic noise of the operational amplifier [MBF+03].

Switch Thermal Noise

Switches in SC integrators are implemented with CMOS transistors. When a switch is on, the CMOS transistor works in triode region and has a finite on-resistance that introduces thermal noise. Thermal noise is an electronic noise generated by the thermal agitation of the charge carriers (usually the electrons) and usually presents in equilibrium. Thermal noise has a white spectrum and wide band, is limited only by the time constant of the switched capacitors or the bandwidth of the OP AMP [MBF+03].

![Equivalent SC circuits](image)

**Figure 5.10:** Equivalent SC circuits in (a) $\phi_1$ (b) $\phi_2$

Referring to the SC integrator shown in Fig. 5.3, the sampling capacitor $C_s$ is in series with switches, that periodically open and close with finite resistance. The equivalent circuit is shown in Fig. 5.10. $r_1$, $r_2$, $r_3$ and $r_4$ are the on-resistances of switches $s_1$, $s_2$, $s_3$ and $s_4$, respectively. $v_{n1}$, $v_{n2}$, $v_{n3}$ and $v_{n4}$ are white Gaussian noise sources, which modulate the thermal noise of $r_1$. 

**Figure 5.9:** Effect of clock jitter
5.3 Analysis on \( \Sigma \Delta \) modulator in \( z \)-domain

\[e_T^2 = \int_0^\infty \frac{4kT R_{on}}{1 + (2\pi R_{on}C_s)^2} df = \frac{kT}{C_s},\]  
\( (5.12) \)

where \( k \) is the Boltzmann’s constant, \( T \) is the absolute temperature, and \( 4kT R_{on} \) is the noise PSD associated with the switch on-resistance. The corresponding model for switch thermal noise in simulink is shown in Fig. B.2.

**Operational Amplifier Noise**

The intrinsic noise of operational amplifier includes thermal noise, flicker (1/f) noise, shot noise etc. For these purely random noises, it is very hard to predict their instantaneous values at any time. The usual method to calculate random noise is the average mean-square value of the noise. When having multiple noise sources in a circuit, all noises can be represented together by a total root mean square (rms) noise source, which is the square root of the sum of the average mean-square values of each individual source. In the behavioral model of OP AMP, an input-referred noise source with rms noise voltage \( V_n \) is used to present the intrinsic noises in OP AMP:

\[y(t) = b \cdot [x(t) + n_{opamp}(t)],\]  
\( (5.13) \)

where

\[n_{opamp} = V_n \cdot RN(t).\]  
\( (5.14) \)

\( b \) is the integrator gain, and \( RN(t) \) is a Gaussian random number with zero mean and unity standard deviation. In this experiment, only thermal noise is considered, while other noises are neglected. The corresponding model for thermal noise is shown in Fig. B.3.

**Figure 5.11**: Effect of noise
Fig. 5.11 shows the PSD of the 2nd-order SC ΣΔ modulator with the switch thermal noise and the operation amplifier noise respectively, when the OSR is set to 256.

5.3.1.3 Operational Amplifier Non-idealities

In Sec. 5.3, the transfer functions of input signal and quantization noise \((STF(Z) \text{ and } NTF(Z), \text{ respectively})\) are based on an ideal integration process. In practice, the real behavior of an integrator deviates from this ideal behavior due to several non-idealities of the analog circuit implementation. The non-ideal effect of the integrator is a consequence of the OP AMP's non-idealities. Such as finite DC gain and bandwidth, slew rate and limited voltage region will result in an incomplete transfer of charge in SC integrators. These non-idealities will be discussed separately in the following.

**DC Gain**  To simplify analysis here, the effect of an infinite DC gain of OP AMP in a first-order ΣΔ modulator are formulated here, and the following equations can be extended to the higher-order modulators. The transfer function of an ideal integrator with unity coefficient is

\[
H_{\text{ideal}}(z) = \frac{z^{-1}}{1 - z^{-1}}. \tag{5.15}
\]

For a first-order ΣΔ modulator, the ideal z-domain transforms of the input signal and the quantization noise are expressed as

\[
Y(z) = STF_{\text{ideal}}(z)X(z) + NTF_{\text{ideal}}(z)E(z)
\]

\[
= \frac{H_{\text{ideal}}(z)}{1 + H_{\text{ideal}}(z)}X(z) + \frac{1}{1 + H_{\text{ideal}}(z)}E(z)
\]

\[
= z^{-1}X(z) + (1 - z^{-1})E(z),
\]

where

\[
|STF_{\text{ideal}}(z)| = 1 \quad \text{for} \quad z \to 1 \quad \text{i.e.} \quad f \to 0. \quad \tag{5.17}
\]

As we can see from Equation 5.16, the output of the modulator is the delayed input signal plus the in-band shaped quantization noise. In practice, a real integrator built by an OP AMP with a finite DC gain \(A_0\) provides a real transfer function:

\[
H_{\text{real}}(z) = \frac{z^{-1}}{1 - (1 - \alpha)z^{-1}} \quad \text{with} \quad \alpha = \frac{1}{A_0}. \tag{5.18}
\]

Therefore, the real z-domain transforms of the input signal and the quantization noise can be expressed as

\[
Y(z) = STF_{\text{real}}(z)X(z) + NTF_{\text{real}}(z)E(z)
\]

\[
= \frac{H_{\text{real}}(z)}{1 + H_{\text{real}}(z)}X(z) + \frac{1}{1 + H_{\text{real}}(z)}E(z)
\]

\[
= \frac{z^{-1}}{1 + \alpha z^{-1}}X(z) + \frac{1 + (1 - \alpha)z^{-1}}{1 + \alpha z^{-1}}E(z),
\]

\[
\tag{5.19}
\]
5.3 Analysis on \( \Sigma \Delta \) modulator in \( z \)-domain

\[
|ST F_{\text{real}}(z)| = \left| \frac{1}{1+\alpha} \right| \approx 1 - \alpha
\]

\[
|NT F_{\text{real}}(z)| = \left| \frac{1+\alpha-z^{-1}}{1+\alpha} \right| \approx \left| (1 - \alpha)(1 - z^{-1}) + \alpha z^{-1} \right| \rightarrow \alpha \quad \text{for } \alpha \ll 1 \text{ and } z \rightarrow 1.
\]  

(5.20)

An OP AMP with a finite DC gain affects the integration process by two aspects. Based on the comparison between \( |ST F_{\text{ideal}}(z)| \) and \( |ST F_{\text{real}}(z)| \), only a \( (1 - \alpha) \) fraction of the previous output of the integrator is added to the new input sample every time. This phenomenon is usually referred as "leaky integration". Based on the comparison between \( |NT F_{\text{ideal}}(z)| \) and \( |NT F_{\text{real}}(z)| \), quantization noise in the signal band is weakly attenuated. Hence the SNR performance is consequently degraded by the incomplete integration process.

**Bandwidth and Slew Rate**    Besides the finite DC gain of OP AMP, a finite bandwidth or a finite slew rate could also produce an inaccurate charge transfer within each clock cycle, which leads a non-ideal transient response in SC circuits. The effect of the imperfections on bandwidth and slew rate are correlated to each other and can be interpreted as a nonlinear gain [MPVARVH94].

Referring to the SC integrator shown in Fig. 5.12 with a sampling period \( T_s \), the evolution of the output node during the \( n \)-th integration period (when \( \Phi_2 \) is on) is given by

\[
V_{\text{out}}(t) = V_{\text{out}}(nT_s - T_s/2) + (1 - \alpha)V_s(1 - e^{-\frac{t}{\tau}}), \quad 0 < t < T_s/2,
\]  

(5.21)

where \( V_s = V_{\text{in}}(nT_s - T_s/2) \), \( (1 - \alpha)V_s \) presents the leaky integration due to the finite DC gain. \( \tau = 1/(2\pi \cdot GBW) \) is the time constant of the integrator. \( GBW \) is the unity-gain bandwidth of the OP AMP when loaded by \( C_f \). From Equation 5.21, the maximal slope of the integration curve happens at the beginning of each integration process, i.e. \( t = 0 \), resulting in

\[
\frac{d}{dt}V_{\text{out}}(t)|_{\text{max}} = (1 - \alpha)\frac{V_s}{\tau}.
\]  

(5.22)

This can also be explained by that the SC integrator usually starts the integration process with the maximal slew rate. The later integration process can be divided into two separate cases according to the slew rate capability of OP AMP [MBF+03]:

1. If the slew rate of OP AMP is larger than the value in Equation 5.22, there exists no limitation for the integration process. Therefore, the integration process of \( V_{\text{out}} \) can be described entirely by Equation 5.21.
2. If the slew rate of OP AMP is smaller than the value in Equation 5.22, the output signal of OP AMP cannot linearly follow the changes of the input signal. $V_{\text{out}}$ can be characterized in a piecewise function. The first part of the temporal evolution of $V_{\text{out}}(t \leq t_0)$ is linear with the slope $SR$, while the second part of $V_{\text{out}}(t > t_0)$ is non-linear with the slope $SR$.

In summary, the integrator output $V_{\text{out}}$ can be characterized as

$$V_{\text{out}}(t) = \begin{cases} V_{\text{out}}(nT_s - T_s) + SR \cdot t & \text{if } t \leq t_0 \\ V_{\text{out}}(nT_s - T_s) + (1 - \alpha)V_s - SR \cdot t_0 \times (1 - e^{-\frac{t - t_0}{\tau}}) & \text{if } t > t_0 \end{cases}$$

where $t_0$ is set by

$$t_0 = \frac{(1 - \alpha)V_s}{SR} - \tau$$

in order to get the continuity of the derivatives of $V_{\text{out}}(t)$.

### Saturation (Output Voltage Range)

Another non-ideality of OP AMP is the output saturation, i.e. the limited output voltage range. Clipping will occur when the OP AMP is asked to produce an voltage exceeding its own output voltage range. If this occurs in the integration process, the output signal would fail to follow the ideal output voltage waveform, and is instead a distorted waveform. Therefore, the output range of OP AMP has also to be taken into account.

The corresponding model for a real integrator is shown in Fig. B.4. Fig. 5.13 shows the PSD of the 2nd-order SC $\Sigma\Delta$ modulator with a limited DC gain, a limited slew rate and a limited bandwidth and a limited output voltage range of the OP AMP respectively, when the OSR is set to 256.

In this section, we have made the analysis on how the non-idealities of electrical implementations result in the error mechanisms, which consequently worsen the performance of modulators. Moreover, the degraded SNR performances by these non-idealities are shown in Fig. 5.9, 5.11 and 5.13.

### 5.4 Example: Hierarchical Optimization of a 2nd-order SC $\Sigma\Delta$ Modulator

The SNR performance determines the eventual resolution of A/D and D/A converters. For SNR measurement, one simulation requires 10,000-100,000 clock samples depending on the oversampling ratio and desired accuracy of the SNR estimation. The standard analog spice-like simulations are usually used to generate the raw sample data for the later fast Fourier transform (FFT) or discrete Fourier transform (DFT) postprocessing. If a DFT/FFT is used for calculation of SNR, a $\Sigma\Delta$ modulator with 64*OSR and 1024 in-band FFT bins would require 131,072 clock cycles plus the initial cycle [NST97]. To achieve an accuracy on the order of 90dB, circuit simulators need typically 100-1000 time steps per clock cycle. Thus, over a million time steps are needed just to acquire a single data point on the SNR-versus-input amplitude plot. Such
5.4 Example: Hierarchical Optimization of a 2\textsuperscript{nd}-order SC \(\Sigma\Delta\) Modulator

A simulation can last a day even running on today’s fastest computer machine. Therefore, in order to optimize SNR performance, a hierarchical optimization process is needed for SC \(\Sigma\Delta\) modulators.

### 5.4.1 SC \(\Sigma\Delta\) Modulator Hierarchical Modeling

A system-level model in [MBF+03] for SC \(\Sigma\Delta\) modulators enables a quick and accurate SNR estimation. Based on the efficient module, a transient simulation only needs several minutes. It makes the "simulation-in-a-loop” based optimization approach feasible for SC \(\Sigma\Delta\) modulators.

A system-level module for the 2\textsuperscript{nd}-order SC \(\Sigma\Delta\) modulators in Fig. 5.2 is realized in SIMULINK, as shown in Fig. 5.14. Since SC \(\Sigma\Delta\) modulators are sensitive to circuit’s non-idealities at the input stage where no noise shaping has yet taken place [BW88], only the non-idealities of the first integrator are considered here, while the second integrator, comparator and 1-bit DAC are assumed as ideal. The detailed realization of each sub-model is described in Appendix B. In this model, the following non-idealities of building blocks are considered:

1. clock jitter at the input sampler: \(\delta\);
2. switch thermal noise in the SC structure: \(KT/C_s\);
3. operational amplifier noise: \(RN(t)\);
4. operational amplifier finite DC gain: \(A\);
5. operational amplifier gain bandwidth: \(GBW\);

![Figure 5.13: Effects of non-identities in OP AMP](image)
Item 1 clock jitter and item 3 operational amplifier noise are random noises, which won’t be considered in this work. Since $K$ is the Boltzmann’s constant and $T$ is the absolute temperature, item 2 switch thermal noise $kT/C_s$ is fixed for a predetermined capacitor. The SNR performance of a given topology SC $\Sigma\Delta$ modulator can be optimized by properly choosing the design parameters of OP AMP, i.e. items 4-7 $A$, $GBW$, $SR$ and $V_{out}$. Other system information is listed in Tab. 5.2.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Signal bandwidth $BW$</td>
<td>$22.05$Hz</td>
</tr>
<tr>
<td>Sampling frequency $f_s$</td>
<td>$11.29$MHz</td>
</tr>
<tr>
<td>Oversampling ratio $R$</td>
<td>$256$</td>
</tr>
<tr>
<td>Number of samples considered</td>
<td>$66536$</td>
</tr>
<tr>
<td>Integrator coefficients $b$</td>
<td>$b = b2 = 0.5$</td>
</tr>
</tbody>
</table>

In the hierarchical modeling of the SC $\Sigma\Delta$ modulator, the SNR is the sole system performance to be optimized, i.e.

$$f = \text{SNR}. \quad (5.25)$$

The SNR performance is dependent of the system-level parameters $p$ of OP AMP, i.e.

$$p = [A, GBW, SR, V_{out}]. \quad (5.26)$$

These system-level parameters are determined by the circuit-level design parameters $x$, i.e.

$$x = [W_1, L_1, W_2, L_2, \ldots]. \quad (5.27)$$

where $W_i$ and $L_i$ are the width and length of the $i$-th transistor shown in Fig. 5.16. The hierarchical modeling of the second-order SC $\Sigma\Delta$ modulator is presented in Fig. 5.15.
5.4 Example: Hierarchical Optimization of a 2\textsuperscript{nd}-order SC $\Sigma\Delta$ Modulator

Modulator System

System performance

System-level parameters

Building block

Circuit-level performances

Circuit-level parameters

Figure 5.15: Hierarchical performance modeling of the SC $\Sigma\Delta$ modulator

5.4.2 PSE on OP AMP

A folded-cascode OP AMP is used for the second-order SC $\Sigma\Delta$ modulator, which is realized in 180nm technology with a supply voltage of 3V. The schematic is shown in Fig. 5.16.

![Schematic of a fold cascode OP AMP](image)

Figure 5.16: Schematic of a fold cascode OP AMP

For PSE on the OP AMP, sizing rules [GZEA01], i.e. geometric and electric constraints, are considered to guarantee all transistors to work properly in the expected analog function. Tab. 5.3 summarizes the total 152 sizing rules for the N/PMOS in the folded-cascode OP AMP. The detailed sizing rules for each basic analog structure can be found in Tab. A.2. In addition to the sizing rules, two performance constraints are taken into account: a) phase margin ($PM$) larger...
than 45° for the stability of OP AMP; b) the output voltage region \( V_{\text{out}} \) larger than 2.5V, so that the 2\(^{nd}\)-order SC \( \Sigma \Delta \) modulator will work properly and the SNR performance will not be degraded by the limited output voltage range.

### Table 5.3: Sizing rules for the folded-cascode OP AMP

<table>
<thead>
<tr>
<th>N/PMOS</th>
<th>Analog function</th>
<th>Number of sizing rules</th>
</tr>
</thead>
<tbody>
<tr>
<td>N1 &amp; N2</td>
<td>N-type differential pair</td>
<td>2*7</td>
</tr>
<tr>
<td>N3 &amp; N4</td>
<td>N-type simple current mirror</td>
<td>2*7</td>
</tr>
<tr>
<td>N5 &amp; N6</td>
<td>N-type simple current mirror</td>
<td>2*7</td>
</tr>
<tr>
<td>N7 &amp; N8</td>
<td>N-type lever shift</td>
<td>2*9</td>
</tr>
<tr>
<td>P1 &amp; P2</td>
<td>P-type differential pair</td>
<td>2*7</td>
</tr>
<tr>
<td>P3 &amp; P4</td>
<td>P-type lever shift</td>
<td>2*9</td>
</tr>
<tr>
<td>P5 &amp; P6</td>
<td>P-type simple current mirror</td>
<td>2*7</td>
</tr>
<tr>
<td>P0 &amp; P6</td>
<td>P-type simple current mirror</td>
<td>2*7</td>
</tr>
<tr>
<td>P7 &amp; P8</td>
<td>P-type simple current mirror</td>
<td>2*7</td>
</tr>
<tr>
<td>P9 &amp; P10</td>
<td>P-type lever shift</td>
<td>2*9</td>
</tr>
</tbody>
</table>

### Table 5.4: Maxima and minima of \( A, GBW \) and \( SR \)

<table>
<thead>
<tr>
<th></th>
<th>maximum</th>
<th>minimum</th>
</tr>
</thead>
<tbody>
<tr>
<td>( A ) (dB)</td>
<td>105.23</td>
<td>76.54</td>
</tr>
<tr>
<td>( GBW ) (MHz)</td>
<td>47.05</td>
<td>1.831</td>
</tr>
<tr>
<td>( SR ) (V/( \mu )s)</td>
<td>33.01</td>
<td>1.44</td>
</tr>
</tbody>
</table>

### 5.4.3 Nominal Optimization of SC \( \Sigma \Delta \) Modulator

The ideal SNR value of the modulator is 101.5dB. Practically, SNR will be degraded by the finite \( A, GBW \) and \( SR \) of the OP AMP. To compute the maximal SNR performance, a hierarchical optimization process based on the capability of the OP AMP can be formulated as

\[
\max_{A,GBW,SR} \text{SNR}(A,GBW,SR) \quad \text{s.t.} \quad c(A,GBW,SR) = PF,
\]  

\[ (5.28) \]
Figure 5.17: Pareto-optimal front of OP AMP (a) 3D dimension; Projection on 2D surface (b) \( A \) vs. \( GBW \); (c) \( A \) vs. \( SR \); (d) \( SR \) vs. \( GBW \)
Hierarchical Optimization of Switched-Capacitor Sigma-Delta Modulators

where \( c(*) \) describes the feasible ranges of the system-level design parameters: \( A, GBW \) and \( SR \), as shown in Fig. 5.17(a). However, the trade-off between \( GBW \) and \( SR \) is not obvious, which results in a narrow 3D Pareto-optimal front as shown in Fig. 5.17(a). It is difficult to fitting the \( PF \). Moreover, an inaccurate fitting could result in a false design result. Hence, we try to reduce the number of the system-level design parameters first. Six system-level sweep-analysis processes of SNR are executed at design points \( x, y, z \), respectively. For example a sweep-analysis at point \( x \), (i.e. maximal \( GBW \)), one of the other two parameters \( A \) and \( SR \), varies from their own minimum to maximum individually and the other parameters keep constant at the corresponding design values.

At point \( x \), the red lines in Fig. 5.18 represent the variation of SNR related to \( A \) and \( SR \) respectively. As can be seen, SNR increases with \( SR \), while SNR stays almost constant with \( A \). At point \( y \), the blue lines represent the variation of the SNR related to \( SR \) and \( GBW \) respectively. Worth to note, SNR keeps almost 0 with \( GBW \), because \( SR \) is minimal at the design point. At point \( z \), the green lines represent the variation of the SNR related to \( SR \) and \( A \) respectively. Based on these results, we can find that \( A \) doesn’t have any impact on SNR in this experiment, while SNR is very sensitive to the value of \( SR \). Additionally, a larger \( GBW \) can enlarge SNR when \( SR \) is maximal.

![Figure 5.18: SNR vs Parameter Sweep](image)

Based on the analysis above, SNR is mostly dominated by \( SR \). And the main increasing trend of \( SR \) is proportional to \( GBW \). Therefore, the maximal SNR is achieved by a design point, which lies on the nominal Pareto-optimal front \( PF3 \), zoomed in Fig. 5.19. The optimization of SNR in Equation 5.28 can be simplified to

\[
\max_{GBW, SR} \text{SNR}(GBW, SR) \quad s.t. \quad c(GBW, SR) = PF3, \quad (5.29)
\]
where the system-level design parameters are reduced to $GBW$ and $SR$, $PF_3$ is the optimization constraint for system-level optimization. The maximal SNR is 98.89dB generated by the design point $q_1$, which is listed in Tab. 5.6 and denoted as diamond in Fig. 5.19.

Figure 5.19: Pareto-optimal front of GBW vs. SR

5.4.4 Worst-Case Analysis of SC $\Sigma\Delta$ Modulator

What is the SNR performance with the nominal optimized device sizes actually after fabrication? Through a Monte Carlo analysis on the statistical parameters of the OP AMP, about 52% circuits can generate the $q_1$ performance values, i.e. $GBW=35.05$MHz and $SR=32.97$V/$\mu$s. Additionally, two operation parameters are considered here, the supply voltage, which varies from 2.9V to 3.1V, and the temperature, which varies from 0$^\circ$C to 100$^\circ$C. The worst-case operation conditions for $SR$ and $GBW$ are listed in Tab. 5.5. Consequently, only 0% circuits can achieve the $q_1$ performance values in terms of both variation of statistical and operation parameters. Therefore, the SNR of 98.89dB, which is generated by design point $q_1$, is not an actual capability of the circuit after fabrication.

Table 5.5: Worst-case operation conditions for $GBW$ and $SR$

<table>
<thead>
<tr>
<th></th>
<th>Temperature</th>
<th>Supply voltage</th>
</tr>
</thead>
<tbody>
<tr>
<td>$GBW$</td>
<td>100$^\circ$C</td>
<td>2.9V</td>
</tr>
<tr>
<td>$SR$</td>
<td>0$^\circ$C</td>
<td>2.9V</td>
</tr>
</tbody>
</table>

To get the real maximum of SNR after fabrication, a yield-aware optimization is needed. According to varied target yields, different worst-case-aware Pareto-optimal fronts can be extracted. Fig. 5.19 shows a front $PF_3_{WC}$ with a target yield of 99.87% when the worst-case distance $\beta_w$ is set to 3, according to Tab. 3.1. The optimization of SNR with the 99.87% yield can be formulated as

$$\max_{GBW, SR} \text{SNR}(GBW, SR) \quad s.t. \quad c(GBW, SR) = PF_3_{WC}.$$  \quad (5.30)
The achieved maximal SNR is 89.4dB, the corresponding design point $q_2$ is listed Tab. 5.6 and denoted as diamond in Fig. 5.19. The maximal value of SNR is obviously degraded by the statistical and operation parameters. Finally, the robust design results that we are searching for are the device sizes which generate $q_2$, not $q_1$.

<table>
<thead>
<tr>
<th></th>
<th>SNR (dB)</th>
<th>Yield</th>
<th>GBW (MHz)</th>
<th>SR (V/µs)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Ideal</td>
<td>101.5</td>
<td>infinite</td>
<td>infinite</td>
<td></td>
</tr>
<tr>
<td>Nominal $q_1$</td>
<td>98.8</td>
<td>0%</td>
<td>35.05</td>
<td>32.97</td>
</tr>
<tr>
<td>Robust $q_2$</td>
<td>89.5</td>
<td>99.87%</td>
<td>28.32</td>
<td>26.3</td>
</tr>
</tbody>
</table>

5.5 Summary

This chapter describes the basic advantages by using ΣΔ modulators compared to Nyquist-frequency modulators. Noise averaging and noise shaping is simultaneously implemented in the ΣΔ modulators. The efficiency of the 2nd-order SC ΣΔ modulator is discussed compared to other modulator topologies.

Based on the linear $z$-domain model of the 2nd-order SC ΣΔ modulator topology, the maximal theory value of SNR can be evaluated. Due to the non-idealities of the analog building blocks, the actual maximum of SNR will be less than the theoretical value. Most non-idealities of the 2nd-order SC ΣΔ modulator are listed here and their effects on the SNR performance are presented individually. And the corresponding models in SIMULINK are built including these non-idealities.

Since the simulation of a SC ΣΔ modulator on circuit level is very time-consuming, the proposed hierarchical optimization methodology in Chapter 3 is applied on the modulator. The SNR performance is maximized based on the real capability of a fold-cascode OP AMP circuit. In the optimization process, a performance space exploration method is firstly applied to the OP AMP so that a nominal Pareto-optimal front is extracted, and then the worst-case analysis is conducted on each efficient point of the nominal Pareto-optimal front, in order to obtain a worst-case-aware Pareto-optimal front. Based on that, the practical maximum of SNR with a target yield has been computed. It gives designers a real insight into the capability of the circuit after fabrication. The presented method is not limited to the 2nd-order SC ΣΔ modulator but applicable to other modulators as well.
Chapter 6

Conclusion

The Semiconductor Industry Association (SIA) has announced that worldwide semiconductor sales reached totaling $267.5 Billion in 2007, a new industry record. Analog circuits are one of the fastest-growing segments of the market. In the next five years, nearly 70 percent of all ICs will have analog signal components, which are widely used in communication, computer, consumer, automotive and industrial applications. The gradually shrinking process technologies make analog circuits smaller, faster and more power efficient, but introduce more challenges and difficulties as well. The increasing time-to-market pressure drives the progress of design methodologies, CAD tools and design flows. In contrast to the significant increasing density and complexity of analog circuits, design methodologies on analog circuits have achieved a little progress over the past decades. Compared to the well developed and widely available digital CAD tools, few robust commercial CAD tools are available except for spice-like numerical simulators. Consequently, the design of analog and mixed-signal ICs remains still a long and error prone process, which tends to be a bottleneck point in design process of SoC systems. Therefore, an efficient design methodology for large-scale analog/mixed-signal circuit design will be valuable.

Chapter 1 presented the challenges of analog design. Compared to digital design, designers have to be much more involved in the design process of an analog design than that of a digital design, and consequently the design results are highly dependent on the knowledge and experiences of individual designers. The current analog design flow starts with the manual selection of circuit topology. Then designers assign the transistor sizes and the values for resistors and capacitors, and simulate the circuit. The tuning process on design parameters and the circuit verification have to been repeated until the desired circuit performances are achieved. Some commercial automatic sizing tools have been mentioned, which are capable of accelerating some analog designs. As analog circuits become larger, more complex and include digital parts gradually, two different optimization strategies for large-scale analog/mixed-signal circuits have been introduced: i.e. flat and hierarchical optimization methodology. Additionally, the methods for performance space exploration have been also summarized, which provide the useful information of the lower-level realization for the design on a higher-level in hierarchy.

Chapter 2 explained the two automatic processes for analog design, i.e. automatic sizing and performance space exploration. The fundamental concepts of analog design, e.g. design parameter, circuit performance/specification/yield, circuit simulation, have been introduced. Nom-
inal design aims to optimize the often conflicting circuit performances at the same time, while robust design aims to maximize the circuit yield. The automatic sizing process is a mapping from circuit specifications to design parameters, and is usually referred to as a nominal design process. The additional sizing rules for CMOS transistors intend to reduce the degree of freedom in analog design, so that the automatically sized results are guaranteed to stay in the technically meaningful regions. Performance space exploration is a mapping from the feasible space of the design parameters to the feasible space of the circuit performances. A deterministic and simulation-based PSE method, i.e. Normal-Boundary Intersection, has been introduced to generate a Pareto-optimal front (a part boundary of the entire feasible performance region), which represents the performance capabilities of the circuit.

Chapter 3 proposed a hierarchical optimization methodology for large-scale analog/mixed-signal circuits. The methodology is a "simulation-in-a-loop"-based hierarchical optimization methodology, which consists of four main steps. (1) Performance space exploration is applied to each building block on the circuit level individually. Their respective Pareto-optimal fronts can be obtained. (2) Efficient behavioral models are built in HDLs or SIMULINK. The models include not only the description of circuit functionalities, but the description of Pareto-optimal fronts as well. (3) Based on these behavioral models, an automatic sizing process is conducted on the system level. During the optimization process, the system-level parameters are restricted to these Pareto-optimal fronts. (4) The specifications for each building block are propagated from the optimized results on the system level. And then, the automatic sizing process on each building block is conducted individually and in parallel. The whole hierarchical optimization process can be characterized by a bottom-up extraction of circuit capability and a top-down hierarchical sizing process. Additionally, through worst-case analysis on the efficient points of the nominal Pareto-optimal front, a worst-case-aware Pareto-optimal front can be extracted. Based on that, the obtained optimization results represent the actual circuit capability with a target yield after fabrication.

In this thesis, the proposed hierarchical optimization methodology has been applied to two typical large-scale analog/mixed-signal circuits: a charge-pump phase-locked loop (CPPLL) and a switched-capacitor sigma-delta (SC ΣΔ) modulator. In Chapter 4, the fundamental of a CPPLL has been discussed, including PLL's building blocks, PLL's performances and PLL system's analysis methods. The complex trade-offs in PLLs show the difficulties and the challenges of the optimization task by manual design. The time-consuming simulation of PLL's performance is the main obstacle to the "simulation-in-a-loop"-based optimization method. To tackle the problem, efficient behavioral models in Verilog-A have been developed. Based on performance space exploration on the circuit level, a first-time-successful top-down sizing process without iteration has been realized. The obtained Pareto-optimal fronts of building blocks and system represent the capability of circuit and visualize the conflicting relationship among performances, which give designers a detailed insight into the circuit. In Chapter 5, a second-order SC ΔΣ modulator has been taken as example circuit because of its efficiency. Based on the linear z-domain model, the theoretic maximum SNR performance can be evaluated. However due to the non-idealities of the analog building blocks, the actual SNR performance cannot achieve the theoretical value. By applying performance space exploration on building blocks, the SNR performance has been maximized while considering the capabilities of OP AMP. Moreover, worst-case analysis has been applied to the nominal efficient points in order to extract a worst-case-aware Pareto-optimal front. Based on it, the actual maximum of the SNR with a target
yield has been extracted. The final optimized result show the eventual capability of the circuit after fabrication. The “simulation-in-a-loop”-based optimization method has been realized based on the efficient behavioral modeling in SIMULINK.

With the proposed hierarchical approach, the complete CPPLL and the SC ΣΔ modulator have been sized algorithmically despite their design complexity. Moreover, the whole design process can be accomplished in a reasonable time cost, which is obviously shorter than the design period by using the traditional analog design method. The presented approach is not limited to the two kinds of circuits but applicable to other large-scale analog/mixed-signal circuits as well.
Appendix A

Analog Sizing Rules

For analog circuits using CMOS technology, a lot of fundamental analog building blocks are identified based on [GZEA01]. There are five levels of hierarchy as depicted in Tab. A.1 [Ste05].

- At the lowest hierarchy level 0, the atomic building block is a single transistor. A transistor can act as a voltage-controlled current source (VCCS) in its saturation operation range or a voltage-controlled resistor in its linear operation range.

- At the hierarchy level 1, seven transistor pairs are defined to present basic analog functionalities. For example, a simple current mirror is used to copy a current from one path to another path. A level shift is used to shift a voltage to a higher or a lower voltage level.

- At the hierarchy level 2, four “pairs of transistor pairs” are defined to present more complex or more accurate analog function. These pairs are consisted of the structures from level 0 and level 1. For example, a 4-transistor can be modeled as a combination of a voltage reference and a current mirror load.

- At the hierarchy level 3, a cascode current mirror bank is modeled as a level shifter bank and a current mirror bank.

- At the hierarchy level 4, a differential stage is modeled as a combination of a differential pair and a generic current source. The current source can be implemented as a simple current mirror or a cascode mirror structure.

As can be seen, the block at a certain hierarchy level is composed of blocks at lower levels. Moreover, the structure library is not complete and a variety of other building blocks can be added.

A set of sizing rules will be given for each building block. Theses sizing constraints guarantee the dedicated analog function and strengthen its robustness, e.g. reduce mismatch effect or channel length modulation. These constraints refer to not only transistor geometry parameters (width, length and area) and electrical transistor quantities (e.g. transistor drain/source voltages) as well. Since each block on level \( i \) consists of the building blocks at the lower levels in hierarchy, therefore the sizing rules for each identified block at level \( i \) include all sizing rules

* In this table, fundamental analog building blocks are presented schematically through NMOS transistors and analogously for PMOS counterparts.
### Table A.1: Library of analog basic building blocks [Ste05]

<table>
<thead>
<tr>
<th>Function</th>
<th>Schematic (NMOS)</th>
<th>Hierarchy Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>Voltage-controlled resistor</td>
<td>(res)</td>
<td>0</td>
</tr>
<tr>
<td>Voltage-controlled current source</td>
<td>(cs)</td>
<td></td>
</tr>
<tr>
<td>Simple current mirror</td>
<td>(cm)</td>
<td>1</td>
</tr>
<tr>
<td>Level shifter</td>
<td>(ls)</td>
<td></td>
</tr>
<tr>
<td>Voltage reference 1</td>
<td>(vr1)</td>
<td></td>
</tr>
<tr>
<td>Current mirror load</td>
<td>(cml)</td>
<td></td>
</tr>
<tr>
<td>Differential pair</td>
<td>(dp)</td>
<td></td>
</tr>
<tr>
<td>Voltage reference 2</td>
<td>(vr2)</td>
<td></td>
</tr>
<tr>
<td>Flip-flop</td>
<td>(ff)</td>
<td></td>
</tr>
<tr>
<td>Level shifter bank</td>
<td>(LSB)</td>
<td>2</td>
</tr>
<tr>
<td>Current mirror bank</td>
<td>(CMB)</td>
<td></td>
</tr>
<tr>
<td>Cascode current mirror</td>
<td>(CCM)</td>
<td></td>
</tr>
<tr>
<td>4-Transistor current mirror</td>
<td>(4TCM)</td>
<td></td>
</tr>
<tr>
<td>Cascode current mirror bank</td>
<td>(CCMB)</td>
<td>3</td>
</tr>
<tr>
<td>Differential stage</td>
<td>(DST)</td>
<td>4</td>
</tr>
</tbody>
</table>

(CM ∈ { cm, CCM, 4TCM, CCMB })
of these fundamental blocks and the additional sizing rules for the block itself. Tab. A.2 [mun] lists the detailed sizing rules for the recognized basic NMOS structures: simple current mirror, differential pair and level shifter. The safety margin $m$ is set according to the CMOS realization technology.

### Table A.2: Sizing rules for NMOS basic structures [mun]

<table>
<thead>
<tr>
<th>Structure</th>
<th>Constraint</th>
<th>Safety margin</th>
<th>Reason</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>NMOS Current Mirror</strong></td>
<td>$V_{GS} - V_{th} \geq m$</td>
<td>$m = 100mV$</td>
<td>inversion</td>
</tr>
<tr>
<td></td>
<td>$V_{DS} - (V_{GS} - V_{th}) \geq m$</td>
<td>$m = 100mV$</td>
<td>saturation</td>
</tr>
<tr>
<td></td>
<td>$L \cdot W \geq m$</td>
<td>$m = 1\mu m^2$</td>
<td>limit $V_{th}$ mismatch</td>
</tr>
<tr>
<td></td>
<td>$L \geq m$</td>
<td>$m = 0.5\mu m$</td>
<td>limit relative variance of the transconductance factor</td>
</tr>
<tr>
<td></td>
<td>$W \geq m$</td>
<td>$m = 0.5\mu m$</td>
<td>reduce the influence of the channel length modulation factor on the current transmission coefficient</td>
</tr>
<tr>
<td></td>
<td>$-m \leq V_{DS1} - V_{DS2} \leq m$</td>
<td>$m = 200mV$</td>
<td>equal length</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>limit systematic mismatches</td>
</tr>
<tr>
<td><strong>NMOS Differential Pair</strong></td>
<td>$V_{GS} - V_{th} \geq m$</td>
<td>$m = 10mV$</td>
<td>inversion</td>
</tr>
<tr>
<td></td>
<td>$V_{DS} - (V_{GS} - V_{th}) \geq m$</td>
<td>$m = 10mV$</td>
<td>saturation</td>
</tr>
<tr>
<td></td>
<td>$L \cdot W \geq m$</td>
<td>$m = 1\mu m^2$</td>
<td>limit $V_{th}$ mismatch</td>
</tr>
<tr>
<td></td>
<td>$L \geq m$</td>
<td>$m = 0.5\mu m$</td>
<td>limit relative variance of the transconductance factor</td>
</tr>
<tr>
<td></td>
<td>$W \geq m$</td>
<td>$m = 0.5\mu m$</td>
<td>reduce the influence of transconductance mismatch on the input offset</td>
</tr>
<tr>
<td></td>
<td>$-m \leq V_{DS1} - V_{DS2} \leq m$</td>
<td>$m = 200mV$</td>
<td>reduce the influence of the channel length modulation factor on the current transmission coefficient</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>equal length</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>avoid transconductance mismatch and input offset voltage mismatch</td>
</tr>
<tr>
<td><strong>NMOS Level Shifter</strong></td>
<td>$V_{GS} - V_{th} \geq m$</td>
<td>$m = 10mV$</td>
<td>inversion</td>
</tr>
<tr>
<td></td>
<td>$V_{DS} - (V_{GS} - V_{th}) \geq m$</td>
<td>$m = 10mV$</td>
<td>saturation</td>
</tr>
<tr>
<td></td>
<td>$L \cdot W \geq m$</td>
<td>$m = 1\mu m^2$</td>
<td>limit $V_{th}$ mismatch</td>
</tr>
<tr>
<td></td>
<td>$L \geq m$</td>
<td>$m = 0.5\mu m$</td>
<td>limit relative variance of the transconductance factor</td>
</tr>
<tr>
<td></td>
<td>$W \geq m$</td>
<td>$m = 0.5\mu m$</td>
<td>avoid a difference between the effective voltage $V_{GS}$</td>
</tr>
<tr>
<td></td>
<td>$-m \leq i_{DS2}/i_{DS1} - W_2/W_1 \leq m$</td>
<td>$m = 0.2$</td>
<td>equal length</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>avoid transconductance mismatch</td>
</tr>
</tbody>
</table>
A  Analog Sizing Rules
Appendix B

Modeling $\Sigma\Delta$ Modulator Non-Idealities in SIMULINK

In this chapter, the behavioral model of the second-order $\Sigma\Delta$ modulators is realized in Simulink module, which are based on the modules in [BFM+99, MBF+03].

**Sampling Jitter** According to Equation 5.11, a random sampling jitter model can be built as shown in Fig. B.1. The input signal $x(t)$ and its derivative $du/dt$ are continuous-time signals. They are sampled with a sampling period $T_s$ by a zero-order hold. The sampling uncertainty, i.e. clock jitter $\delta$, is implemented by a Gaussian random process $n(t)$ with a standard deviation $\Delta\tau$.

![Figure B.1: Modeling a random sampling jitter [MBF+03]](image)

**Switches Thermal Noise** The switch thermal noise voltage $e_T$ can be evaluated by Equation 5.12. And the thermal noise is superimposed to the input voltage $x(t)$, expressed as

$$y(t) = [x(t) + e_T(t)]b = [x(t) + \sqrt{\frac{kT}{C_s}}n(t)]b,$$  \hspace{1cm} (B.1)

where $n(t)$ denote a Gaussian random process with unity standard deviation, and $b = C_s/C_t$ is the coefficient of the integrator. Equation B.1 is implemented by the model shown in Fig. B.2.
**Operation Amplifier Noise**  The model of OP AMP noise is shown in Fig. B.3. The intrinsic noise of operational amplifier includes thermal noise, flicker (1/f) noise, shot noise etc. These various noises contribute together to the total OP AMP noise power $V_n^2$, whose value can be evaluated through simulation on the circuit in Fig. 5.3 during phase $\phi_2$. "The resulting output referred noise PSD has to be integrated over the whole frequency spectrum, eventually taking into account the degradation of the thermal noise PSD introduced by the auto-zero or correlated double sampling techniques" [ET96], and then divided by $b^2$ in order to calculate the corresponding input-referred rms noise voltage $V_n$.

**Real Integrator**  Fig. B.4 shows the model of a real integrator including all the non-idealities listed in Sec. 5.3.1.3. Only a fraction $(1 - \alpha)$ of the previous output of the integrator is added (fed back) to each new input sample, which is modeled the parameter 'alpha' in Fig. B.4, where

$$\alpha = 1 - \alpha = \frac{A_0 - 1}{A_0}. \quad (B.2)$$

The finite bandwidth and slew rate of the OP AMP are implemented by a MATLAB function block which is placed in front of the integrator. According to Equation 5.23, the corresponding description in C-code is shown as follows. The limited output range of the OP AMP can be simply realized by using a saturation block to limit the final output range.

```matlab
function out = slew(in, alpha, sr, GBW, Ts)
% Models the operational amplifier finite bandwidth and slew rate
% for a discrete time integrator
```
% out = slew(in, alpha, sr, GBW, Ts)
% in: Input signal amplitude
% alpha: Effect of finite gain (ideal amplifier alpha=1)
% sr: Slew rate in V/s
% GBW: Gain-bandwidth product of the integrator loop gain in Hz
% Ts: Sample time in s
% out: Output signal amplitude

tau = 1/(2*pi*GBW); % Time constant of the integrator
Tmax = Ts/2;
slope = alpha*abs(in)/tau;

if slope > sr % Op-amp in slewing
    tsl = abs(in)*alpha/sr - tau; % Slewing time
    if tsl >= Tmax
        error = abs(in) - sr*Tmax;
    else
        texp = Tmax - tsl;
        error = abs(in)*(1-alpha) + (alpha*abs(in) - sr*tsl) * exp(-texp/tau);
    end
else % Op-amp in linear region
    texp = Tmax;
    error = abs(in)*(1-alpha) + alpha*abs(in) * exp(-texp/tau);
end

out = in - sign(in)*error;

Figure B.4: Modeling real integrator [MBF+03]
Appendix C

Phase Noise & Jitter

C.1 Relationship between Phase Noise and Jitter

Analog engineers prefer using phase noise, while digital designers prefer using jitter. The relationship between the two parameters is briefly described here. More details can be found in [Raz96, DMR00, HL98, HLL99, LH00, Meh02].

Timing jitter is defined as the standard deviation of the time uncertainty [HLL99]:

\[ \sigma_{\Delta T}^2 = \frac{1}{\omega_0^2} \cdot E\{[\phi(t + \Delta T) - \phi(t)]^2\} = \frac{1}{\omega_0^2} \cdot \{E[\phi^2(t + \Delta T) + \phi^2(t) - 2\phi(t + \Delta T) \cdot \phi(t)]\}, \quad (C.1) \]

where \( E[\cdot] \) represents the expected value, the time uncertainty is \( \phi(t + \Delta T) - \phi(t) \). Since \( E[\phi(t) \cdot \phi(t + \Delta T)] \) is equal to the autocorrelation of \( \phi(t) \), i.e. \( R_{\phi}(\Delta T) \). Therefore, the jitter in Equation C.1 can be written as

\[ \sigma_{\Delta T}^2 = \frac{2}{\omega_0^2} [R_{\phi}(0) - R_{\phi}(\Delta T)]. \quad (C.2) \]

The relationship between the autocorrelation and the power spectrum is given by the Khinchin theorem [Gar90], i.e.

\[ R_{\phi}(\tau) = \int_0^\infty S_{\phi}(f) e^{j2\pi f \tau} df, \quad (C.3) \]

where \( S_{\phi}(f) \) represents the power spectrum of \( \phi(t) \). Replacing the autocorrelation of Equation C.3 into Equation C.2 result in

\[ \sigma_{\Delta T}^2 = \frac{8}{\omega_0^2} \int_0^\infty S_{\phi}(f) \sin^2(\pi f \Delta T) df. \quad (C.4) \]

Equation C.4 describes the relationship between timing jitter and noise power spectral density (psd). Therefore, the timing jitter can be calculated from the phase noise by using Equation C.4. As \( \Delta T \) goes to infinity, timing jitter is calculated from Equation C.2:

\[ \sigma_{\Delta T \rightarrow \infty}^2 = \frac{2}{\omega_0^2}[R_{\phi}(0)] \quad (C.5) \]

\[ \sigma_{\Delta T \rightarrow \infty}^2 = \frac{2}{\omega_0^2} \int_0^\infty S_{\phi} df. \quad (C.6) \]
C Phase Noise & Jitter

As can be seen from the above analysis, since time jitter has less information than phase-noise spectrum, the inverse process (from jitter to phase noise) is normally not feasible, unless the extra information on the shape of phase-noise spectrum is available.

C.2 Extracting Jitter from Phase Noise Analysis on PFD/CP and VCO Blocks

Simulator SpectreRF [Cadb] is used in this thesis to compute the phase noise of the CPPLL. The proper frequency range in periodic-steady-state (PSS) analysis and PNoise analysis is chosen so that the noise out of the frequency range is neglected*. In PSS and PNoise analysis, it linearizes the circuit at each time step in a given period and accumulates the contributions from every noise sources and over each time point to compute the total phase noise.

The jitter of PFD/CP can be derived from the following steps. First of all, according to the edge-to-edge jitter definition, i.e.

\[ \var(\delta t_n) = E[(\delta t_n - \bar{\delta t}_n)^2], \]  

where \( \bar{\delta t}_n \) is the mean value of \( \delta t_n \). The value is equal to 0 for a white noise. Therefore, the \( \var(\delta t_n) \) can be reformulated as

\[ \var(\delta t_n) = E[(\delta t_n)^2] = R_{\delta t_n}(0). \]  

Applying the Wiener-Khinchin Theorem in Equation C.3 to determine

\[ \var(\delta t_n) = \int_0^\infty S_n(f) df, \]  

where \( S_n(f) \) is the power spectral density of the \( \delta t_n \) sequence.

To reduce the simulation time on jitter extraction, the noise source of PFD/CP is moved forward at its input, which is called input-referred jitter of PFD/CP. The corresponding input-referred jitter can be calculated by dividing the effective gain of the PFD/CP:

\[ J_{ee \text{PFD/CP}} = \frac{T K_{\text{det}}}{2} \sqrt{\frac{\var(\delta n)}{2}}, \]  

where \( K_{\text{det}} \) is the gain of the PFD/CP, in units of amperes/cycle, \( T \) is in units of seconds/cycle. The ratio 2 comes from the modeling on two transition edges in a cycle. †

The jitter in VCO, is almost completely due to oscillator phase noise. According to the Equation 76 in [Kun05], i.e.

\[ c = L(\Delta f) \frac{\Delta f^2}{\int_0^2}, \]  

---

* The noise should be at least -40dB down and dropping at the highest frequency simulated.
† In a cycle with two transition, the sum jitter is calculated by \( J_{\text{sum}} = \sqrt{J_1^2 + J_2^2} = \sqrt{2} J \), where \( J_1 = J_2 = J \)
and Equation 73 in [Kun05], i.e.
\[ J = \sqrt{cT} = \sqrt{\frac{c}{f_0}}, \]  
\hspace{1cm} (C.12)

the jitter of VCO can be extracted by
\[ J_{\text{VCO}} = \frac{\Delta f}{f^{1.5}} 10^{\frac{L(\Delta f)}{20}}. \]  
\hspace{1cm} (C.13)

\( L(\Delta f) \) means the phase noise on the offset frequency \( \Delta f \) in unit of dBC/Hz. Note the Equation C.13 is only valid when the phase noise at \( \Delta f \), which is in the \( 1/f^2 \) region.
In this chapter, the Verilog-A models for building blocks of the CPPLL are listed in the following, which are based on the modules in [Kun05].

Lists D.1- D.4 are for the extraction of the locking time performance.

**Listing D.1:** Behavioral model of PFD in Verilog-A

```verilog
// PFD

// include "constants.h"
// include "discipline.h"

module PFD (ref, feedback, u, ub, db, d);
    input ref, feedback; // input and feedback are the reference and feedback clock
    output u, ub, db, d;
    electrical ref, feedback, u, ub, db, d;

    parameter real v_high=3;
    parameter real v_low=0;
    parameter real t_tol=10p;
    parameter real t_t=120p from (0:1000000); // rise time and fall time
    // for "u" and "ub" signal: delay time extracted from circuit level simulation
    // --- rise delay --- fall delay
    parameter real t_d_u1=37p; parameter real t_d_u2=190p;
    // for "d" and "db" signal:
    parameter real t_d_d1=45p; parameter real t_d_d2=136p;

    integer state; // state=-1 for down, state=1 for up.
    real td_u, td_ub, td_d, td_db; // delay time

    analog begin
        @(cross(V(ref)--v_high/2.0, 1, t_tol)) begin
            if (state < 1)
                state = state + 1;
                // $strobe("current time %g ns and state=", Srealtime*10e8, state);
                if (V(u) > v_high/2) td_u = t_d_u1; else td_u = t_d_u2;
                if (V(ub) > v_high/2) td_ub = t_d_u1; else td_ub = t_d_u2;
                if (V(d) > v_high/2) td_d = t_d_d1; else td_d = t_d_d2;
                if (V(db) > v_high/2) td_db = t_d_d1; else td_db = t_d_d2;

        end

        @(cross(V(feedback)--v_high/2.0, 1, t_tol)) begin
            if (state > 1)
                state = state - 1;
                if (V(u) > v_high/2) td_u = t_d_u1; else td_u = t_d_u2;
                if (V(ub) > v_high/2) td_ub = t_d_u1; else td_ub = t_d_u2;
                if (V(d) > v_high/2) td_d = t_d_d1; else td_d = t_d_d2;
                if (V(db) > v_high/2) td_db = t_d_d1; else td_db = t_d_d2;
    end
```

---

Appendix D

CPPLL's Verilog-A Models

In this chapter, the Verilog-A models for building blocks of the CPPLL are listed in the following, which are based on the modules in [Kun05].

Lists D.1- D.4 are for the extraction of the locking time performance.

**Listing D.1:** Behavioral model of PFD in Verilog-A
D CPPLL’s Verilog-A Models

Listing D.2: Behavioral model of CP in Verilog A

// CP
#include "constants.vams"
#include "disciplines.vams"
module CP(Iout, Down, N_Down, N_Up, Up, Ibias);
  input Up, N_Up, Down, N_Down;
  output Iout;
  electrical Up, N_Up, Down, N_Down, Iout;
  parameter real v_high = 3;
  parameter real v_low = 0;
  parameter real v_th = (v_high - v_low)/2; // threshold voltage
  parameter real TransTime=10p from (0:1000000);
  parameter real Delay=Ip from (0:1000000);
  parameter real Ip=25.0e-6; // charge pump's output current
  parameter real v_max=2.85; // maximum voltage at output node so that
                             // Pmos current source in saturation
  parameter real v_min=0.35; // minimum voltage at output node so that
                             // Nmos current source in saturation
  parameter real Mis=0.00; // Mismatch of up and down current
  integer state; // CP state: "−1>charge","1>discharge","0>no output current"
analog begin
  @(initial_step) begin
    state = 0;
  end
  @(cross(V(Up)−v_th, 1)) begin // current charge
    state = −1;
  end
  @(cross(V(Down)−v_th, 1)) begin // current discharge
    state = 1;
  end
  @(cross(V(Up)−v_th, −1)) begin // no output current
    state =0;
  end
  @(cross(V(Down)−v_th, −1)) begin
    state = 0;
  end
  // restrict the output voltage range from v_min to v_max
  @(cross(V(Iout)−v_max, 1)) begin
    state =0
  end
  @(cross(V(Iout)−v_min, −1)) begin
    state =0;
  end
  I(Iout)<+transition(Ip*state *(1+state*Mis), Delay, TransTime);
end endmodule

Listing D.3: Behavioral model of VCO in Verilog A
// VCO

module vco ( V_tune , VCO_out );
  input V_tune ;
  output VCO_out ;
electrical V_tune , VCO_out ;
  parameter real VSS = 0 , VDD = 3 ;
  parameter real Vmin = 0.6 ;
  parameter real Vmax = 2.6 from ( Vmin : 10e5 );
  parameter real Fmin = 50e6 from ( 0 : 10e9 );
  parameter real Kvco = 600e6 ;
  parameter real tt = 0.0001/Fmin from ( 0 : 10e3 );
  parameter real ttol = 1e-8/Fmin from ( 0 : 1/Fmin );

  real freq , phase , Vout ;

  analog begin
    @(initial_step) begin
      Vout = VSS ;
    end
    // compute the freq from the input voltage
    if ( V(V_tune ) <= Vmin )
      freq = Fmin ;
    else if ( V(V_tune ) <= Vmax )
      freq = ( V(V_tune ) − Vmin ) * Kvco + Fmin ;
    else
      freq = ( V(V_tune ) − Vmax ) * ( 0.25 * Kvco ) + ( Vmax − Vmin ) * Kvco + Fmin ;
    // ideal function
    // freq = ( V(V_tune ) − Vmin ) * Kvco + Fmin ;
    // phase is the integral of the frequency modulo 1
    phase = idtmod ( freq , 0.0 , 1.0 , −0.5 ) ;
    // update jitter twice per period
    @(cross ( phase − 0.25 , 1 , ttol )) begin
      Vout = VDD ;
    end
    @(cross ( phase + 0.25 , 1 , ttol )) begin
      Vout = VSS ;
    end
    V(VCO_out) <+ transition ( Vout , 0 , tt ) ;
  end
endmodule

Listing D.4: Behavioral model of divider in Verilog A

// divider

module divider ( clock_out , clock_in );
  input clock_in ;
  output clock_out ;
electrical clock_in , clock_out ;
  parameter real Nmin = 6 ; // minimum divider value
  parameter real Nmax = 20 ; // maximum divider value
  parameter real v_high = 3 ;
  parameter real v_low = 0 ;
  parameter real v_th = 1.5 ;
  parameter real tt = 10p ; // time of rise and fall time
parameter real \( td = 0 \);
parameter real \( \text{JumpTime}=8u \); // output frequency jump from minimum maximum

integer count, n, M;

analog begin
@\(\text{cross}((\text{V(clock\_in})-\text{v\_th}),1))\) begin
if \((\text{\$abstime}>=\text{\text{JumpTime}})\) \(M=\text{Nmax}\);
else \(M=\text{Nmin}\);

count=count+1;
if \((\text{count}>=M)\) count=0;
\(n=(2*\text{count} >= M)\);
end

\(\text{V(clock\_out)}++\text{transition}(n?\text{v\_high}:\text{v\_low}, td, tt)\);
end
endmodule

Lists D.5- D.8 are for the extraction of the jitter performance.

**Listing D.5:** Behavioral model of oscillator in Verilog-A

```verbatim
// Fixed-frequency oscillator (OSC) with accumulating and synchronous jitter

#include "disciplines.h"

module OSC(out);
  output out;
electrical out;

parameter real \( \text{freq}=25e6 \) from \(0:10e9\);
parameter real \( \text{ratio}=1 \) from \(0:10e9\);
parameter real \( \text{VSS}=0 \), \( \text{VDD}=3 \);
parameter real \( \text{tt}=0.01*\text{ratio}/\text{freq} \) from \(0:10e9\);
parameter real \( \text{acc\_Jitter}=0 \) from \([0:0.1/\text{freq}]\);
// period jitter for reference osc and divider
parameter real \( \text{sync\_Jitter}=0 \) from \([0:0.1*\text{ratio}/\text{freq}]\);
// edge-to-edge jitter, for divider and PFD/CP

integer n, acc\_Seed, sync\_Seed;
real next, dt, dt, accSD, syncSD, Jcp;

analog begin
@\(\text{initial\_step}\) begin
acc\_Seed=286;
sync\_Seed=-459;
acc\_SD=acc\_Jitter*sqrt(\text{ratio}/2);
sync\_SD=sync\_Jitter;
next=0.5/\text{freq}+\$abstime;
end

// calculation jitter value from fitting function
\(\text{Jcp}=\text{fitting\_function\_1}(\text{Jcp})\);
sync\_SD=Jcp;

@\(\text{timer}(\text{next}+\text{dt})\) begin
\(n!=n\);
\(\text{dT}=\text{acc\_SD}+\$\text{dist\_normal}(\text{acc\_Seed},0.1)\);
\(\text{dt}=\text{sync\_SD}+\$\text{dist\_normal}(\text{sync\_Seed},0.1)\);
\(\text{next}=\text{next}+0.5/\text{freq}+\\text{dT}\);
end

\(\text{V(out)}++\text{transition}(n?\text{VDD}:\text{VSS},0,\text{tt})\);
end
endmodule
```
Listing D.6: Behavioral model of PFD/CP in Verilog A

```
// Phase−Frequency Detector with Charge Pump
// pfd_cpl: a simple three state phase−frequency detector
// Version 1a, 12 July 03
// Ken Kundert
// Downloaded from The Designer’s Guide (www.designers-guide.org).
// Post any questions to www.designers-guide.org/Forum

#include "disciplines.vams"
#include "constants.vams"

// This model exhibits no jitter
// The jitter of PFD/CP is integrated into OSC block

module pfd_cpl (Iout, ref, vco);
output Iout; electrical Iout;
input ref; voltage ref;
input vco; voltage vco;
parameter real iout=20u; // maximum output current
parameter real VDD=1.8; // output voltage in high state
parameter real VSS=0; // output voltage in low state
parameter real Vth=(VDD+VSS)/2; // threshold voltage at input
parameter integer dir=1 from [−1:1] exclude 0; // dir=1 for positive edge trigger
// dir=−1 for negative edge trigger
parameter real tt=1n from (0:inf); // transition time of output signal
parameter real ttol=1p from (0:inf);
integer state;

analog begin
  @(cross(V(ref)−Vth, dir, ttol)) begin
    if (state > −1) state = state − 1;
  end
  @(cross(V(vco)−Vth, dir, ttol)) begin
    if (state < 1) state = state + 1;
  end
  I(Iout) <+ transition(iout*state, 0, tt);
end
endmodule
```

Listing D.7: Behavioral model of VCO with jitter in Verilog A

```
//VCO&D together exhibits Jitter

module VCO_Div(V_tune, VCO_out);
input V_tune;
output VCO_out;
electrical V_tune, VCO_out;
parameter real VSS=0, VDD=3;
parameter real Vmin=0.6;
parameter real Vmax=2.6 from (Vmin:10e5);
parameter real Kvco=600e6;
parameter real Fmin=50e6 from (0:10e9);
parameter real tt=0.0001/Fmin from (0:10e3);
parameter real ttol=1e−8/Fmin from (0:1/Fmin);
parameter real N=20; // if N'=1, divider moved into the VCO block
parameter real Jvco=400e−6; // current consumption
parameter real Jvco=10e−12; // Jitter of VCO
real freq, phase, dT, delta, Vout;
integer seed, fvco;
```

125
Listing D.8: Behavioral model of PLL outputs’ period measurement in Verilog A

```
// VerilogA for PLL_JitterMod, PLLoutput, veriloga
#include "constants.vams"
#include "disciplines.vams"

module PLLoutput (VCO_out);
  input VCO_out;
  electrical VCO_out;
  parameter real tstart = 20e-6; // start to write an output file
  parameter real VDD = 3;
  parameter real VTH = VDD/2;
  integer fp;
  real prev;

  analog begin
    @(initial_step) begin
      fp=$fopen("PLLPeriods.tcl", "w");
    end
    @(cross ((V(VCO_out)-VTH),1)) begin
      if ($abstime>=tstart) $fstrobe(fp,"%3.10e",$abstime-prev);
      prev=$abstime;
    end
    @(final_step) begin
      $fclose(fp);
    end
  end
endmodule
```
Bibliography


Bibliography


[Mat] *MATLAB®, The Mathworks (www.mathworks.com).*


Bibliography


<table>
<thead>
<tr>
<th>Reference</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Wic</td>
<td>MunEDA (<a href="http://www.muneda.com">www.muneda.com</a>).</td>
</tr>
</tbody>
</table>


# Nomenclature

## General Conventions

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\beta_w$</td>
<td>Worst-case distance</td>
<td>37</td>
</tr>
<tr>
<td>$c(\cdot)$</td>
<td>Sizing constraints</td>
<td>19</td>
</tr>
<tr>
<td>$\mathcal{D}$</td>
<td>Feasible parameter space</td>
<td>23</td>
</tr>
<tr>
<td>$\mathcal{F}$</td>
<td>Feasible performance space</td>
<td>23</td>
</tr>
<tr>
<td>$f$</td>
<td>Circuit performances</td>
<td>16</td>
</tr>
<tr>
<td>$f_l$</td>
<td>Low specifications</td>
<td>17</td>
</tr>
<tr>
<td>$f_u$</td>
<td>Upper specifications</td>
<td>17</td>
</tr>
<tr>
<td>$L$</td>
<td>Transistor Length</td>
<td>19</td>
</tr>
<tr>
<td>$m(\cdot)$</td>
<td>Mapping process from circuit parameters to circuit performances</td>
<td>17</td>
</tr>
<tr>
<td>$p$</td>
<td>System-level parameters</td>
<td>27</td>
</tr>
<tr>
<td>$s$</td>
<td>Statistical parameters</td>
<td>16</td>
</tr>
<tr>
<td>$\theta$</td>
<td>Operational parameters</td>
<td>16</td>
</tr>
<tr>
<td>$w$</td>
<td>Worst-case parameter</td>
<td>37</td>
</tr>
<tr>
<td>$W$</td>
<td>Transistor width</td>
<td>19</td>
</tr>
<tr>
<td>$w_i$</td>
<td>Weight factor</td>
<td>26</td>
</tr>
<tr>
<td>$x$</td>
<td>Circuit-level parameters</td>
<td>29</td>
</tr>
<tr>
<td>$Y$</td>
<td>Yield</td>
<td>17</td>
</tr>
</tbody>
</table>

## Switched-Capacitor Sigma-Delta Modulator

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>$A$</td>
<td>DC Gain of Op Amp</td>
<td>89</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>Actual fraction of the previous output of the integrator transferred to next stage</td>
<td>89</td>
</tr>
<tr>
<td>$b$</td>
<td>Integrator gain</td>
<td>88</td>
</tr>
<tr>
<td>$C_f$</td>
<td>Integrating capacitor</td>
<td>82</td>
</tr>
<tr>
<td>$C_s$</td>
<td>Sampling capacitor</td>
<td>82</td>
</tr>
<tr>
<td>$e_T$</td>
<td>$kT/C$ noise</td>
<td>88</td>
</tr>
<tr>
<td>$f_s$</td>
<td>Sampling frequency</td>
<td>79</td>
</tr>
</tbody>
</table>
**Nomenclature**

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>GBW</td>
<td>Unity-gain bandwidth of Op Amp</td>
<td>90</td>
</tr>
<tr>
<td>K</td>
<td>Boltzmann’s constant</td>
<td>88</td>
</tr>
<tr>
<td>N</td>
<td>Conversion resolution</td>
<td>79</td>
</tr>
<tr>
<td>R</td>
<td>Oversampling ratio</td>
<td>79</td>
</tr>
<tr>
<td>RN(t)</td>
<td>Gaussian random number with zero mean and unity standard deviation</td>
<td>88</td>
</tr>
<tr>
<td>onT</td>
<td>Finite resistance of switch</td>
<td>88</td>
</tr>
<tr>
<td>SR</td>
<td>Slew rate of Op Amp</td>
<td>90</td>
</tr>
<tr>
<td>T</td>
<td>Absolute temperature</td>
<td>88</td>
</tr>
<tr>
<td>Ts</td>
<td>Sampling period</td>
<td>109</td>
</tr>
<tr>
<td>V_n</td>
<td>Total rms noise voltage of the operation amplifier</td>
<td>88</td>
</tr>
<tr>
<td>E(z)</td>
<td>Z-transform of the quantization noise</td>
<td>83</td>
</tr>
<tr>
<td>NTF(z)</td>
<td>Z-transfer function of the quantization noise</td>
<td>83</td>
</tr>
<tr>
<td>STF(z)</td>
<td>Z-transfer function of the input signal</td>
<td>83</td>
</tr>
<tr>
<td>X(z)</td>
<td>Z-transform of the input signal</td>
<td>83</td>
</tr>
<tr>
<td>Y(z)</td>
<td>Z-transform of the output signal</td>
<td>83</td>
</tr>
</tbody>
</table>

**Charge-Pump Phase-Locked Loop**

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>C_1</td>
<td>Capacitor 1 in Loop filter</td>
<td>42</td>
</tr>
<tr>
<td>C_2</td>
<td>Capacitor 2 in Loop filter</td>
<td>42</td>
</tr>
<tr>
<td>f_0</td>
<td>Carrier frequency in spectrum</td>
<td>50</td>
</tr>
<tr>
<td>Δf</td>
<td>Offset frequency from carrier frequency</td>
<td>50</td>
</tr>
<tr>
<td>f_max</td>
<td>Maximal output frequency</td>
<td>43</td>
</tr>
<tr>
<td>f_min</td>
<td>Minimal output frequency</td>
<td>43</td>
</tr>
<tr>
<td>f_out</td>
<td>Output frequency</td>
<td>40</td>
</tr>
<tr>
<td>f_ref</td>
<td>Reference input frequency</td>
<td>40</td>
</tr>
<tr>
<td>F_{LF}(s)</td>
<td>Transfer function of loop filter in s-domain</td>
<td>45</td>
</tr>
<tr>
<td>H(s)</td>
<td>Closed loop transfer function in s-domain</td>
<td>45</td>
</tr>
<tr>
<td>H(z)</td>
<td>Closed loop transfer function in z-domain</td>
<td>46</td>
</tr>
<tr>
<td>I_{CP}</td>
<td>Out-biased current of charge pump</td>
<td>42</td>
</tr>
<tr>
<td>J_{cc}</td>
<td>Cycle-to-cycle jitter</td>
<td>52</td>
</tr>
<tr>
<td>J_{ee}</td>
<td>Edge-to-edge jitter</td>
<td>52</td>
</tr>
<tr>
<td>J_k</td>
<td>Long-term jitter</td>
<td>52</td>
</tr>
<tr>
<td>K_{VCO}</td>
<td>Gain of VCO in unit of Hz/V</td>
<td>43</td>
</tr>
<tr>
<td>K_c</td>
<td>Ratio of C_1 to C_2</td>
<td>46</td>
</tr>
<tr>
<td>Symbol</td>
<td>Description</td>
<td></td>
</tr>
<tr>
<td>--------</td>
<td>-------------</td>
<td></td>
</tr>
<tr>
<td>$L(\Delta f)$</td>
<td>Phase noise on the offset frequency $\Delta f$ in unit of dBC/Hz</td>
<td></td>
</tr>
<tr>
<td>$LG(s)$</td>
<td>Open loop transfer function in $s$-domain</td>
<td></td>
</tr>
<tr>
<td>$LG(z)$</td>
<td>Open loop transfer function in $z$-domain</td>
<td></td>
</tr>
<tr>
<td>$N$</td>
<td>Divider value</td>
<td></td>
</tr>
<tr>
<td>$\phi_{PM}$</td>
<td>Phase margin</td>
<td></td>
</tr>
<tr>
<td>$\phi_{fb}$</td>
<td>Phase of feedback signal</td>
<td></td>
</tr>
<tr>
<td>$\phi_{ref}$</td>
<td>Phase of input reference signal</td>
<td></td>
</tr>
<tr>
<td>$R$</td>
<td>Resistor in Loop filter</td>
<td></td>
</tr>
<tr>
<td>$S_{\phi}(f)$</td>
<td>Noise power spectral density</td>
<td></td>
</tr>
<tr>
<td>$\sigma_{\Delta \phi}$</td>
<td>Phase jitter</td>
<td></td>
</tr>
<tr>
<td>$\sigma_{\Delta T}$</td>
<td>Timing jitter</td>
<td></td>
</tr>
<tr>
<td>$T_s$</td>
<td>Locking time</td>
<td></td>
</tr>
<tr>
<td>$v_{\text{max}}$</td>
<td>Maximal input voltage</td>
<td></td>
</tr>
<tr>
<td>$v_{\text{min}}$</td>
<td>Minimal input voltage</td>
<td></td>
</tr>
<tr>
<td>$v_c$</td>
<td>Voltage across the loop filter capacitor $C_1$</td>
<td></td>
</tr>
<tr>
<td>$v_{\text{ctrl}}$</td>
<td>VCO control voltage</td>
<td></td>
</tr>
<tr>
<td>$\omega_1$</td>
<td>Loop filter time constant 1</td>
<td></td>
</tr>
<tr>
<td>$\omega_2$</td>
<td>Loop filter time constant 2</td>
<td></td>
</tr>
<tr>
<td>$\omega_{p3}$</td>
<td>PLL's third pole</td>
<td></td>
</tr>
<tr>
<td>$\omega_{\text{UGB}}$</td>
<td>PLL's Unity-gain bandwidth</td>
<td></td>
</tr>
<tr>
<td>$\omega_z$</td>
<td>PLL's zero</td>
<td></td>
</tr>
<tr>
<td>$\omega_i$</td>
<td>Reference input frequency</td>
<td></td>
</tr>
</tbody>
</table>
List of Figures

1.1 Digital versus analog design in SoC [Cad02] . . . . . . . . . . . . . . . . . . . 2
1.2 Analog circuits in wireless communication system . . . . . . . . . . . . . . . 3
1.3 (a) Two-stage OP AMP (b) its “thumbs models” [TMG02] . . . . . . . . . . . 5
1.4 Relative performance of analog and digital circuits over time [BM04] . . . . . 5
1.5 (a) Hierarchical design steps of analog/mixed-signal integrated circuit design
(b) detailed design processes on an analog circuit . . . . . . . . . . . . . . . . . . 7
1.6 Knowledge-based optimization approaches . . . . . . . . . . . . . . . . . . . 10
1.7 "Simulation-in-a-loop"-based optimization approaches . . . . . . . . . . . . . 11

2.1 Schematic of a current-mode-logic(CML) cell . . . . . . . . . . . . . . . . . . 18
2.2 Simulation-based performance evaluation . . . . . . . . . . . . . . . . . . . . 19
2.3 Delay variation of CML vs. process corners . . . . . . . . . . . . . . . . . . . 20
2.4 Yield estimation by means of MC analysis . . . . . . . . . . . . . . . . . . . . 21
2.5 Design flow of the automatic sizing process . . . . . . . . . . . . . . . . . . . 26
2.6 Feasible parameter space \( \mathcal{D} \) . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Feasible performance space \( \mathcal{F} \) . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Normal boundary intersection [SGA03] . . . . . . . . . . . . . . . . . . . . . 29

3.1 Step sizes in numerical simulation . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Hierarchical sizing of a large-scale analog/mixed-signal circuit . . . . . . . . . 34
3.3 Pareto-optimal fronts in hierarchical optimization . . . . . . . . . . . . . . . . 35
3.4 (a) Basic structure of VHDL-AMS Model (b) A VHDL-AMS Model of an ideal
OP AMP with slew-rate limiting [APT02] . . . . . . . . . . . . . . . . . . . . . . 37
3.5 A simulink model for a real integrator [MBF+03] . . . . . . . . . . . . . . . . 38
3.6 Proposed hierarchical optimization design flow . . . . . . . . . . . . . . . . . . 39
### List of Figures

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.7</td>
<td>Transformation from $s \sim N(s_0, C)$ into $s \sim N(0, 1)$</td>
<td>41</td>
</tr>
<tr>
<td>3.8</td>
<td>Visualization of the worst-case analysis definition for a given robustness $\beta_w$</td>
<td>42</td>
</tr>
<tr>
<td>3.9</td>
<td>Worst-case analysis on nominal Pareto-optimal front</td>
<td>42</td>
</tr>
<tr>
<td>4.1</td>
<td>A block diagram for a typical PLL</td>
<td>44</td>
</tr>
<tr>
<td>4.2</td>
<td>Building blocks of a 3rd-order CPPLL</td>
<td>45</td>
</tr>
<tr>
<td>4.3</td>
<td>(a) State machine of a PFD (b) Clock diagram example</td>
<td>45</td>
</tr>
<tr>
<td>4.4</td>
<td>A five-stage ring oscillator</td>
<td>47</td>
</tr>
<tr>
<td>4.5</td>
<td>Frequency vs. input control voltage for an ideal VCO</td>
<td>47</td>
</tr>
<tr>
<td>4.6</td>
<td>Circuit diagram of a 1/4 divider</td>
<td>48</td>
</tr>
<tr>
<td>4.7</td>
<td>Linear model of a CPPLL</td>
<td>49</td>
</tr>
<tr>
<td>4.8</td>
<td>Discrete model of a CPPLL</td>
<td>50</td>
</tr>
<tr>
<td>4.9</td>
<td>(a) Definition of $t_p$ and $T_-$ (b) Two state-space variables in RC network [HBMM04]</td>
<td>52</td>
</tr>
<tr>
<td>4.10</td>
<td>Locking time definition</td>
<td>53</td>
</tr>
<tr>
<td>4.11</td>
<td>Oscillator power spectrum [Rob03]</td>
<td>54</td>
</tr>
<tr>
<td>4.12</td>
<td>A typical spectrum for signal-sideband phase noise [Rob03]</td>
<td>55</td>
</tr>
<tr>
<td>4.13</td>
<td>Jitter definition</td>
<td>55</td>
</tr>
<tr>
<td>4.14</td>
<td>Bode diagram for open loop of a third-order CPPLL</td>
<td>58</td>
</tr>
<tr>
<td>4.15</td>
<td>CPPLL instability observed in z-domain analysis</td>
<td>59</td>
</tr>
<tr>
<td>4.16</td>
<td>Linear model of a CPPLL with noise sources</td>
<td>60</td>
</tr>
<tr>
<td>4.17</td>
<td>Loop transfer function from each noise source to PLL’s output</td>
<td>62</td>
</tr>
<tr>
<td>4.18</td>
<td>Schematic of CP Block</td>
<td>63</td>
</tr>
<tr>
<td>4.19</td>
<td>Schematic of VCO Block</td>
<td>64</td>
</tr>
<tr>
<td>4.20</td>
<td>Hierarchical performance modeling of the CPPLL</td>
<td>66</td>
</tr>
<tr>
<td>4.21</td>
<td>Pareto-optimal front of the CP</td>
<td>70</td>
</tr>
<tr>
<td>4.22</td>
<td>Pareto-optimal front of the VCO</td>
<td>71</td>
</tr>
<tr>
<td>4.23</td>
<td>File system in WiCkEĐ</td>
<td>72</td>
</tr>
<tr>
<td>4.24</td>
<td>Pareto-optimal front of the CP</td>
<td>75</td>
</tr>
<tr>
<td>4.25</td>
<td>(a) 3D Pareto-optimal front of the VCO (b) Contours of jitter in 2D surface (gain vs. current)</td>
<td>75</td>
</tr>
<tr>
<td>4.26</td>
<td>Pareto-optimal front of jitter and locking time at input jitter=1ns</td>
<td>76</td>
</tr>
</tbody>
</table>
4.27 Pareto-optimal front of jitter and locking time at input jitter=0 and setting $I_{CP} = I_{max}$ .................................................. 77

5.1 Block diagram of A/D Converters and Noise spectrum .............................. 83
5.2 A second-order SC $\Sigma\Delta$ modulator [BW88] ........................................ 86
5.3 Single-ended SC Integrator ................................................................. 87
5.4 Linear model of the modulator with an injected quantization noise .............. 88
5.5 Ideal linear Model of the 2nd-order SC $\Sigma\Delta$ modulator ......................... 89
5.6 PSD plot of the ideal 2nd-order SC $\Sigma\Delta$ modulator @ OSR=256: ................ 89
5.7 SNR vs. OSR in the ideal 2nd-order SC $\Sigma\Delta$ modulator ........................ 90
5.8 Clock jitter on sampling input signal .................................................. 91
5.9 Effect of clock jitter ................................................................. 92
5.10 Equivalent SC circuits in (a) $\phi1$ (b) $\phi2$ ........................................ 92
5.11 Effect of noise ................................................................. 93
5.12 Integrator output response .................................................. 95
5.13 Effects of non-identities in OP AMP .................................................. 97
5.14 A second-order SC $\Sigma\Delta$ modulator model in Simulink ......................... 98
5.15 Hierarchical performance modeling of the SC $\Sigma\Delta$ modulator ................. 99
5.16 Schematic of a fold cascode OP AMP .................................................. 99
5.17 angle=90 ................................................................. 101
5.18 SNR vs Parameter Sweep .................................................. 102
5.19 Pareto-optimal front of GBW vs. SR .................................................. 103

B.1 Modeling a random sampling jitter [MBF+03] ......................................... 113
B.2 Modeling switches thermal noise $KT/C$ block [MBF+03] ........................ 114
B.3 Modeling operational amplifier noise [MBF+03] ..................................... 114
B.4 Modeling real integrator [MBF+03] .................................................. 115
List of Tables

1.1 Classification of automatic sizing tools .......................... 12
2.1 Overall yield versus partial yield ............................... 21
2.2 Sizing rules for common-mode-logic cell in Fig. 2.1 .............. 23
2.3 Classification of the sizing rules on CML in Tab. 2.2 ............. 24
3.1 Yield estimation by worst-case distance ........................... 41
4.1 Jitter metrics based on [Kun05] .................................. 56
4.2 Trade-offs in CPPLL ............................................. 61
4.3 Hierarchical optimization results of the CPPLL .................... 74
4.4 Hierarchical optimization results at two different cases .......... 79
5.1 Summary of ΣΔ modulator architectures based on [FAB99] ........ 85
5.2 Parameters of the 2nd-order SC ΣΔ modulator ..................... 98
5.3 Sizing rules for the folded-cascode OP AMP ...................... 100
5.4 Maxima and minima of A, GBW and SR .......................... 100
5.5 Worst-case operation conditions for GBW and SR ................. 103
5.6 Nominal and robust optimization results of the Modulator .......... 104
A.1 Library of analog basic building blocks [Ste05] ................... 110
A.2 Sizing rules for NMOS basic structures [mun] ..................... 111
Abstract in German
