Study and Implementation of an Integrated CMOS High Resolution Time-to-Digital Converter for High Energy Physics Applications



Lukas Perktold Institute of Electronics Graz University of Technology

A thesis submitted for the degree of Doctor of Engineering Sciences (Dr. techn.) 2013 July

ii

Chairperson: Ao.Univ.-Prof. Dr. Werner RENHART

1. Examiner: Univ.-Prof. Dr. Wolfgang PRIBYL

2. Examiner: Prof. dr. ir. Edoardo CHARBON

Day of the defense: 13 August 2013

Signature from head of committee:

 $\mathrm{iv}$ 

## EIDESSTATTLICHE ERKLÄRUNG

Ich erkläre an Eides statt, dass ich die vorliegende Arbeit selbstständig verfasst, andere als die angegebenen Quellen/Hilfsmittel nicht benutzt und die den benutzten Quellen wörtlich und inhaltlich entnommenen Stellen als solche kenntlich gemacht habe.

Graz, am .....

.....(Unterschrift)

Englische Fassung:

## STATUTORY DECLARATION

I declare that I have authored this thesis independently, that I have not used other than the declared sources / resources and that I have explicitly marked all material which has been quoted either literally or by content from the used sources.

date

(signature)

vi

#### Abstract

In high energy physics (HEP), time measurements represent an efficient way to perform particle identification and tracking. Such applications often require to implement hundreds or thousands of measurement channels. For future upgrades of the LHC experiments at CERN, time resolutions in the sub 10 ps-rms range have become necessary. Thereby, the time-to-digital converter constitutes a fundamental crucial building block within the complete measurement chain.

In the scope of this thesis a novel multichannel time-to-digital converter (TDC), featuring fine-time resolutions has been developed. Based on a two stage interpolation mechanism, employing a Delay-Locked-Loop (DLL) in the first stage and a resistive voltage division circuit in the second stage, least significant bit (LSB) sizes of 5 ps are achieved. The interpolator structure is implemented only once per application specific integrated circuit (ASIC), shared among all the channels. A novel calibration approach to efficiently calibrate out device mismatches is proposed. Time offsets due to process-voltage-temperature (PVT) variations are compensated by the feedback mechanism of the DLL. At the same time, the interpolator has been designed to be operated at lower time resolutions, offering less demanding applications to profit from improved power consumption.

A demonstrator ASIC with a total of 8 channels has been implemented in a 130 nm technology. After calibration has been applied, time-resolutions of better than 3 ps-rms have been demonstrated. Differential-non-linearity (DNL) and integral-non-linearity (INL) have been evaluated to be below +/-0.9 LSB and +/- 1.3 LSB respectively. Depending on the mode of operation, the full prototype consumes between 34 mW/channel to 42 mW/channel. Less power is consumed for lower resolution settings. The demonstrator exhibits a time shift in presence of voltage variations of -0.19 ps/mV and a temperature dependence of 0.44 ps/deg. With the measurement precision of the test setup, inter-channel crosstalk between two neighboring channels has been evaluated to be below +/- 1 LSB.

viii

#### Kurzfassung

In der Hochenergiephysik stellen Zeit-Messungen eine effiziente Möglichkeit dar, Teilchen zu identifizieren und deren Flugbahn aufzuzeichnen. Derartige Anwendungen erfordern eine Integration von oft hunderten oder tausenden Messkanälen. Für zukünftige Nachrüstungen der LHC-Experimente am CERN werden zeitliche Auflösungen von unter 10 ps-rms benötigt. Innerhalb der Messkette stellt dabei der Zeit-zu-Digital-Umsetzer einen fundamental kritischen Baustein dar.

Im Zuge dieser Dissertation wurde ein neuartiger, mehrkanaliger Zeit-zu-Digital-Umsetzer, der das Erreichen hoher zeitlicher Auflösungen ermöglicht, entwickelt. Basierend auf einem zweistufigen Interpolationsmechanismus, unter der Verwendung eines Verzögerungsregelkreises in der ersten Stufe und eines Spannungsteilers auf Widerstandsbasis in der zweiten Stufe, werden kleinste zeitliche Auflösungen von 5 ps erzielt. Eine Implementierung der Interpolationsschaltung ist nur einmal pro Microchip notwendig und wird von den am Chip integrierten Kanälen gemeinsam genutzt. Um Bauteilvariationen ausgleichen zu können, wird ein neuartiger Kalibrierungsmechanismus vorgeschlagen. Zeitliche Verzögerungen aufgrund von Prozess-, Spannungs- sowie Temperaturschwankungen werden durch die Regelschleife des Verzögerungsregelkreises ausgeglichen. Die Interpolationsschaltung wurde zugleich für den Einsatz für niedrigere zeitliche Auflösungen ausgelegt, um weniger anspruchsvollen Anwendungen einen geringeren Leistungsbedarf zu ermöglichen.

Für Demonstrationszwecke wurde ein Microchip mit insgesamt 8 Kanälen in einer 130nm-Technologie realisiert. Nach Kalibrierung der Schaltung konnten zeitliche Auflösungen von besser als 3 ps-rms erreicht werden. Der differentielle sowie der integrale Fehler der Nichtlinearitäten der Schaltung wurde zahlenmäßig auf unter +/- 0.9 LSB bzw. auf unter +/- 1.3 LSB bestimmt. Abhängig vom Betrieb der Schaltung weist der gesamte Chip einen Leistungsverbrauch zwischen 34 mW/Kanal bis 42 mW/Kanal auf. Für niedrigere zeitliche Auflösungen stellt sich ein geringerer Leistungsbedarf ein. Die aufgebaute Schaltung weist zeitliche Verschiebungen bei Spannungsschwankungen von -0.19 ps/mV und eine Temperaturabhängigkeit von 0.44 ps/deg auf. Mit der Messgenauigkeit des Versuchsaufbaus wurde die gegenseitige zeitliche Beeinflussung zweier benachbarter Kanäle auf unter +/-1 LSB bestimmt.

## Acknowledgements

I would like to thank Jorgen Christanesen for his continuous support and countless technical discussions during the design and evaluation period of this work. With his high level of expertise in the design of TDCs he has significantly contributed to the origin and quality of this work. A great thanks is also contributed to my supervisors, Wolfgang Pribyl and Kostas Kloukinas, for their continuous technical support and more noteworthy directive advice during the entire progress of this work. I would also like to thank Alexander Kluge and the design team of the NA62 Gigatracker ASIC for refining my technical skills and their encouragement during difficult periods of this work.

My gratefulness also goes to Matthew Noy as a virtuous engineer who shaped this work over endless inspiring teas and coffees. I got to know him not only as a gifted mind but also as a great friend.

Thank you!

ii

# Contents

| Li            | st of                | Figure  | es                                                      | vii  |
|---------------|----------------------|---------|---------------------------------------------------------|------|
| $\mathbf{Li}$ | st of                | Table   | S                                                       | xiii |
| 1             | Intr                 | oducti  | ion                                                     | 1    |
|               | 1.1                  | Struct  | sure of Work                                            | . 2  |
|               | 1.2                  | Summ    | nary of Contributions                                   | . 3  |
|               | 1.3                  | List of | f Publications                                          | . 4  |
| <b>2</b>      | $\operatorname{Tim}$ | ie Mea  | asurements in High-Energy-Physics and Related Fields    | 7    |
|               | 2.1                  | Field o | of applications                                         | . 8  |
|               |                      | 2.1.1   | Time Measurement Chain                                  | . 9  |
|               | 2.2                  | System  | n Level Implementation                                  | . 11 |
|               |                      | 2.2.1   | In-Pixel Architecture                                   | . 11 |
|               |                      | 2.2.2   | End-Of-Column Architecture                              | . 13 |
|               |                      | 2.2.3   | Multi-ASIC Approach                                     | . 15 |
|               |                      | 2.2.4   | Architecture Comparison                                 | . 17 |
|               | 2.3                  | Develo  | opment Trends                                           | . 19 |
|               |                      | 2.3.1   | Novel Sensors and Experiments                           | . 21 |
| 3             | The                  | e Essen | nce of Time-to-Digital Converter Design                 | 27   |
|               | 3.1                  | Time-   | to-Digital Converter Principles and their Architectures | . 27 |
|               |                      | 3.1.1   | Time Measurement Principle                              | . 28 |
|               |                      | 3.1.2   | Quantization Noise                                      | . 29 |
|               |                      | 3.1.3   | Counter Principle                                       | . 30 |
|               |                      | 3.1.4   | Delay-Line Principle                                    | . 31 |

 $\mathbf{4}$ 

| 3.1.5                   | Time Amplification Principle33                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 3.1.6                   | Fixed Time Delay Principle                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| 3.1.7                   | Interpolation Principle                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| 3.1.8                   | Time-to-Amplitude Principle    36                                                                                                                                                                                                                                                                                                                                                                                                                              |
| 3.1.9                   | Repetitive Measurement Principle                                                                                                                                                                                                                                                                                                                                                                                                                               |
| 3.1.10                  | Event-Capture vs Clock-Capture                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| 3.1.11                  | Multistage Architectures                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| 3.1.12                  | Event Sampling                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| 3.1.13                  | Delay Control                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| Challe                  | nges in Fine-Time Resolution TDC Design                                                                                                                                                                                                                                                                                                                                                                                                                        |
| 3.2.1                   | Error Sources                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| 3.2.2                   | Quantization Noise                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| 3.2.3                   | Device Mismatch                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| 3.2.4                   | PVT Variations                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| 3.2.5                   | Power Supply Noise                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| 3.2.6                   | Inter-Channel Crosstalk                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| 3.2.7                   | RC-Delays                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| 3.2.8                   | Time Reference Jitter                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| h Dogo                  | lution, Multi-Channel Time-to-Digital Converter: A Pro-                                                                                                                                                                                                                                                                                                                                                                                                        |
| II-nesu                 | iution, Multi-Channel Inne-to-Digital Converter: A Fio-                                                                                                                                                                                                                                                                                                                                                                                                        |
| al                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| al<br>Needs             | 55                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                         | 55 in the High-Energy Physics Community                                                                                                                                                                                                                                                                                                                                                                                                                        |
| Needs                   | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56                                                                                                                                                                                                                                                                                                                                                                               |
| Needs<br>4.1.1          | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56Fine-Time Resolution Aspects58                                                                                                                                                                                                                                                                                                                                                 |
| Needs<br>4.1.1          | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56Fine-Time Resolution Aspects584.1.2.1Quantization Noise58                                                                                                                                                                                                                                                                                                                      |
| Needs<br>4.1.1          | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56Fine-Time Resolution Aspects584.1.2.1Quantization Noise584.1.2.2Device Mismatch58                                                                                                                                                                                                                                                                                              |
| Needs<br>4.1.1          | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56Fine-Time Resolution Aspects584.1.2.1Quantization Noise584.1.2.2Device Mismatch584.1.2.3PVT Variations59                                                                                                                                                                                                                                                                       |
| Needs<br>4.1.1          | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56Fine-Time Resolution Aspects584.1.2.1Quantization Noise584.1.2.2Device Mismatch584.1.2.3PVT Variations59                                                                                                                                                                                                                                                                       |
| Needs<br>4.1.1          | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56Fine-Time Resolution Aspects584.1.2.1Quantization Noise584.1.2.2Device Mismatch584.1.2.3PVT Variations594.1.2.4Power Supply Noise604.1.2.5Inter-Channel Crosstalk61                                                                                                                                                                                                            |
| Needs<br>4.1.1          | 55in the High-Energy Physics Community56Time-to-Digital Converter Requirements56Fine-Time Resolution Aspects584.1.2.1Quantization Noise584.1.2.2Device Mismatch584.1.2.3PVT Variations594.1.2.4Power Supply Noise604.1.2.5Inter-Channel Crosstalk61                                                                                                                                                                                                            |
| Needs<br>4.1.1<br>4.1.2 | 55         in the High-Energy Physics Community       56         Time-to-Digital Converter Requirements       56         Fine-Time Resolution Aspects       58         4.1.2.1       Quantization Noise       58         4.1.2.2       Device Mismatch       58         4.1.2.3       PVT Variations       59         4.1.2.4       Power Supply Noise       60         4.1.2.5       Inter-Channel Crosstalk       61         4.1.2.6       RC-Delay       61 |
|                         | 3.1.7<br>3.1.8<br>3.1.9<br>3.1.10<br>3.1.11<br>3.1.12<br>3.1.13<br>Challe<br>3.2.1<br>3.2.2<br>3.2.3<br>3.2.4<br>3.2.5<br>3.2.6<br>3.2.7<br>3.2.8                                                                                                                                                                                                                                                                                                              |

#### CONTENTS

|          |                |        | 4.1.4.1    | LSB Auto-Adjustment                                                                                                        | 65          |
|----------|----------------|--------|------------|----------------------------------------------------------------------------------------------------------------------------|-------------|
|          |                |        | 4.1.4.2    | Event Signal Characteristics                                                                                               | 66          |
|          | 4.2            | Propo  | sed Archit | $ecture \dots \dots$ | 66          |
|          |                | 4.2.1  | Sub-Gate   | e Delay Resolutions - Concept Choice                                                                                       | 67          |
|          |                | 4.2.2  | Fine-Tin   | ne Interpolator Structure                                                                                                  | <u> 5</u> 9 |
|          | 4.3            | Archit | ectural Fe | eatures Summary                                                                                                            | 70          |
|          |                | 4.3.1  | Key Desi   | gn Aspects                                                                                                                 | 73          |
|          |                |        | 4.3.1.1    | Adjustment Feature                                                                                                         | 73          |
|          |                |        | 4.3.1.2    | Fast Delay Cell                                                                                                            | 73          |
|          |                |        | 4.3.1.3    | Fast Time-Capture Registers                                                                                                | 74          |
| <b>5</b> | Der            | nonstr | ator ASI   | С 7                                                                                                                        | 75          |
|          | 5.1            | Demo   | nstrator A | rchitecture                                                                                                                | 75          |
|          | 5.2            | Centra | al Fine-Ti | me Interpolation                                                                                                           | 76          |
|          |                | 5.2.1  | Fast Dela  | ay Buffer                                                                                                                  | 78          |
|          |                | 5.2.2  | Resistive  | Time Interpolation                                                                                                         | 84          |
|          |                | 5.2.3  | Delay Lo   | cked Loop                                                                                                                  | 88          |
|          |                |        | 5.2.3.1    | Loop Dynamics                                                                                                              | 90          |
|          | 5.3            | Chanr  | nel Matrix |                                                                                                                            | 92          |
|          |                | 5.3.1  | Fine-Tin   | ne Code Buffers - Distribution Buffers                                                                                     | 93          |
|          |                | 5.3.2  | Event Ca   | apture Based Channel                                                                                                       | 95          |
|          |                |        | 5.3.2.1    | Event Capture Register                                                                                                     | 97          |
|          |                | 5.3.3  | Clock Ca   | apture Based Channel                                                                                                       | )0          |
|          |                |        | 5.3.3.1    | Clock Capture Register                                                                                                     | )1          |
|          |                | 5.3.4  | Event Di   | stribution $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $10$                                    | )4          |
|          | 5.4            | Expec  | ted Perfor | mance $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $10$                                                  | )5          |
|          |                | 5.4.1  | Timing I   | Precision $\ldots \ldots 10$    | )5          |
|          |                | 5.4.2  | Power Co   | onsumption $\ldots \ldots 10$                 | )7          |
| 6        | $\mathbf{Exp}$ | erime  | ntal Resu  | ilts 11                                                                                                                    | 1           |
|          | 6.1            | Measu  | rement Se  | tup                                                                                                                        | 11          |
|          | 6.2            | TDC    | Characteri | zation $\ldots \ldots 11$              | 12          |
|          |                | 6.2.1  | Function   | al Test                                                                                                                    | 14          |
|          |                | 6.2.2  | DLL Loc    | king Range                                                                                                                 | 17          |

|               |       | 6.2.3   | TDC Tra    | ansfer Function              | . 118 |
|---------------|-------|---------|------------|------------------------------|-------|
|               |       |         | 6.2.3.1    | Calibration                  | . 126 |
|               |       |         | 6.2.3.2    | Frequency Range              | . 128 |
|               |       |         | 6.2.3.3    | Expected RMS-Time Resolution | . 129 |
|               |       | 6.2.4   | RMS-Ti     | me Resolution                | . 130 |
|               |       | 6.2.5   | Power C    | onsumption                   | . 134 |
|               |       | 6.2.6   | Inter-Ch   | annel Crosstalk              | . 136 |
|               |       | 6.2.7   | Voltage-   | Temperature Variations       | . 137 |
| 7             | Con   | clusior | 1          |                              | 141   |
|               | 7.1   | Short . | Architect  | ure Description              | . 141 |
|               | 7.2   | Demor   | nstrator F | Performance                  | . 142 |
|               | 7.3   | Relate  | d Work .   |                              | . 144 |
|               | 7.4   | Scienti | fic Contr  | ibutions                     | . 145 |
|               | 7.5   | Future  | Develop    | ments                        | . 145 |
| Acronyms      |       |         | 148        |                              |       |
| Te            | ermin | ology   |            |                              | 151   |
| $\mathbf{Li}$ | st of | Symbo   | ols        |                              | 153   |

# List of Figures

| 2.1 | Sectional view of the CMS and ALICE detector. Available. [Online].                   |    |
|-----|--------------------------------------------------------------------------------------|----|
|     | $eq:http://cms.web.cern.ch/ and http://aliceinfo.cern.ch/ \ $                        | 8  |
| 2.2 | Fluorescence-lifetime-imaging-microscopy (FLIM) principle                            | 9  |
| 2.3 | Block diagram of a typical time measurement chain in HEP and related                 |    |
|     | fields.                                                                              | 10 |
| 2.4 | Timing uncertainties introduced by the different blocks of the time mea-             |    |
|     | surement chain.                                                                      | 11 |
| 2.5 | Popular implementation approaches to implement the complete time                     |    |
|     | measurement chain.                                                                   | 12 |
| 2.6 | Exemplary <i>in-pixel</i> architectures: (a) Timepix [4] (b) Delft-SPIS [5] $\ldots$ | 13 |
| 2.7 | Exemplary <i>end-of-column</i> architectures: (a) TDCpix [6] (b) EPFL-SPIS           |    |
|     | [7]                                                                                  | 14 |
| 2.8 | Exemplary <i>multi-ASIC</i> architectures: (a) and (b) muon drift tube (CMS)         |    |
|     | [8, 9] (c) time-of-flight (ALICE) [10]                                               | 16 |
| 2.9 | Time-measurement trends observed in the HEP community                                | 19 |
| 3.1 | Classification of TDC architectures according to their underlying oper-              |    |
|     | ating principle                                                                      | 28 |
| 3.2 | Basic principle of time measurements                                                 | 29 |
| 3.3 | Illustration of the quantization noise                                               | 30 |
| 3.4 | Counter principle.                                                                   | 31 |
| 3.5 | Illustration of the delay-line principle                                             | 32 |
| 3.6 | Delay-line principle variants                                                        | 32 |
| 3.7 | Time-amplification principle.                                                        | 33 |
| 3.8 | Cross-coupled time-amplifier (TA)                                                    | 34 |
|     |                                                                                      |    |

## LIST OF FIGURES

| 3.9  | Fixed time-delay principle and a capacitive scaling example              | 35 |
|------|--------------------------------------------------------------------------|----|
| 3.10 | Interpolation Principle.                                                 | 36 |
| 3.11 | Resistive voltage division example                                       | 36 |
| 3.12 | Time-to-amplitude principle.                                             | 37 |
| 3.13 | Illustration of the Wave-Union TDC approach.                             | 38 |
| 3.14 | Time-to-Digital converter based on successive-approximation              | 38 |
| 3.15 | Different time capturing principles                                      | 39 |
| 3.16 | Different time capturing principles based on a delay-line architecture.  | 40 |
| 3.17 | Illustration of a multistage TDC architecture                            | 41 |
| 3.18 | Illustration of a sampling TDC architecture                              | 42 |
| 3.19 | Control principles                                                       | 42 |
| 3.20 | A DLL block diagram.                                                     | 43 |
| 3.21 | Illustration of the INL and DNL error respectively                       | 45 |
| 3.22 | Illustration of local and global delay variations respectively           | 47 |
| 3.23 | Power supply noise sources                                               | 47 |
| 3.24 | Illustration of timing error introduced by power supply noise            | 48 |
| 3.25 | LSB variation for synchronous and asynchronous power supply noise        | 48 |
| 3.26 | Crosstalk induced timing variations.                                     | 49 |
| 3.27 | Time shifts introduced on a victim line due to capacitive coupling       | 50 |
| 3.28 | Delay offset shifts due to the RC-delay of a wire                        | 51 |
| 4.1  | Transition times and signal slopes using different driving configuration |    |
|      | to drive a large capacitive load                                         | 60 |
| 4.2  | Distribution of time critical signals                                    | 62 |
| 4.3  | Illustration of a) a local implementation approach b) a global implemen- |    |
|      | tation approach.                                                         | 63 |
| 4.4  | Proposed multi-channel TDC architecture. [67]                            | 67 |
| 4.5  | A multi-stage Fine-Time Interpolator. [68]                               | 70 |
| 5.1  | Microphotograph of the demonstrator ASIC. [69]                           | 76 |
| 5.2  | Block level diagram of the fine-time generator implementation            | 77 |
| 5.3  | High speed delay buffer implementing an additional zero in the signal    |    |
|      | path. [70]                                                               | 78 |
| 5.4  | Bias generator of the delay buffer cell. [69]                            | 79 |

## LIST OF FIGURES

| 5.5  | Half circuit equivalent of the delay buffer cell                                                     | 81  |
|------|------------------------------------------------------------------------------------------------------|-----|
| 5.6  | Selected simulated waveforms of the delay line                                                       | 82  |
| 5.7  | Simulated buffer delay for different control voltages and operating con-                             |     |
|      | ditions. [69]                                                                                        | 83  |
| 5.8  | Resistive voltage division principle                                                                 | 84  |
| 5.9  | Schematic diagrams and dimensions of the resistive interpolation circuit.                            | 86  |
| 5.10 | Selected simulated waveforms of the interpolated fine-time code. $\ . \ . \ .$                       | 87  |
| 5.11 | Simulated LSB size for all 128 bins of the interpolator for different DLL                            |     |
|      | buffer delay settings. [69] $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                    | 88  |
| 5.12 | Output states of a bang-bang phase detector (PD) implementation                                      | 89  |
| 5.13 | Simulated DLL control voltage for a 1562.5 MHz reference signal                                      | 91  |
| 5.14 | Schematic diagram of the distribution buffers to distribute the fine-time                            |     |
|      | code. [69]                                                                                           | 94  |
| 5.15 | Illustration of the <i>event-capture</i> implementation                                              | 95  |
| 5.16 | Schematic diagram of the time-capture registers (TCRs) for the <i>event</i>                          |     |
|      | capture based channel implementation                                                                 | 97  |
| 5.17 | Simulated waveform diagrams of the proposed $\mathit{event}\ \mathit{capture}\ \mathrm{register.}$ . | 98  |
| 5.18 | Illustration of the <i>clock-capture</i> implementation. The time-of-arrival of                      |     |
|      | an event is captured by sampling the state of the fine-time interpolator.                            | 100 |
| 5.19 | Schematic diagram of the TCRs for the <i>clock capture</i> based channel                             |     |
|      | implementation. [69] $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$         | 101 |
| 5.20 | Simulated waveform diagrams of the proposed $\mathit{clock}\ \mathit{capture}\ \mathrm{register.}$   | 102 |
| 5.21 | Block diagram of the event signal distribution network. $\hdots$                                     | 104 |
| 5.22 | Schematic diagrams and dimensions of the event distribution buffers                                  | 105 |
| 6.1  | Block level diagram of the test setup used to characterize the TDC                                   | 119 |
| 6.2  | Photograph of the carrier printed circuit board (PCB)                                                |     |
| 6.3  | Measured output of the PD signal in locked condition.                                                |     |
|      |                                                                                                      | 114 |
| 6.4  | Measured jitter characteristics of: (a) the input signal of the DLL (b)                              | 115 |
| 6 5  | the output signal of the DLL                                                                         | 119 |
| 6.5  | <i>Event capture</i> scheme: Data patterns for different phases of the event signal                  | 116 |
|      | ыднан                                                                                                | 110 |

## LIST OF FIGURES

| 6.6  | $Clock \ capture \ scheme:$ Data patterns for different phases of the event                                    |     |
|------|----------------------------------------------------------------------------------------------------------------|-----|
|      | signal                                                                                                         | 117 |
| 6.7  | Functional diagram of a code-density-test.                                                                     | 118 |
| 6.8  | Histogram of the event phase with respect to the reference signal's phase                                      |     |
|      | for a 1.28 GHz reference signal.                                                                               | 119 |
| 6.9  | Measured LSB size of channel 5 for all 128 bins after global calibration.                                      | 119 |
| 6.10 | Reconstructed input output transfer-function of channel 5                                                      | 120 |
| 6.11 | Reconstruction of the data word of channel pair 1 & 2 with a virtual                                           |     |
|      | shift of 5 bins, to correct for timing offset shifts.                                                          | 121 |
| 6.12 | Channel 1 and 2: Measured DNL of all 128 bins after global calibration                                         |     |
|      | for 5 ps LSB sizes.                                                                                            | 121 |
| 6.13 | Channel 1 and 2: Measured INL of all 128 bins after global calibration                                         |     |
|      | for 5 ps LSB sizes.                                                                                            | 122 |
| 6.14 | Channel 3 - 6: Measured DNL of all 128 bins after global calibration for                                       |     |
|      | 5 ps LSB sizes. [76] $\ldots$                                                                                  | 123 |
| 6.15 | Channel 3 - 6: Measured INL of all 128 bins after global calibration for                                       |     |
|      | 5 ps LSB sizes. [76] $\ldots$                                                                                  | 124 |
| 6.16 | Channel 7 and 8: Measured DNL of all 128 bins after global calibration                                         |     |
|      | for 10 ps LSB sizes. $\ldots$ | 125 |
| 6.17 | Channel 7 and 8: Measured INL of all 128 bins after global calibration                                         |     |
|      | for 10 ps LSB sizes. $\ldots$ | 126 |
| 6.18 | Calibration vector derived from the cumulative INL error of channel 3 to $6.2$                                 | 127 |
| 6.19 | Measured LSB size of channel 5 for different calibration settings $\mathbbm{R}$                                | 128 |
| 6.20 | Measured LSB size of channel 5 for different LSB size settings                                                 | 129 |
| 6.21 | Functional diagram to measure the rms-time resolution of the TDC                                               | 131 |
| 6.22 | Measured time difference of a 4 inch wire delay applied between channel                                        |     |
|      | 5 and 6                                                                                                        | 132 |
| 6.23 | Measured wire delay difference of channel pair 5 $\&$ 6 for different wire                                     |     |
|      | delays. [76]                                                                                                   | 132 |
| 6.24 | Measured power consumption of the demonstrator chip for different ref-                                         |     |
|      | erence clock frequencies when supplied with $1.3 \mathrm{V}.$                                                  | 135 |
| 6.25 | Functional diagram to measure inter-channel crosstalk between two ad-                                          |     |
|      | jacent channels                                                                                                | 136 |

| 6.26 | Measured inter-channel crosstalk of channel pair 5 and 6. $\ldots$    | 137 |
|------|-----------------------------------------------------------------------|-----|
| 6.27 | Functional diagram to measure voltage and temperature sensitivity of  |     |
|      | the TDC                                                               | 137 |
| 6.28 | Measured delay variations of channel pair 5 $\&$ 6 due to voltage and |     |
|      | temperature shifts. [76]                                              | 138 |
| 7.1  | Architectural view of a prospective full TDC ASIC. [81]               | 146 |

# List of Tables

| 2.1  | Summary of representative implementations of <i>in-pixel</i> , <i>end-of-column</i>          |     |
|------|----------------------------------------------------------------------------------------------|-----|
|      | and <i>multi-ASIC</i> architectures                                                          | 17  |
| 2.2  | Tendency of the number of TDC channels, the LSB size and the power                           |     |
|      | efficiency of different implementation approaches                                            | 18  |
| 2.3  | Timing performance of state of the art sensor designs                                        | 21  |
| 2.4  | Requirements of future experiments set on the TDC                                            | 22  |
| 3.1  | Common error sources in TDC design                                                           | 44  |
| 4.1  | Summary of <i>fine-time interpolator</i> concepts                                            | 68  |
| 5.1  | Device dimensions of the circuit shown in figure 5.3 and 5.4 respectively.                   | 80  |
| 5.2  | Charge-pump (CP) current for different CP current settings                                   | 90  |
| 5.3  | Nominal DLL parameters with the DLL operated at 1562.5 MHz                                   | 91  |
| 5.4  | Different channel configurations implemented by the demonstrator                             | 92  |
| 5.5  | Summary of the estimated input capacitances for the different time cap-                      |     |
|      | turing register (TCR) types                                                                  | 93  |
| 5.6  | Device dimensions of the <i>event capture</i> TCR                                            | 99  |
| 5.7  | Current consumption of the proposed <i>event capture</i> register                            | 99  |
| 5.8  | Parasitic input capacitance of the proposed <i>event capture</i> register                    | 99  |
| 5.9  | Device dimensions of a conventional flip-flop used as a <i>clock capture</i>                 |     |
|      | register                                                                                     | 102 |
| 5.10 | Current consumption of a conventional flip-flop used as a $\mathit{clock}\ \mathit{capture}$ |     |
|      | register                                                                                     | 103 |
| 5.11 | Parasitic input capacitance of the conventional flip-flop used as a $\mathit{clock}$         |     |
|      | capture register                                                                             | 103 |

## LIST OF TABLES

| 5.12 | Estimated parasitic capacitance of the TCRs to which the event signal                                                  |
|------|------------------------------------------------------------------------------------------------------------------------|
|      | needs to connected                                                                                                     |
| 5.13 | Estimated timing performance of the different channel configurations. $% \left( {{{\bf{n}}_{{\rm{m}}}}} \right)$ . 106 |
| 5.14 | Power consumption estimates of the respective blocks of the demonstrator.107                                           |
| 5.15 | Estimated parasitic load and power contribution of each channel config-                                                |
|      | uration                                                                                                                |
| 6.1  | Expected rms-time resolution of the demonstrator                                                                       |
| 6.2  | Measured rms-time resolution of channel pair 5 & 6 for different wire                                                  |
|      | delay settings                                                                                                         |
| 6.3  | Measured rms-time resolution of different channels and LSB size settings. $133$                                        |
| 6.4  | Measured power consumption of respective blocks of the demonstrator                                                    |
|      | supplied with $1.3 \text{ V}$                                                                                          |
| 7.1  | Performance summary of the demonstrator                                                                                |
| 7.2  | Performance comparison of different channel configurations                                                             |
| 7.3  | Comparison with related TDC designs reported in literature 144                                                         |

# 1

# Introduction

Very fine time resolution measurements are getting an ever increasing attention in the high energy physics (HEP) community. Many physical quantities like crossing time or energy of a particle can be conducted based on time measurements. Only, recently novel sensor designs have demonstrated to achieve time resolutions in the sub-10 ps-rms domain. Thereby, the quality of the measurement depends to a great extend on the timing precision of the time-to-digital converter (TDC). To allow novel experiments to profit form the improved detector capabilities, a new time-to-digital converter reaching far into the sub-10 ps-rms domain is required.

Time measurements in this resolution domain are challenging and rely on very high performance electronics. A new multichannel, fine-time resolution time-to-digital converter targeted to full-fill the requirements of upcoming HEP detectors like the ATLAS AFP, CMS HPS or LHCb TORCH and many other HEP R&D programs is demanded. For such a development, resources from different experiments need to be combined, requiring a high degree of flexibility of the design. In the HEP community, often, large systems with hundreds or thousands of channels distributed over a large area are needed, putting special requirements on the design and architecture of the TDC. In this work a TDC implementation targeted to full-fill the stringent requirements of novel HEP experiments is developed.

#### 1. INTRODUCTION

#### 1.1 Structure of Work

This work will discuss the specialties in TDC design in the HEP community and will develop a TDC architecture capable of achieving sub 10 ps-rms timing precisions for highly integrated designs.

In the first part of the work (Chapter 2), after a short introduction to the field of application, a complete picture of the time measurement chain in the HEP community is presented and different implementation approaches are discussed. The chapter will close with a study of the development trends observed in the field and outline the major requirements set by upcoming developments.

An extensive literature study is presented in Chapter 3. Different concepts to implement time-to-digital converters are presented and described. Later, in the same chapter, challenges and difficulties needed to be faced in the development of a fine-time resolution TDC are discussed and formally formulated.

In Chapter 4 the requirements of a novel multichannel TDC with fine time resolutions are described and illuminated. The difficulties and challenges identified in the previous chapter as well as important system level characteristics, found in the HEP community, are analyzed. Based on the foregoing discussed, a TDC architecture targeted to full-fill the requirements of the next generation physics experiments is presented. The chapter closes with a short summary of the important features of the proposed architecture.

To demonstrate the feasibility and performance of the proposed architecture, a demonstrator application specific integrated circuit (ASIC) has been designed and constructed. In Chapter 5 its transistor level implementation is developed. The central fine-time interpolator together with a total of 8 channels have been implemented. Based on simulation results, the expected performance of the demonstrator is analyzed.

Measurement results of the demonstrator are presented in Chapter 6. Testing procedures are detailed as well as the obtained results reported and compared to simulation. Most importantly, the linearity, the single-shot time resolution as well as the TDCs power consumption have been extracted. Inter-channel crosstalk as well as voltage and temperature variations of the design have also been investigated.

Chapter 7 concludes the work presented in this thesis. After a short summary of the implemented architecture, the achieved results of the demonstrator are summarized and discussed. Next, a comparison to similar work is presented and worth mentioning contributions made to the scientific community are listed. The chapter closes with a short outlook of a prospective full TDC development.

## **1.2 Summary of Contributions**

In the course of the implementation of a fine-time resolution TDC, following worthmentioning contributions have been made:

- Silicon implementation of a high-time resolution ASIC to demonstrate timeresolutions of better than 5 ps-rms. The prototype implements a total of 8 channels with different configurations.
- Study and implementation of a fast delay-cell. The cell is based on a Maneatis delay cell employing a higher current density load together with a additional zero in its signal path.
- Study and implementation of a resistive division interpolation concept. The interpolator has been designed and evaluated to be operated with different reference signal frequencies. An on-chip regulation circuit is implemented to auto-adjust the least-significant bit (LSB) size of the interpolator in presence of process-voltage-temperature (PVT) variations.
- Study and implementation of a so called *clock capture* and *event capture* based TDC architecture. The performance of both concepts has empirically been evaluated and compared to each other.
- Study and implementation of two different flip-flop types to investigate and compare effects on device mismatch and power consumption.

In the course of the implementation of the TDCpix readout ASIC for the Gigatracker station of the NA62 experiment, currently under development at CERN, following worth-mentioning contributions have been made:

• Study and evaluation of an existing 100 ps LSB size TDC to identify the shortcomings of the TDCpix prototype.

#### 1. INTRODUCTION

- Study and implementation of a revised delay-locked-loop (DLL) to be used in a DLL based TDC with 100 ps LSB size.
- Support of the integration of a DLL based TDC in the final version of the TDCpix ASIC.

## **1.3** List of Publications

This sections lists the publications made in the coarse of the presented work and details my personal contribution to them. Newest on top.

[1] To be published in the proceedings of International Instrumentation and Measurement Technology Conference, 2013 (I2MTC)

**Title:** A high time-resolution (< 3 ps-rms) Time-to-Digital Converter for Highly Integrated Designs

Authors: L. Perktold and J. Christiansen

**Contribution:** The proposed architecture, the designed circuits as well as measurement results of demonstrator are reported in this publication. This represents a selection of the work presented in Chapter 4, 5 and 6. The work reported in this paper has been designed and evaluated all by myself in the scope of my PhD thesis. The co-author contributed with valuable discussions to the work.

[2] Proceedings of Ph.D. Research in Microelectronics and Electronics, 2012 (PRIME).Title: A Flexible 5 ps Bin-Width Timing Core for Next Generation High-Energy-Physics Time-to-Digital Converter Applications

Authors: <u>L. Perktold</u> and J. Christiansen

**Contribution**: The fine-time interpolator structure, its transistor level implementation as well as simulation results are presented. This represents a fraction of the work, mainly presented in Chapter 5. The work reported in this paper has been designed and evaluated by myself in the scope of this thesis. The co-author contributed with valuable discussions to the work.

[3] Journal of Instrumentation, Volume 7, January 2012. Proceedings of Topical Workshop on Electronics for Particle Physics, 2011 (TWEPP).

**Title**: A 9-Channel, 100 ps LSB Time-to-Digital Converter for the NA62 Gigatracker Readout ASIC (TDCpix)

Authors: <u>L. Perktold</u>, G. Aglieri Rinella, E. Martin, M. Noy, A. Kluge, K. Kloukinas, J. Kaplon, P. Jarron, M. Morel and M. Fiorini

**Contributions**: The implementation of a DLL based TDC for the Gigatracker station readout chip, named TDCpix, of the NA62 experiment is presented. My responsibilities included the design and integration of a DLL for the fine-time resolution block. The implementation of this design is a good example of a monolithic end-of-column architecture as detailed in Chapter 2. The co-authors contributed with technical discussions to the work and have given valuable input on the schematic design and layout of the DLL.

#### **References Chapter 1**

- L. Perktold and J. Christiansen, "A high time-resolution (< 3 ps-rms) time-todigital converter for highly integrated designs," in *Instrumentation and Measurement Technology Conference (I2MTC), 2013 IEEE International,* 2013.
- [2] <u>L. Perktold</u> and J. Christiansen, "A flexible 5 ps bin-width timing core for next generation high-energy-physics time-to-digital converter applications," in *Ph.D. Re*search in Microelectronics and Electronics (PRIME), 2012 8th Conference on, 2012, pp. 1–4.
- [3] <u>L. Perktold</u>, G. A. Rinella, E. Martin, M. Noy, A. Kluge, K. Kloukinas, J. Kaplon, P. Jarron, M. Morel, and M. Fiorini, "A 9-channel, 100 ps lsb time-to-digital converter for the NA62 gigatracker readout ASIC (TDCpix)," *Journal of Instrumentation*, vol. 7, no. 01, p. C01065, 2012. [Online]. Available: http://stacks.iop.org/1748-0221/7/i=01/a=C01065

## 1. INTRODUCTION

# $\mathbf{2}$

# Time Measurements in High-Energy-Physics and Related Fields

In high energy physics (HEP) experiments and other closely related fields, e.g. positron emission tomography (PET) or fluorescence lifetime imaging microscopy (FLIM) to mention just a few, it is crucial to measure characteristics of physical particles crossing a detector. Quantities like position, crossing time or energy of a particle need to be measured, sometimes with very high precision. Whereas the position of the crossing point can be resolved by segmenting the sensor into smaller areas, crossing time as well as energy are most often directly derived by time measurements. This makes time measurements crucially important in the HEP community as well as in many other fields.

In this chapter the focus is set on time measurements in the HEP community and related fields. First, a short overview of the fields of application and the basic concepts to perform time measurements is given. Later on, different implementation approaches are presented, discussed and compared to each other. At the end, the trends observed in the community are illuminated and the need of a fine-time resolution time-to-digital converter (TDC) development is identified. The subsequent chapter then describes the difficulties experienced in fine time-resolution measurements and discusses potential solutions.

#### 2.1 Field of applications

Applications like HEP experiments, PET or FILM use the interaction of a particle like a photon to deduct its characteristics or timing information. Often such applications require several hundreds or thousands of channels to cover the physical area of interest. Depending on the systems level needs, the requirements set on the TDC can vary significantly with respect to power consumption, area, dynamic range or time resolution. Subsequently, exemplary, the requirements on the TDC set by FLIM and HEP experiments are very briefly illuminated.



(a) CMS detector: The CMS muon chamber spans across the whole experiment and implements approximately 200.000 channels. Due to the relatively low time resolution of the gas based drift time detector, time resolutions in the ns range are sufficient.



(b) ALICE detector: The time-of-flight detector of the ALICE experiment covers  $160 \text{ m}^2$  of area and has a diameter of 8 m. In total more than 150.000 channels are provided with time resolutions as low as 100 ps.

#### Figure 2.1

In HEP experiments most often large scale systems are constructed requiring to spread its electronics over a large area. Often in such experiments correlations between measurements across multiple measurement channels are necessary. For this purpose all the electronics are synchronized to one common time reference which is distributed to all the TDC channels to allow relate all measurements to the same time base. Time resolutions can vary widely across different detector architectures. The dynamic range of the system needs to cover at least one clock cycle of the system's time base which usually is in the range of several nanoseconds (e.g. 25 ns). In gas based tracking detectors, like the CMS muon detector as depicted in figure 2.1a, require moderate resolutions in the range of a few ns whereas in time-of-flight detectors, like in the ALICE experiment as depicted in figure 2.1b, time resolutions well below 100 ps are required.



**Figure 2.2:** FLIM principle. With a exponentially decaying function, the fluorescence dye emits a light pulse after excitation.

In FILM applications for example, as schematically shown in figure 2.2, the exponential decay of a fluorescence dye is recoded by repetitive excitation through a pulsed light source (e.g. laser). In this case large pixel arrays with fine granularity on a small physical area are implemented. For such applications, the TDC itself is often directly integrated into the pixel itself. This makes power consumption as well as the physical area consumed by the TDC crucial important on a system level for such applications. The dynamic range requirement is set by the pulse interval of the light source and can extend to several ns. Time resolutions in the order of 50 ps and less are required to reconstruct the exponential decay of the fluorescence dye with fine time granularity. To some limited extend such applications can improve time resolutions on a system level by repetitive measurements.

#### 2.1.1 Time Measurement Chain

A typical time measurement chain of a single channel usually employed in HEP detector designs and related fields is shown in figure 2.3. At first, a sensor uses the physical interaction between the particle and the sensor material to generate an electrical signal. The physical quantity to be measured is the charge induced by the crossing particle in the sensor. The charge gets collected and pre-amplified to generate an analog signal. In some cases the sensor itself produces a strong enough signal avoiding the need of a pre-amplifier circuit. The shape of the induced analog signal discloses important physical characteristics of the particle. Some of the information contained in the analog waveform can be extracted by discriminating the signal in amplitude. This generates

# 2. TIME MEASUREMENTS IN HIGH-ENERGY-PHYSICS AND RELATED FIELDS



Figure 2.3: Block diagram of a typical time measurement chain in HEP and related fields.

a pulse signal, still continuous in time, that can then be digitized using a TDC. The discrimination in amplitude represents a reduction of information greatly simplifying the complexity of the measurement chain. By measuring both the rising as well as the falling edge of the pulse, the energy of the particle as well as its time-of-arrival can be measured. As the signal induced in the sensor is not of perfect nature a signal preprocessing stage might be added to suppress glitches and/or provide a minimum pulse duration width and gap. To be complete, at the end of the processing chain a readout circuit responsible to provide a suitable way to access and retrieve the generated data is required.

At a system level the timing uncertainties are introduced at each level of the measurement chain as visualized in figure 2.4. This is, the timing uncertainty of the sensor itself denoted as  $\sigma_{sensor}$ , the noise introduced by the charge amplifier and the discriminator denoted as  $\sigma_{pre-amp}$  and  $\sigma_{distriminator}$  respectively as well as the timing uncertainty introduced by the TDC circuit itself, denoted as  $\sigma_{TDC}$ . Also the timing uncertainty introduced by the distribution of the reference signal through the timing, trigger and control (TTC) system of the detector, denoted as  $\sigma_{TTC}$ , need to be taken into account. If all the different error sources are assumed to be uncorrelated, the



Figure 2.4: Timing uncertainties introduced by the different blocks of the time measurement chain.

system's timing performance can be written as

$$\sigma_{sys} = \sqrt{\sigma_{sensor}^2 + \sigma_{pre-amp}^2 + \sigma_{discriminator}^2 + \sigma_{TDC}^2 + \sigma_{TTC}^2}$$

#### 2.2 System Level Implementation

Depending on the physical construction of the sensor different implementation approaches to implement the complete measurement chain exit. They can be further grouped into *single-ASIC* and *multi-ASIC* designs. Fig. 2.5 shows the three different approaches. A *single-ASIC* approach is frequently used if high integration levels are essential as commonly the case for pixel detectors. For such an approach usually the different blocks, are directly integrated, as the name suggests, into a single ASIC only the sensor itself is often implemented as a separate module. A *single-ASIC* approach can be further divided into *in-pixel* and *end-of-column* architectures. If a high integration factor is not required a more modular approach, using multiple dedicated ASICs, can be pursued.

#### 2.2.1 In-Pixel Architecture

*In-pixel* architectures are commonly used to implement a very high number of pixels on a single chip. The electronics are either integrated in very close proximity in the sensor material itself or are integrated as a separate module which is pump bonded to the sensor's back-side. Usually, all measurement functions, including the TDC, are integrated directly into the pixel area. Only the readout block that groups all the channels into one single interface is implemented on a global scale. The sensor usually

# 2. TIME MEASUREMENTS IN HIGH-ENERGY-PHYSICS AND RELATED FIELDS



Figure 2.5: Popular implementation approaches to implement the complete time measurement chain.

is designed to provide very good spatial resolution with pixel sizes as small as a few tens or hundreds of  $\mu m^2$ . With such an approach, the area available to integrate the various electronic components is determined by the pixel size itself. Due to the limited space in each pixel, performance trade-offs need to be made.

Exemplary, a block diagram of the Timepix readout chip developed at CERN [4] as well as a single photon image sensor developed at Delft university (here referred to as Delft-SPIS) [5] is shown in figure 2.6a and figure 2.6b respectively. Both implementations represent well suited examples of an *in-pixel* architecture. In the case of the Timepix chip, the sensor is to be bump-bonded to the readout chip whereas the Delt-SPIS chip employs a technology to allow to fully integrate the sensor directly into the pixel. As indicated by the dimension, only little area is occupied by the circuitry placed outside of the pixel matrix. The Timepix chip integrates a total of 65 536 pixels employing a pixel size of  $55\,\mu\mathrm{m\,x}\,55\,\mu\mathrm{m}$  whereas the Delft-SPIS integrates a total of 20 480 pixels using  $50 \,\mu \text{m} \ge 50 \,\mu \text{m}$  sized pixels. Different pixel functionality as well as different time resolution requirements exits for both solutions. For the Timepix implementation a binary counter is employed to perform the function of the TDC. In that case, the reported minimum least significant bit (LSB) achieved is 25 ns per LSB. The Delft-SPIS uses a more sophisticated approach employing a gated ring oscillator in the pixel to perform the fine interpolation. This approach allows to achieve LSB sizes as small as 55 ps. A global phase-locked-loop (PLL) is employed to overcome the intrinsic dynamic range limitation of the pixel. The power consumption of the full chip in case

of the Timepix chip is reported to be equivalent to  $13.6 \,\mu\text{W/pixel}$  whereas half of the power is devoted to the analog portion of the chip. In the case of the Delft-SPIS chip the equivalent power consumption is reported to equate to  $26.9 \,\mu\text{W/pixel}$ .



(a) Timepix floorplan: on the top square there is the 256 x 256 pixel matrix, on the bottom square there is the Timepix IO periphery and DACs.



(b) Photo micrograph of the sensor chip with pixel and microlens array details in the insets. The circuit, has an area of  $11.0 \times 12.3 \text{ mm}^2$ . The pixel pitch is  $50 \,\mu\text{m}$ .

Figure 2.6: Exemplary *in-pixel* architectures: (a) Timepix [4] (b) Delft-SPIS [5]

#### 2.2.2 End-Of-Column Architecture

To overcome the performance issues experienced in *in-pixel* architectures, in *end-of-column* architectures some parts of the measurement chain are moved to the end of the pixel matrix. Analog functions up to the discriminator are normally retained in the pixel area. The TDC, the coding as well as the common readout interface can be implemented on a global scale. Through the gained space, by moving some of the functions to the end-of-column region, performance improvements can be achieved. Such an approach also gives rise to share one TDC among several pixel to increase integration ratio. Anyway, in *end-of-column* architectures the analog signals need to

### 2. TIME MEASUREMENTS IN HIGH-ENERGY-PHYSICS AND RELATED FIELDS





(a) TDCpix top level block arrangement. The ASIC implements 1800 pixels and consumes a total area of approximately  $12 \times 20 \text{ mm}^2$ 

(b) Photomicrograph of the timecorrelated single-photon counting (TC-SPC) image sensor with a pixel detail in the inset. The integrated circuit, fabricated in a 0.35  $\mu$ m CMOS technology, has a surface of 8 x 5 mm<sup>2</sup>.

Figure 2.7: Exemplary end-of-column architectures: (a) TDCpix [6] (b) EPFL-SPIS [7]

travel from the pixel all the way down to the end-of-column area. In the end-ofcolumn area a single TDC channel can be shared among several pixels. With such an architecture the integration of many channels is still feasible but the integration level is usually smaller compared to *in-pixel* architecture based designs.

An end-of-column approach for example is adopted by the TDCpix [6], a pixel readout chip currently under development at CERN, or a single photon image sensor chip developed at EPFL [7] here referred to as the EPFL-SPIS. A representative image of both implementations is shown in figure 2.7a and figure 2.7b respectively. In both cases, components placed outside the pixel matrix occupy a substantial portion of the full chip. Whereas the EPFL-SPIS implementation is supposed to be used as a single device, the TDCpix arranges all the connections on one side to allow several chips to be abutted to cover a larger area.

The TDCpix ASIC implements a 40 x 45 pixel matrix with  $200 \times 200 \ \mu m^2$  sized pixels.

As in the case of the Timepix chip, also for the TDCpix chip, the sensor is to be implemented by separate module. One TDC with a total of 36 channels, is implemented per double-column, 40 TDCs per chip. One TDC channel is shared among 5 pixels residing in the same column. The rising as well as the falling edge of the signals are measured by two distinct channels. To avoid signal quality degradation, on chip differential signaling is used to bring the timing critical signals to the end-of-column region. A delay-locked-loop (DLL) in conjunction with a binary counter is used to achieve LSB sizes of 97.7 ps. The power consumption of the TDC is 1.8 mW/channel.

The EPFL-SPIS chip integrates a  $128 \times 128$  large pixel array using  $25 \,\mu$ m squared pixels. The sensor in this case is integrated in the pixel itself. A total of 32 TDCs are implemented, each of the TDCs providing one channel. All TDCs share a global DLL whose time codes are distributed to the respective channels. The finest interpolation step is accomplished locally, active only on the arrival of an event. The TDC itself employs digitally calibrated delay line generating 97 ps sized LSBs. Each TDC serves four columns of the array. The power of the full chip is reported to equate to  $5.9 \,\mathrm{mW/channel}$ .

#### 2.2.3 Multi-ASIC Approach

In a *multi-ASIC* approach, the measurement functions are handled as separate blocks and are implemented at a time. A relatively small number of channels is grouped together and processed by a dedicated ASIC. Such an approach is especially attractive for detector designs that have their sensors physically far apart or for which a custom ASIC design is not feasible. Anyhow, as the critical timing signals in such an approach need to be send off chip and travel over relatively large distances, additional power needs to be invested to avoid signal integrity issues.

A popular implementation of such a *multi-ASIC* approach has been adopted by many different applications and experiments. To illustrate this approach the muon drift tube detector of the CMS experiment [8, 9] as well as the time-of-flight detector of the ALICE experiment [10] are to be discussed. The architecture of the module implementing the TDC function is depicted in figure 2.8a and 2.8b and figure 2.8c respectively.

In both cases the functionality of the discriminator as well as the sensor are implemented by separate modules sitting close to the sensor. Timing critical signals are

### 2. TIME MEASUREMENTS IN HIGH-ENERGY-PHYSICS AND RELATED FIELDS





(b) Picture of a 128 channels Read-Out Board assembled on a Minicrate.

(a) Read-Out Board diagram.



(c) TDC Readout Module (TRM) conceptual design.

**Figure 2.8:** Exemplary *multi-ASIC* architectures: (a) and (b) muon drift tube (CMS) [8, 9] (c) time-of-flight (ALICE) [10]

brought out over copper wires to connect to the TDC modules. Whereas in the CMS muon drift tube detector the modules sit relatively close to the actual sensor, in the case of the ALICE time-of-flight detector, the timing critical signals need to be transmitted over 5-8 m long wires. In both cases a large number of channels are implemented: 192 000 channels for CMS muon drift tube and 157 248 channels for the ALICE time-of-flight. To perform the task of time measurement, both detector designs make use of a TDC developed at CERN referred to as the HPTDC [11].

The HPTDC uses a counter in conjunction with a DLL to generate small LSB sizes. The size of the LSB can be adjusted by means of changing the internal frequency of the DLL. The CMS muon drift chamber TDC module uses the HPTDC in its low resolution mode. In this mode, a total of 32 channels are available and the DLL runs at 40 MHz offering LSB sizes as small as 0.78125 ns. The average power consumption in this mode is 21 mW per channel. The ALICE time-of-flight TDC modules make use of the very high resolution mode. In this mode the HPTDC offers a reduced number of 8 channels and LSB sizes as small as 24.4 ps at an average power consumption of 132 mW per channel. The small LSB sizes are generated by running the DLL at a faster speed and employing a RC-delay line. To achieve uniform LSB sizes the delay of the RC-delay line needs to be carefully calibrated.

#### 2.2.4 Architecture Comparison

The approaches just discussed have a direct influence on TDC characteristics like the LSB size of the TDC<sup>1</sup>, the number of channels per ASIC or the power consumed per channel. A summary of the properties of the examples just discussed are listed in table 2.1. No sharp separation of the major properties between the different approaches can be identified. Nonetheless, a tendency as schematically presented in table 2.2 can be observed.

| Chip              | LSB                                                                   | Channels<br>(per ASIC) | Power<br>(per channel) | Method            |
|-------------------|-----------------------------------------------------------------------|------------------------|------------------------|-------------------|
| Timepix           | $\begin{array}{c} 10\mathrm{ns}\\ 55\mathrm{ps}\end{array}$           | 65536                  | $13.6\mu\mathrm{W}^a$  | counter           |
| Delft-SPIS        |                                                                       | 20480                  | $26.9\mu\mathrm{W}^a$  | gated-oscillator  |
| TDCpix            | $\begin{vmatrix} 97.7\mathrm{ps} \\ 97\mathrm{ps} \end{vmatrix}$      | 720                    | $1.8 \mathrm{mW}^b$    | delay-locked-loop |
| EPFL-SPIS         |                                                                       | 32                     | $5.7 \mathrm{mW}^a$    | delay-line        |
| HPTDC (low res.)  | $\begin{vmatrix} 781  \mathrm{ps} \\ 24.4  \mathrm{ps} \end{vmatrix}$ | 32                     | $21 \mathrm{mW}^a$     | delay-locked-loop |
| HPTDC (high res.) |                                                                       | 8                      | $132 \mathrm{mW}^a$    | RC-delay          |

<sup>a</sup>full ASIC

<sup>b</sup>TDC only

**Table 2.1:** Summary of representative implementations of *in-pixel*, *end-of-column* and *multi-ASIC* architectures. The table lists the LSB size, the number of channels per ASIC, the power consumption per channel as well as the interpolation method employed for the fine interpolation.

<sup>&</sup>lt;sup>1</sup>The LSB size is used as an indicator of the actual rms time-resolution of a TDC. In a real application the final rms time-resolution of the TDC depends on many other contributions, especially in the very fine-time resolution regime below 10 ps-rms.

### 2. TIME MEASUREMENTS IN HIGH-ENERGY-PHYSICS AND RELATED FIELDS

|                                            | number of TDC channels | LSB size | power<br>efficiency |
|--------------------------------------------|------------------------|----------|---------------------|
| In-Pixel<br>End-of-Column<br>Multiple-ASIC | <b>4</b> -<br>         |          |                     |

**Table 2.2:** Tendency of the number of TDC channels, the LSB size and the power efficiency of different implementation approaches.

Clearly, the amount of TDC channels implemented per ASIC is the most for *in-pixel* architectures. It rather strongly drops off for the other two implementation approaches. In *end-of-Column* architectures, to save resources, often several pixels share one TDC channel. This allows to serve a large pixel array reducing the amount of channels needed per ASIC. In general, due to the limited input/output (I/O) pads the number of TDC channels implemented on a single ASIC is the smallest for a *multi-ASIC* approaches.

Small LSB sizes can basically achieved in all three approaches. Anyway, a tendency towards smaller LSB sizes for a more global approach can be identified. In the end, a trade off between dynamic range and LSB size is needed to be made. Due to space restrictions in *in-pixel* architectures, some of the interpolation is sometimes moved to be implemented on a global scale to overcome some of the restrictions. In *end-of-column* architectures the space constraints are much more relaxed and literally not existing in a *multi-ASIC* approaches. This in the end makes the implemented on a global scale.

A direct comparison of the power consumption per channel does reveal another important outcome of the properties of the different architectures. Due to the high amount of channels in *in-pixel* architecture the power consumption per TDC channel needs to be very low to keep the total power consumption within reasonable limits. This requires TDC architectures that consume only very little power per channel, sacrificing time resolutions. As the occupancy per pixel is often very low (in the kHz range), TDC designs that become 'active' only on the processing of an event are unavoidable in such architectures. In *end-of-column* architectures several pixel are sharing one channel, increasing the channel's occupancy and consequently also increasing the available power level. This allows for TDC designs that perform the time interpolation in a more continuous way and consume less dynamic power. In *multi-ASIC* designs, additional power needs to be invested for off-chip signaling and most often represent the most power hungry solution. Just to be complete, TDCs covering a larger dynamic range achieving the same LSB size, in general, also require more power per channel. Anyhow, often very efficient digital logic can be employed to increase the dynamic range to basically infinity.

#### 2.3 Development Trends

Time measurements have a long tradition in the HEP community. Starting with discrete realizations, in the early days, using counters, capacitors and ADC, nowadays, highly specialized ASICs are being developed to perform the complex function of time measurement.



Figure 2.9: Time-measurement trends observed in the HEP community.

The major trends observed today are schematically shown in figure 2.9. Thereby, the number of TDC channels available per ASIC is plotted on the x-axis whereas the size per LSB as an indicator for rms-time resolution is plotted on the y-axis. Two major trends can be identified: On the one hand, the spatial resolution of the application or experiment is improving, requiring ever higher integration levels. On the other hand, improved sensor's time resolutions require ever finer time measurements. Especially, for fine-time resolution measurements, bringing out the analog quantities over long cables to connect to the TDC cannot be afforded any more. This requires in the end more

#### 2. TIME MEASUREMENTS IN HIGH-ENERGY-PHYSICS AND RELATED FIELDS

electronics to sit in close proximity to the sensors where usually space requirements are tight. TDCs offering finer time resolutions and higher channel integration are required.

Single-ASIC approaches are often aiming at very high pixel integration ratio. Depending on the internal architecture, several tens of TDC measurement channels, shared among several pixel, or thousands of measurement channels, one TDC channel integrated per pixel, are provided. In most common applications, system level timing requirements can widely vary from several ns, as reported in [4, 12, 13] and [14], down to 50 ps, as reported in [7, 15] and [16]. In some cases, LSB sizes as small as 9 ps with a total of 128 channels integrated on a single ASIC [17], have been achieved. Most often such approaches make use of highly specific TDC designs, precisely targeted to the application's requirements.

If the integration ratio of an application or experiment is less pronounced, a more modular approach using a *multi-ASIC* concept to realize the complete measurement chain can be pursued. Due to the limited I/O capabilities usually only a few channels are provided per ASIC. A very successful realization of such a flexible TDC ASIC, was named HPTDC [11], covering LSB sizes reaching from several ns down to approximately 25 ps and offering a total of 8 channels to 32 channels depending on the time resolution mode. Less flexible TDC designs like the OTIS TDC for the LHCb experiment [18] offer less flexibility but overcome some of the drawbacks of a more flexible design (e.g. channel occupancy). Anyhow, to increase the effective resolution and integration level, implementations based on FPGAs, as reported in [19, 20] and [21], have become more and more popular. Most advanced designs, as reported in [22, 23, 24] and [25], achieve equivalent LSB sizes of 27 ps down to approximately 12 ps with up to 3 ps-rms effective resolution with a total of 10 to 48 channels integrated on a single FPGA. However, small channel count at fine time resolutions represents one of the major drawbacks of a FPGA implementation approach for designs targeting the sub 5 ps-rms resolution domain. In general, *multi-ASIC* approaches require higher power per channel but allow for faster prototyping, more flexibility in design and easier re-use.

Discrete implementations are mostly obsolete. They basically suffer from low integration level and high power consumption. Advances in microelectronics achieved high time resolutions at lower power consumption [26], but still not competitive compared to a fully integrated solution.

#### 2.3.1 Novel Sensors and Experiments

In the last couple of years novel sensor designs entering into the sub 10 ps-rms resolutions regime have been reported and are summarized in table 2.3. Several proposal have been made or are under current discussion to improve the detector's or experiment's physical performance. A summary of the requirements of future developments at CERN and other institutes is given in table 2.4. As not to substantially deteriorate the sensor's timing performance, often harsh timing requirements are set on the detectors electronics measurement chain. Usually, the required TDC timing uncertainties are roughly two to three times less than the timing uncertainties introduced by the sensor itself. Especially, a set of new detector designs like the ATLAS - AFP [27, 28], CMS-HPS [27] or a novel forward particle detector for the TOTEM experiment set new challenging timing requirements on the respective parts of the measurement chain. This, in the end, requires the TDC timing precision to be well below 10 ps-rms. Nowadays, due to a missing large channel count fine-time-resolution measurement solutions, the capabilities of state-of-the-art sensor designs, as mentioned previously, can not be fully exploited.

Also, less aggressive experiments like the LHCb - TORCH [29], the CBM [30] and PANDA [31] experiment at the FAIR facilities in Germany or the *Belle II* [32] experiment at KEK in Japan would profit significantly from a new TDC development in terms of higher integration ratio and lower power consumption. This makes a fine time resolution TDC achieving time-resolutions well below the sub-10 ps-rms regime together with a large channel count highly demanded.

| Technique    | Time Resolution $(\sigma)$             |
|--------------|----------------------------------------|
| MGRPC [33]   | $8\mathrm{ps}$                         |
| MCP-PTM [34] | $6.2\mathrm{ps}$                       |
| APD $[35]$   | $10\mathrm{ps}\ (\mathrm{calculated})$ |

Table 2.3: Timing performance of state of the art sensor designs.

| 2. TIME MEASUREMENTS IN HIGH-EN | ERGY-PHYSICS AND |
|---------------------------------|------------------|
| RELATED FIELDS                  |                  |

| Experiment           | Required TDC Resolution                                  | Total Number of Channels |
|----------------------|----------------------------------------------------------|--------------------------|
| ATLAS - AFP [27, 28] | $< 10  \mathrm{ps}\text{-rms}$                           | 128 (Stage I)            |
| CMS - HPS [27]       | $< 10  \mathrm{ps}\text{-rms}$                           | -                        |
| LHCb - TORCH [29]    | $<40\mathrm{ps}\text{-rms}^{\mathrm{x}}$ (single photon) | 1024                     |
| TOTEM [no ref.]      | $<\!10\mathrm{ps}$                                       | few hundreds             |
| CBM [30]             | $\sim 50\mathrm{ps}$                                     | -                        |
| PANDA [31]           | $< 30  \mathrm{ps^x}$                                    | -                        |
| Belle II $[32]$      | $< 40  \mathrm{ps^x}$                                    | 8200                     |

**Table 2.4:** Requirements of future experiments set on the TDC. Entries marked with x are assumed to have the timing resolution equally distributed between the sensor/front-end, TDC and timing distribution network.

#### **References Chapter 2**

- [4] X. Llopart, R. Ballabriga, M. Campbell, L. Tlustos, and W. Wong, "Timepix, a 65k programmable pixel readout chip for arrival time, energy and/or photon counting measurements," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 581, no. 1–2, pp. 485 494, 2007, proceedings of the 11th International Vienna Conference on Instrumentation. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900207017020
- [5] C. Veerappan, J. Richardson, R. Walker, D.-U. Li, M. Fishburn, Y. Maruyama, D. Stoppa, F. Borghetti, M. Gersbach, R. Henderson, and E. Charbon, "A 160 x 128 single-photon image sensor with on-pixel 55 ps 10b time-to-digital converter," in *Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, 2011 IEEE International, feb. 2011, pp. 312 –314.
- [6] L. Perktold, G. A. Rinella, E. Martin, M. Noy, A. Kluge, K. Kloukinas, J. Kaplon, P. Jarron, M. Morel, and M. Fiorini, "A 9-channel, 100 ps lsb time-to-digital converter for the NA62 gigatracker readout ASIC (TDCpix)," *Journal of Instrumentation*, vol. 7, no. 01, p. C01065, 2012. [Online]. Available: http://stacks.iop.org/1748-0221/7/i=01/a=C01065

- [7] C. Niclass, C. Favi, T. Kluter, M. Gersbach, and E. Charbon, "A 128 x 128 singlephoton image sensor with column-level 10-bit time-to-digital converter array," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 12, pp. 2977 –2989, dec. 2008.
- [8] C. Fernandez, J. Alberdi, J. Marin, J. Oller, and C. Willmott, "Design and performance testing of the read-out boards for the CMS - DT chambers," pp. 363–367, 2002.
- [9] C. Bedoya, J. Marin, J. Oller, and C. Willmott, "Electronics for the cms muon drift tube chambers: the read-out minicrate," in *Nuclear Science Symposium Conference Record*, 2004 IEEE, vol. 2, oct. 2004, pp. 1309 – 1313 Vol. 2.
- [10] A. Akindinov et al., "Design aspects and prototype test of a very precise tdc system implemented for the multigap rpc of the alice-tof," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 533, no. 1–2, pp. 178 – 182, 2004, proceedings of the Seventh International Workshop on Resistive Plate Chambers and Related Detectors. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900204014287
- [11] M. Mota, J. Christiansen, S. Debieux, V. Ryjov, P. Moreira, and A. Marchioro, "A flexible multi-channel high-resolution time-to-digital converter ASIC," in *Nuclear Science Symposium Conference Record, 2000 IEEE*, vol. 2, 2000, pp. 9/155 –9/159 vol.2.
- [12] I. Perić, L. Blanquart, G. Comes, P. Denes, K. Einsweiler, P. Fischer, E. Mandelli, and G. Meddeler, "The FEI3 readout chip for the ATLAS pixel detector," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 565, no. 1, pp. 178 187, 2006, proceedings of the International Workshop on Semiconductor Pixel Detectors for Particles and Imaging. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900206007649
- [13] M. Garcia-Sciveres et al., "The FE-I4 pixel readout integrated circuit," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 636, no. 1, Supplement,

pp. S155 – S159, 2011, 7th International "Hiroshima" Symposium on the Development and Application of Semiconductor Tracking Detectors. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900210009551

- [14] H. Kästli, M. Barbero, W. Erdmann, C. Hörmann, R. Horisberger, D. Kotlinski, and B. Meier, "Design and performance of the CMS pixel detector readout chip," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 565, no. 1, pp. 188 – 194, 2006, proceedings of the International Workshop on Semiconductor Pixel Detectors for Particles and Imaging. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900206007674
- [15] D. Schwartz, E. Charbon, and K. Shepard, "A single-photon avalanche diode array for fluorescence lifetime imaging microscopy," *Solid-State Circuits, IEEE Journal* of, vol. 43, no. 11, pp. 2546 –2557, nov. 2008.
- [16] J. Richardson, R. Walker, L. Grant, D. Stoppa, F. Borghetti, E. Charbon, M. Gersbach, and R. Henderson, "A 32 x 32 50 ps resolution 10 bit time to digital converter array in 130 nm cmos for time correlated imaging," in *Custom Integrated Circuits Conference*, 2009. CICC '09. IEEE, sept. 2009, pp. 77–80.
- [17] S. Mandai and E. Charbon, "A 128-channel, 8.9-ps LSB, column-parallel two-stage TDC based on time difference amplification for time-resolved imaging," *Nuclear Science, IEEE Transactions on*, vol. 59, no. 5, pp. 2463–2470, 2012.
- [18] H. Deppe, M. Feuerstack-Raible, U. Stange, U. Trunk, and U. Uwer, "OTIS: A radiation hard TDC for LHCb," pp. 87–90, 2002.
- [19] H. Menninga, C. Favi, M. Fishburn, and E. Charbon, "A multi-channel, 10 ps resolution, FPGA-based TDC with 300 MS/s throughput for open-source PET applications," in *Nuclear Science Symposium and Medical Imaging Conference* (*NSS/MIC*), 2011 IEEE, oct. 2011, pp. 1515 –1522.
- [20] C. Favi and E. Charbon, "A 17ps time-to-digital converter implemented in 65nm FPGA technology," in *Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays*, ser. FPGA '09. New

York, NY, USA: ACM, 2009, pp. 113–120. [Online]. Available: http: //doi.acm.org/10.1145/1508128.1508145

- [21] J. Wu, Y. Shi, and D. Zhu, "A low-power wave union TDC implemented in FPGA," Journal of Instrumentation, vol. 7, no. 01, p. C01021, 2012. [Online]. Available: http://stacks.iop.org/1748-0221/7/i=01/a=C01021
- [22] C. Ugur, E. Bayer, N. Kurz, and M. Traxler, "A 16 channel high resolution (< 11 ps RMS) time-to-digital converter in a field programmable gate array," *Journal of Instrumentation*, vol. 7, no. 02, p. C02004, 2012. [Online]. Available: http://stacks.iop.org/1748-0221/7/i=02/a=C02004
- [23] E. Bayer and M. Traxler, "A high-resolution (< 10 ps rms) 32-channel time-todigital converter (TDC) implemented in a field programmable gate array (FPGA)," in *Real Time Conference (RT), 2010 17th IEEE-NPSS*, 2010, pp. 1–5.
- [24] —, "A high-resolution ( < 10 ps RMS) 48-channel time-to-digital converter (TDC) implemented in a field programmable gate array (FPGA)," Nuclear Science, IEEE Transactions on, vol. 58, no. 4, pp. 1547–1552, 2011.
- [25] E. Bayer, P. Zipf, and M. Traxler, "A multichannel high-resolution (< 5 ps RMS between two channels) time-to-digital converter (TDC) implemented in a field programmable gate array (FPGA)," in Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2011 IEEE, 2011, pp. 876–879.
- [26] P. Keranen, K. Maatta, and J. Kostamovaara, "Wide-range time-to-digital converter with 1-ps single-shot precision," *Instrumentation and Measurement, IEEE Transactions on*, vol. 60, no. 9, pp. 3162–3172, 2011.
- [27] M. G. Albrow *et al.*, "The FP420 R&D project: Higgs and New Physics with forward protons at the LHC," *Journal of Instrumentation*, vol. 4, p. 1, Oct. 2009.
- [28] L. Adamczyk, "AFP: A proposal to install proton detectors at 220m around AT-LAS to complement the ATLAS high luminosity physics program," 2011. [Online]. Available: http://atlas-project-lumi-fphys.web.cern.ch/atlas-project-lumi-fphys/ default.html

## 2. TIME MEASUREMENTS IN HIGH-ENERGY-PHYSICS AND RELATED FIELDS

- [29] N. Harnew, "TORCH: A large-area detector for precision time-of-flight measurements at LHCb," *Physics Procedia*, vol. 37, no. 0, pp. 626 – 633, 2012, proceedings of the 2nd International Conference on Technology and Instrumentation in Particle Physics (TIPP 2011). [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1875389212017427
- [30] D. Bartos et al., "Time resolution of radiation hard resistive plate chambers for the CBM experiment at FAIR," in Nuclear Science Symposium Conference Record, 2008. NSS '08. IEEE, 2008, pp. 2658–2660.
- [31] C. Schwarz et al., "Particle identification for the detector," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 639, no. 1, pp. 169 – 172, 2011, proceedings of the Seventh International Workshop on Ring Imaging Cherenkov Detectors. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900210024010
- [32] K. Nishimura, "The time-of-propagation counter for BelleII," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 639, no. 1, pp. 177 – 180, 2011, proceedings of the Seventh International Workshop on Ring Imaging Cherenkov Detectors. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900210021984
- [33] C. Williams, "Alice time of flight detectors." Presented as the 2011 CERN Detector Seminar, Geneva, Switzerland, 2011, p. 54. [Online]. Available: http://indico.cern.ch/conferenceDisplay.py?confId=149006
- [34] K. Inami, N. Kishimoto, Y. Enari, M. Nagamine, and T. Ohshima, "A 5 ps TOF-counter with an MCP-PMT," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 560, no. 2, pp. 303 – 308, 2006. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900206000611
- [35] S. White, M. Chiu, M. Diwan, G. Atoian, and V. Issakov, "Design of a 10 picosecond Time of Flight Detector using Avalanche Photodiodes," ArXiv e-prints, Jan. 2009.

# The Essence of Time-to-Digital Converter Design

Time-to-Digital converters (TDC) are frequently compared to analog-to-digital converter (ADC) designs. Although quite some similarities do exist, one major difference still remains. Time, contrarily to voltages, cannot be stored or sampled in a traditional sense. This makes commonly known ADC concepts like successive approximation (SAR), pipeline or sigma-delta designs less intuitive to be implemented in time-domain context. In contrast to ADCs, TDCs have only recently become rather popular, nowadays, being used in a wide range of different fields.

In the previous chapter we have stressed the necessity of a multichannel, fine-time resolution TDC development in high-energy physics (HEP) domain. This chapter describes common TDC operating principles and refers to popular TDC architectures reported in literature. Later in this chapter, difficulties in fine-time resolution TDC design are discussed and formally described. The succeeding chapter then develops a suitable architecture to address both, the fine-time resolution difficulties as well as the special requirements set by the HEP community.

### 3.1 Time-to-Digital Converter Principles and their Architectures

TDCs are employed in a wide range of different applications like bio-medical imaging [36, 37], laser ranging [38], test and instrumentation systems [39] or high energy-physics

experiments [40]. Recently, they have also become very popular to replace the phasedetector of a phase-locked-loop (PLL) to enable the construction of all-digital PLLs [41, 42]. Such a wide range of use, make TDC designs to substantially differ with respect to important system parameters like time-resolution, dynamic-range, dead-time, power or area consumption. A classification of TDC architectures, most frequently reported in literature, is presented in figure 3.1. We can distinguish between the TDC basic operating principle, marked as ellipses in the diagram, as well as the specific realization employing the underlying operating principle - here referred to as the architecture. However, before we describe the different operating principles and their architectures, I would like mention two important aspects when it comes to time-measurements as addressed subsequently.



**Figure 3.1:** Classification of TDC architectures according to their underlying operating principle.

#### 3.1.1 Time Measurement Principle

Time, similar to voltages, has to be measured with respect to a given reference. For time measurements, this reference can either be inherently provided by the measured signal itself or can be provided to the TDC by means of a dedicated time reference signal. The two different approaches are depicted in figure 3.2 and are referred to as a *start-stop* and a *time-tagging* principle respectively.



Figure 3.2: Basic principle of time measurements.

In the case of a *start-stop* measurement principle, the time of an *event* is measured in relative terms and its time reference is inherently provided by the event signal itself. Such a measurement acts like a local stop watch, measuring the relative position of two local events, not revealing the 'absolute' time of an event. Using *start-stop* principle allows to have the time-measurement only active on the arrival of an event and switched off otherwise. Such a principle is often used for small local systems and low power applications.

In the case of *time-tagging* all the measurements are referred to one common *reference* often provided by an external clock signal. Using *time-tagging* one common reference can be used to be distributed across the system allowing to synchronize all TDC channels to one common time base. Such a concept gives rise to measure the time of several events independently, allowing to perform absolute time measurements relating all the measurements to the same reference. As the phase relation of the event signal to the reference signal might be unknown, most often, the TDC is running in continuous fashion. Such a principle is often employed in large systems requiring to correlate measurements across different channels. To measure the relative time between two events two distinct measurements need to be performed.

#### 3.1.2 Quantization Noise

The minimum time step that can be resolved by a TDC system, is referred to as the bin-width, bin-size, resolution<sup>1</sup> or least-significant-bit (LSB) of the system. The

<sup>&</sup>lt;sup>1</sup>The term resolution in this context represents a rather unfortunate choice as the term resolution is also often employed to refer to the single-shot-precision/rms-resolution of the system.

different time slots covering the TDC's time-window are referred to as its bins. To perform the time-to-digital conversion, each event is assigned to one specific bin. On this assignment process, as shown in figure 3.3, a timing error proportional to the size of the bin is introduced. This error is more often referred to as the quantization noise. For measurements uniformly distributed across the time-window, the standard deviation of the error function can be calculated by

$$\sigma_q = \frac{LSB}{\sqrt{12}}.$$

This represents the best case achievable timing precision of a single-edge measurement for a given LSB size and defines the fundamental timing precision limit of the TDC. That is to achieve fine precision measurements the LSB size of the system shall be kept small. A mathematical formulation on the calculation of the precision of time measurements can be found in [43].



Figure 3.3: Illustration of the quantization noise.

#### 3.1.3 Counter Principle

The most simple way to perform time measurements is to use a simple counter as shown in figure 3.4. On the arrival of an event, the current counter state is stored into the so called time-capture registers (TCRs). By evaluating the latched code, the time of arrival of an event can be determined.

In this case the precision of the TDC is limited by the clock period of the reference signal's period to several hundreds of ps. For example using a 1 GHz reference clock frequency results in 1 ns LSB size equating to approximately to 300 ps rms-time resolution. The dynamic range basically is only limited by number of bits of the counter and can be further extended off-chip. No instrinsic dead-time is introduced by such a



Figure 3.4: Counter principle.

concept, only limited by the time it takes to buffer the latched code into the readout registers. Due to the binary-coded nature of a counter, power consumption as well as area consumption are small.

**Metastability** Upon the asynchronous arrival of an event, the state of the counter gets transferred into the TCRs. When latching the state of the counter, an ambiguity might occur if the event falls into the transition region of the counter. To resolve this ambiguity, either the event needs to be synchronized to counter's clock domain or a gray-counter scheme needs to be employed preventing a false code to be latched.

#### 3.1.4 Delay-Line Principle

To overcome the time resolution limit of a counter principle, several delay buffers/amplifiers can be connected in series to form a delay-line, as depicted in figure 3.5a. The delayline is used to generate multiple delayed copies of the same signal. On the arrival of an event, the state of the delay-line is stored into the TCRs. By looking for the oneto-zero transition of the latched code sequence, as illustrated in figure 3.5b, the time of arrival with respect to the reference signal can be identified. In this case the LSB size is limited by the propagation delay of the buffer/amplifier circuit. In a 90 nm technology gate delays of a inverter as small as 15 ps to 20 ps can be achieved [41, 42]. Usually, for non-inverted outputs two inverter cells need to be invested. By using a pseudo differential approach, employing two inverter chains as presented in [44], or by employing a fully differential delay cell as reported in [42], smaller LSB sizes can be achieved. However, such approaches are limited by the gate-delay of the technology. To overcome the gate-delay limitation, a so-called vernier-delay line approach, utilizing the propagation delay difference of two delay chains, as reported in [45], can be used. In case of a delay-line and vernier-delay line principle, the dynamic range is limited by the number of elements available in the delay-line, making it difficult to cover a large dynamic range. Employing recycling mechanisms in which the output is routed to the input of the delay-line, as reported in [46] and [47] respectively can be employed. Alternatively, an array of delay-lines, as reported in [48], or a 2-dimensional vernier-delay line as reported in [49] can be used.



Figure 3.5: Illustration of the delay-line principle.

Exemplary, the concept of a traditional vernier-delay line as depicted in figure 3.6a and the array of delay-line concept as shown in figure 3.6b are to be shortly discussed.



(a) Vernier delay-line architecture.

Figure 3.6: Delay-line principle variants.

In a vernier-delay line architecture the propagation delay difference is recoded to achieve LSB sizes smaller than the propagation delay of a buffer. In each stage both the reference as well as the event signal are delayed. To cover large dynamic ranges very long delay-lines are necessary. Due to the delaying behavior of the event signal, for a new conversion to be started, the event first needs to be propagated trough the complete delay-line. This in turn increases dead-time of the architecture. Space and power consumption are increased linearly with smaller LSB sizes.

An improvement over conventional vernier-delay lines is achieved if multiple propagation delay paths are provided, using an array of delay lines. This allows in a similar manner as in the case of a vernier-delay line to achieve sub-gate delay LSB sizes but does not require to delay the event signal itself avoiding to introduce any additional dead-time to the system and reducing the length of the main delay-line.

In delay-line principles, the dynamic range depends on the length of the delay line. Depending on its exact realization its dynamic range can be extended without adding additional elements. Whereas architectures not delaying the event signal can achieve low dead-times only limited by the latching process of the event, architectures delaying the event signal will add additional dead-time to the system. Power and area consumed by a delay line concept, scale linearly with the amount of delay elements depending on its exact realization and can range from very high to low power efficiency.

#### 3.1.5 Time Amplification Principle

In time-amplification based architectures, time differences are amplified by a timeamplifier (TA) as shown in figure 3.7. This permits to relax the time resolution requirements of successive stage allowing to achieve sub-gate delay time resolutions using conventional concepts like e.g. a delay-line approach in the succeeding stage.



Figure 3.7: Time-amplification principle.

Thereby, the TA represents a rather critical building block. High linearity and large dynamic range are, among other characteristics, the most important parameters of a TA. Beside the traditional dual-slope architecture as presented in [50], a cross-coupled delay line, as reported by [51] or [52], represents one of the most popular choices to implement a TA amplifier. A slightly different configuration, employing the propagation delay difference of two DLLs, is reported in [39]. A different approach to build a TA has recently been proposed by [41]. It uses the metastable transition region of a FF

to amplify short time intervals. Notably, as pointed out by [41], using a time amplifier structure also reduces noise limitations of the succeeding stage by the gain of the TA. Based on a TA concept, also more complex architectures like a pipeline structure, as reported in [53], can be realized.

Exemplary, in figure 3.8 a time amplifier based on a cross-coupled delay lines is depicted. The reference signal as well as the event signal are send into two distinct delay-lines with variable delay in inversion direction. The delay of the buffers are initially all set to fast. On the propagation of a delay cell, the delay of the opposite cell, of the other delay-line, is set from fast to slow. When the signals cross, only slow buffers are remaining to be propagated, actually causing the input delay difference between the two signals to get amplified.



Figure 3.8: Cross-coupled time-amplifier (TA).

The limited dynamic range as well as the dead-time introduced by the amplification and the succeeding conversion stage are among the most critical properties of a TA based architecture. Power consumption as well as area consumption can be to some extend reduced as the requirements set on the second stage can be relaxed using a TA principle.

#### 3.1.6 Fixed Time Delay Principle

Architectures based on a fixed-time-delay concept use propagation delay differences between different paths to perform time digitization, allowing to achieve sub-gate delay resolutions. In contrast to delay-line based architectures, fixed-time-delay based architectures generate the time-differences in a parallel manner. The basic principle is shown in figure 3.9a. Time differences can be generated in a vast variety of different ways. Among the most popular approaches are capacitive scaling, as reported in [46], and differently sized buffers, as reported in [54]. Time differences can also be achieved by the RC-delay of a wires as reported by [55] or realized by threshold voltage shifts as also reported in [54]. Fixed time delays even can be generated based on stochastic approaches using threshold voltage mismatches between identical cells as proposed by [56].

Exemplary, a capacitive scaling architecture is shown in figure 3.9b. Different sized capacitors are attached to equally sized driver structure causing different signal slopes at the output. Sampled at the mid-transition point small time delays between the different signal paths can be generated.



(b) Capacitive scaling architecture.

Figure 3.9: Fixed time-delay principle and a capacitive scaling example.

Using a fixed time delay concept, to cover a given dynamic range the number of invested elements scales linearly, making the concept demanding in power and area. However using low power unit elements can help to reduce power and area consumption. By building a more dimensional structure, delaying the event signal as well as the reference signal, the amount of elements required to cover a given dynamic range at a given resolution can be reduced. If no additional delay on the event signal is introduced, dead-times only limited by the event buffering process can be realized.

#### 3.1.7 Interpolation Principle

In interpolation two input signals delayed in time are used to generate intermediate versions of the two input signals. The basic principle is shown in figure 3.10. In the most simple realization as proposed by [57] a two input buffer can be used to generate

the sum of the two input signals. A multipath buffer as proposed by [58] can be used to improve time resolutions. Such an interpolation approach can also be directly be integrated in a FF as proposed by [59] and can also be achieved using a resistor based, as exemplary shown in figure 3.11, or diode based voltage divider structure as proposed by [60].



Figure 3.10: Interpolation Principle.



Figure 3.11: Resistive voltage division example.

An architecture based on the interpolation principle allows to generate sub-gate delay resolutions but is rather limited in dynamic-range as for each interpolation step an additional interpolating element is required. This also makes this principle rather power and area intensive. A passive realization can to some extend overcome the power and area limitations.

#### 3.1.8 Time-to-Amplitude Principle

In time-to-amplitude TDC architectures the time quantity is first converted to an analog voltage before a traditional time-to-analog converter (ADC) is employed to perform the actual conversion step. The basic principle is shown in figure 3.12. Such an approach, to some extend, shifts the conversion problem to the implementation of an ADC and is seldomly used nowadays. However, this concept have been quite popular in the past as employed by [61] or [62].

Time-to-amplitude based architectures allow to achieve fine-time resolutions mainly limited by the precision of the ADC. For fine-time and large dynamic range applications,



Figure 3.12: Time-to-amplitude principle.

also the requirements on the ADC can be significant. For fine-time resolution TDC designs, only small to medium dynamic ranges are achievable. Due to the additional conversion time required by the ADC, additional dead-time is introduced. The overall power and area consumption of the TDC is dominated by the ADC implementation and might vary substantially.

#### 3.1.9 Repetitive Measurement Principle

In repetitive TDC architectures the input signal either exhibits a repetitive characteristic or the TDC architecture itself regenerates multiple copies, delayed in time, of the same initial time difference. This in general allows to improve measurement precision in a first approximation proportional to

$$\sigma_{rep} = \frac{\sigma_{TDC}}{\sqrt{n}} \tag{3.1}$$

where n represents the number of measurements. If the delay is precisely timed with respect to a given reference signal, repetitive measurements can also be used to generate sub-gate delay LSB sizes. Well known examples using a repetitive principle, are represented by the Wave-Union architecture as reported in [63] or a first order noiseshaping architectures as reported in [64]. Successive-approximation-register (SAR) architectures as reported by [38] or linear scrambling architectures as reported by [65] represent less commonly used approaches. Representative, the wave-union architecture and the successive-approximation-register (SAR) architecture should be discussed in more detail.

In a so called wave-union architecture, on the arrival of an event, a new signal consisting of several pulses delayed in time, referred to as a wave union, is generated. The delay between the respective pulses is fixed and assumed to be known. By capturing the generated waveform, repetitive measurements of the initial signal can be performed, potentially improving the TDC's precision. The working principle of such an architecture is illustrated figure 3.13 and are especially popular in systems where large bin-size



Figure 3.13: Illustration of the Wave-Union TDC approach.



Figure 3.14: Time-to-Digital converter based on successive-approximation.

variations occur (e.g. FPGAs). In this case a wave-union approach allows to reduce the negative effect of large single bins. As no additional bins need to be added to the system to increase its resolution, larger dynamic ranges can be implemented with less effort. This also has a positive effect on power and area consumption as only little additional effort is needed to generate the wave union signal. As the processing of the captured wave union is time intensive, additional dead-time can be added to the system.

In SAR architectures, as shown in figure 3.14, time differences are resolved by comparing the input delay difference repetitively to an internally generated delayed version of the same signal. By varying the delay of one propagation delay path, the relative delay between the two signals can be adjusted until both signals experience the same delay. A digital logic controls the delay setting based on the comparison of the previous result. By recording the adjustment steps, the initial time difference of the input signals can be evaluated. The delay is varied in binary steps to reach the best result in  $\log_2(N)$  where N represents the number of possible delay settings. Such a method allows to resolve sub-gate delay time resolutions with reasonable dynamic ranges. The binary search algorithm takes several cycles to complete introducing additional deadtime. Due to its binary behavior the interpolator can be efficiently integrated in terms of power and area consumption.

#### 3.1.10 Event-Capture vs Clock-Capture

Architectures referring the measurement to a periodic signal like a reference clock can be either categorized as an *event-capture* TDC structure or a *clock-capture* TDC structure. Both principles are schematically shown in figure 3.15a and figure 3.15b respectively.



Figure 3.15: Different time capturing principles.

In an *event-capture* structure the state of an event is captured synchronously to the reference signal, each reference clock cycle. Within such a structure it is required to analyze all the bins of the TDC within one reference clock cycle or to use pipelining to actually detect the presence of an event. Alternatively, all the data can be readout synchronously to the reference signal's frequency generating large amount of data. In today's TDC architectures often high reference clock frequencies are used, in turn, requiring fast electronics for the detection or readout process. In a continuously running TDC architecture, an *event-capture* structure potentially allows to capture multiple transitions with zero dead-time.

On the other hand, in an *clock-capture* TDC structure, by its operating principle, the event captures the state of the reference clock signal, generating a valid digital code only on the arrival of an event. As the event represents an asynchronous signal, no further event can be processed until the conversion of the current event has finished and is latched into the readout registers. This intrinsically leads to dead-times of a few readout clock cycles setting a maximum event rate on the input signal. As such a TDC structure only generates data on the presence of an event, no continuous analysis of the data is necessary.

#### 3. THE ESSENCE OF TIME-TO-DIGITAL CONVERTER DESIGN

Both variants are schematically shown in figure 3.16 for the case of a traditional delay-line architecture. In an *event-capture* structure, the reference signal is connected to the CLK-input of the time capture registers whereas the event itself is connected to the register's D-input. This causes to periodically sample the state of the delay-line potentially capturing multiple transitions of the event signal. As the complete set of registers is processed each clock cycle, this architecture can potentially work with no dead-time. By identifying the position of a falling-/rising-transition within the delay-line the time of arrival of an event can be exactly determined.



on a traditional delay-line architecture.



Figure 3.16: Different time capturing principles based on a delay-line architecture.

In the case of a *clock-capture* structure the connection of the event signal and the reference signal are exchanged causing the state of the reference signal propagating though the delay-line. Only on the arrival of an event, the current state of the delay-line is stored into the TCRs. By identifying the falling edge of the captured code, the time of arrival of an event can precisely be determined.

#### 3.1.11 Multistage Architectures

To reduce the effect of device mismatches, fine-time resolution measurements most often require to trade off dynamic range against LSB size . Consequently, fine-time resolution architectures are often intrinsically limited to a rather small number of bits. To cover larger dynamic-ranges, without loosing its resolution, concepts extending the architectures dynamic range by other means are required. To extend an architecture's dynamic-range, as depicted in figure 3.17, a less aggressive principle in a higher hierarchy is added to the fine-resolution stage. An architecture based on such a principle is also often referred to as a multistage architecture. In a multistage architecture, multiple fine-time interpolation stages are required to cover the full time window. The dynamic range of the finest structure need to be large enough to fully cover the time window of the higher hierarchy structure. To further extend the dynamic range of the delay-line, a counter counting for completed clock cycles can be added.



Figure 3.17: Illustration of a multistage TDC architecture.

**Counter Metastability** In TDC architectures, capturing the state of the interpolator on the arrival of an event, previously referred to a *clock-capture* architecture, special attention to guarantee the correct latching of the counter state is required. Due to its asynchronous behavior of the event signal, on the arrival of an event, the counter might be in its switching region causing an invalid code to be latched into the TCRs. A double counter approach or a Gray counter approach using an additional bit, clocked on the falling edge, can be used to resolve the ambiguity by looking at the fine time code of the captured code.

In TDC architectures capturing the event synchronously to the reference clock, previously referred to as an *event-capture* architecture, the risk latching an invalid state can be avoided.

#### 3.1.12 Event Sampling

A special kind of a realization of an *event-capture* structure is created when input signals to the sampling registers of the architecture are exchanged as depicted in figure 3.18. In this case the output signals of the delay line are not connected to the D-input but to the CLK-input of the TCRs. This causes in the end to store the state of the event signal at different time instances, actually sampling the state of the event signal over time. To avoid metastability and account for the propagation delay of the TCRs, an additional row of registers, clocked on the rising and falling edge of the reference clock respectively, is additionally required by such an architecture to perform the clock domain transfer. Such a structure is often met in wave union TDC architectures to allow to capture multiple transitions of an event.

#### 3. THE ESSENCE OF TIME-TO-DIGITAL CONVERTER DESIGN



Figure 3.18: Illustration of a sampling TDC architecture employing a delay line to generate the required sampling signals.

#### 3.1.13 Delay Control

To cope with the delay variations due to process-voltage-temperature (PVT) variations, either a direct or an indirect control principle as shown in figure 3.19a and figure 3.19b respectively can be incorporated in the design.



Figure 3.19: Control principles.

In indirect control mechanisms the variations of a delay element are not physically adjusted to match the desired value but are rather compensated in the digital domain. To compensate for delay variations, the number of physical delay elements propagated is adjusted. Based on statistical tests the actual propagation delay of a cell is extracted. As no physical control capabilities need to be incorporated in the circuit, minimum propagation delays can be achieved. However, to allow cover the required dynamic range, dummy elements need to be implemented to always be able to cover the same dynamic range.

Contrarily, employing a direct control principle, a feedback mechanism to physically adjust the propagation delay of a circuit is implemented. This allows to compensate for delay variations without the need to add extra elements to the design. As additional circuitry needs to be added to the delay elements, the propagation delay of the cells is slowed down artificially. A common technique to implement such an analog control mechanism is to use a delay-locked loop (DLL) as depicted in figure 3.20. A DLL compares the input of the delay-line to its output and adjusts the delay of the delay elements to precisely match one reference clock period. Using such a control mechanism allows the LSB size to be controlled by the reference signal's frequency.



Figure 3.20: A DLL block diagram.

#### 3.2 Challenges in Fine-Time Resolution TDC Design

In an ideal TDC the time resolution is only limited by its LSB size. However, in a real implementation, other contributions like device mismatch or power supply noise clearly need to be taken into account. Especially, in the sub 10 ps-rms resolution regime, second order effects start getting more and more dominant. To achieve fine-time resolutions in the end, second order effects need to be well understood and addressed later in the choice of architecture as well as during the design phase.

Imperfections present in a TDC can be represented by a set of five different parameters. This is, the standard deviation of the real quantization error for a each specific bin represented by  $\sigma_{qDNL}$ . The rms time resolution degradation due to integral nonlinearity effects as denoted by  $\sigma_{wINL}$  and the rms jitter due to thermal and power supply noise denoted by  $\sigma_{noise}$ . Global time shifts due to PVT variations / global RC-delays and inter-channel crosstalk are represented by  $\Delta t_{TS}$  and  $\Delta t_{CT}$  respectively.

The quantization noise error,  $\sigma_{qDNL}$ , the integral non-linearity effects,  $\sigma_{noise}$  as well as the jitter in the system,  $\sigma_{wINL}$  can be combined and represented by a single parameter denoted as  $\sigma_{TDC}$  representing the final single-shot rms resolution of the TDC. Due to measurement constraints, a minimum amount of reference clock jitter denoted as  $\sigma_{ref}$  also need to be included in this term. In the case of uncorrelated error sources,  $\sigma_{TDC}$ , can be written as the root-mean-square sum as

$$\sigma_{TDC} = \sqrt{\sigma_{qDNL}^2 + \sigma_{wINL}^2 + \sigma_{noise}^2 + \sigma_{ref}^2}$$

However, timing offset shifts as well as timing degradation due to inter-channel crosstalk are most often not included in the final rms resolution of the TDC and are reported separately as they depend on operating conditions as well as the characteristics of the event signal themselves.

#### 3.2.1 Error Sources

Most often not a single error source can be identified but several root causes affecting the same performance parameter need to be considered. Table 3.1 lists the different error sources and the corresponding parameter(s) which they might affect. E.g., the non-linearity in a system is introduced by device mismatches, (synchronous) noise on the supply rails as well as (per bin) RC-delays. To reduce the effect of  $\sigma_{inl}$  all three error sources might need to be addressed.

| Error Source                   | Affected performance parameter                  |
|--------------------------------|-------------------------------------------------|
| Quantization noise             | $\sigma_q$                                      |
| Device mismatch                | $\sigma_{INL},\sigma_{qDNL}$                    |
| RC-delays                      | $\sigma_{INL},  \sigma_{qDNL},  \Delta t_{TS}$  |
| Power supply and thermal noise | $\sigma_{INL},  \sigma_{qDNL},  \sigma_{noise}$ |
| Inter channel crosstalk        | $\Delta t_{CT}$                                 |
| PVT variations                 | $\Delta t_{TS}$                                 |
| Reference Clock Jitter         | $\sigma_{ref}$                                  |

Table 3.1: Common error sources in TDC design.

#### 3.2.2 Quantization Noise

The quantization error can be kept reasonably small by designing a system with small LSB sizes. For a system achieving 5 ps LSB sizes the rms-time resolution due to the quantization noise the achieved precession can be at best  $5 \text{ ps} / \sqrt{12}$ , as emphasized by the equation given below.

$$\sigma_q = \frac{LSB}{\sqrt{12}} = \frac{5\,\mathrm{ps}}{\sqrt{12}} = 1.44\,\mathrm{ps}\text{-rms}$$

Anyway, at some point other error sources will become dominating, making a further reduction in LSB dispensable.

#### 3.2.3 Device Mismatch

At the level of sub 10 ps rms-time resolution, the size of one LSB is substantially smaller than the propagation delay of a circuit. Device mismatches can cause deviations in propagation delay leading to large time variations. To illustrate this effect lets assume a buffer experiencing a propagation delay of 100 ps. Assuming a standard variation of 3%, propagation delay variations up to 9 ps need to be expected for a 3- $\sigma$  design. For small LSB sizes this might lead to missing codes or substantially larger LSB sizes. Those variations in LSB will manifest themselves as integral non-linearity (INL) and differential non-linearity (DNL) errors as illustrated in figure 3.21. DNL errors will affect the extended quantization noise term  $\sigma_q DNL$  whereas INL errors will lead to constant time shifts, represented by  $\sigma_{wINL}$ . If measured in advance, INL errors can be corrected offline whereas DNL errors cannot.



Figure 3.21: Illustration of the INL and DNL error respectively.

For hits uniformly distributed in time, larger bins will collect more hits. Due to the larger LSB size this leads to a larger quantization error for hits falling into those larger bins, leading to an increase in quantization error. If mismatch is present in the system, the standard deviation of the quantization error needs to be calculated based on the real values of LSB sizes taking into account their relative probability  $p_i$  to receive a hit,

defined as  $LSB_i/T_{ref}$ . This relation is more clearly expressed by

$$\sigma_{qDNL} = \sqrt{\sum_{i=0}^{N-1} \left(\frac{LSB_i}{\sqrt{12}}\right)^2 \cdot p_i}.$$
(3.2)

In a similar manner the standard deviation of the INL can be represented by the root-mean-square sum of the INL error of each specific bin also weighed by its relative probability  $p_i$  as written as

$$\sigma_{wINL} = \sqrt{\sum_{i=0}^{N-1} \left( INL_i - \overline{INL} \right)^2 \cdot p_i}, \quad \text{with} \quad INL_i = \frac{\Delta LSB_i}{2} + INL_{i-1}, \quad (3.3)$$

where  $\overline{INL}$  represents the mean INL error,  $\Delta LSB_i$  represents the deviation from the mean LSB size and  $INL_i$  the accumulated time error with  $INL_{-1} = 0$  ps.

#### 3.2.4 PVT Variations

Time offsets are proportional to the total propagation delay of timing critical signals and can vary substantially with process-voltage-and-temperature (PVT) variations. Variations can easily cause time offsets of 50 % and more to be introduced. For finetime resolution measurements, in general, compensation techniques are required to avoid large timing errors. E.g. an external reference clock fixed in frequency might be used to compensate for timing offset errors due PVT variations.

Assuming a buffer with a propagation delay of 100 ps, time variations of 50 ps need to be expected. Depending on the point of application, time variations can manifest themselves as non-linearity errors changing the LSB size affecting  $\sigma_{wINL}$  and  $\sigma_{qDNL}$ or as global timing offsets represented by  $\Delta t_{TS}$ . Both effects are schematically shown in figure 3.22.

Mathematically, local time variations need to be accounted for in the same way as device mismatches. In contrast, global variations most often can be approximated by a linear function. In that case indicating the coefficient e.g. 2 ps/mV or 1 ps/°C would be sufficient. For global variations only voltage and temperature variations are subject to change during operation. Process variations and constant time offsets between the event signal and the reference signal can be measured and corrected prior to operation.



Figure 3.22: Illustration of local and global delay variations respectively.

#### 3.2.5 Power Supply Noise

The sources of power supply noise can basically be broken down into two effects as depicted in figure 3.23. On the one hand, this is noise introduced by the external power source, on the other hand, this is the noise coming from power supply voltage drops due to the switching activity of the circuit itself. The noise introduced by the external imperfections can be modeled as a noise source in series to the power supply.



Figure 3.23: Power supply noise sources.

In the end, power supply noise will modulate the switching time of the circuit introducing timing errors and can be modeled as a shift in threshold voltage denoted as  $\Delta V_{th}$ . As depicted in figure 3.24, for a given change in threshold voltage, steeper signal edges experience less timing variations.

Depending on the nature of timing errors introduced on the signal, they will manifest



Figure 3.24: Illustration of timing error introduced by power supply noise modeled as threshold voltage shifts.

themselves in different manners. Power supply noise synchronous with the system clock as shown in figure 3.25a will cause the LSB of the different bins to change by the same amount each clock cycle. This error will finally show up as a non-linearity error causing an increase of  $\sigma_q DNL$  and  $\sigma_{wINL}$ . In contrast, random asynchronous noise will randomly change the size of the bins and manifest themselves as jitter, expressed as  $\sigma_{noise}$  in the system as shown in figure 3.25b.



Figure 3.25: LSB variation for synchronous and asynchronous power supply noise.

Thermal noise also deteriorates the time resolution of the TDC introducing timing jitter in the final measurement. If uncorrelated to other sources it can be taken into account by adding it in a root-mean-square sense to the final time resolution.

Mathematically, jitter due power supply and thermal noise can expressed on a per bin basis by

$$\sigma_{noise_i} = \lim_{M \to \infty} \sqrt{\frac{1}{M} \sum_{n=0}^{M-1} (t_n^{ideal} - t_n^{real})^2}$$

where  $t_n$  represents the  $n^{th}$  minus-to-plus crossing of a signal and M the number of cycles. The standard deviation of jitter across all bins, denoted as N, can be expressed

by the formula given below

$$\sigma_{noise} = \sqrt{\frac{1}{N} \sum_{i=0}^{N-1} \sigma_{noise_i}^2}.$$

## 3.2.6 Inter-Channel Crosstalk

If multiple measurements are to be performed on a multi-channel TDC, two distinct channels can easily influence one another. Two different error sources causing interchannel crosstalk can be identified and are illustrated in figure 3.26. Crosstalk can be introduced, by capacitive coupling between two adjacent lines as well as by power supply voltage drops.



(b) Power supply drop induced crosstalk.

Figure 3.26: Crosstalk induced timing variations.

Exemplary, the graph of figure 3.27 illustrates the delay variations of a signal running in parallel over  $300 \,\mu\text{m}$  distance with a  $0.4 \,\mu\text{m}$  distance to an aggressor line. The capacitance between the wires is also shown. For the test-setup signal transition times of 50 ps are employed. Longer signal transition lines introduce larger time shifts. For a constant time difference between victim and aggressor line, a constant time offset is introduced as represented by  $\Delta_{CT}$ . For randomly spaced signals in time, inter-channel crosstalk will add up as jitter present in the system and will depend on the exact distribution of events.



(b) Observed time shifts on victim line.

Figure 3.27: Time shifts introduced on a victim line due to capacitive coupling of two adjacent lines.

## 3.2.7 RC-Delays

Due to signal transition time degradation, as depicted in figure 3.28, the RC-delay behavior of a wire can cause time delays. Such delays can introduce global or local time shifts and need to be taken into account accordingly by  $\sigma_{qDNL}$ ,  $\sigma_{wINL}$  and  $\Delta t_{TS}$ . E.g. a signal distributed over a 300  $\mu$ m line with intermediate capacitive loads due to its RC-delay behavior, can experience a total delay difference between the start and the end point of a several ps.

## 3.2.8 Time Reference Jitter

For fine-time measurements it is of crucial importance to provide a good time reference to the TDC. Any jitter on the time reference signal will show up in the measurement



Figure 3.28: Delay offset shifts due to the RC-delay of a wire.

deteriorating the TDCs performance. If a closed loop approach is employed to control the LSB size of the system, the reference jitter might be influenced by the transfer function of the loop. In principal, any external uncertainties coming from other sources than the TDC itself should be excluded from the characterization of the TDC. However, in real set-up it is basically impossible to disentangle reference jitter from other jitter sources present in the system. A minimum amount of jitter, as represented by  $\sigma_{ref}$ , will always be included in the timing precision specification of the TDC.

## **References Chapter 3**

- [36] D. Schwartz, E. Charbon, and K. L. Shepard, "A single-photon avalanche diode array for fluorescence lifetime imaging microscopy," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 11, pp. 2546–2557, 2008.
- [37] A. Yousif and J. Haslett, "A fine resolution TDC architecture for next generation PET imaging," *Nuclear Science*, *IEEE Transactions on*, vol. 54, no. 5, pp. 1574– 1582, 2007.
- [38] A. Mantyniemi, T. Rahkonen, and J. Kostamovaara, "A CMOS time-to-digital converter (TDC) based on a cyclic time domain successive approximation interpolation method," *Solid-State Circuits, IEEE Journal of*, vol. 44, no. 11, pp. 3067– 3078, 2009.
- [39] R. Rashidzadeh, R. Muscedere, M. Ahmadi, and W. Miller, "A delay generation technique for narrow time interval measurement," *Instrumentation and Measurement, IEEE Transactions on*, vol. 58, no. 7, pp. 2245–2252, 2009.
- [40] G. A. Rinella, M. Fiorini, P. Jarron, J. Kaplon, A. Kluge, E. Martin, M. Morel, M. Noy, L. Perktold, and K. Poltorak, "The TDCpix readout

### 3. THE ESSENCE OF TIME-TO-DIGITAL CONVERTER DESIGN

ASIC: A 75 ps resolution timing front-end for the Gigatracker of the NA62 experiment," *Physics Procedia*, vol. 37, no. 0, pp. 1608 – 1617, 2012, proceedings of the 2nd International Conference on Technology and Instrumentation in Particle Physics (TIPP 2011). [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1875389212018779

- [41] M. Lee and A. Abidi, "A 9 b, 1.25 ps resolution coarse-fine time-to-digital converter in 90 nm CMOS that amplifies a time residue," *Solid-State Circuits, IEEE Journal* of, vol. 43, no. 4, pp. 769–777, 2008.
- [42] M. Zanuso, P. Madoglio, S. Levantino, C. Samori, and A. Lacaita, "Time-to-digital converter for frequency synthesis based on a digital bang-bang DLL," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 57, no. 3, pp. 548–555, 2010.
- [43] H. P. Inc., "Time interval averaging," *Application note 162-1*. [Online]. Available: http://hpmemory.org/an/pdf/an\_162-1.pdf
- [44] R. Staszewski, S. Vemulapalli, P. Vallur, J. Wallberg, and P. Balsara, "1.3 v 20 ps time-to-digital converter for frequency synthesis in 90-nm CMOS," *Circuits and Systems II: Express Briefs, IEEE Transactions on*, vol. 53, no. 3, pp. 220–224, 2006.
- [45] P. Dudek, S. Szczepanski, and J. Hatfield, "A high-resolution CMOS time-todigital converter utilizing a vernier delay line," *Solid-State Circuits, IEEE Journal* of, vol. 35, no. 2, pp. 240–247, 2000.
- [46] J.-P. Jansson, A. Mantyniemi, and J. Kostamovaara, "A CMOS time-to-digital converter with better than 10 ps single-shot precision," *Solid-State Circuits, IEEE Journal of*, vol. 41, no. 6, pp. 1286–1296, 2006.
- [47] J. Yu, F. F. Dai, and R. Jaeger, "A 12-bit vernier ring time-to-digital converter in 0.13 μm CMOS technology," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 4, pp. 830–842, 2010.
- [48] K. Shimizu, M. Kaneta, H. Lin, H. Kobayashi, N. Takai, and M. Hotta, "Timeto-digital-converter with small circuitry," in *Design Automation Conference*, 2009. ASP-DAC 2009. Asia and South Pacific, 2009, pp. 109–110.

- [49] A. Liscidini, L. Vercesi, and R. Castello, "Time to digital converter based on a 2dimensions vernier architecture," in *Custom Integrated Circuits Conference*, 2009. *CICC '09. IEEE*, 2009, pp. 45–48.
- [50] P. Chen, C.-C. Chen, and Y.-S. Shen, "A low-cost low-power CMOS time-todigital converter based on pulse stretching," *Nuclear Science, IEEE Transactions* on, vol. 53, no. 4, pp. 2215–2220, 2006.
- [51] T. Nakura, S. Mandai, M. Ikeda, and K. Asada, "Time difference amplifier using closed-loop gain control," in VLSI Circuits, 2009 Symposium on, 2009, pp. 208– 209.
- [52] S. Mandai, T. Iizuka, T. Nakura, M. Ikeda, and K. Asada, "Time-to-digital converter based on time difference amplifier with non-linearity calibration," in ESS-CIRC, 2010 Proceedings of the, 2010, pp. 266–269.
- [53] Y.-H. Seo, J.-S. Kim, H.-J. Park, and J.-Y. Sim, "A 0.63 ps resolution, 11 b pipeline TDC in 0.13 μm CMOS," in VLSI Circuits (VLSIC), 2011 Symposium on, 2011, pp. 152–153.
- [54] K. Minami, M. Mizuno, H. Yamaguchi, T. Nakano, Y. Matsushima, Y. Sumi, T. Sato, H. Yamashida, and M. Yamashina, "A 1 GHz portable digital delay-locked loop with infinite phase capture ranges," in *Solid-State Circuits Conference*, 2000. Digest of Technical Papers. ISSCC. 2000 IEEE International, 2000, pp. 350–351.
- [55] M. Mota and J. Christiansen, "A high-resolution time interpolator based on a delay locked loop and an rc delay line," *Solid-State Circuits, IEEE Journal of*, vol. 34, no. 10, pp. 1360–1366, 1999.
- [56] V. Kratyuk, P. Hanumolu, K. Ok, U.-K. Moon, and K. Mayaram, "A digital PLL with a stochastic time-to-digital converter," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 56, no. 8, pp. 1612–1621, 2009.
- [57] T. Knotts, D. Chu, and J. Sommer, "A 500 MHz time digitizer IC with 15.625 ps resolution," in Solid-State Circuits Conference, 1994. Digest of Technical Papers. 41st ISSCC., 1994 IEEE International, 1994, pp. 58–59.

- [58] M. Straayer and M. Perrott, "An efficient high-resolution 11-bit noise-shaping multipath gated ring oscillator tdc," in VLSI Circuits, 2008 IEEE Symposium on, 2008, pp. 82–83.
- [59] M.-W. Chen, D. Su, and S. Mehta, "A calibration-free 800 MHz fractional-N digital PLL with embedded TDC," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 12, pp. 2819–2827, 2010.
- [60] S. Henzler, S. Koeppe, D. Lorenz, W. Kamp, R. Kuenemund, and D. Schmitt-Landsiedel, "Variation tolerant high resolution and low latency time-to-digital converter," in *Solid State Circuits Conference*, 2007. ESSCIRC 2007. 33rd European, 2007, pp. 194–197.
- [61] E. Raisanen-Ruotsalainen, T. Rahkonen, and J. Kostamovaara, "A high resolution time-to-digital converter based on time-to-voltage interpolation," in *Solid-State Circuits Conference*, 1997. ESSCIRC '97. Proceedings of the 23rd European, 1997, pp. 332–335.
- [62] R. Bassini, C. Boiano, S. Brambilla, and M. Malatesta, "An eight-channel time-to-digital converter on a VME board," *Nuclear Instruments and Methods* in *Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 340, no. 3, pp. 580 – 583, 1994. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0168900294901414
- [63] J. Wu, Y. Shi, and D. Zhu, "A low-power wave union TDC implemented in FPGA," Journal of Instrumentation, vol. 7, no. 01, p. C01021, 2012. [Online]. Available: http://stacks.iop.org/1748-0221/7/i=01/a=C01021
- [64] M. Straayer and M. Perrott, "A multi-path gated ring oscillator TDC with firstorder noise shaping," *Solid-State Circuits, IEEE Journal of*, vol. 44, no. 4, pp. 1089–1098, 2009.
- [65] M. Zanuso, S. Levantino, A. Puggelli, C. Samori, and A. Lacaita, "Time-to-digital converter with 3-ps resolution and digital linearization algorithm," in *ESSCIRC*, 2010 Proceedings of the, 2010, pp. 262–265.

# High-Resolution, Multi-Channel Time-to-Digital Converter: A Proposal

On a system level, all components starting form the sensor down to the time-to-digital converter (TDC) influence the timing performance of an application. Thereby, for very high resolution measurements in the 10 ps-rms resolution domain, the TDC represents a fundamental critical building block. In contrast to other applications such as all-digital-phase-locked-loops (ADPLLs), on chip characterization or laser ranging in high-energy physics (HEP), experiments rely on high precision, single shot time measurements with multiple channels integrated on the same chip. Consequently, the difficulties experienced in the fine-time resolution design as well as the specialties met in the HEP community need to be addressed for the design of a TDC.

The previous chapter presented time-to-digital architectures and discussed challenges needed to be faced in the fine-time resolution domain. In this chapter I concentrate on the development of a suitable architecture serving the demanding needs of the HEP community. First, I identify the special requirements set by the HEP community and address important features required on a system level. After, I discuss the proposed architecture with a special focus on high resolution measurements for highly integrated designs. At last, key circuit requirements for a successful implementation are discussed. The preceding chapter then develops a transistor level implementation of the proposed architecture.

## 4.1 Needs in the High-Energy Physics Community

Achieving fine-time resolution measurements on a large scale, represents a non trivial task and requires a great development effort. From one experiment to another, specifications like time-resolution, number of channels, dynamic range or power consumption can vary substantially. To share the development effort across different experiments, a flexible solution adapted to the needs of larger range of experiments is highly appreciated by the community.

## 4.1.1 Time-to-Digital Converter Requirements

For fine-time resolution detector designs, the list below briefly summarizes the major requirements of a TDC suitable for at least a large set of new applications in the HEP community.

- high number of channels per ASIC: in the range of 32 - 128
- high single-shot time resolution:

 $< 5 \,\mathrm{ps}$ -rms up to a few hundreds of ps-rms (adjustable)

- measurements related to one common time reference: preferably 40 MHz (LHC clock)
- large dynamic range: to cover multiple LHC clock cycles
- power consumption in the range of few watt per ASIC: significantly lower for lower time resolution modes
- flexible digital logic:
  - triggered, non-triggered readout with buffering trailing, leading and time-over-threshold (TOT) measurements

**Multi-Channel** In HEP it is crucial to build systems on a large scale employing hundreds or thousands of measurement channels. In the end, the amount of channels per ASIC is dominated by the packing and PCB manufacturing costs. A smaller package type is usually easier accessible for smaller experiments. Experiments find a total of 32 to 128 channels per ASIC rather acceptable.

**Time-resolution** Time resolutions starting from well-below 10 ps-rms for very demanding experiments up to a couple of several tens of ps have to be covered for state of the art detector designs. Especially if the design is to be used by a large set of different customers, the flexibility to adjust the timing performance precisely to the system's needs and in turn profit from lower power consumption, smaller data words or larger dynamic-range is highly appreciated by the experiments.

**Common-Time Reference** In high energy physics often measurements need to be correlated across a large physical area. This usually requires to relate all the measurements to one common time reference. Such a time reference usually is provided by means of a reference signal distributed across the system. In HEP experiments this reference is often derived from the 40 MHz LHC clock. That is a system measuring the relative position between two signals is often found to be unsuitable.

**Dynamic-Range** Whereas some experiments need to distinguish particles in a rather narrow time window of a few ns, others need to identify interesting events later in time, requiring several hundreds or thousands of ns of dynamic range. This requires the TDC to provide large dynamic ranges in the order of  $\mu$ s.

**Power Consumption** As any material put in the way of a particle crossing the detector negatively affects the measurements, a minimum amount of cabling and cooling is supposed to go into the machine. Lower power TDC designs allow thinner cables and less cooling material to be installed in the detector. To reduce the cooling efforts a maximum power of a few watt per ASIC is envisaged. This represents a rather vague requirement subject to change with time resolution and number of channels integrated per ASIC. To allow experiments with less stringent timing requirement to profit from lower power consumptions, a TDC design offering lower power consumption at lower time resolutions is required.

**Digital Logic** Last but not least, for a generic TDC design a flexible digital logic, to retrieve the measurement data, is essential. Different measurement modes (trailing, leading and TOT) as well as different readout capabilities (triggered, non-triggered) are required by the users.<sup>1</sup>

## 4.1.2 Fine-Time Resolution Aspects

To full-fill the needs of state-of-the art sensor designs, rms-time resolutions below 5 psrms are needed to be achieved. The main challenges experienced in fine-time resolution design have been discussed in section 3.2. Whereas timing errors resulting from *quantization noise*, *device mismatch*, *PVT variations* and *RC-delays* can rather accurately be evaluated and addressed during design time, *power supply noise* as well as *inter-channel crosstalk* are in general much more difficult to quantize. In any case, for fine precision time measurements, a clean time reference need to be supplied to the system.

#### 4.1.2.1 Quantization Noise

The smaller the LSB size the smaller the quantization noise. However, especially for small LSB sizes other error sources will become more and more dominated. Therefore, a good rule of thumb is to choose a LSB size in the same order of the targeted rms-time resolution as proposed by equation 4.1. This keeps the contribution due to the quantization error within reasonable limits  $(1/\sqrt{12}$ -times smaller). In our case, to achieve sub 5 ps-rms time resolutions, LSB sizes smaller than the gate delay of modern technologies need to be achieved.

$$\sigma_{TDC} \approx \text{LSB} \tag{4.1}$$

## 4.1.2.2 Device Mismatch

In a most simple approach, the propagation delay of a cell can be modeled as

$$\tau = \frac{V_{osc} \cdot C_{eff}}{I_D} \tag{4.2}$$

where  $V_{osc}$  represents the voltage difference of the transition,  $C_{eff}$  represents the effective load capacitance of the charging node and  $I_D$  represents the charging current.

 $<sup>^1{\</sup>rm The}$  digital portion of the TDC ASIC is not scope of this thesis and is subject to be addressed in the future.

Assuming in a first approximation the voltage  $V_{osc}$  and the load capacitance  $C_{eff}$  not to change, the expected timing variations can be written as

$$\sigma(\Delta \tau) = \tau \cdot \frac{\sigma(\Delta I_D)}{I_D} = \frac{V_{osc} \cdot C_{eff}}{I_D} \cdot \frac{\sigma(\Delta I_D)}{I_D}.$$
(4.3)

where  $\frac{\sigma(\Delta I_D)}{I_D}$  represents the relative current mismatch between to identical cells.

That is, to keep the timing variations small, timing critical circuits need to employ fast transition times and need to provide good current matching. In general this requires to design circuits with large overdrive voltages, i.e. large width devices, to provide good current matching, and to keep small transistor devices to allow for fast transition times, i.e. minimum length devices. Those relations are more often represented by the current mismatch, see equation 4.4, as well as the transit frequency, see equation 4.5, of a cell as discussed e.g. in [66].

$$\sigma(\frac{\Delta I_D}{I_D}) = \sqrt{\left[\frac{1}{\beta} \cdot \sigma(\Delta\beta)\right]^2 + \left[\frac{2}{V_{gs} - V_{th}} \cdot \sigma(\Delta V_{th})\right]^2}$$
(4.4)

$$f_T = \frac{g_m}{C_{gg}} = \frac{\mu \cdot (V_{gs} - V_{th})}{L}$$
(4.5)

If power consumption plays an important role, small length <u>and</u> width devices with larger device mismatch need to be employed, in turn, demanding calibration mechanisms to be implemented. In any case, to reduce the effect of mismatch introduced in the design, device mismatches needs to be carefully taken into account early in the design process and need to be considered already on an architectural level.

## 4.1.2.3 PVT Variations

In fine-time resolution TDC designs, it is essential to implement calibration techniques to compensate for timing variations introduced by PVT variations. For a discussion on the delay control principles, the reader is referred to section 3.1.13. Employing calibration mechanisms not physically adjusting the delay of a cell, require to implement dummy elements to be able to also cover extreme cases. In a multi-channel TDC design this would lead to a considerable implementation overhead. Contrarily, in an approach where the delay is adjusted to match the desired LSB by a feedback system (e.g. through a DLL), here referred to as a direct control scheme, does not require any additional circuitry on a per channel basis. The overhead to implement a control loop, e.g. DLL, is small and can be shared across all the channels. However, usually control structure artificially reduce the propagation delay of a cell leading to increased current consumption. Nevertheless, due to its reduced channel complexity, in multichannel environments, usually a direct control mechanism represents the preferred choice.

#### 4.1.2.4 Power Supply Noise

For fine-time resolution measurements, a low noise power source as well as a carefully designed off- and on-chip power distribution network are crucial important. Additionally, for an efficient power domain separation, technology features like triple-well or substrate isolation are highly advantageous. As evaluation of power supply noise is difficult to be addressed during design time, a 'silent' and 'robust' design is preferably envisaged. To reduce the amount of timing error introduced in the design, fast signal slopes are required. As illustrated in figure 4.1, to drive large capacitive loads, configurations minimizing the total propagation delay of a cell are preferred. Although, more signal transitions are introduced. However, due to increased capacitive loads, more power needs to be invested. If a noise sensitive circuit is to be integrated with noisy



(a) Single inverter example.



(b) Exponentially scaled inverter chain example.

Figure 4.1: Transition times and signal slopes using different driving configuration to drive a large capacitive load.

circuity on the same chip, short propagation delays together with fast signal edges are preferred for good power supply noise robustness.

### 4.1.2.5 Inter-Channel Crosstalk

Inter-channel crosstalk can be introduced due to power supply drops or capacitive coupling. Inter-channel crosstalk due to capacitive coupling cannot be avoided on an architectural level and needs to be addressed during layout phase. Increased distances as well as shielding are proper approaches to decrease or avoid coupling effects. To reduce noise due to power supply drops, it is crucial to sufficiently dimension power supply rails as well as use decoupling capacitors to keep the influence of power supply drops due to current peaks small. However, the more dynamic current is consumed by a system during a measurement, the more noise is introduced onto the power supply rails potentially affecting nearby channels. The amount of power supply noise introduced by an architecture is difficult to predict during design time. Architectures, intrinsically exhibiting small dynamic power consumption can greatly contribute to reduce interchannel crosstalk.

#### 4.1.2.6 RC-Delay

To avoid delay offsets to be introduced due to the RC-delay of a wire, time critical signals need to be distributed in a balanced manner. This can be achieved by building a h-tree like structure as depicted in figure 4.2. Such a tree-like structure adds additional capacitive loading to the signal but keeps the propagation delay path of all signals equal. For large fan-out structures, this can lead to huge capacitive loads that might require repeater structures to sustain sharp signal edges. Such repeaters need to be dimensioned not to introduce timing offsets due to device mismatches themselves. In general, for time critical signals the RC-delay of the routing structures need to be carefully evaluated during design time and carefully addressed during the layout phase.

### 4.1.3 Architectural Considerations

The choice of TDC architecture represents a critical design step to make, influencing a large set of system level characteristics. Beside the LSB size, system level aspects like calibration effort, noise susceptibility, inter-channel crosstalk, synchronization, dynamic

## 4. HIGH-RESOLUTION, MULTI-CHANNEL TIME-TO-DIGITAL CONVERTER: A PROPOSAL



Figure 4.2: Distribution of time critical signals.

range, power consumption or dead-time can greatly be influenced by the choice of architecture.

## 4.1.3.1 Fine-Time Generator Sharing

In a multi-channel system, the creation of the smallest LSB can either be accomplished locally or globally. A diagram showing the two different approaches is shown in figure 4.3a and figure 4.3b respectively. In a fully *local interpolation* approach, the time-generator, the calibration circuit as well as the time-capture registers (TCRs) need to be implemented for each channel separately. However, in a *global interpolation* approach the time-generator as well as the calibration circuit can be implemented centrally shared across all the channels. In this case only the TRCs need to be implemented on a per channel basis. Generally speaking, for a *global interpolation* approach, it is required not to delay the event signal to generate its smallest LSB size. If the event signal needs to be delayed, a *local implementation* approach has to be pursued. The choice of a local or global generation approach influences a great number of system level aspects as to be discussed subsequently.

**Device Mismatch** Especially in a multi-channel system, calibration can represent a substantial effort. In a *local interpolation* approach, to calibrate out device mismatches, calibration needs to be applied on a per channel basis, greatly increasing the total number of registers to be implemented to keep all the calibration values. Contrarily, if *global implementation* approach is pursued, the calibration can be applied centrally greatly reducing the number of registers on a chip. However, employing a *global implementation* 





Figure 4.3: Illustration of a) a local implementation approach b) a global implementation approach.

approach, the TCRs need to be designed to provide good intrinsic device matching. For the implementation of a multi-channel TDC, due to the reduced channel complexity as well as its reduced testing time, a *global interpolation* approach represents the first choice.

**Power Supply Noise Sensitivity** Usually, in a *local interpolation* approach, to generate the finest LSB size, the event signal needs to be delayed additionally by the TDC architecture. This in turn, due to the event signal's increased propagation delay, potentially degrades the TDC's timing performance. For a general purpose fine-time resolution TDC with a large amount of digital logic implemented on the same die, power supply noise is very difficult to be addressed during design time. Most often the final timing performance can only be evaluated very late in the design process. To reduce the risk to suffer from power supply noise during a late phase in the development, a less noise sensitive architecture following a *global interpolation* approach represents the preferred solution.

## 4. HIGH-RESOLUTION, MULTI-CHANNEL TIME-TO-DIGITAL CONVERTER: A PROPOSAL

**Power Consumption** In a *local interpolation* scheme, the finest LSB usually only needs to be generated on the arrival of an event. This allows the architecture to be active only on the processing of an event, potentially reducing the architectures overall power consumption. Contrarily, in a *global approach* the timing generator, needs to be active all the time, increasing its power consumption. However, the power invested for the timing-generator can be shared among all the channels. For a large amount of channel implemented per ASIC, the power consumption of the central interpolator turns out to be nearly negligible. After all, power needs to be invested to distribute the timing signals to the respective channels. Usually for low hit rates, a *global interpolation* is not a major limitation, due to its reduced power supply noise and inter-channel crosstalk sensitivity, a *global implementation* approach represents the first choice for the implementation a multi-channel TDC.

**Inter-Channel Crosstalk** Usually, to reduce power consumption, the finest LSB size is only generated on the arrival of an event. However in this case additional power on the processing of an event needs to be invested, potentially influencing nearby channels. In contrast, a *global interpolation* approach operates in a continuous manner requiring only a minimum amount of circuitry to be propagated on the arrival of an event, potentially reducing the risk to suffer from inter-channel crosstalk. If the power consumption benefit is abandoned, employing e.g. fully differential structures, inter-channel crosstalk is not expected to be increased due power supply coupling effects in a *local interpolation* approach. However, in this case, no benefit over a global approach can be achieved as additional circuitry and power needs to be invested on a per channel basis.

## 4.1.4 Multi-Stage

As described in section 3.1.11, using a multistage approach allows to set less stringent requirements on the matching performance of the finest interpolating stage. A multistage concept is not only highly advantageous with respect to device mismatches, but additionally has a positive effect on the dead-time of an architecture. **Dead-Time** In general dead-time does not represent a critical requirement for most of the state-of-the-art detector designs and is usually limited by the time it takes to buffer the latched data to free the channel. Most often, dead-times of a few ns are considered to be sufficient. As detector designs go to ever faster timing signals there is an increasing need to measure events in close succession to each other in the ns regime. However, architectures like SAR or architectures based on a TA principle add additional dead-time to the system. This dead-time is proportional to the dynamic range of the finest structure. By employing a multistage approach, the dynamic range of the architecture is extended on a higher hierarchy level, in turn, reducing the dynamic range requirements on the finest level, effectively leading to shorter dead-times.

#### 4.1.4.1 LSB Auto-Adjustment

In high energy physics often one common reference clock is distributed across the system to synchronize the whole system to one common time reference. For TDC applications, this reference can potentially be used to compensate timing shifts due to PVT variations and to adjust the LSB size of the system to be a fixed fraction of the reference signal's period. However, for such a feature to work, the LSB size down to the finest level of resolution is required to be adjusted automatically. Such a feature also allows to have a positive effect on the power consumption of an architecture.

**Power vs. Resolution** The power consumption of continuously operated TDC architecture, mainly depends on the reference clock frequency. In a first approximation the power consumed is proportional to

$$P = V_{DD}^2 \cdot f_{ref} \cdot C_{eff}. \tag{4.6}$$

Slower reference frequencies result in less switching events and lead to lower power consumption. However, to cover the full clock cycle of the reference clock, larger LSB sizes will be generated leading to less time precision at lower power consumption.

The power consumption of the TDC is mostly determined by its reference signal's frequency. If the LSB size is only defined by the input reference signal, larger LSB sizes can be generated by decreasing the reference signal's frequency leading to lower power consumption designs

**Dynamic Range** Changing the input frequency does not manipulate the amount of bits being resolved by a given TDC architecture. Thus, lower input frequency also lead to an increased dynamic range.

## 4.1.4.2 Event Signal Characteristics

In high energy physics, recorded events can not be reproduced in a deterministic manner and are required to be precisely captured with a single measurement. In this case, no repetitive measurements can be employed to improve the TDC's time resolution. This renders some TDC architectures, developed for some specific applications like test and measurement systems or all-digital-PLLs, unsuitable.

## 4.2 Proposed Architecture

A schematic sketch of the developed architecture full-filling the requirements set by the HEP community is shown in Figure 4.4. The presented architecture has been reported in [67]. Basically, a reduced calibration effort as well as increased robustness against power supply noise and inter-channel crosstalk, makes a global time generator scheme, considering the given constraints of high time resolution and moderate power consumption, the preferred choice to implement a high resolution, multi-channel TDC. The architecture is based on a central *fine-fime interpolator* responsible to generate the so called *fine-time code* of the system. A *coarse counter* is added to count completed clock cycles to extend the dynamic range of the interpolator. Both the *fine-time interpolator* as well as the *coarse counter* only need to be implemented once per ASIC. The *distribution buffers* together with the TCRs is referred to as the *channel-matrix*.

In the proposed architecture, all the measurements are referred to the reference clock allowing to synchronize multiple TDCs to one common reference. The fine-time code as well as the coarse counter state are distributed to the respective channels. The buffers need to be sufficiently dimensioned to be strong enough to drive the total capacitive load of one line and its attached circuit. To sustain sharp signal edges multiple channels are grouped into segments and served by so called *distribution buffers*. This counteracts the RC-delay of long wires. On the level of *distribution buffers*, bin adjustment features are implemented to adjust for device mismatches introduced by the *fine-time interpolator* and the *distribution buffers* themselves. No calibration is required on a per-channel



**Figure 4.4:** Proposed multi-channel TDC architecture. The architectures addresses the difficulties experienced in the sub 5 ps-rms resolution domain and meets the needs set by the HEP community.

basis. The architecture provides small signal propagation delays for both the *fine-time* code signals as well as the event signals. The architecture is continuously running, covering the complete reference signal's cycle, allowing the event signal to arrive at any time not restricted to a given time window. Due to its continuous operating principle, the proposed architecture allows for 'silent' operation with only a minimum of dynamic power to be consumed.

## 4.2.1 Sub-Gate Delay Resolutions - Concept Choice

A wide variety of different concepts to implement a TDC have been presented in section 3.1. Often, the dynamic range limitations is used as a negative argument of one or another architecture. However, employing a multi-stage approach as presented in section 3.1.11 can overcome the dynamic range limitations greatly increasing the range

## 4. HIGH-RESOLUTION, MULTI-CHANNEL TIME-TO-DIGITAL CONVERTER: A PROPOSAL

 Table 4.1: Summary of *fine-time interpolator* concepts. Rows shaded in gray represent possible candidates.

| Architecture       | Sub-Gate | Int. sharing | Auto-adjust | Single-shot | Power             |
|--------------------|----------|--------------|-------------|-------------|-------------------|
| Counter            | no       | yes          | yes         | yes         | +                 |
| Delay-Line         | no       | yes          | yes         | yes         | ~                 |
| Vernier-Delay      | yes      | no           | yes         | yes         | n.a. <sup>a</sup> |
| Array-Delay        | yes      | yes          | yes         | yes         | -                 |
| Cross-Coupled TA   | yes      | no           | $yes^b$     | yes         | n.a. <sup>a</sup> |
| Capacitive Scaling | yes      | yes          | no          | yes         | ~                 |
| RC-Delay           | yes      | yes          | no          | yes         | +                 |
| Resistive Int.     | yes      | yes          | yes         | yes         | +                 |
| Multipath Int.     | yes      | yes          | yes         | yes         | -                 |
| Time to Volt.      | yes      | no           | $yes^b$     | yes         | n.a. <sup>a</sup> |
| SAR                | yes      | no           | yes         | yes         | +                 |
| Stochastic         | yes      | yes          | no          | yes         | -                 |
| WaveUnion          | yes      | no           | $yes^b$     | $yes^c$     | n.a. <sup>a</sup> |

<sup>*a*</sup>not applicable

 ${}^{b}\mathrm{requires}$  calibration mechanism based on measurement results

<sup>c</sup>require complex digital logic to analyze captured data - dead-time

of possible candidates. Table 4.1 should serve as a guidance to support the succeeding discussion.

In principle, all the architectures presented in section 3.1, expect the counter and the delay-line principle, can be employed to achieve sub-gate delay LSB sizes. However, not all TDC architectures can be implemented employing a global implementation approach. For example, in a vernier delay-line approach both, the reference as well as the event signal are delayed to resolve the smallest LSB size. In such a case, at least some of the LSB generation steps are required to be accomplished locally on a per-channel basis. For the proposed architecture, only concepts offering the possibility to be implemented based on a central approach represent a suitable choice. Further, to allow to trade-off power consumption against LSB size, architectures need to be able to adjust their LSB size down to the finest step. This renders concepts like, the

RC-delay line or concepts based on stochastic approaches to be unsuitable candidates for the proposed architecture. Last but not least, only architectures providing good time-resolution employing a single measurement are considered. Architectures relying on repetitive characteristics of the event signal or require multiple measurements to improve resolution represent improper choices.

This leaves us with three possible candidates as indicated by the shaded rows of table 4.1. In architectures employing an *array of delay-lines*, an interleaved structure to resolve smaller LSB sizes has to be built. Each delay step requires one additional element plus dummy structure at the end of the delay-line. This makes such a concept rather power hungry and difficult to be layout symmetrically. In *interpolation* architectures, the finest LSB size is hierarchically generated and derived from its preceding stage. This makes it easier concerning a clean layout. Using an active interpolation approach all the delay steps are generated by active devices, considerably increasing the TDCs power consumption. Due to its lower power operation, a passive realization is preferred. For the presented TDC architecture, this makes *interpolation* based on restive division the first choice among the presented architectures.

### 4.2.2 Fine-Time Interpolator Structure

To achieve sub-gate delay resolutions together with a large dynamic range, a multistage approach, as has been presented in [68], is pursued. In total  $N \cdot M$  timing signals, where N represents the interpolation ratio in the first stage and M the interpolation ratio in the second stage are generated. In the first stage a delay-locked-loop (DLL) is employed to generate N uniformly distributed signals. Thereby, the delay of the delay-line buffers is adjusted by the feedback loop so that the sum of all elements within the loop precisely match one reference clock period  $T_{ref}$ . In locked condition the delay of each buffer is equal to  $T_{ref}/N$  and only controlled by the input frequency of the DLL. The second stage is based on a sub-gate delay concept. It takes the signals generated in the first stage as its inputs to generate finer delayed signals. This allows to achieve LSBs as small as  $T_{ref}/MN$ . The set of all the signals is referred to as the *fine-time code* of the system.

To reduce the amount of propagation delay within the loop, it is advantageous to run the DLL at a very high frequency. The higher the DLL clock the less delay stages are required to achieve the same size in LSB. In other words, operating the DLL at

## 4. HIGH-RESOLUTION, MULTI-CHANNEL TIME-TO-DIGITAL CONVERTER: A PROPOSAL



**Figure 4.5:** A multi-stage *Fine-Time Interpolator*. In the  $1^{st}$ -stage a DLL is used to generate delays in the order of the gate-delay of the technology. In a  $2^{nd}$ -stage passive interpolation is employed to achieve sub-gate delay LSB sizes.

a higher clock frequency reduces the dynamic range of the interpolator needed to be resolved by the counter in this case. At this early stage of development, no suitable PLL circuit is available that could be used to internally generate high clock frequencies. For practical purposes to reduce potential difficulties at high frequencies off-chip, the DLL clock is preferably kept below 2 GHz. To achieve the envisaged 5 ps LSBs, a possible choice of interpolation factors is given by N = 32 and M = 4. Such a combination requires a DLL clock frequency of 1.5625 GHz to generate 5 ps LSB sizes. Other choices are less favorable. As to ease digital coding later in the system, interpolation factors of base two are preferred. To minimize device mismatch and decrease noise sensitivity of the DLL, it is advantageous to choose the highest DLL frequency possible. Considering that the gate delay is limited to approximately 20 ps, choosing N = 32 and M = 4represents a reasonable choice. This allows to run the DLL at high frequencies and to have interpolation factors of an integer power of two.

## 4.3 Architectural Features Summary

The proposed architecture has been developed to perfectly match the requirements set by the HEP community. Focus was given to achieve very good time resolution also in the view of a future stand-alone TDC ASIC with a large portion of digital logic integrated on the die. The proposed architecture allows multiple channels to be integrated on a single chip and offers to calibrate out device mismatches with reasonable effort on a per segment level.

# of Channels Due to the architecture's modular channel layout, the amount of channels can easily be adjusted to match the final requirement. Several channels are grouped into one segment. The amount of channels within one segment depends on the strength of distribution buffers and the total amount of parasitic and lumped capacitance connected to the line. No LSB size degradation as more segments are added occurs. Any variation introduced by adding another segment can be compensated for each segment individually. However, the more segments are being implemented the more propagation delay is added to the *fine-time code* signals potentially introducing jitter due power supply noise for segments further away.

**Power Consumption and LSB size** In the proposed architecture the generation is done globally and thus, only needs to be provided once for all the channels integrated in an ASIC. No power, to generate timing signals, need to be invested on a per channel basis. The passive division concept further helps to reduce the power consumption. The system's LSB size is solely derived from the reference signal's period and can be adjusted by changing the reference frequency. An interpolation factor of 128 or equivalently 7 bits is resolved by the *fine-time interpolator* structure. Operating the interpolator with a 1.56 GHz reference clock results in 5 ps LSB sizes. The power consumption of the timing generator and the distribution network scales linearly with the input clock frequency. A lower input clock frequency lowers the rate at which internal nodes are charge/discharged, leading to a near linear reduction in power consumption. Additionally, the *fine-time interpolator* employs a power efficient interpolation structure based on resistive voltage division reducing its impact per channel.

**PVT control** The proposed architecture, makes use of a DLL in the first stage keeping the generated delays stable over PVT variations. As the delay generation of the second stage depends on the delays generated in the first stage, intrinsically the delays of the second stage are held stable over PVT variations as well. This mechanism

## 4. HIGH-RESOLUTION, MULTI-CHANNEL TIME-TO-DIGITAL CONVERTER: A PROPOSAL

allows the LSB to be only dependent on the external reference timing signal which can be of very high quality. A remainder of timing offset errors is to be expected to come from different I/Os and the global distribution of the timing signals. This also includes the timing offsets resulting from the *distribution buffers*. Anyway, the total delay between the timing reference signals and the hit signal can be closely matched. This will cause similar absolute timing offsets, reducing global timing variations.

**Power-Supply Noise** If not properly designed, jitter due to power supply noise can easily be introduced onto timing critical signals. Thereby, the jitter introduced is proportional to the total propagation delay of timing critical signals and their signal slopes. For the proposed architecture, timing critical signal exhibit short propagation delays to reduce power supply noise sensitivity. The longest delay is applied by the DLL to delay the input reference signal. At the maximum the reference signal is delayed by  $\frac{1}{f_{ref}}$  s. To allow to keep fast signal edges throughout the *channel matrix*, intermediate repeater structures to distribute the *fine-time interpolator* signals are employed. Although the timing generator and the corresponding *distribution buffers* consume a substantial amount of power, the noise introduced onto the power rails is mostly negligible. The switching of the signals is smoothly distributed in time leading to a constant power consumption. However, to avoid large voltage peaks to be introduced onto the supply, readout logic need to be well separated from timing sensitive circuitry.

**Inter-Channel Crosstalk** The proposed architecture generates all timing signals in a continuous manner. Due to the intrinsically low circuit activity on the arrival of an event, jitter resulting from inter-channel crosstalk is not to be expected to cause considerable time-resolution degradation. Only the TCRs and the event-signal buffers consume current on the arrival of an event.

**Device-Mismatch** In the proposed architecture, several channels are grouped into segments. Device mismatches are adjusted on a per segment basis. Calibration of device mismatches is achieved by adjusting the respective propagation delay of the fine-time signals and is foreseen to be integrated in the *distribution buffers*. This avoids the need to integrate calibration mechanisms for each single channel. In turn, this requires good matching TCRs.

**Global Time Reference** All the time measurements are referred to one common reference signal. This allows to perform absolute time measurements as well as to measure time-over-thresholds of a single-signal performing two distinct measurements. As all the measurements are referred to the external reference signal, the same reference can be used to keep several TDCs synchronized.

**Dynamic Range** Technically the dynamic range of the fine-time interpolator is limited to  $\frac{1}{f_{ref}}$ . However, an on-chip counter is added to increase the dynamic range only limited by the number of bits of the coarse counter. Off-chip, the dynamic range can even be further extended to technically infinity. Within a limited range, a lower reference signal frequency can also be used to increase the dynamic range of the architecture.

## 4.3.1 Key Design Aspects

For a successful implementation of the proposed architecture, on transistor level, three critical blocks have been identified and are to be discussed subsequently.

### 4.3.1.1 Adjustment Feature

To compensate for delay-mismatches of the fine-time generation block, an adjustment feature on the level of *distribution buffers* is foreseen. This feature would need to compensate delay variations in the ps domain. If large device mismatches need to be expected, to compensate for INL errors, an adjustment feature offering wide range and small adjustment steps is necessary. To keep the complexity of the adjustment feature within reasonable limits, device mismatches of the interpolator need to be carefully evaluated and compensated to reasonable level during design time.

#### 4.3.1.2 Fast Delay Cell

For the generation of the fine-time code signals, the reference clock needs to be delayed at a maximum by  $1/T_{ref} - LSB$ . To keep the propagation delay of those signal small, the interpolator is required to run at high signal frequencies. Following the discussion of section 4.2.2, this requires to achieve propagation delays as small as 20 ps in the first stage. These delays need be guaranteed for different operating conditions to cover process-voltage and temperature variations. For comparison, the propagation delay of a symmetrical minimum length inverter with W/L = 10 operated with a 1.2 V

## 4. HIGH-RESOLUTION, MULTI-CHANNEL TIME-TO-DIGITAL CONVERTER: A PROPOSAL

supply voltage in a commercial 130 nm technology, is in the order of 25 ps. This does not include any control structure to adjust the delay of the cell usually leading to an increase in propagation delay neither does it include inter-cell signal routing. In the end, for the proposed architecture to achieve 5 ps LSB sizes, very fast delay buffers in the DLL are required. Such buffers have been designed to achieve nominal gate-delays as small as 16 ps.

### 4.3.1.3 Fast Time-Capture Registers

With the proposed architecture no per channel calibration is required but corrected for a group of channels (i.e. segment). For the concept to work, intrinsic matching of the TCRs have to be good enough not to deteriorate the TDCs timing performance. This requires larger width devices to be used in the TCRs to improve device matching. However, larger width devices lead to increased power consumptions. For a power efficient design, the TCRs need to be well matched to the targeted timing performance of the TDC. In the end, a trade-off between power consumption and device mismatch needs to be made.

## **References Chapter 4**

- [66] W. Sansen, Analog design essentials, ser. Analog circuits and signal processing series. Springer, 2006, no. Bd. 1.
- [67] L. Perktold and J. Christiansen, "A high time-resolution (< 3 ps-rms) time-todigital converter for highly integrated designs," in *Instrumentation and Measurement Technology Conference (I2MTC)*, 2013 IEEE International, 2013.
- [68] <u>L. Perktold</u> and J. Christiansen, "A flexible 5 ps bin-width timing core for next generation high-energy-physics time-to-digital converter applications," in *Ph.D. Research in Microelectronics and Electronics (PRIME), 2012 8th Conference on*, 2012, pp. 1–4.

## $\mathbf{5}$

## **Demonstrator ASIC**

A demonstrator application specific integrated circuit (ASIC) of the proposed architecture has been implemented and constructed in a commercial 130 nm technology. To thoroughly evaluate the potential of the proposed architecture, the time interpolator, the distribution buffers and a total of 8 channels have been implemented. The purpose of the demonstrator is to demonstrate the architectures timing performance and collect experiences for a prosperous future full time-to-digital converter (TDC) ASIC development for next generation high energy physics (HEP) experiments.

In the previous chapter I've developed a suitable architecture precisely matched to the requirements set by the high-energy physics community. The focus of this chapter is targeted on transistor level implementation of the proposed architecture. After a brief description of the top level implementation, circuit diagrams, design trade-offs as well as simulation results of the respective blocks of the architecture will be presented and discussed. A discussion on the expected performance to be achieved by the demonstrator circuit will close this chapter. In the following chapter, measurement results of the constructed demonstrator ASIC will be presented and compared to the expected performance.

## 5.1 Demonstrator Architecture

A photograph of the demonstrator ASIC is shown in figure 5.1. The central fine-time code generator, the distribution buffers with its calibration feature as well as a total of 8 channels have been implemented. To retrieve the captured data of the TDC a

## 5. DEMONSTRATOR ASIC

simple serial readout scheme is employed. A serial shift registers, loaded on power up, is used to configure the TDC and program the TDC's calibration feature. The die of the demonstrator consumes a total  $2431 \,\mu m \ge 1517 \,\mu m$  whereas the active area, excluding the input/output (I/O) pad ring, consumes  $1520 \,\mu m \ge 850 \,\mu m$ . In total 75 wire-bond pads for signal I/O and power are placed around the perimeter of the chip. Parts of the presented work here have been published in [69] and [70].



Figure 5.1: Microphotograph of the demonstrator ASIC wire-bonded to the PCB.

## 5.2 Central Fine-Time Interpolation

The central fine-time interpolator constitutes the core of the architecture and is designed to generate least-significant bit (LSB) sizes reaching from 20 ps down to 5 ps. This relates to a reference clock frequency ranging from approximately 390 MHz up to 1.56 GHz. The LSB size can be calculated by following relation

$$LSB = \frac{T_{ref}}{128} \tag{5.1}$$

where  $T_{ref}$  represents the reference signal's period.

A block level diagram of the interpolator is shown in figure 5.2. The propagation delay of the delay buffers is adjusted by the feedback operation of the loop and set



Figure 5.2: Block level diagram of the fine-time generator implementation.

by  $V_{ctrl}$ . Due to its relatively long delay, the delay-line represent one of the most critical signal paths of the design. When operated at 1.6525 GHz, a delay of up to 640 ps is added to the fine-time code signals. To reduce the amount of jitter being introduced due to power supply noise, the delay-locked-loop (DLL) is build up fully differential. However, to reduce power consumption, the output signals of the DLL are distributed in a single ended manner. Such an approach represents a suitable choice, as the propagation delay introduced by distributing the signals to the channels can be kept relatively small. To allow the input signals to settle to their final values and to provide equal loading for the cells, dummy elements have been placed before and after the DLL. The low level control voltage of the cell is generated by a dedicated bias generator. As the loop feedback voltage ( $V_{ctrl}$ ) is referenced to the upper supply voltage, an inversion is required within the control loop to provide an overall negative feedback. The 0<sup>th</sup> up to the 31<sup>st</sup> output are distributed to the channels. For timing performance reasons the stronger negative edge is used for distribution across the *channel matrix*.

## 5.2.1 Fast Delay Buffer

A schematic diagram of the delay buffer is shown in figure 5.3. The buffer is based on a Maneatis delay cell as reported in [71]. The inputs of the delay buffer in+ and in- are connected to the outputs out+ and out- of its succeeding cell. The propagation delay of the cell can be adjusted by means of  $V_{ctrl}$ . In figure 5.4 the bias circuit responsible to generate the lower level control voltage of the cell is shown. A current gain operational transconductance amplifier (OTA) output stage is employed to generate a single ended version of the signal to connect to the second stage.



Figure 5.3: High speed delay buffer implementing an additional zero in the signal path to reduce the effective capacitance seen at the output. A single-ended output is used to make the connection to the  $2^{nd}$  stage of the interpolator.

To reduce the complexity of the fine-resolution  $2^{nd}$  stage, short propagation delays in the  $1^{st}$  stage of the interpolator of the cell are essential. The total delay of a cell can be very roughly estimated by

$$\tau \approx \frac{V_{osc} \cdot C_{eff}}{2 \cdot I_D} \tag{5.2}$$

where  $V_{osc}$  represents the oscillation voltage,  $C_{eff}$  the total capacitive load at the output and  $I_D$  represents the current defined by VBN. This relation is meant only to give a rough estimate of the cell's propagation delay. In reality  $C_{eff}$  depends on the output amplitude and is not decoupled to ground, the charging and discharging current are



Figure 5.4: Bias generator of the delay buffer cell.

different as well as the signal crossing point might not exactly be found at  $V_{DD}/2$ . In a first approximation, the propagation delay of the cell is proportional to

$$au \propto \frac{1}{V_{ctrl}}.$$
(5.3)

For the qualitative operation of the cell the resistances connected to the gate of T4 & T5 are neglected. Transistors T2 & T3 work as switches, steering the current flowing through the cell either to flow fully through the left or right hand branch of the circuit. The current available to discharge the capacitance at the output  $C_{eff}$  is given by the tail current source  $I_D$ . VBN defines the total current following through the cell. Even in steady state there is a constant current flow. The portion of current flowing through the top diode connected load T4 & T7 defines the oscillation amplitude  $V_{osc}$  of the cell. The larger its amplitude the longer it will take to reach the cells switching point. Table 5.1 lists the transistor dimensions of the cell.

At the highest DLL clock frequency the maximum propagation delay of the delay buffers of the DLL cannot exceed 20 ps. To a great extend the propagation delay of the cell is defined by the current drive capability of the input devices T1 & T2. The shortest propagation delay of the cell can be achieved if the length of the devices are set to their minimum, i.e. 120 nm. To guarantee that the tail current source T1 is not pushed into linear region, the width of the input switches is required to be sufficiently large. In a first approximation, larger width devices do not lead to shorter propagation delays.

| Device                         | Width                | Length           |  |
|--------------------------------|----------------------|------------------|--|
| T1 & T13                       | $18\mu{ m m}$        | $0.6\mu{ m m}$   |  |
| T2 & T3                        | $4\mu{ m m}$         | $0.12\mu{ m m}$  |  |
| $\mathrm{T5}\ \&\ \mathrm{T6}$ | $4.5\mu{ m m}$       | $0.12\mu{ m m}$  |  |
| T4 & T7                        | $3\mu{ m m}$         | $0.12\mu{ m m}$  |  |
| T8 & T9                        | $3\mu{ m m}$         | $0.12\mu{ m m}$  |  |
| T10 & T11                      | $1.36\mu{ m m}$      | $0.12\mu{ m m}$  |  |
| T12                            | $7.5\mu{ m m}$       | $0.12\mu{\rm m}$ |  |
| R                              | $11\mathrm{k}\Omega$ | -                |  |

Table 5.1: Device dimensions of the circuit shown in figure 5.3 and 5.4 respectively.

This is because larger width devices would lead to an increase of the capacitive load at the output of the subsequent cell, eliminating any speed improvements. However, in reality, larger width devices, i.e. larger cell currents, reduce the relative contribution of routing capacitances of the cell, effectively allowing shorter propagation delays to be achieved. Input device sizes of  $4 \,\mu m$  width have been found as a reasonable trade-off between propagation delay and current consumption. The top diode connected loads T4 & T7 need to be designed, so that the low level of the output signal is low enough to fully turn off the non-conducting branch of the subsequent cell. This requires rather small width devices. Despite of the lower charge mobility in PMOS devices, device widths in the range of  $4 \,\mu m$  represent a good choice. In a traditional Maneatis delay cell the current flowing through the current and diode connected load on the top is equally shared among them. As the gate-source capacitance of PMOS T4 & T7 are directly connected to the output, slowing down the cell, smaller width devices of the diode connect loads are preferable. By employing 50% larger devices for the current load connected PMOS, a speedup of 11% can be achieved.

To additionally speed up the cell, a zero is added to the signal path using resistive peaking as described in [72], for the implementation of 2-stage ring oscillators, is employed. The equivalent half circuit of the cell is shown in figure 5.5. Neglecting channel



Figure 5.5: Half circuit equivalent of the delay buffer cell.

length modulation effects, an output impedance of

$$Z_{out} = \frac{1}{gm_2} \cdot \frac{1 + sRC}{1 + s\frac{C}{gm_2}}$$
(5.4)

can be derived from its small signal model. For  $\frac{1}{RC} << s << \frac{gm_2}{C}$  the output resistance reduces to  $\frac{sRC}{gm_2}$  making the load look like an inductor. This can be thought of hiding the capacitance C during the switching cycle. To properly place the zero, the resistance Rneeds to be large enough to make the load inductive for the switching signal, but small enough to allow the diode to define the output signal's amplitude even for fast switching signals. In a first approximation the location of the zero is matched to the output's signal fundamental frequency  $f_{BW}$  approximated by  $0.35/t_{rise}$ [73]. For a 20 ps delay buffer, a rise-time of approximately 40 ps is required, equating to a  $f_{BW}$  of 8.75 GHz.

From the post layout extracted view the parasitic capacitance  $C = C_{wire} + C_{gg2}$  is estimated to be  $2 \,\text{fF} + 2.7 \,\text{fF} = 4.7 \,\text{fF}$ . This equates to a required resistance of  $3.9 \,\text{k}\Omega$ . As the estimate of the fundamental frequency is quite approximate, from simulation a suitable value of the resistance is found to be  $11 \,\text{k}\Omega$ . With resistive peaking, an additional reduction in propagation delay of the cell of 9% is achieved. The DE/SE converter is designed to provide similar current drive strength as in the main cell and is sized to reduce the propagation delay employing small length devices.

### 5. DEMONSTRATOR ASIC



Figure 5.6: Simulated waveforms of the  $15^{th}$  element of the delay line.

The exact shape of the waveform propagating down the delay line for a 1.56 GHz signal after post-layout extraction is depicted in figure 5.6. The fully differential as well as the output of the DE/SE converter are shown. The total delay introduced by the DE/SE conversion is, depending on its transition direction and is found to be in the range of 60 - 75 ps. At a delay setting of 20 ps and operated with a supply voltage of 1.2 V the cell oscillates with an amplitude  $V_{Osc}$  of 790 mV and consumes a total of 858  $\mu$ A of current, including a contribution of 135  $\mu$ A coming from the DEtoSE converter. The capacitance seen at the differential output is estimated to be  $C_{eff} = C_{wire} + C_{gd5/6} + C_{gd4/7} + C_{gg2/3} + C_{gg8/9} = 5.3 fF + 1.3 fF + 1 fF + 4.5 fF + 2.5 fF = 14.6 fF$ . The delay curve for different operating conditions is shown in figure 5.7. Under nominal conditions a propagation delay as low as 16 ps can be achieved. Within the indicated operating range the delay is expected to be found between 12 ps and 23 ps.



Figure 5.7: Simulated buffer delay for different control voltages and operating conditions.

**Device Mismatch** In a first approximation propagation delay variations are introduced proportionally to a relative change in charging current. For good current matching, as detailed in 4.1.2.2, this requires large overdrive voltages for the tail current source. However, large VBN voltages reduce the voltage swing within of the cell limiting the tail current source to smaller  $V_{gs} - V_{th}$  voltages. Nonetheless,  $\sigma(\Delta\beta)$  and  $\sigma(\Delta V_{th})$  are proportional to  $1/\sqrt{WL}$ , making large device sizes less sensitive to device mismatch. Large device dimension of the tail current source T1 do not considerably slow down the cell as the drain potential is held approximately constant during the switching cycle. With the given device sizes, the intrinsic current matching of a single cell is simulated to be 0.93%. Additional mismatch is added to the output of the cell by the NMOS switches and the top PMOS devices designed for fastest speed. The 1-sigma standard deviation of the LSB size at the differential output of the cell across all 128 bins has been simulated to be 0.64 ps-rms or equivalently 3.2%. At the output of DE/SE converter the 1-sigma variation is increased to 2.0 ps-rms for the rising edge and 3.9 ps-rms for the falling edge.

#### 5. DEMONSTRATOR ASIC



Figure 5.8: Resistive voltage division principle.

## 5.2.2 Resistive Time Interpolation

To overcome the propagation delay limitation of the technology a passive interpolation concept is employed. This work extends the passive interpolation concept, as presented in [74], by using a single-ended implementation incorporated into the feedback of a DLL to automatically adjust the LSB size in presence of process-voltage-temperature (PVT) variations. Using such a principle also allows to automatically adjust the LSB size of the interpolator to the desired value solely by varying its reference clock period.

The basic operation principle is depicted in figure 5.8. Buffers are employed to provide sufficient driving strength to the interpolator structure. The interpolation works as follows: At node A the signal propagating through the DLL has reached its highest amplitude  $V_{DD}$ . This causes a current to flow across the resistive voltage divider to the lowest potential of the divider. The current is flowing from node A across the ladder to node D in this specific example. Only when node B reaches its top supply voltage it starts contributing to the current flowing in the resistive ladder to generate the necessary voltage drop. For the principle to work, adjacent edges are required to overlap. That is the output of the subsequent cell needs to start to rise before its predecessor has reached its top supply voltage. The number of elements involved during the switching process depends on the slope of the signal. The buffers need to be designed to be strong enough to prevent large voltage drops due to their limited output resistance and at the same time must not be too strong to maintain overlapping edges.

The schematic components of the interpolator structure are depicted in figure 5.9.

Despite its simplicity, the dimensions of the circuit have a great influence on the performance of the design. To increase the driving strength of the generated signals of the  $2^{nd}$  interpolator, the outputs of the resistive ladder are loaded by the capacitive load connected to the node. This causes an additional delay being introduced modulating the LSB sizes of the interpolator. In a first approximation, to avoid strong effects of the RC delay behavior, the Elmore delay of the resistive ladder needs to be small compared to the envisaged LSB size. Mathematically this can be expressed by

$$\tau_{elmore} = \frac{RC \cdot N^2}{2} \cdot \left(\frac{N+1}{N}\right) \ll LSB \tag{5.5}$$

where R represent the resistor value of a single segment, C the capacitive load contributed by the attached capacitance connected to the node and N represent the number of RC elements. The P+ doped poly-resistor type, due to its good matching performance, low parasitic capacitance and low absolute tolerances has been found to represent the best candidate to implement the resistive ladder. Its dimensions will be discussed shortly. The buffer attached to each output node is shown in figure 5.9c. Its dimensions have been optimized to reduce the capacitive load attached to the nodes of the resistive ladder. On average, an equivalent parasitic capacitance of 13.4 fF is connected to each node. The attached capacitance is a composition of the circuit, routing as well as the parasitic capacitance due to the resistor network itself. With only a resistance of  $20 \Omega$ , the Elmore delay already equates to 5.36 ps. This leaves us with a trade-off between the linearity of the interpolator vs. power consumption. To reduce the power consumed by the drivers driving the resistive ladder, the size of the resistances connecting adjacent nodes need to be kept reasonable large. This keeps the current required to generate a certain voltage drop small. Resistances in the range of  $35\,\Omega$  has been found to represent a good trade-off. To compensate for the introduced RC-delays, a non-linear scaled resistive ladder has been implemented with resistive values as shown in figure 5.9b. The values have been optimized for 5 ps delays. The resistor values have been implemented with physical lengths equal to  $0.8 \,\mu\text{m}$  and a physical widths ranging from  $8\,\mu\text{m}$  and  $14\,\mu\text{m}$ .

The strength of the buffers connected to the resistive ladder define the signal slopes propagating down the ladder. For the principle to work, overlapping edges are required. In a first approximation the amount of elements involved in the process can be estimated from the desired signal slopes as expressed by  $\frac{t_{slope}}{LSB}$ . Fast signal slopes require rather



**Figure 5.9:** Schematic diagrams and dimensions of the resistive interpolation circuit: (a) the driver to drive the resistive ladder, (b) the non-linear resistive division network and (c) the output buffer connecting to the intermediate nodes of the resistive ladder.

strong, power intensive drivers, whereas, on the other hand, slow signal slopes are more sensitive to power supply noise and require many elements at the beginning and the end of the DLL to reach uniformity. A good trade-off have been found for signal slopes of approximately 120 ps. This involves a total of 6 delay elements or equivalently 24 LSB codes. The buffers to drive the resistive ladder are depicted in figure 5.9a. For the dimensioning of the buffers the on resistance of the drivers are required to be small compared to the sum of resistances involved in the switching process:

$$R_{ON} \ll \frac{t_{slope}}{LSB} \cdot R.$$
(5.6)

The on-resistance for the NMOS and PMOS device respectively is approximated to be  $113 \Omega$  and  $181 \Omega$  respectively. Exemplary, a sub-set of fine-time code signals is shown in figure 5.10. The LSB size of all the 128 bins, for different reference signal's frequencies, is shown in figure 5.11. The resistive interpolation circuit has been optimized to generate perfectly uniform bins for the 5 ps case.

The power consumed by input drivers to drive the resistive interpolation structure is evaluated to be  $348 \,\mu\text{W}$  per cell. For the output driver to buffer the fine-time code signal an additional power of  $100 \,\mu\text{W}$  per cell is consumed. Excluding the dummy structures, a total of 32 input drivers and a total of 128 output drivers have to be implemented.

**Device Mismatch** The passive interpolator structure effectively filters device mismatches resulting from the DLL. Multiple elements (K) are involved in the switching



Figure 5.10: Simulated waveforms of the interpolated fine-time code derived from the outputs of the DLL element  $15^{th}$  and  $16^{th}$  respectively. The waveforms shown are for approximately 5 ps LSB sizes.

process causing delay variations of adjacent cells to be averaged in a similar manner as reported by [59]. In a first approximation the expected 1-sigma standard deviation is scaled by  $1/\sqrt{K}$ .

The 1-sigma standard deviation across all 128 bins at the level of the resistive interpolation circuit is 0.34 ps-rms and 0.48 ps-rms for the rising and falling edge respectively. To buffer the generated signals, relatively small buffers have to be employed to keep the capacitive loading reasonably small. This makes the buffers suffer from device mismatch. To profit from the stronger NMOS devices, the negative edge is used to distribute the fine-time code across the channel matrix. Due to the relatively small buffers, the 1-sigma variation for the better falling edge is increased to 1.24 ps-rms at

#### 5. DEMONSTRATOR ASIC



**Figure 5.11:** Simulated LSB size for all 128 bins of the interpolator for different DLL buffer delay settings.

the output of the fine-time interpolator.

# 5.2.3 Delay Locked Loop

The delay of the delay elements are controlled by the feedback of the DLL and are adjusted to precisely match  $1/32^{nd}$  of the reference clock period. For a block diagram of the DLL the reader is referred to figure 5.2. The  $0^{th}$  and  $32^{nd}$  output of the delay are buffered (not shown in figure 5.2) to connect to the phase detector (PD) of the DLL. To provide uniform loading to the elements, dummy structures are implemented for all cells. For good timing performance the negative edge is employed to make the long connections from the beginning and the end of the output of the delay line to connect to the inputs of the PD.

A bang-bang PD is used to detect the phase relationship between the input and the output of the delay-line. To provide a more constant delay path from the reference signal to the output of the PD the  $0^{th}$  edge is connected to the clock input of the PD. With each clock cycle the state of the PD output is re-evaluated. Depending on the state of the PD, the charge pump (CP) sinks/sources a constant current from/to the loop filter (LF) capacitor. During one clock cycle the state of the PD is left unchanged, constantly charging or discharging the LF capacitor. The different cases are depicted in figure 5.12. If the rising edge of the input of the delay-line arrives before the rising edge of the output of the delay-line a logic '0' is latched by the PD, indicating to the CP to decrease the voltage stored on the LF capacitance to speed up the delay-line. The opposite is true for the other case. The PD has a phase uncertainty of 180°. That means that phase differences greater than 180° cannot be resolved by the PD. This might lead to a wrongly indicating state causing the loop to get unstable. Such a 'false' state can be avoided by always approaching the locked condition, from its fastest state (i.e.  $V_{ctrl} = 0 V$ ). In this state it is guaranteed that the propagation delay of the delay line is smaller than reference signal's period. During startup, the input to the CP is disconnected from the PD's output and directly controlled by a digitally controlled block. Initially, the CP is indicated to slow down the delay-line, i.e. charging the LF capacitor. Only when the PD constantly indicates '1', the control is handed over to the loop. From then on the DLL itself is capable of reaching locked condition. The control of the loop during startup is implemented off-chip.



Figure 5.12: Output states of a bang-bang PD implementation.

The implementation of the PD as well as the implementation of the CP is derived

form a slightly modified version available within the group. E.g. the implementation of the PD is reported in [75]. The current for the CP setting has been made programmable and can be varied from approximately  $1 \,\mu\text{A}$  to  $12 \,\mu\text{A}$ . A selected set of CP current values is summarized in table 5.2. By default the CP current is set to 0010. Due to channel length modulation of the current mirrors of the CP, the up and down currents are different. The currents have been evaluated for a voltage of approximately  $400 \,\text{mV}$  stored on the LF capacitor. The size of the LF capacitance is based on the dynamics of the loop which will be discussed next.

| iSel<0:3> | down (early) | up (late) | Units            |
|-----------|--------------|-----------|------------------|
| 0000      | 0.77         | 1.05      | $\mu A$          |
| 0001      | 1.50         | 1.92      | $\mu A$          |
| 0010      | 2.23         | 2.75      | $\mu \mathbf{A}$ |
| 0011      | 2.96         | 3.55      | $\mu A$          |
| 0100      | 3.68         | 4.34      | $\mu A$          |
| 1000      | 6.56         | 7.36      | $\mu A$          |
| 1111      | 11.51        | 12.37     | $\mu A$          |

Table 5.2: CP current for different CP current settings. iSel<0:3> controls the value of the CP current. By default the value should be set to 0010.

#### 5.2.3.1 Loop Dynamics

Any jitter on the time reference signal of the TDC will manifest itself in the measurement. A clean reference signal needs to be supplied to the TDC not to degrade the precision of the measurement. The loop should only react on slow changes of the reference signal to prevent the DLL to follow high frequency jitter components of the reference signal. This requires the DLL to provide small loop bandwidths.

A bang-bang PD represents a non-linear component which cannot be linearized around its operating point. However, to analyze the dynamics of the loop, the total change in delay within one reference clock period, referred to as  $\Delta t_{DL}$ , is taken as an alternative measure. A summary of the parameters necessary to calculate  $\Delta t_{DL}$  when the reference signal's frequency is set to 1562.5 MHz are listed in table 5.3.  $\Delta t_{DL}$  can be calculated after

$$\Delta t_{DL} = \frac{I_{CP} \cdot T_{ref}}{C_{LF}} \cdot |G_{DB}| \cdot N \qquad \left[\frac{second}{period}\right] \tag{5.7}$$

| Parameter      | Nom. | Units         |
|----------------|------|---------------|
| $DE_{Gain}$    | -19  | ps/volt       |
| $I_{CP-early}$ | 2.23 | uA            |
| $I_{CP-late}$  | 2.75 | uA            |
| $C_{LF}$       | 88.8 | $\mathrm{pF}$ |
| Т              | 640  | $\mathbf{ps}$ |
| Ν              | 32   | -             |
| PD hysteresis  | 1.3  | $\mathbf{ps}$ |

Table 5.3: Nominal DLL parameters with the DLL operated at 1562.5 MHz.



Figure 5.13: Simulated DLL control voltage  $(V_{ctrl})$  for a 1562.5 MHz reference signal.

where  $I_{CP}$  represents the CP current,  $T_{ref}$  the reference signal's period,  $C_{LF}$  the LF capacitance,  $G_{DB}$  the gain  $\frac{\Delta t}{V_{ctrl}}$  of a single delay-buffer element and N the number of delay elements within the loop. To reduce the amount of change within one reference clock period, small CP currents  $I_{CP}$  as well as large LF capacitances  $C_{LF}$  need to be employed. Operating the DLL with a 1.5625 GHz reference clock (LSB = 5 ps) the small signal gain of a single delay buffer  $G_{DB}$  is evaluated to be approximately -19 ps/V. The exact bandwidth of the DLL does not represent a critical parameter but should be designed to be well below the reference signal to avoid any jitter due to switching to be adjusted by the loop. At the nominal CP current of approximately 2.5  $\mu$ A and a LF capacitance of 88.8 pF the change in transition time at the output of

the DLL equates to

$$\Delta t_{DL} \approx 0.011 \, \frac{ps}{period}.\tag{5.8}$$

With a given hysteresis of the PD of 1.3 ps, it takes approximately 118 cycles for the PD to switch its state. In reality the up and down current of the CP are different. From simulation the up and down cycle has been evaluated to be in the order 98 and 133/134 cycles with a DLL reference signal frequency of 1.5625 GHz. The corresponding simulation results of the closed loop operation showing its control voltage ( $V_{ctrl}$ ) is depicted in figure 5.13.

# 5.3 Channel Matrix

The channel matrix implements a total of 8 channels. To distribute the fine-time code signals, generated by the fine-time interpolator, so called distribution buffers are employed. An adjustment feature, to calibrate out device mismatches, is implemented on the level of the distribution buffers. Different channel configurations have been designed to evaluate effects of the input buffer architecture, device mismatch of the time-capture registers (TCRs) as well as to evaluate the performance of different time capturing schemes. Table 5.4 lists the different channel configurations that have been implemented. To ease the characterization of the channels, always two copies of the same channel have been implemented. To avoid any power supply drop and unwanted parasitic coupling of time critical signals, the channel matrix has been carefully layed-out.

 Table 5.4: Different channel configurations implemented by the demonstrator.

| Ch | annel Number | Input Buffer | Time Capture Register    | Capturing Scheme |
|----|--------------|--------------|--------------------------|------------------|
|    | 1 & 2        | GBT RX       | direct drive FF (custom) | event            |
|    | 3 & 4        | E-Link       | standard FF (custom)     | clock            |
|    | 5 & 6        | GBT RX       | standard FF (custom)     | clock            |
|    | 7 & 8        | GBT RX       | standard FF (tech. lib)  | clock            |

# 5.3.1 Fine-Time Code Buffers - Distribution Buffers

Fast signal edges are essential for good timing and matching performance of the registers. Strong buffers have been implemented to distribute the fine-time code to the channels. For the distribution, the stronger negative edge of the buffers is preferred. An inversion of the fine-time code is already accomplished by the fine-time interpolator. The distribution buffer needs to be dimensioned according to its capacitive load. To keep the parasitic load of the wiring within reasonable limits, the height of the channels need to be kept small. Its height has been limited to 50  $\mu$ m. Only for channel 1 & 2, due to its increased complexity, the height has been increased to approximately  $60 \,\mu m$ . For a total of 8 channels, the length of the bus to distribute the fine-time code equates to  $420 \,\mu\text{m}$ . To avoid large RC-delays along the line, the distribution of the fine-time code is accomplished using a low resistive metal layer. From the design manual, for a minimum width wire on metal layer four (MQ), the resistance and the capacitance per unit length is given to be  $0.105 \,\Omega/\mu m$  and approximately  $0.2 \,\mathrm{fF}/\mu m$  respectively, equating to 84 fF for inter-channel routing of the whole segment. From the post layout extracted view, the total parasitic routing capacitance, including the contribution resulting from the TCRs, for each of the 128 signals of the fine-time code is approximated to be within the range of 75 fF to 79 fF. A summary of the parasitic capacitances, resulting from the TCRs, for the different channel implementations is given in table 5.5.

**Table 5.5:** Summary of the estimated input capacitances for the different time capturing register (TCR) types.

| Channel Number | Time Capture Register    | Input | Routing $C^a$    | Device C         |
|----------------|--------------------------|-------|------------------|------------------|
| 1 & 2          | direct drive FF (custom) | CLK   | $3.3\mathrm{fF}$ | $7.7\mathrm{fF}$ |
| 3 - 6          | standard FF (custom)     | D     | $1.3\mathrm{fF}$ | $4.5\mathrm{fF}$ |
| 7 & 8          | standard FF (tech. lib)  | D     | $1.9\mathrm{fF}$ | $1.5\mathrm{fF}$ |

 $^{a}\mathrm{TCR}$  contribution only

This leaves us with a total capacitance (i.e. routing and device) to be driven by the distribution buffers of approximately  $C_{total} \approx 130 fF$ . The RC-delay of the line has been evaluated to be less than 2.5 ps. As such delays only lead to a degradation in the



**Figure 5.14:** Schematic diagram of the distribution buffers to distribute the fine-time code. Capacitive loading is used to adjust for device mismatches.

signal-slope, affecting all the bins of a channel in the same way, the RC-delay due to the inter-channel routing is found to be fully negligible.

The schematic diagram of the distribution buffer including its adjustment feature is shown in figure 5.14. For good matching performance of the registers, signal slopes in the range of 50 ps have been found to be necessary. With the given dimensions, the 10 % to 90 % fall- and rise-times of the fine-time code signals are approximately 45 ps and 70 ps respectively. The current consumption of the cell depends on the precise setting of the adjustment network. It reaches from 647  $\mu$ A in the case of all bit equal to '0' up to 763  $\mu$ A if all bits are set to '1'. Operated at 1.5625 GHZ and for the mid-calibration setting ('1000'), the current consumption is approximated to be 707  $\mu$ A.

**Device Mismatch** The  $1^{st}$  stage of the distribution buffers is designed to use rather strong devices. Enough driving strength is provided by the fine-time interpolator to sufficiently drive the input capacitance of the distribution buffers. Any mismatch introduced at this level can be compensated by adjustment feature of the distribution buffers. From the input to output of the distribution buffers, an increase of the 1-sigma variation of LSB size across all bins from 1.24 ps-rms to 1.76 ps-rms is expected.

Adjustment Feature Device mismatches introduced by the fine-time interpolator as well as the distribution buffers themselves are forseen to be compensated by the adjustment feature integrated into the distribution buffers. The required range has been chosen based on mote-carlo simulation results. For a 5-sigma design deviations



**Figure 5.15:** Illustration of the *event-capture* implementation. The time-of-arrival of an event is captured by sampling the event over time.

from the nominal value of a single edge are expected to fall in the range of  $\pm 6$  ps. Capacitive loading is employed to delay the propagation of a single edge. To avoid large capacitances to be implemented at the output of the buffer, the adjustment feature is implemented after the first buffering stage, see figure 5.14. A 5-bit digital-to-time converter has been implemented. In total up to 64 fF of capacitance can be added in 2 fF steps, allowing to delay signals by up to 32 ps in 1 ps steps. This allows not only to compensate for differential-non-linearity (DNL) errors but also to compensate for integral-non-linearity (INL) errors up to 6.4 LSB. Due to the additional capacitive loading the current consumption, compared to a cell without any calibration added, is increased by approximately 25 % in the case of the mid-calibration setting ('1000').

# 5.3.2 Event Capture Based Channel

In an event capturing based architecture, as discussed in section 3.1.10, the outputs of the fine-time interpolator are connected to the CLK input whereas the event signals are connected to the D input of the TCRs. This causes the state of the event to be sampled over time and stored into the TCRs. Generally, with each reference clock cycle the captured data is overwritten requiring to analyze the captured data within one reference clock cycle. To prevent the need to process the whole array in each clock cycle, the content of the TCRs are transferred only on the detection of an event, lowering power consumption.

A block diagram of the implemented circuit is depicted in figure 5.15. To improve signal integrity the event signal is distributed in a fully differential manner. On the arrival of an event, the state of the fine-time-code is transferred into the TCRs. The detection of an event is accomplished with so called transition detectors. The TCRs are constantly clocked with clock phases uniformly distributed across one reference clock cycle. This can potentially lead to metastability during the latching process. For this reason, the whole array is divided into two segments. To allow the TCRs to settle, the transfer to the storing registers is shifted in time. Here, an array size of 128 is assumed. An event arriving in the first half of the array causes the transition detector of segment A to trigger the transfer only on the arrival of bin80, a signal of segment B. This stores the value of all the TRCs of segment A in the storing registers. Depending on the relative position of the event signal to bin 16, the content of segment B is transferred half a clock cycle earlier or later than the content of segment A. This causes segment B to either latch all '0's or all '1's. The time of an event is extracted from the '0' to '1' transition of the latched code.

The transfer from the TCRs to the readout registers is only done on the arrival of an event. The process can be divided into two sequences: a) capture the event and b) latch the captured data into the readout registers. Both events need to be precisely timed to guarantee correct operation. In a first approximation, half of the clock period is contributed to either event. When generating 5 ps LSB sizes the complete time window is only 640 ps. For redundancy, to allow to virtually correct for time offsets later, the whole array is latched twice. To keep the conceptual diagram simple, only half of the array have been drawn out in figure 5.15. Careful timing analyzes of the post-layout extracted view can be used to precisely determine the given timing offsets. This can circumvent the need to capture the full array twice. Additional multiplexers, not shown in figure 5.15, have been inserted to allow to reuse the transfer registers as the readout registers.

In the presented capturing scheme, the D inputs of the TCRs are connected to the event signal, causing the state of the first latch to stay constant for most of the time. Only on a change of the event signal, the state of the register is switched. This makes the register to consume only very little power for most of the time. However, the CLK



Figure 5.16: Schematic diagram of the TCRs for the *event capture* based channel implementation. The  $1^{st}$  latch is optimized for timing.

input of the register has to toggle with the reference signal's frequency making the CLK input critical in terms of capacitive load.

# 5.3.2.1 Event Capture Register

A special TCR has been developed to reduce the capacitive load attached to the CLK input. The schematic of the register is presented in figure 5.16. To benefit from the stronger falling edge of the fine-time code signals, the register is clocked on the negative edge. As in a traditional FF, the first latch follows input D as long as CLK is 'high' (switch T4 and T9 closed). In this state the second latch is disconnected through switch T14 from the first latch. The switch T19 closes the feedback of the 2nd stage to keep the output value Q constant. On the falling edge of CLK, the state of the first latch is transferred to the second latch. In total only 4 devices are involved to capture



**Figure 5.17:** Simulated waveform diagrams of the proposed *event capture* register. The waveforms shown describe the behavior of the FF in the vicinity of the decision making point. A one-to-zero transition at the output is shown.

the state of input D, minimizing the parasitic capacitance load attached to the CLK input. In figure 5.17 the waveforms of the register for the more interesting case of a one-to-zero transition at the output are shown.

Dimension of the flip-flop are given in table 5.6. The first latch and its drivers have been optimized for matching performance whereas the second latch has been optimized for low power consumption. To reduce the capacitive loading on CLK input and to avoid to regenerate a negated version of the CLK signal, single NMOSs or PMOS switches are employed. This causes the low state not the be fully transferred from the 1st to the 2nd latch. To circumvent this problem, the inverter formed by the devices T15 & T17 has been designed to provide a higher threshold voltage.

| Device            | Width          | Length          |
|-------------------|----------------|-----------------|
| T0 & T5           | $2\mu{ m m}$   | $0.12\mu{ m m}$ |
| T1 & T6           | $0.7\mu{ m m}$ | $0.12\mu{ m m}$ |
| T2 & T7           | $4.5\mu{ m m}$ | $0.12\mu{ m m}$ |
| T3 & T8           | $1.5\mu{ m m}$ | $0.12\mu{ m m}$ |
| T4, T9, T10 & T11 | $3\mu{ m m}$   | $0.12\mu{ m m}$ |
| T12 & T13         | $1\mu{ m m}$   | $0.12\mu{ m m}$ |
| T14               | $1\mu{ m m}$   | $0.12\mu{ m m}$ |
| T15 & T16         | $2\mu{ m m}$   | $0.12\mu{ m m}$ |
| T17               | $0.5\mu{ m m}$ | $0.12\mu{ m m}$ |
| T18 & T19         | $0.7\mu{ m m}$ | $0.12\mu{ m m}$ |
|                   |                |                 |

Table 5.6: Device dimensions of the event capture TCR.

 Table 5.7: Current consumption of the proposed event capture register.

| Current Consumption               | D/CLK = 0           | D/CLK = 1           |
|-----------------------------------|---------------------|---------------------|
| D @ 100 MHz                       | $6.94\mu\mathrm{A}$ | $14.8\mu\mathrm{A}$ |
| $\mathrm{CLK} @~1.56\mathrm{GHz}$ | $0.02\mu\mathrm{A}$ | $0.02\mu\mathrm{A}$ |

Table 5.8: Parasitic input capacitance of the proposed event capture register.

| Input Capacitance | routing          | device           |
|-------------------|------------------|------------------|
| D                 | $1.4\mathrm{fF}$ | $2.7\mathrm{fF}$ |
| CLK               | $3.3\mathrm{fF}$ | $7.7\mathrm{fF}$ |

Table 5.7 and table 5.8 list the current consumption of the register as well as its routing and device capacitance. As long as the D input stays constant the latch itself consumes zero power. Anyhow, power needs to be invested to load/unload the parasitic capacitances attached to the CLK input.

**Device Mismatch** To avoid calibration to be necessary on single register basis, the TCRs have been designed to provide good intrinsic matching performance. As there is

#### 5. DEMONSTRATOR ASIC



**Figure 5.18:** Illustration of the *clock-capture* implementation. The time-of-arrival of an event is captured by sampling the state of the fine-time interpolator.

a trade-off to be made between the size power consumption (i.e. size) and the matching performance, device dimension have been chosen to provide reasonably good matching in the case of 5 ps LSB sizes. The timing performance of the register is only determined by the first latch whereas the requirements of the 2nd latch can greatly be relaxed. From monte-carlo simulations, the expected 1-sigma variation of the time capture point has been approximated to be 1.3 ps-rms. This is just about good enough for a 5 ps LSB TDC. In any case the 3-sigma variation is to be expected to be about 3.9 ps. As two edges define the LSB size absolute variations of greater than 5 ps are to be expected. In rare cases this can lead to missing codes that need to be handled accordingly by the digital logic but due to their rareness will not greatly influence the final rms-resolution of the TDC.

#### 5.3.3 Clock Capture Based Channel

In a clock capturing based architecture, as discussed in section 3.1.10, the outputs of the fine-time interpolator are connected to the D input whereas the event signals are connected to the CLK input of the TCRs. This causes the state of the interpolator to be sampled and stored into the TCRs on the arrival of an event. For the event to be captured, the first latch of the TCRs needs to be kept transparent so that its input can follow the state of the fine-time code signal. That causes the 1st latch to switch with the frequency of the reference input and causes considerable power to be consumed. A block diagram of the implemented circuit is depicted in figure 5.18.



Figure 5.19: Schematic diagram of the TCRs for the *clock capture* based channel implementation. For the better matching channels, the  $1^{st}$  latch is optimized for timing.

# 5.3.3.1 Clock Capture Register

Two different versions of the clock capturing registers have been implemented to investigate the different effects of device mismatch and power consumption. Both implementations are based on the same schematic as shown in figure 5.19. One register has been taken from the standard cell library together with its standard cell layout whereas the other flip-flop has been designed and laid-out to improve timing performance and reduce parasitic capacitances. With either register, two distinct channel pairs have been implemented.

The waveforms of the better matching channel are shown in figure 5.20. The dimensions of the standard cell as well as the good matching register are listed in table 5.9. For the good matching register the 1st latch has been optimized for better matching performances. Good matching registers require fast input signal edges. For this reason, the strength of the internal CLK buffer is also increased.



**Figure 5.20:** Simulated waveform diagrams of the proposed *clock capture* register. The waveforms shown describe the behavior of the FF in the vicinity of the decision making point. A one-to-zero transition at the output is shown.

**Table 5.9:** Device dimensions of a conventional flip-flop used as a *clock capture* register.Device dimensions of standard cell and better matching register are shown.

| D         | Width           | Width           | T               |
|-----------|-----------------|-----------------|-----------------|
| Device    | (Standard Cell) | (Good Matching) | Length          |
| T1 - T7   | $1\mu{ m m}$    | $3\mu{ m m}$    | $0.12\mu{ m m}$ |
| T8 - T14  | $0.5\mu{ m m}$  | $1.5\mu{ m m}$  | $0.12\mu{ m m}$ |
| T15 - T19 | $1\mu{ m m}$    | $1\mu{ m m}$    | $0.12\mu{ m m}$ |
| T20 - T24 | $0.5\mu{ m m}$  | $1\mu{ m m}$    | $0.12\mu{ m m}$ |

| Current Consumption     | Standard Cell       |                     | Good Matching       |                     |
|-------------------------|---------------------|---------------------|---------------------|---------------------|
|                         | D/CLK = 0           | D/CLK = 1           | D/CLK = 0           | D/CLK = 1           |
| D @ 1.56 GHz            | $31.4\mu\mathrm{A}$ | $0.01\mu\mathrm{A}$ | $72.2\mu\mathrm{A}$ | $0.08\mu\mathrm{A}$ |
| CLK @ $100\mathrm{MHz}$ | $1.99\mu\mathrm{A}$ | $1.85\mu\mathrm{A}$ | $3.31\mu\mathrm{A}$ | $3.10\mu\mathrm{A}$ |

**Table 5.10:** Current consumption of a conventional flip-flop used as a *clock capture* register. The current consumption of a standard cell and better matching register is shown.

**Table 5.11:** Parasitic input capacitance of the conventional flip-flop used as a *clock capture* register. The parasitic capacitances of a standard cell and better matching register are shown.

| Input Capacitance | Standard Cell    |                  | Good Matching    |                  |
|-------------------|------------------|------------------|------------------|------------------|
| Input Capacitance | routing          | device           | routing          | device           |
| D                 | $1.9\mathrm{fF}$ | $1.5\mathrm{fF}$ | $1.3\mathrm{fF}$ | $4.5\mathrm{fF}$ |
| CLK               | $2.3\mathrm{fF}$ | $1.5\mathrm{fF}$ | $2.3\mathrm{fF}$ | $4.5\mathrm{fF}$ |

**Mismatch** Two different realization of the same register with different dimension have been implemented. This allows to compare the simulation with the actual measured results and allows to demonstrate the timing performance comparing a standard cell register against a fully custom cell. The timing performance of the register is only determined by the first latch whereas the requirements of the 2nd latch can greatly be relaxed. From monte-carlo simulations, the expected 1-sigma variation of the time capturing point has been approximated to be 2.4 ps-rms and 1.3 ps-rms for the standard cell register and the good matching register respectively. This relates to a 3-sigma variation of about 7.2 ps and 3.9 ps respectively. As two edges define the LSB size, variations greater than 10 ps are to be expected. This renders the standard cell option not to be suitable for the very fine time resolution setting of 5 ps LSB. For the better matching register the 1-sigma variation is to be expected to be just about good enough for a 5 ps LSB TDC. Its performance is similar to the *event capture* register discussed earlier. In any case missing codes can occur that need to be handled accordingly by the digital logic.

# 5. DEMONSTRATOR ASIC

**Table 5.12:** Estimated parasitic capacitance of the TCRs to which the event signal needs to connected. The estimated Elmore delay as well as the simulated rise and fall times of the event signal for the given distribution network is also shown.

| Channel Number | C per Register   | Elmore Delay $^{a}$ | Simulated Transition Times                                  |
|----------------|------------------|---------------------|-------------------------------------------------------------|
| CH 1-2         | $4.1\mathrm{fF}$ | $31.2\mathrm{ps}$   | $t_{rise} = 49 \mathrm{ps} \ / \ t_{fall} = 44 \mathrm{ps}$ |
| CH 3-6         | $6.8\mathrm{fF}$ | $46.7\mathrm{ps}$   | $t_{rise} = 60 \mathrm{ps} \ / \ t_{fall} = 55 \mathrm{ps}$ |
| CH 7-8         | $3.8\mathrm{fF}$ | $29.5\mathrm{ps}$   | $t_{rise} = 98 \mathrm{ps} \ / \ t_{fall} = 78 \mathrm{ps}$ |

<sup>a</sup>Expected Elmore delay without intermediate buffering added to the distribution network.



**Figure 5.21:** Block diagram of the event signal distribution network: (a) for channel pair 1 - 6, (b) for channel pair 7 - 8.

#### 5.3.4 Event Distribution

The event signal needs to connect to a high number of registers. In our case the signal needs to be routed to a total of 128 registers. A straight wire connecting to all the TCRs one after another would cause timing variations due to RC-delay being introduced. A h-tree like distribution network is implemented to avoid any artificial delays being introduced in the measurement. Assuming a cell width of  $10 \,\mu$ m, i.e. a wire length of 5.12 mm for a h-tree distribution structure, the Elmore delay of the complete structure is estimated to be between 46.7 ps and 29.5 ps for the different wire loads as listed in table 5.12. This would cause the signal edge of the event signal to degrade considerably. However, for good matching performance, sharp signal edges are essen-



**Figure 5.22:** Schematic diagrams and dimensions of the event distribution buffers: (a) the 0th level buffer, (b) the 1st level buffer (c) the 2nd level buffer.

tial. To sustain sharp signal edges intermediate buffer structures are introduced in the distribution network. Whereas, for the weaker matching channel pair one intermediate level has been found sufficient, for the better matching channels, i.e. channels 1-6, two intermediate buffer structures have been inserted. The two different distribution networks are shown in figure 5.21a and figure 5.21b respectively. The post layout extracted rise- and fall-times are also listed in table 5.12. The schematic diagram of the respective buffers are shown in figure 5.22. To prevent any mismatch to be introduced by the second level buffering stage, non-cascaded inverter structures with large device sizes are employed. With an estimated 1-sigma variation of 0.2 ps-rms of the second level buffer cell, device mismatches have been found to be negligible. The introduced inversion in the case of the weaker matching channel is resolved at the differential input level of the chip.

# 5.4 Expected Performance

The proposed TDC has been optimized to achieve sub 5 ps-rms time resolutions in a multi-channel environment. The better performing channels 1 - 6 are expected to achieve good linearity when generating 5 ps LSB size whereas the weaker performing channel pair 7 & 8 are operated with 10 ps LSB sizes.

# 5.4.1 Timing Precision

The timing generator has been designed to generate LSB sizes reaching from 5 ps to 20 ps. To prevent any time offsets being introduced due to signal routing, time critical signals have been carefully laid-out. At such an early stage of development, the exact

| Channel      | $\sigma_{DNL}$             | $\sigma_{INL}$             | Ref./Termal              | Quant. noise                   | Exp. res. <sup><math>a</math></sup> |
|--------------|----------------------------|----------------------------|--------------------------|--------------------------------|-------------------------------------|
| CH 1 & 2     | 1.8 ps-rms                 | 1.8 ps-rms                 | 1 ps-rms                 | $1.44\mathrm{ps}	ext{-rms}$    | 2.9 ps-rms                          |
| CH 3 - 6     | $1.8\mathrm{ps}	ext{-rms}$ | $1.8\mathrm{ps}	ext{-rms}$ | $1\mathrm{ps}	ext{-rms}$ | $1.44\mathrm{ps}	ext{-rms}$    | 2.9 ps-rms                          |
| CH 7 & $8^b$ | $3.4\mathrm{ps}	ext{-rms}$ | $3.4\mathrm{ps}	ext{-rms}$ | $1\mathrm{ps}	ext{-rms}$ | $2.88\mathrm{ps}\mathrm{-rms}$ | 4.8 ps-rms                          |

 Table 5.13: Estimated timing performance of the different channel configurations.

<sup>a</sup>Expected resolution is calculated after  $\sigma_{TDC} = \sqrt{(\sigma_{DNL}/\sqrt{12})^2 + \sigma_{INL}^2 + \sigma_q^2 + \sigma_{clk}^2 + \sigma_{noise}^2}$ <sup>b</sup>Due to the bad matching performance of the registers, the LSB size is relaxed to 10 ps.

timing performance is difficult to estimate. However, based on Monte-Carlo simulations first expectations can be formulated. Any timing mismatch introduced by the fine-time interpolator and distribution buffers can be compensated by the adjustment feature. Thereby, the intrinsic mismatch filtering feature of the resistive division circuit helps to reduce non-linearity errors introduced by the fine-time interpolator. The remaining linearity errors introduced by the interpolator, are calibrated out on a segment per segment basis. This allows to virtually compensate device mismatches resulting from the fine-time interpolator as well as the distribution buffers themselves. The required range of the adjustment feature has been extracted from Monte Carlo simulation. Only timing mismatches caused by the TCRs are expected to influence the linearity of the TDC. As time mismatches are not adding up at the level of TCRs, as would be the case in a DLL configuration, in a first approximation, the INL can be approximated to equal the DNL errors.<sup>1</sup> To achieve good time resolutions, a clean and jitter free reference clock needs to be applied. As a first approximation of the jitter introduced by the reference signal, a jitter value of 1 ps-rms is assumed. Period-jitter resulting from thermal noise has been evaluated by transient noise analysis to be less than 1 psrms. Jitter resulting from power supply noise is far more difficult to estimate and is subject to be measured in the real design. A summary of the expected performance of the different channel configurations is given in table 5.13.

<sup>&</sup>lt;sup>1</sup>For simplicity the contribution of DNL error is added in rms-sense to the final rms time resolution. In general, the DNL would need to be calculated as explained in 3.2.3.

| Block                   | acquisition on    | acquisition off  |  |
|-------------------------|-------------------|------------------|--|
| Interpolator            | $71\mathrm{mW}$   |                  |  |
| Delay Line              | $40\mathrm{mW}$   |                  |  |
| Resistive Interpolation | $28\mathrm{mW}$   |                  |  |
| Loop Components         | $3\mathrm{mW}$    |                  |  |
| Channel Matrix          | $163\mathrm{mW}$  | $109\mathrm{mW}$ |  |
| Distribution Buffers    | $109\mathrm{mW}$  |                  |  |
| CH 1-2                  |                   | -                |  |
| CH 3-6                  | $11.1\mathrm{mW}$ | $0\mathrm{mW}$   |  |
| CH 7-8                  | $4.8\mathrm{mW}$  | $0\mathrm{mW}$   |  |

Table 5.14: Power consumption estimates of the respective blocks of the demonstrator.

# 5.4.2 Power Consumption

Based on post-layout extracted simulations, the average power consumption of the architecture, when acquisition is on for all the channel, has been estimated to be 234 mW equating to approximately 29 mW/channel. A large fraction of the power consumed by the architecture will be contributed to distribute the fine-time code signals to the channels. The power consumption of the registers themselves can be extrapolated from the power estimated of a single cell. In table 5.14 the estimated contribution of the different blocks is summarized. The values have been evaluated for 5 ps LSB sizes and for a 1.2 V power supply voltage.

The power consumption of the distribution buffers is shared across all the channels. To estimate the power consumption of each respective channel configuration, the power consumed by the distribution buffers is proportionally assigned to the channel's capacitive load. In table 5.15 the capacitive load as well as its estimated power consumption, including the power consumed by the registers plus the proportional fraction of power consumed by the distribution buffers, for a single bin, is shown. The highest power is consumed by the channel configuration employing the better matching standard cell TCR of channel 3 - 6. Approximately 25% less power is consumed by channel pair 1 & 2 employing the the custom designed *event capture* registers. The lowest power is consumed by channel pair 7 & 8, by employing the weaker matching TCRs.

| Channel  | Block Level      | Register         | Load       | Power Consumption <sup>a</sup> |                        |
|----------|------------------|------------------|------------|--------------------------------|------------------------|
|          | Routing          |                  | Percentage | acquisition on                 | acquisition off        |
| CH 1 & 2 | $7.6\mathrm{fF}$ | $11\mathrm{fF}$  | 16.4%      | $17.9\mathrm{mW/ch}.$          |                        |
| CH 3 - 6 | $7.6\mathrm{fF}$ | $5.8\mathrm{fF}$ | 11.9%      | $24.1\mathrm{mW/ch}$ .         | $13\mathrm{mW/ch}$ .   |
| CH 7 & 8 | $7.6\mathrm{fF}$ | $3.4\mathrm{fF}$ | 9.7%       | $15.3\mathrm{mW/ch}$ .         | $10.6\mathrm{mW/ch}$ . |

**Table 5.15:** Estimated parasitic load and power contribution, proportionally scaled ac-cordingly to its parasitic load, of each channel configuration.

<sup>a</sup>Includes the power consumption of the TCRs and distribution buffers.

# **References Chapter 5**

- [69] <u>L. Perktold</u> and J. Christiansen, "A high time-resolution (< 3 ps-rms) time-todigital converter for highly integrated designs," in *Instrumentation and Measurement Technology Conference (I2MTC), 2013 IEEE International,* 2013.
- [70] <u>L. Perktold</u> and J. Christiansen, "A flexible 5 ps bin-width timing core for next generation high-energy-physics time-to-digital converter applications," in *Ph.D. Research in Microelectronics and Electronics (PRIME), 2012 8th Conference on*, 2012, pp. 1–4.
- [71] J. Maneatis, "Low-jitter and process independent DLL and PLL based on self biased techniques," in Solid-State Circuits Conference, 1996. Digest of Technical Papers. 42nd ISSCC., 1996 IEEE International, feb 1996, pp. 130-131, 430.
- [72] S. Anand and B. Razavi, "A CMOS clock recovery circuit for 2.5-Gb/s NRZ data," Solid-State Circuits, IEEE Journal of, vol. 36, no. 3, pp. 432 –439, mar 2001.
- [73] E. Bogatin, Signal and Power Integrity: Simplified, ser. Prentice Hall Modern Semiconductor Design Series. Prentice Hall PTR, 2010. [Online]. Available: http://books.google.ch/books?id=zMpSGQAACAAJ
- [74] S. Henzler, S. Koeppe, D. Lorenz, W. Kamp, R. Kuenemund, and D. Schmitt-Landsiedel, "A local passive time interpolation concept for variation-tolerant high-resolution time-to-digital conversion," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 7, pp. 1666 –1676, july 2008.

[75] J. Christiansen, "An integrated high resolution CMOS timing generator based on an array of delay locked loops," *Solid-State Circuits, IEEE Journal of*, vol. 31, no. 7, pp. 952–957, 1996.

# 6

# **Experimental Results**

A prototype has been designed and fabricated in a commercial 130 nm technology. To perform measurements of the demonstrator, a chip-carrier board, a field-programmable gate array (FPGA) based readout board as well as a software suite to analyze the captured data have been designed. The demonstrator application specific integrated circuit (ASIC) is to be used to understand design trade-offs and compare the measurement results to simulation as well as to disseminate the obtained results to the experiments.

The previous chapter described the transistor level implementation of the constructed demonstrator. In this chapter I concentrate on the obtained measured performance. First, the test setup constructed to perform the different measurements is explained. Later, the conducted tests are described and the obtained results are presented, discussed and compared to simulation. Most notable, the time-to-digital converter (TDC) transfer function characteristics, the measured timing performance as well as the TDCs power consumption are presented. The chapter closes with a short performance summary of the achieved results. In the next chapter, a conclusion of the conducted work is drawn.

# 6.1 Measurement Setup

A block diagram of a modular test setup is shown in figure 6.1. A standard PC is used to control the instruments and to setup the communication to the FPGA board. The FPGA board is used to control and readout the TDC as well as to program its adjustment feature. A low jitter clock, serving as the TDC's reference signal, is provided

# 6. EXPERIMENTAL RESULTS

by a SRS clock generator, model CG635. The event signals are either generated by one of the listed pattern generators or directly by the FPGA itself. Depending on the conducted test, the test bench is adapted accordingly.



Figure 6.1: Block level diagram of the test setup used to characterize the TDC.

To provide the physical interface between the microchip and the outside world, a carrier printed circuit board (PCB) has been developed and constructed. A photograph with the microchip wire-bonded to the PCB is shown in figure 6.2. In total five low drop-out (LDO) regulators are employed to generate a clean power supply for the ASIC as well as to supply the on-board transceiver circuits. A VHCDI connector is used to connect to the FPGA development board. The different channel connections as well as the reference signal connections indicated.

# 6.2 TDC Characterization

To fully characterize the TDC, a series of seven tests have been conducted. Whereas, the first two tests investigate the basic operation of the TDC, the remaining tests are



Figure 6.2: Photograph of the carrier PCB.

used to characterize the TDC's performance. A list of the conducted tests is given below. If not stated otherwise, the ASIC is supplied with a 1.3 V power supply running at a temperature of approximately 25 °C. The TDC's reference signal frequency is mentioned in the corresponding section. In its highest resolution setting the TDC reference signal is set to 1.5625 GHz. A selected set of the presented results in this chapter have been published in [76].

- Functional Test
- DLL Locking Range
- TDC Transfer Function
- RMS-Time Resolution
- Power Consumption

- Inter-Channel Crosstalk
- Voltage-Temperature Variations

# 6.2.1 Functional Test

The output of the phase detector (PD) when the reference signal's frequency is set to approximately 1.48 GHz is shown in figure 6.3. As long as the PD output is constantly switching the delay-locked loop (DLL) is considered to be in locked condition. Due to the low bandwidth of the DLL, the switching cycle is dominated by transients in the control loop. The jitter of the clock propagating through the DLL has been measured at the input and at the output of the DLL. As illustrated in figure 6.4, the time-intervalerror (TIE), period jitter as well as the period-period jitter of the output signals are shown. An absolute increase in TIE of 0.28 ps-rms or equivalently 0.84 ps-rms, added in a rms sense, could be observed. In reality, smaller absolute jitter values are to be expected as additional jitter introduced by the output buffers as well as jitter coming from the oscilloscope itself are included in the measurement. Due to the marginal increase in jitter, time variations introduced by the switching behavior of the DLL are considered to be negligible.



Figure 6.3: Measured output of the PD signal in locked condition.

To test the basic functionally of the TDC and the readout system, the phase of the event signal has been synchronized to the reference signal and has been swept in 100 ps steps across the reference signal's period. In figure 6.5 and figure 6.6 the captured data stream of the two different capturing concepts, as presented in section 5.3.2 and section 5.3.3 respectively, are shown. Representative, data from channel 1 and channel



Figure 6.4: Measured jitter characteristics of: (a) the input signal of the DLL (b) the output signal of the DLL.

7, when the DLL reference signal's frequency is set to 1.28 GHz, are shown.<sup>1</sup> For this test, the adjustment feature was disabled, i.e. set to 00000. To identify the time of arrival of an event, in both cases, the 0-to-1 transition has to be extracted. Each of the 128 bits represent a time step of 6.1 ps. The absolute delay setting of the instrument is not very well under control and can vary by several tens of ps.

For the *event capture* scheme the full array is latched twice to virtually adjust for timing offsets. But, only half of the latched data is used to construct the data word. From the measurement shown in figure 6.5a and figure 6.5b respectively the measured time difference can be extracted to be 13 bins or equivalently 79 ps for the first phase step.

For the *clock capture* scheme the reconstruction is easier. The full array is processed as latched. As the negated fine-time code is distributed to the channel matrix, the 0-to-1 transition represents the rising edge of the signal. From the measurements shown in

<sup>&</sup>lt;sup>1</sup>Whereas the tests of channel 7 have been preformed with the power supply voltage set to 1.2 V, measurements of channel 1 have been performed with the power supply voltage set to 1.3 V.

# 6. EXPERIMENTAL RESULTS



(g) approximately 700 ps

**Figure 6.5:** *Event capture* scheme: Data patterns for different phases of the event signal. The DLL is operated with a reference signal of 1.28 GHz. The area shaded green, represents the first half, the area shaded blue, represents the second half of the captured data stream.

```
(a) approximately 100 ps
(b) approximately 200 ps
(c) approximately 300 ps
(d) approximately 400 ps
(e) approximately 500 ps
(f) approximately 600 ps
(g) approximately 700 ps
```

**Figure 6.6:** *Clock capture* scheme: Data patterns for different phases of the event signal. The DLL is operated with a reference signal of 1.28 GHz.

figure 6.6a and figure 6.6b respectively, the measured time difference can be extracted to be 15 bins or equivalently 92 ps. From this measurement the duty cycle of the reference signal propagating down the DLL can be extracted to be approximately 37 %.

# 6.2.2 DLL Locking Range

The TDC is designed to operate with different input frequencies ranging from 390.625 MHz up 1.5625 GHz. During start-up the DLL adjusts the total delay of the delay-line to

precisely match the input signal's period. Only when the PD output is constantly switching, the DLL is considered to be in lock. Lock is acquired starting from the delay-line running in its fastest mode. For reference signal frequencies smaller than approximately  $0.5 \cdot f_{max}$ , the PD 'wrongly' indicates 'early' trying to speed up its delay beyond its capabilities. This requires a special start-up procedure in which the state of the charge-pump (CP) is forced to indicate 'late' until the delay of the DLL is close enough to the reference signal's period so that the feedback of the loop can correctly acquire lock. The control mechanism during start-up is implemented off-chip and integrated into the firmware of the FPGA.

The measured samples reach a maximum frequency of approximately 1.490 GHz. In the slow corner (SS) the cell is expected to achieve a minimum delay (Vctrl = 0V) of 21 ps. This corresponds to a maximum frequency of 1.486 GHz. As it is rather unlikely to fall in the SS corner, the parasitic estimates from the post layout extracted view are likely to be a little bit on the optimistic side. However, the agreement between measurements and simulation are quite good. To compensate for the slower operation the supply voltage has been increased to 1.3 V. This allows the TDC to operate within the foreseen reference frequency range as well as slightly beyond.

# 6.2.3 TDC Transfer Function

To extract the linear characteristic of the TDC, ideally a signal with a well controlled phase relationship with respect to the reference signal is swept across the dynamic range of the TDC. This requires to adjust the phase of the event signal in fine steps across a large dynamic range. As it is very challenging to generate such fine time delay steps with high linearity, a statistical approach to characterize the TDC is usually preferred [77]. In figure 6.7, a schematic sketch of this method, usually referred to as a code density test, is shown. A completely flat distributed sequence of events is generated. Based on the amount of events collected for each code, the actual least-significant bit



Figure 6.7: Functional diagram of a code-density-test.

(LSB) size can be calculated from

$$LSB_i = \frac{NumEvents_i}{totalNumEvents} \cdot T_{ref}.$$
(6.1)



Figure 6.8: Histogram of the event phase with respect to the reference signal's phase for a 1.28 GHz reference signal.



Figure 6.9: Measured LSB size of channel 5 for all 128 bins after global calibration.

To generate a perfectly flat distribution of events, the reference signal as well as the event signal are generated from two distinct uncorrelated clock sources. The quality of the distribution, for a 781.25 ps clock period, is shown in figure 6.8. A total of 1.000.000 events have been collected. For the succeeding tests, to reduce acquisition time, the number of events has been limited to 100.000.

Exemplary, the LSB size of channel 5 after global calibration for all 128 bins is shown in figure 6.9. For the test the reference signal's frequency is set to 1.5625 GHz, generating 5 ps LSB sizes. The standard deviation in LSB size is calculated to be in the order of 1.32 ps. No missing codes are observed. As a representative measure of

#### 6. EXPERIMENTAL RESULTS



Figure 6.10: Reconstructed input output transfer-function of channel 5.

the TDC's linearity, the differential non-linearity (DNL) as well as the integral nonlinearity (INL) of the TDC can be extracted from this measurement. Thereby, the DNL represents the maximum deviation of each code from its ideal size, whereas, the INL represents the maximum deviation of the TDC's measured transfer function to its ideal behavior. Whereas the minimum DNL is limited to -1, representing a missing code, the maximum DNL is unlimited. From this test the transfer function of the TDC can be reconstructed as shown in figure 6.10.

**Channel Pair 1 & 2** Channel pair 1 and 2 make use of the custom designed timecapture registers (TCRs) as presented in section 5.3.2.1. Figure 6.12 and figure 6.13 show the DNL and INL for 5 ps LSB sizes, after global calibration has been applied. The calibration vector is derived from the cumulative code-density-test histogram of channel 1 - 2. No missing codes are observed across the channels. The maximum variation in DNL for the two channels is smaller than  $\pm 0.9$  LSB whereas the INL is smaller than  $\pm 2$  LSB. From Monte-Carlo simulation the 1-sigma variation in LSB size across all 128 bins is estimated to be 1.8 ps-rms. From measurements the maximum 1-sigma variation is calculated to be 1.15 ps-rms which is even slightly better than predicted from simulation.

At the fastest reference signal's frequency setting, the propagation delay of the transition detector signal of segment B has been identified to be slightly on the slow side. This causes the first bits of segment B to be overwritten before they are correctly



Figure 6.11: Reconstruction of the data word of channel pair 1 & 2 with a virtual shift of 5 bins, to correct for timing offset shifts. The area shaded green, represents the first half, the area shaded blue, represents the second half of the captured data stream.



Figure 6.12: Channel 1 and 2: Measured DNL of all 128 bins after global calibration for 5 ps LSB sizes. The boxes list the mean 1-sigma variation of the DNL error as well as its weighted rms error calculated after equation (3.2).

transferred to the readout registers.<sup>1</sup> To correct for this delay offset, the valid data is virtually shifted by 5 bins to the right. This shift is only necessary when running at the highest frequency. Figure 6.11 illustrates the construction of the final data word.

<sup>&</sup>lt;sup>1</sup>This timing issue has been identified during post layout extraction simulations but has not been corrected due to the close submission deadline.



Figure 6.13: Channel 1 and 2: Measured INL of all 128 bins after global calibration for 5 ps LSB sizes. The boxes list the RMS value of the INL as well as its weighted rms error calculated after equation (3.3).

An unintended inversion to distribute the readout clock signal has been erroneously introduced in the readout circuit of channel 1. This generates an unintended glitch during the readout process causing the captured data to be corrupted. Keeping the event signal of channel 1 in high state during readout can circumvent this problem.

**Channel Pair 3 - 6** Channel pair 3 - 6 make use of the custom designed TCR as presented in section 5.3.3.1. Figure 6.14 and figure 6.15 show the DNL and INL after global calibration has been applied. The calibration vector is derived from the cumulative code-density-test histogram of channel 3 - 6. No missing codes are observed across the channels. The maximum variation in DNL across all four channels is smaller than  $\pm 0.9$  LSB whereas the INL is smaller than  $\pm 1.3$  LSB. From Monte-Carlo simulation the 1-sigma variation in LSB size across all 128 bins is estimated to be 1.8 ps-rms. From measurements the maximum 1-sigma variation is calculated to be 1.3 ps-rms which is even slightly better than predicted from simulation.



**Figure 6.14:** Channel 3 - 6: Measured DNL of all 128 bins after global calibration for 5 ps LSB sizes. The boxes list the 1-sigma distribution of the DNL as well as the weighted quantization error calculated after equation (3.2).



**Figure 6.15:** Channel 3 - 6: Measured INL of all 128 bins after global calibration for 5 ps LSB sizes. The boxes list the mean free 1-sigma distribution of the INL as well as its weighted rms error calculated after equation (3.3).

**Channel Pair 7 & 8** Channel pair 7 and 8 make use of the standard cell TCR. The 1-sigma LSB variation resulting from these registers has been evaluated from Monte-Carlo simulation to be 3.4 ps. This can lead to missing and/or very large codes. For this reason the LSB size for channel 7 and 8 has been reduced to 10 ps. The corresponding INL and DNL after global calibration is depicted in figure 6.16 and figure 6.17 respectively. No missing codes are observed with 10 ps LSB sizes. The maximum variation in DNL across both channels is smaller than  $\pm 0.9$  LSB whereas the INL is smaller than  $\pm 0.8$  LSB. From measurements the maximum 1-sigma variation is calculated to be 2.8 ps-rms which is even slightly better than predicted from simulation.



**Figure 6.16:** Channel 7 and 8: Measured DNL of all 128 bins after global calibration for 10 ps LSB sizes. The boxes list the 1-sigma distribution of the DNL as well as the weighted quantization error calculated after equation (3.2).



**Figure 6.17:** Channel 7 and 8: Measured INL of all 128 bins after global calibration for 10 ps LSB sizes. The boxes list the mean free 1-sigma distribution of the INL as well as its weighted rms error calculated after equation (3.3).

#### 6.2.3.1 Calibration

To adjust for LSB size mismatches coming from the interpolator structure and the distribution buffers themselves, the fine-time code signals need to be delayed accordingly. Based on the cumulative INL error, a global calibration vector is derived. The vector is calculated for a set of channels employing the same TCRs. Exemplary, the calibration vector of channel 3 to 6 is shown in figure 6.18. The adjustment feature has been presented in section 5.3.1. For each code-step, the corresponding fine-time code signal gets additionally delayed by 1 ps. To allow the adjustment feature to add and subtract additional delays, a calibration code of 16 is uniformly applied to all the bins. In this example, a maximum correction of 8 steps has to be applied to the right edge of the fine-time code signals of bin 80, 83 and 89 respectively.

Exemplary the LSB size of channel 5 before and after global calibration is shown in figure 6.19a and figure 6.19b. A reduction from 2.05 ps-rms, without global calibra-



Figure 6.18: Calibration vector derived from the cumulative INL error of channel 3 to 6 to adjust for global device mismatches.

tion, down to 1.32 ps-rms, with global calibration, has been achieved. Previously, from Monte-Carlo simulations, the 1-sigma variation in LSB size across all 128 bins at the output of the distribution buffers has been evaluated to be 1.76 ps-rms. Ideally, if the expected LSB size variation coming from the fine-time interpolator and distribution buffers (i.e. 1.76 ps-rms) is deducted from the measurement without global calibration applied, after global variations have been corrected a 1-sigma variation of 1.18 ps-rms is expected to be achieved. This brings us reasonably close to the achieved 1.31 psrms stated previously. That is after global calibration is applied, device mismatches contributed by the fine-time interpolator and distribution buffers can be neglected.

To demonstrate the precision of the adjustment feature, each single channel can be calibrated separately. In this case not only device mismatches coming from the interpolator and distribution buffer structure are corrected but also device mismatches coming from the TCRs are compensated. However, as the calibration affects all the channels, the remaining channels will have weaker performance. Exemplary, the LSB size of channel 5, after an ideal adjustment vector has been calculated, is shown in figure 6.19c. In this case, the 1-sigma variation of the LSB size can be reduced to values as small as 0.55 ps-rms.



Figure 6.19: Measured LSB size of channel 5 for different calibration settings.

## 6.2.3.2 Frequency Range

To test the performance of the TDC for different LSB size settings, the reference signal's frequency has been adjusted from 1562.5 Mhz down to 320 MHz generating LSB sizes ranging from 5 ps to 24.4 ps. Representative, the LSB size of channel 5 across all 128 bins of the interpolator has been recorded and is depicted in figure 6.20. To better

investigate the effects of LSB size variations coming from the interpolator, a constant single channel calibration has been applied to channel 5.



**Figure 6.20:** Measured LSB size of channel 5 for different LSB size settings with single channel calibration applied to channel 5.

#### 6.2.3.3 Expected RMS-Time Resolution

An ideal 5 ps LSB size TDC is only limited by its quatization error and can achieve rms-time resolutions as small as 1.44 ps-rms. From the measured INL and DNL errors of the respective channels the expected rms-time resolution denoted as  $\sigma_{TDCqnl}$  can be calculated by

$$\sigma_{TDCqnl} = \sqrt{\sigma_{qDNL}^2 + \sigma_{wINL}^2} \tag{6.2}$$

where the values of  $\sigma_{qDNL}$  and  $\sigma_{wINL}$  are listed in their corresponding INL and DNL plot respectively. Table 6.1 lists the expected rms-resolution for the respective channels

## 6. EXPERIMENTAL RESULTS

**Table 6.1:** Expected rms-time resolution of the demonstrator, including quantizationnoise, INL and DNL errors.

| Channel | LSB               | $\sigma_{qDNL}$                | $\sigma_{wINL}$                | $\sigma_{TDCqnl}$              | $\sigma_{TDCideal}$         |
|---------|-------------------|--------------------------------|--------------------------------|--------------------------------|-----------------------------|
| 1       | $5\mathrm{ps}$    | $1.55\mathrm{ps}\mathrm{-rms}$ | $2.8\mathrm{ps}	ext{-rms}$     | <b>3.20</b> ps-rms             | 1.44 ps-rms                 |
| 2       | $5\mathrm{ps}$    | $1.55\mathrm{ps}\mathrm{-rms}$ | $2.4\mathrm{ps}\text{-rms}$    | $2.86\mathrm{ps}	ext{-rms}$    | $1.44\mathrm{ps}	ext{-rms}$ |
| 3       | $5\mathrm{ps}$    | $1.60\mathrm{ps}\mathrm{-rms}$ | $1.35\mathrm{ps}\mathrm{-rms}$ | $2.09\mathrm{ps}	ext{-rms}$    | $1.44\mathrm{ps}	ext{-rms}$ |
| 4       | $5\mathrm{ps}$    | $1.55\mathrm{ps}\mathrm{-rms}$ | $1.45\mathrm{ps}\mathrm{-rms}$ | $2.12\mathrm{ps}	ext{-rms}$    | $1.44\mathrm{ps}	ext{-rms}$ |
| 5       | $5\mathrm{ps}$    | $1.60\mathrm{ps}\text{-rms}$   | $1.30\mathrm{ps}\mathrm{-rms}$ | $2.06  \mathrm{ps}\text{-rms}$ | $1.44\mathrm{ps}	ext{-rms}$ |
| 6       | $5\mathrm{ps}$    | $1.60\mathrm{ps}\text{-rms}$   | $1.40\mathrm{ps}\text{-rms}$   | $2.13\mathrm{ps}	ext{-rms}$    | $1.44\mathrm{ps}	ext{-rms}$ |
| 7       | $10\mathrm{ps}$   | $3.2\mathrm{ps}\mathrm{-rms}$  | $2.3\mathrm{ps}\mathrm{-rms}$  | $3.94\mathrm{ps}	ext{-rms}$    | $2.89\mathrm{ps}$ -rms      |
| 8       | $10\mathrm{ps}$   | $3.2\mathrm{ps}\mathrm{-rms}$  | $2.6\mathrm{ps}\text{-rms}$    | $4.12\mathrm{ps}\mathrm{-rms}$ | 2.89 ps-rms                 |
| $5^a$   | $5\mathrm{ps}$    | $1.45\mathrm{ps}\mathrm{-rms}$ | $0.85\mathrm{ps}\mathrm{-rms}$ | $1.68\mathrm{ps}\text{-rms}$   | $1.44\mathrm{ps}	ext{-rms}$ |
| $5^a$   | $6.1\mathrm{ps}$  | $1.77\mathrm{ps}\mathrm{-rms}$ | $1.16\mathrm{ps}\mathrm{-rms}$ | $2.11\mathrm{ps}	ext{-rms}$    | $1.76\mathrm{ps}	ext{-rms}$ |
| $5^a$   | $10\mathrm{ps}$   | $2.90\mathrm{ps}\mathrm{-rms}$ | $2.30\mathrm{ps}\mathrm{-rms}$ | $3.70\mathrm{ps}\mathrm{-rms}$ | $2.89\mathrm{ps}	ext{-rms}$ |
| $5^a$   | $12.2\mathrm{ps}$ | $3.54\mathrm{ps}\text{-rms}$   | $1.56\mathrm{ps}\mathrm{-rms}$ | <b>3.89</b> ps-rms             | $3.52\mathrm{ps}	ext{-rms}$ |
| $5^a$   | $20\mathrm{ps}$   | $5.80\mathrm{ps}\mathrm{-rms}$ | $5.80\mathrm{ps}\mathrm{-rms}$ | 8.20 ps-rms                    | $5.77\mathrm{ps}	ext{-rms}$ |
| $5^a$   | $24.4\mathrm{ps}$ | $7.08\mathrm{ps}\mathrm{-rms}$ | $6.34\mathrm{ps}\text{-rms}$   | 9.50 ps-rms                    | 7.04 ps-rms                 |

<sup>*a*</sup>Single channel calibration.

for different LSB size settings. The values listed in the table incorporate the quantization noise as well as the measured DNL and INL error. Highest time resolutions of better than 3 ps-rms are expected to be achieved by channels 3 to 6 when the LSB size set to 5 ps. For channel 1 and 2, due to larger INL errors, a slightly weaker timing performance of approximately 3 ps-rms is expected. Channel 7 and 8 are expected to achieve time resolutions as good as 4 ps-rms, when the LSB size is set to 10 ps. In the lowest resolution mode, rms time resolutions of approximately 10 ps-rms are likely to be achieved.

# 6.2.4 RMS-Time Resolution

As a measure of the expected rms-time resolution of the TDC, a flat distribution of events is sent to two distinct channels to record their propagation delay difference. This method, as reported in [78], is illustrated in figure 6.21. A resistive power splitter is employed to generate two signals out of one. The fixed delay between the two channels is generated by a fixed length of wire added to one of the channels. This allows to generate delay differences robust against voltage and temperature variations without additional jitter being introduced. Using such a technique, timing contributions coming from the TDC itself as well as jitter coming from the reference signal are recorded by the measurement. Any contribution resulting from the event signal itself are excluded from the measurement.



Figure 6.21: Functional diagram to measure the rms-time resolution of the TDC. Different wire lengths are employed to apply a fixed delay difference between two channels.

Time delay differences are recorded at different positions distributed across the reference clock period. To extract the rms-time resolution of the TDC, the difference in bin count of both channels is histogramed and the underlying Gaussian distribution is extracted. Exemplary, the histogram of channel pair 5 & 6 for a wire delay of 4 inch is shown in figure 6.22. The TDC is operated with a 1.56 GHz reference signal generating 5 ps LSB sizes. An average delay of 100.9 bins between the two signals has been recorded in this specific example. This equates to a difference in propagation delay between the two signals of 504.5 ps. The propagation delay of a 4 inch wire is quoted to be approximately 490 ps. From this measurement the 1-sigma variation can be extracted. As two edges are involved in the measurement, in a first approximation the obtained result needs to be divided by the  $\sqrt{2}$ . The rms-resolution of the TDC, sometimes also referred as to the single-shot precision, can be derived from following equation:

$$\sigma_{TDC} = \frac{\sigma_{\Delta bin} \cdot LSB}{\sqrt{2}} \tag{6.3}$$



Figure 6.22: Measured time difference of a 4 inch wire delay applied between channel 5 and 6. The underlying Gaussian distribution of the histogram is also shown.

Three different measurement series are been conducted. The delay difference has been adjusted so that the event signal of the  $2^{nd}$  channel arrives ...

- 1. ... within one clock cycle.
- 2. ... one clock cycle later.
- 3. ... multiple clock cycles later.

Wire delays of 0, 1, 2, 4, 8 and 39 inch have been applied. Exemplary, the results of channel pair 5 & 6 are shown in figure 6.23. Across the measurement series rms-time resolutions of better than 2.5 ps-rms have been demonstrated. As listed in table 6.2, the obtained result is very well in line with the expected time resolution calculated previously.



Figure 6.23: Measured wire delay difference of channel pair 5 & 6 for different wire delays.

| Channel | Wire Delay | $\Delta$ Bin | Double-Shot                    | Single-Shot                 | Expected                 |
|---------|------------|--------------|--------------------------------|-----------------------------|--------------------------|
| 5 & 6   | 0 inch     | 3.2 LSB      | 3.38 ps-rms                    | 2.39 ps-rms                 |                          |
| 5 & 6   | 1 inch     | 27.5  LSB    | $3.45\mathrm{ps}\mathrm{-rms}$ | 2.44 ps-rms                 |                          |
| 5 & 6   | 2 inch     | 52.1 LSB     | $3.10\mathrm{ps}\text{-rms}$   | $2.19\mathrm{ps}	ext{-rms}$ | $2.10  \mathrm{ps}$ -rms |
| 5 & 6   | 4 inch     | 100.9  LSB   | $3.07\mathrm{ps}\mathrm{-rms}$ | 2.17 ps-rms                 | 2.10 ps-1111s            |
| 5 & 6   | 8 inch     | 79.1 LSB     | $3.00\mathrm{ps}\text{-rms}$   | 2.12 ps-rms                 |                          |
| 5 & 6   | 39 inch    | 74.1 LSB     | $3.02\mathrm{ps}\text{-rms}$   | 2.13 ps-rms                 |                          |

**Table 6.2:** Measured rms-time resolution of channel pair 5 & 6 for different wire delay settings. The LSB size is adjusted to 5 ps.

 Table 6.3: Measured rms-time resolution of different channels and LSB size settings.

| Channel        | LSB               | $\Delta$ Bin | Double-Shot                    | Single-Shot                    | Expected                       |
|----------------|-------------------|--------------|--------------------------------|--------------------------------|--------------------------------|
| 1 & 2          | $5\mathrm{ps}$    | -20 LSB      | $5.25\mathrm{ps}\mathrm{-rms}$ | 3.71 ps-rms                    | $3.03\mathrm{ps}\text{-rms}$   |
| 3 & 4          | $5\mathrm{ps}$    | 21.7  LSB    | $3.05\mathrm{ps}\mathrm{-rms}$ | 2.16 ps-rms                    | $2.10\mathrm{ps}\mathrm{-rms}$ |
| 5 & 6          | $5\mathrm{ps}$    | 3.2  LSB     | $3.37\mathrm{ps}	ext{-rms}$    | $2.39\mathrm{ps}\mathrm{-rms}$ | $2.10\mathrm{ps}\mathrm{-rms}$ |
| 7 & 8          | $10\mathrm{ps}$   | -3.7 LSB     | $7.02\mathrm{ps}	ext{-rms}$    | 4.96 ps-rms                    | $4.03\mathrm{ps}\mathrm{-rms}$ |
| $5 \ \& \ 6^a$ | $6.1\mathrm{ps}$  | 8.8  LSB     | $4.09\mathrm{ps}	ext{-rms}$    | 2.89 ps-rms                    | $2.11\mathrm{ps}\mathrm{-rms}$ |
| $5 \ \& \ 6^a$ | $10\mathrm{ps}$   | 5.4  LSB     | $5.50\mathrm{ps}\mathrm{-rms}$ | $3.89\mathrm{ps}\mathrm{-rms}$ | $3.70\mathrm{ps}\mathrm{-rms}$ |
| $5 \ \& \ 6^a$ | $12.2\mathrm{ps}$ | 4.5  LSB     | $6.34\mathrm{ps}	ext{-rms}$    | $4.49\mathrm{ps}	ext{-rms}$    | $3.89\mathrm{ps}\mathrm{-rms}$ |
| $5 \ \& \ 6^a$ | $20\mathrm{ps}$   | 2.7  LSB     | $9.79\mathrm{ps}\mathrm{-rms}$ | $6.92\mathrm{ps}	ext{-rms}$    | $8.20\mathrm{ps}\mathrm{-rms}$ |
| $5 \ \& \ 6^a$ | $24.4\mathrm{ps}$ | 2.2  LSB     | 11.26 ps-rms                   | $7.96\mathrm{ps}\mathrm{-rms}$ | $9.50\mathrm{ps}\mathrm{-rms}$ |

 $^{a}$ Erroneously applied single channel calibration for channel 5.

#### 6. EXPERIMENTAL RESULTS

Table 6.3 lists the achieved rms-time resolution for different channel configurations and LSB size settings. The discrepancy between the measurement and the expected results of some measurements can be explained from the fact that the INL between two channels might be correlated. This can lead, depending on their correlation characteristics, to slightly better or worse measured rms-time resolutions.

However, the results are reasonably close to the expected values. The measurement also includes the jitter contribution of the reference signal and jitter introduced by thermal noise. The period jitter of the reference clock jitter has been measured to be approximately 0.9 ps-rms whereas the thermal noise contribution of the TDC has been evaluated by transient noise analysis to be below 1 ps-rms. As their contribution to the final rms-time resolution will be small, the resolution of the demonstrator ASIC is mainly limited by the non-linearities of the TDC. For the conducted measurements the same calibration vector as employed to perform the code density test of the channels has been used.

#### 6.2.5 Power Consumption

The power consumption of the TDC is extracted by means of a current-voltage measurement. In total 3 different domains have been implemented to distinguish the power contributions from the I/O, the Fine-Time Interpolator and the Channel Matrix respectively.<sup>1</sup> The power consumed by the demonstrator for different reference frequency settings is depicted in figure 6.24.

Most of the power is consumed by the distribution buffers and the TCRs. Relatively little consumption is contributed by the fine-time interpolator structure itself. For channel 3 - 8 the first latch of the TCRs need to be transparent to capture the time-of-arrival of a hit. This causes additional power to be consumed during acquisition. In contrast, the power consumption of channel 1 & 2 does not change with the level of the event signal. The estimated power consumed for by each channel has been analyzed in section 5.4.2. From simulation results, the power consumption of the fine-time interpolator and the channel matrix has been estimated to be 71 mW and 163 mW respectively. Scaled to 1.3 V this equates to 191 mW and 83 mW respectively. If the

<sup>&</sup>lt;sup>1</sup>For fail-safe operation, the power domain of the I/O buffer of the reference signal has been physically decoupled from the rest of the I/O cells. As this feature has not been used, the I/O is handled as one common supply.



Figure 6.24: Measured power consumption of the demonstrator chip for different reference clock frequencies when supplied with 1.3 V. A total of 8 channels are integrated on chip.

| Frequency | Global Interpolator | Channe           | Channel Matrix   |                 |  |
|-----------|---------------------|------------------|------------------|-----------------|--|
| [MHz]     | Global Interpolator | acquisition on   | acquisition off  | I/O             |  |
| 1562.5    | $85\mathrm{mW}$     | $185\mathrm{mW}$ | $117\mathrm{mW}$ | $72\mathrm{mW}$ |  |
| 1280      | $70\mathrm{mW}$     | $152\mathrm{mW}$ | $97\mathrm{mW}$  | $72\mathrm{mW}$ |  |
| 781.25    | $43\mathrm{mW}$     | $95\mathrm{mW}$  | $60\mathrm{mW}$  | $72\mathrm{mW}$ |  |
| 640       | $36\mathrm{mW}$     | $80\mathrm{mW}$  | $50\mathrm{mW}$  | $72\mathrm{mW}$ |  |
| 390.625   | $24\mathrm{mW}$     | $50\mathrm{mW}$  | $33\mathrm{mW}$  | $72\mathrm{mW}$ |  |
| 320       | $20\mathrm{mW}$     | $42\mathrm{mW}$  | $28\mathrm{mW}$  | $72\mathrm{mW}$ |  |

**Table 6.4:** Measured power consumption of respective blocks of the demonstrator supplied with 1.3 V.

acquisition of channel 3-8 is off, the scaled power consumption of the channel matrix is reduced to  $127\,\mathrm{mW}.$ 

With 5 ps LSB sizes running on a 1.3 V supply voltage, with all the channels in acquisition mode, the total power consumption has been measured to be 335 mW, equating to 42 mW/channel. This includes a total of 72 mW of power consumed by the I/O cells, 85 mW consumed by the fine-time interpolator as well as 185 mW by the channel matrix. Compared to simulation results the measurements are within a

3% range. If acquisition on channel 3 - 8 is turned off, the power consumption is reduced to 261 mW or equally 33 mW/channel. In acquisition mode 59% of the power is consumed by the channel matrix, 18% by the fine-time interpolator and 23% is consumed by I/O structures. At lower reference frequencies less power is consumed. Table 6.4 summarizes the results for different reference signal frequency settings.

# 6.2.6 Inter-Channel Crosstalk

In a multichannel system, events arriving in short succession in time, can potentially influence each other through power supply coupling or by capacitive coupling of adjacent lines. To investigate the behavior of inter-channel crosstalk, two succeeding events are sent to two adjacent channels. Whereas, one of the event is kept in a constant phase relationship to the reference signal, the second event is swept in small time steps across the first edge. A block diagram of the testing procedure is depicted in figure 6.25.



Figure 6.25: Functional diagram to measure inter-channel crosstalk between two adjacent channels.

Such a procedure requires to generate two signals with fixed phase relationship to the reference signal. A two channel pattern generator, model HP 31130A, synchronized to the reference signal is employed to generate these two signals. The delay of each channel can be adjusted in 2 ps steps. To average the jitter present on the event signals 250 measurements per delay settings are recorded.

Exemplary, the inter-channel crosstalk measurement of channel pair 5 & 6 is shown in figure 6.26. For some delay settings, a time offset of approximately 10 ps is introduced by the measurement equipment. This can be seen as the constant phase jumps in the diagram. However, in the interesting area, the crossing point of the two signals, as shown in zoomed box, the delay of the generator is swept in a rather continuous manner. From the zoomed region the inter-channel crosstalk between two adjacent



Figure 6.26: Measured inter-channel crosstalk of channel pair 5 and 6. The edge of the channel 5 has been swept across the signal of channel 6 with constant phase setting.

channels is evaluated to be less than  $\pm 1$  LSB. This allows to consider inter-channel crosstalk variation effects to be marginal.

# 6.2.7 Voltage-Temperature Variations

Any propagation delay not held stable with temperature and voltage will suffer from propagation delay variation effects. In general the longer the uncontrolled delay path the larger the effects. In the proposed architecture the LSB size is solely adjusted by the reference signal's frequency. However, the event and reference signal receivers as well as the distribution buffers are not adjusted by the loop. Any delay difference introduced in those two paths will show up as timing errors. The sign can be either positive or negative.



**Figure 6.27:** Functional diagram to measure voltage and temperature sensitivity of the TDC.

#### 6. EXPERIMENTAL RESULTS



Figure 6.28: Measured delay variations of channel pair 5 & 6 due to voltage and temperature shifts. The measurements have been performed at nominally 1.3 V,  $31 \degree \text{C}$  operating temperature and with a 1280 MHz reference clock frequency (REFCLK).

To characterize the circuit's sensitivity against voltage and temperature variations, an event with a fixed phase relationship with respect to the reference clock is generated. An illustration of this test procedure is shown in figure 6.28. On a voltage or temperature change, the TDC recorded time will be affected. Two separate measurements have been conducted. In a first step, the temperature has been kept constant and the voltage has been swept from 1.1 V to 1.4 V whereas in a second step the voltage has been kept constant and the temperature has been swept from 10 °C to 60 °C. Exemplary, the timing offset change due to voltage and temperature variations of channel pair 5 & 6 is shown. As depicted in figure 6.28, a voltage sensitivity of -0.2 ps/V and a temperature variation of 0.4 ps/°C have been recorded. These variations have been found to be negligible with respect to variations introduced by other building blocks within the measurement chain as e.g. reported by [79].

# **References Chapter 6**

[76] L. Perktold and J. Christiansen, "A high time-resolution (< 3 ps-rms) time-todigital converter for highly integrated designs," in *Instrumentation and Measurement Technology Conference (I2MTC), 2013 IEEE International,* 2013.

- [77] J.-P. Jansson, A. Mantyniemi, and J. Kostamovaara, "A CMOS time-to-digital converter with better than 10 ps single-shot precision," *Solid-State Circuits, IEEE Journal of*, vol. 41, no. 6, pp. 1286–1296, 2006.
- [78] C. Ugur, E. Bayer, N. Kurz, and M. Traxler, "A 16 channel high resolution (< 11 ps RMS) time-to-digital converter in a field programmable gate array," *Journal of Instrumentation*, vol. 7, no. 02, p. C02004, 2012. [Online]. Available: http://stacks.iop.org/1748-0221/7/i=02/a=C02004
- [79] E. Martin *et al.*, "The 5 ns peaking time transimpedance front end amplifier for the silicon pixel detector in the NA62 Gigatracker," in *Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE*, 2009, pp. 381–388.

# Conclusion

7

In this work the problem of building time-to-digital converter (TDC) featuring multiple channels integrated on a single chip has been investigated. The primary goal of this work has been to demonstrate the feasibility to achieve fine-time resolutions in the sub 10 ps-rms resolution domain implementing a high number of channel on a single application specific integrated circuit (ASIC). Based on extensive literature study, a novel TDC architecture has been developed, implemented and constructed in a 130 nm technology. Difficulties in achieving fine time resolutions in a multichannel environment have been identified and carefully analyzed. The constructed demonstrator has been thoughtfully tested, obtaining good time resolutions and relatively low power consumption well in line with simulation results.

# 7.1 Short Architecture Description

The TDC architecture is based on a delay-locked-loop (DLL), with 32 elements, nominally running with a 1.56 GHz clock frequency generating 20 ps least-significant bit (LSB) sizes. In a successive stage, using a resistive time-interpolation concept, 5 ps LSB sizes are achieved. Only one instance of the DLL and time interpolation circuit is implemented per ASIC and shared across all the channels. Distribution buffers are implemented to distribute the fine-time code of the interpolator to the respective channels. Several channels are grouped into segments where each segment served by its own distribution buffer array. An on-chip adjustment feature is provided to compensate for device mismatches introduced by the fine-time interpolator and the distribution buffers

## 7. CONCLUSION

respectively. This allows each segment to be calibrated separately and to correct for device mismatches on a per segment basis avoiding the need to calibrate each channel separately. To limit the susceptibility to power supply noise, the architecture has been developed to sustain fast signal slopes and short signal propagation paths for all timing critical signals. Process-voltage-temperature (PVT) variations are compensated by the feedback mechanism of the DLL. This auto-adjust feature of the DLL also gives raise to trade-off time-resolution against power consumption by changing its input clock frequency, offering less demanding applications to profit from a lower power consumption. A clock synchronous counter is added to extend the dynamic range of the interpolator.

# 7.2 Demonstrator Performance

=

A prototype, implemented in a commercial 130 nm technology, has been designed, fabricated and successfully tested with promising results. The fine-time interpolator together with 8 channels with different configurations have been implemented. Best timing resolutions are achieved by channels 3 - 6.

| Technology            | $130\mathrm{nm}$                                |
|-----------------------|-------------------------------------------------|
| Supply Voltage        | $1.3\mathrm{V}$                                 |
| Area                  | $1.2\mathrm{mm}^2$                              |
| Power Consumption     | $34\mathrm{mW}$ - $42\mathrm{mW}$ (per channel) |
| # of Channels         | 8                                               |
| LSB size              | $5\mathrm{ps}$                                  |
| DNL                   | $\pm 0.9\mathrm{ps}$                            |
| INL                   | $\pm 1.3\mathrm{ps}$                            |
| Single Shot Precision | $< 3\mathrm{ps}$                                |
| Dynamic Range         | $640\mathrm{ps}$ (on chip)                      |
|                       |                                                 |

 Table 7.1: Performance summary of the demonstrator.

To characterize the linearity of the TDC, a code density test has been performed. After calibration, for the better matching channels, a differential-non-linearity (DNL) and integral-non-linearity (INL) of +/- 0.9 SLB and +/- 1.3 LSB has been achieved.

| Channel | LSB             | DNL                   | INL                   | Single-Shot                       | Power per ch. <sup><math>a</math></sup> |                   | # ch. |
|---------|-----------------|-----------------------|-----------------------|-----------------------------------|-----------------------------------------|-------------------|-------|
|         | LOD DIVL        |                       | Single-Shot           | acqu. on                          | acqu. off                               | # CII.            |       |
| 1 & 2   | $5\mathrm{ps}$  | $\pm 0.9\mathrm{LSB}$ | $\pm 2  \text{LSB}$   | <4 ps-rms                         | 28.9                                    | mW                | 7     |
| 3 - 6   | $5\mathrm{ps}$  | $\pm 0.9\mathrm{LSB}$ | $\pm 1.4\mathrm{LSB}$ | $<\!2.5\mathrm{ps}\text{-rms}$    | $35.5\mathrm{mW}$                       | $22.5\mathrm{mW}$ | 9     |
| $5^{b}$ | $10\mathrm{ps}$ | $> 0.2  \mathrm{LSB}$ | $> 0.8  \mathrm{LSB}$ | $\sim 4  \mathrm{ps}\text{-rms}$  | $18.0\mathrm{mW}$                       | $11.4\mathrm{mW}$ | 9     |
| $5^{b}$ | $20\mathrm{ps}$ | $> 0.4  \mathrm{LSB}$ | $> 0.9  \mathrm{LSB}$ | ${\sim}7\mathrm{ps}\mathrm{-rms}$ | $9.6\mathrm{mW}$                        | $6.3\mathrm{mW}$  | 9     |
| 7 & 8   | $10\mathrm{ps}$ | $\pm 0.9\mathrm{LSB}$ | $\pm 0.7\mathrm{LSB}$ | $\sim 5\mathrm{ps}\mathrm{-rms}$  | $12.0\mathrm{mW}$                       | $9.2\mathrm{mW}$  | 11    |

 Table 7.2:
 Performance comparison of different channel configurations.

 $^a\mathrm{The}$  power is evaluated based on simulation results excluding I/O.

<sup>b</sup>Single channel calibration for channel 5.

The single-shot precision of the TDC has been evaluated by means of a time-difference measurement for different wire length differences. For the better matching channels, across the whole measurement series, a single-shot precision of better than 2.44 ps-rms has been demonstrated. The full prototype consumes between 34 mW/channel to 42 mW/channel. Lowering the input clock frequency to 781 MHz (= 10 ps LSB sizes), the power consumption can be reduced to 21 mW/channel and 26 mW/channel respectively. The architecture exhibits a time shift of -0.19 ps/mV and a temperature dependence of 0.44 ps/deg. With the measurement precision of the test setup, interchannel crosstalk between two neighboring channels has been evaluated to be below +/-1 LSB. Table 7.1 summarizes the obtained performance of the demonstrator ASIC for the better matching channels 3 - 6. With the given measurement setup, the input buffer architecture has been found not to negatively influence the timing performance of the TDC.

A summary of important performance parameters for different channel configurations is given in table 7.2. The power consumption listed in table is calculated based on the assumption that in a final ASIC the channel matrix only implements identical channels. To keep the capacitive load of the distribution buffers equal, the number of channels is scaled to provide similar loading as of the existing demonstrator. E.g. implementing a single segment with a total of 11 channels using the timing capture registers from channel 7 & 8 results in a power consumption, excluding I/O cells, of 9.2 mW/channel, when the TDC is supplied with 1.3 V and the LSB size is set to 10 ps.

#### 7. CONCLUSION

| Ref. | Method                                   | LSB                | Single shot                    | $Power^{a}$        | Channels | ${\rm Robustness}^b$ |
|------|------------------------------------------|--------------------|--------------------------------|--------------------|----------|----------------------|
| [80] | time amp. <sup><math>c</math></sup>      | $8.9\mathrm{ps}$   | -                              | -                  | 128      | ~/+                  |
| [81] | RC-delay                                 | $24.4\mathrm{ps}$  | $15.8\mathrm{ps}\mathrm{-rms}$ | $125\mathrm{mW}$   | 8        | $\sim/\sim$          |
| [82] | WaveUnion                                | $3.7\mathrm{ps}^d$ | $2.5\mathrm{ps}\mathrm{-rms}$  | -                  | 10       | +/~                  |
| [83] | cap. scaling                             | $12.2\mathrm{ps}$  | $13\mathrm{ps}	ext{-rms}$      | $20\mathrm{mW}$    | 2        | +/~                  |
| [84] | pas. interpol. <sup><math>c</math></sup> | $4.7\mathrm{ps}$   | $3.3\mathrm{ps}\mathrm{-rms}$  | $3.6\mathrm{mW}^e$ | 1        | $\sim /+$            |
| [85] | time amp. <sup><math>c</math></sup>      | $1.25\mathrm{ps}$  | $0.6\mathrm{ps}	ext{-rms}$     | $3\mathrm{mW}^f$   | 1        | -/~                  |
| this | pas. interpol.                           | $5\mathrm{ps}$     | $2.5\mathrm{ps}\mathrm{-rms}$  | 43  mW             | 8        | +/+                  |

Table 7.3: Comparison with related TDC designs reported in literature.

<sup>*a*</sup>per channel

 $^b\mathrm{PVT}$  & mismatch / power supply noise

 $^c\mathrm{START}\text{-}\mathrm{STOP}$  measurement

<sup>d</sup>equivalent LSB size

 $^e180\,\rm MHz$  sampling frequency

 $^{f}10\,\mathrm{MHz}$  sampling frequency

# 7.3 Related Work

In table 7.3 a comparison to previously published work is presented. Although a fair comparison between the different designs is difficult to make, most importantly, the single-shot precision, the TDC power consumption per channel as well as the architecture's robustness against PVT variations, device mismatches and against power supply noise is evaluated. The listed single-shot corresponds to a single edge measurement. Compared with other state-of-the-art multichannel TDC designs, the proposed architecture achieves good time resolution at low power consumption, at an increased level of robustness against PVT variations, device mismatches and power supply noise.

Power consumption of the different TDC implementations is difficult to be compared directly. For a flexible TDC implementation the event signal can arrive at any time requiring the TDC to be continuously running. The reader should notice, that in contrast to START-STOP architectures, in a continuously running environment, the interpolator structure cannot be stopped to reduce power consumption. The power reported in the table is derived from the total power consumed divided by the number of channels.

# 7.4 Scientific Contributions

In the coarse of this work following worth mentioning contributions have been made:

- The proposed architecture make use of a novel calibration approach to compensate for local device mismatches introduced on a per channel basis. Good matching timing registers are employed to avoid the need of per channel adjustment features. Calibration is only needed to be applied on a larger group of channels. The feasibility of such a common calibration approach have been demonstrated and its timing performance has been extensively analyzed.
- Two different concepts to perform time-to-digital conversions have been implemented and compared. A novel time capturing scheme reducing static current consumption has been proposed. Design subtleties have been identified and proper operation has been demonstrated.
- A fast voltage controlled delay buffer cell achieving nominally 16 ps propagation delays in a 130 nm technology has been proposed. Based on a Maneatis delay buffer, the proposed cell make use of higher current density loads and implements an additional zero in its signal path to effectively reduce the propagation delay of the cell.
- For the novel time capturing scheme, a novel flip-flop, optimized to reduce the parasitic loading on its clock input, has been developed and characterized. The proposed flip-flop avoids the need of a negated clock edge, reducing the amount of switching elements by a factor of two compared to a tratitional flip-flop implementation.

# 7.5 Future Developments

The prototype circuit has been found to represent a suitable candidate for a prospective full TDC development meeting the special requirements of next generation HEP detector designs. The delicate time interpolation circuit, will only need minor changes to be expanded from 8 channels to a larger set of channels. A block diagram of a possible implementation based on the more than 10 years old HPTDC [81] is depicted in figure 7.1. To allow to run at lower input clock frequencies (e.g. 40 MHz) a PLL as

# 7. CONCLUSION



Figure 7.1: Architectural view of a prospective full TDC ASIC. The diagram has been based on a previously developed TDC named HPTDC, [81].

well as to extend the dynamic range of the demonstrator a counter will be added to the final design. In a final ASIC, a total of 32 up to 128 channels are envisaged to be implemented.

# **References Chapter 7**

- [80] S. Mandai and E. Charbon, "A 128-channel, 8.9-ps LSB, column-parallel two-stage TDC based on time difference amplification for time-resolved imaging," *Nuclear Science, IEEE Transactions on*, vol. 59, no. 5, pp. 2463–2470, 2012.
- [81] J. Christiansen, "Manual: HPTDC high performance time to digital converter," 2004. [Online]. Available: http://tdc.web.cern.ch/tdc/hptdc/docs/ hptdc\_manual\_ver2.2.pdf
- [82] E. Bayer, P. Zipf, and M. Traxler, "a multichannel high-resolution (5 ps RMS between two channels) time-to-digital converter (TDC) implemented in a field programmable gate array (FPGA)"," in *Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2011 IEEE*, 2011, pp. 876–879.
- [83] J.-P. Jansson, A. Mantyniemi, and J. Kostamovaara, "A CMOS time-to-digital converter with better than 10 ps single-shot precision," *Solid-State Circuits, IEEE Journal of*, vol. 41, no. 6, pp. 1286–1296, 2006.
- [84] S. Henzler, S. Koeppe, D. Lorenz, W. Kamp, R. Kuenemund, and D. Schmitt-Landsiedel, "A local passive time interpolation concept for variation-tolerant high-resolution time-to-digital conversion," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 7, pp. 1666–1676, 2008.
- [85] M. Lee and A. Abidi, "A 9 b, 1.25 ps resolution coarse-fine time-to-digital converter in 90 nm CMOS that amplifies a time residue," *Solid-State Circuits, IEEE Journal* of, vol. 43, no. 4, pp. 769–777, 2008.

# 7. CONCLUSION

# Acronyms

cuit CP

#### ADC INL Analog-to-Digital Converter Integral-Non-Linearity ADPLL LDO Low Drop-Out All-Digital Phase-Locked-Loop $\mathbf{LF}$ ASIC Application Specific Integrated Cir-Loop-Filter LSB Least-Significant Bit Charge-Pump OTA DLL Operational Transconductance Am-Delay-Locked-Loop plifier PCB DNL Differential-Non-Linearity Printed Circuit Board FLIM $\mathbf{PD}$ Fluorescence Lifetime Imaging To-Phase-Detector PET mography FPGA Positron Emission Tomography $\mathbf{PLL}$ Field Programmable Gate Array HEP Phase-Locked-Loop $\mathbf{PVT}$ High Energy Physics I/OProcess-Voltage-Temperature Input/Output

# R&D

Research & Development

# $\mathbf{SAR}$

Successive Approximation Register

# TCR

Time Capture Register

# TDC

Time-to-Digital Converter TOT Time-over-Threshold TTC

Timing, Trigger and Clock

# Terminology

# single-ASIC

An implementation approach to implement the complete time measurement chain employing a single ASIC. Most often only the sensor is implemented by a separate module.

# multi-ASIC

An implementation approach to implement the complete time measurement chain employing multiple ASICs.

## in-pixel

An implementation approach to implement the complete time measurement chain in which the TDC is integrated into the pixel area of the readout ASIC.

## end-of-column

An implementation approach in which the TDC is integrated outside the pixel array of the readout ASIC.

#### start-stop

In a *start-stop* measurement principle

the time reference is intrinsically provided by the measurement signal.

#### time-tagging

In a *time-tagging* principle the measured time is referred to a common time reference. The measurement signal itself does not contain any time reference information.

## event

An *event* is referred to as the signal induced by a particle crossing the detector, amplified and discriminated in amplitude.

#### reference

In a *time-tagging* principle, the *reference* represents the signal to which all measurements are referred.

#### local interpolation

In a *local interpolation* approach the fine-time generator, the calibration circuit as well as the TCRs are implemented per channel.

# TERMINOLOGY

# global interpolation

In a *global interpolation* approach the fine-time generator as well as the calibration circuit is implemented centrally and shared across all the channels. Only the TCR are implemented per channel.

# event capture

In an *event capture* TDC architecture

the state of the event signal is stored into the TCRs.

#### clock capture

In an *clock capture* TDC architecture the state of the reference signal is stored into the TCRs.

# List of Symbols

| Symbol                   | Description                                 | Unit                 |
|--------------------------|---------------------------------------------|----------------------|
| $\sigma_{sys}$           | <br>timing precision including of the com-  | second in rms        |
|                          | plete measurement chain                     |                      |
| $\sigma_{sensor}$        | <br>timing precision of the sensor          | second in rms        |
| $\sigma_{pre-amp}$       | <br>timing precision of the pre-amplifier   | second in rms        |
| $\sigma_{TDC}$           | <br>timing precision of the TDC             | second in rms        |
| $\sigma_{discriminator}$ | <br>timing precision of the discriminator   | second in rms        |
| $\sigma_{TTC}$           | <br>timing precision of the detector's TTC  | second in rms        |
|                          | system                                      |                      |
| LSB                      | <br>smallest code step of the TDC           | second/LSB           |
| $DNL_i$                  | <br>DNL of bin i                            | second/LSB           |
| $\sigma_{qDNL}$          | <br>timing uncertainties of the TDC due to  | second or LSB in rms |
|                          | quantization noise and DNL errors           |                      |
| $INL_i$                  | <br>INL of bin i                            | second/LSB           |
| $\sigma_{wINL}$          | <br>timing uncertainties of the TDC due to  | second or LSB in rms |
|                          | INL errors                                  |                      |
| $\sigma_{noise}$         | <br>timing uncertainties of the TDC due to  | second in rms        |
|                          | asynchronous power supply noise and         |                      |
|                          | transient noise                             |                      |
| $\sigma_{ref}$           | <br>timing uncertainties of the TDC due to  | second in rms        |
|                          | jitter of the reference signal              |                      |
| $\Delta t_{TS}$          | <br>timing offset introduced by PVT varia-  | second               |
|                          | tions and wire delays                       |                      |
| $\Delta t_{CT}$          | <br>timing offsets introduced due to inter- | second               |
|                          | channel crosstalk for a given time dif-     |                      |
|                          | ference between two signals                 |                      |

# LIST OF SYMBOLS

| Symbol                      | <br>Description                                    | Unit            |
|-----------------------------|----------------------------------------------------|-----------------|
| τ                           | <br>propagation delay of a circuit                 | second          |
| $V_{osc}$                   | <br>oscillation amplitude of a cell                | volt            |
| $C_{eff}$                   | <br>effective load capacitance of a cell           | farad           |
| $I_D$                       | <br>charging current of a cell                     | ampere          |
| P                           | <br>power consumption of a cell                    | watt            |
| $V_{DD}$                    | <br>power supply voltage of a cell                 | volt            |
| $f_{ref}$                   | <br>reference signal's frequency                   | hertz           |
| $T_{ref}$                   | <br>reference signal's period                      | second          |
| V <sub>ctrl</sub>           | <br>DLL feedback voltage stored on the LF          | volt            |
|                             | capacitor                                          |                 |
| VBN                         | <br>NMOS control voltage of the delay              | volt            |
|                             | buffer cell of the DLL                             |                 |
| $DE_{Gain}$                 | <br>small signal gain $\tau/V_{ctrl}$ of the delay | second/volt     |
|                             | buffer cell of the DLL                             |                 |
| $I_{CP-early}, I_{CP-late}$ | <br>CP current if PD is indicating DLL is          | second/volt     |
| U                           | too fast, slow                                     |                 |
| $C_{LF}$                    | <br>LF capacitor value                             | farad           |
| N                           | <br>number of delay buffer cells in the DLL        | -               |
| $\Delta_{DL}$               | <br>propagation delay change of the delay-         | second / period |
|                             | line after one reference signal's clock            | , _             |
|                             | period                                             |                 |
| $f_{max}$                   | <br>highest operating frequency of the DLL         | hertz           |
| $\sigma_{\Delta bin}$       | <br>propagation delay difference between           | second          |
|                             | two channels                                       |                 |

# **IEEE Copyright Notice**

Copyright 2012/2013 IEEE. Parts of the work presented in this thesis have been or are to be published in the proceedings of the 2012 IEEE Ph.D. Research in Microelectronics & Electronics conference, June 12-15, 2012, Aachen, Germany and in the proceedings of the 2013 IEEE International Instrumentation and Measurement Conference, May 6-9, 2013, Minneapolis, USA respectively. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE.