本書(shū)由2017年圖靈獎(jiǎng)得主Patterson和Hennessy共同撰寫(xiě),是計(jì)算機(jī)體系結(jié)構(gòu)領(lǐng)域的經(jīng)典書(shū)籍,強(qiáng)調(diào)軟硬件協(xié)同設(shè)計(jì)及其對(duì)性能的影響。本書(shū)采用開(kāi)源的RISC-V指令系統(tǒng)體系結(jié)構(gòu),講解硬件技術(shù)、指令、算術(shù)運(yùn)算、流水線(xiàn)、存儲(chǔ)層次、I/O以及并行處理器等。第2版將RV64切換為RV32以降低學(xué)習(xí)難度,新增關(guān)于領(lǐng)域定制體系結(jié)構(gòu)(DSA)的討論以反映新的技術(shù)趨勢(shì)。此外,每一章都增加了性能提升和自學(xué)章節(jié),并更新了大量練習(xí)題。本書(shū)適合計(jì)算機(jī)體系結(jié)構(gòu)領(lǐng)域的專(zhuān)業(yè)技術(shù)人員參考,也適合高等院校計(jì)算機(jī)相關(guān)專(zhuān)業(yè)的學(xué)生閱讀。
本書(shū)由圖靈獎(jiǎng)得主Patterson和Hennessy聯(lián)袂撰寫(xiě),是計(jì)算機(jī)體系結(jié)構(gòu)新黃金時(shí)代之作。根據(jù)讀者的需求,這一版將RV64切換為RV32,減少10條指令,降低學(xué)習(xí)難度;新增關(guān)于領(lǐng)域定制體系結(jié)構(gòu)(DSA)的討論,使用Google的TPUv1作為示例,還新增了TPUv3 DSA超級(jí)計(jì)算機(jī)與NVIDIA Volta GPU集群的比較;每一章都增加了性能提升一節(jié),分別采用數(shù)據(jù)級(jí)并行、指令級(jí)并行、線(xiàn)程級(jí)并行等方法,僅增加21行代碼便使矩陣乘法程序加速近50 000倍,直觀呈現(xiàn)出硬件對(duì)提高能效的重要性。
Preface
The most beautiful thing we can experience is the mysterious. It is the source of all true art and science.
Albert Einstein, What I Believe, 1930
About This Book
We believe that learning in computer science and engineering should reflect the current state of the field, as well as introduce the principles that are shaping computing. We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities, performance, energy, and, ultimately, the success of computer systems.
Modern computer technology requires professionals of every computing specialty to understand both hardware and software. The interaction between hardware and software at a variety of levels also offers a framework for understanding the fundamentals of computing. Whether your primary interest is hardware or software, computer science or electrical engineering, the central ideas in computer organization and design are the same. Thus, our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers.
The recent switch from uniprocessor to multicore microprocessors confirmed the soundness of this perspective, given since the first edition. While programmers could ignore the advice and rely on computer architects, compiler writers, and silicon engineers to make their programs run faster or be more energy-efficient without change, that era is over. For programs to run faster, they must become parallel. While the goal of many researchers is to make it possible for programmers to be unaware of the underlying parallel nature of the hardware they are programming, it will take many years to realize this vision. Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers.
The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and/or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does.
About the Other Book
Some readers may be familiar with Computer Architecture: A Quantitative Approach, popularly known as Hennessy and Patterson. (This book in turn is often called Patterson and Hennessy.) Our motivation in writing the earlier book was to describe the principles of computer architecture using solid engineering fundamentals and quantitative cost/performance tradeoffs. We used an approach that combined examples and measurements, based on commercial systems, to create realistic design experiences. Our goal was to demonstrate that computer architecture could be learned using quantitative methodologies instead of a descriptive approach. It was intended for the serious computing professional who wanted a detailed understanding of computers.
A majority of the readers for this book do not plan to become computer architects. The performance and energy efficiency of future software systems will be dramatically affected, however, by how well software designers understand the basic hardware techniques at work in a system. Thus, compiler writers, operating system designers, database programmers, and most other software engineers need a firm grounding in the principles presented in this book. Similarly, hardware designers must understand clearly the effects of their work on software applications.
Thus, we knew that this book had to be much more than a subset of the material in Computer Architecture, and the material was extensively revised to match the different audience. We were so happy with the result that the subsequent editions of Computer Architecture were revised to remove most of the introductory material; hence, there is much less overlap today than with the first editions of both books.
戴維·A. 帕特森(David A. Patterson)
自1977年加入加州大學(xué)伯克利分校以來(lái),他一直在該校教授計(jì)算機(jī)體系結(jié)構(gòu)課程,并在那里擔(dān)任計(jì)算機(jī)科學(xué)Pardee教席。他曾因教學(xué)工作獲得加州大學(xué)杰出教學(xué)獎(jiǎng)、ACM Karlstrom獎(jiǎng)、IEEE Mulligan教育獎(jiǎng)?wù)乱约癐EEE本科教學(xué)獎(jiǎng)。因?yàn)閷?duì)RISC的貢獻(xiàn),Patterson獲得了IEEE技術(shù)進(jìn)步獎(jiǎng)和ACM Eckert-Mauchly獎(jiǎng),并因?yàn)閷?duì)RAID的貢獻(xiàn)分享了IEEE Johnson信息存儲(chǔ)獎(jiǎng)。他和Hennessy共同獲得了IEEE John von Neumann獎(jiǎng)?wù)乱约癈&C獎(jiǎng)金。與Hennessy一樣,Patterson是美國(guó)國(guó)家工程院、美國(guó)國(guó)家科學(xué)院、美國(guó)藝術(shù)與科學(xué)院和計(jì)算機(jī)歷史博物館院士,ACM和IEEE會(huì)士,并入選了硅谷工程名人堂。他曾擔(dān)任加州大學(xué)伯克利分校電氣工程與計(jì)算機(jī)科學(xué)(EECS)系計(jì)算機(jī)科學(xué)分部主任、計(jì)算研究學(xué)會(huì)主席和ACM主席。這些工作使他獲得了ACM、CRA以及SIGARCH的杰出服務(wù)獎(jiǎng)。他因在科學(xué)普及和計(jì)算多樣化方面的貢獻(xiàn)而獲得了Tapia成就獎(jiǎng),并與Hennessy共同獲得了2017年ACM圖靈獎(jiǎng)。
在伯克利,Patterson領(lǐng)導(dǎo)了RISC I的設(shè)計(jì)與實(shí)現(xiàn)工作,這可能是第一臺(tái)VLSI精簡(jiǎn)指令系統(tǒng)計(jì)算機(jī),為商用SPARC體系結(jié)構(gòu)奠定了基礎(chǔ)。他也是廉價(jià)磁盤(pán)冗余陣列(RAID)項(xiàng)目的領(lǐng)導(dǎo)者,RAID技術(shù)引導(dǎo)許多公司開(kāi)發(fā)出了高可靠的存儲(chǔ)系統(tǒng)。他還參加了工作站網(wǎng)絡(luò)(NOW)項(xiàng)目,正是因?yàn)樵擁?xiàng)目,才有了被互聯(lián)網(wǎng)公司廣泛使用的集群技術(shù)以及后來(lái)的云計(jì)算。這些項(xiàng)目獲得了四個(gè)ACM最佳論文獎(jiǎng)。2016年,他成為伯克利的榮休教授和谷歌杰出工程師,在谷歌,他致力于面向機(jī)器學(xué)習(xí)的領(lǐng)域定制體系結(jié)構(gòu)的研究工作。他還是RISC-V國(guó)際協(xié)會(huì)副主席和RISC-V國(guó)際開(kāi)源實(shí)驗(yàn)室主任。
約翰·L.亨尼斯(John L. Hennessy)
斯坦福大學(xué)第十任校長(zhǎng),從1977年開(kāi)始任教于該校電氣工程與計(jì)算機(jī)科學(xué)系。Hennessy是IEEE和ACM會(huì)士,美國(guó)國(guó)家工程院、美國(guó)國(guó)家科學(xué)院、美國(guó)哲學(xué)院以及美國(guó)藝術(shù)與科學(xué)院院士。Hennessy獲得的眾多獎(jiǎng)項(xiàng)包括:2001年ACM Eckert-Mauchly獎(jiǎng)(因?qū)ISC的貢獻(xiàn)),2001年Seymour Cray計(jì)算機(jī)工程獎(jiǎng),2000年與Patterson共同獲得IEEE John von Neumann獎(jiǎng)?wù)拢?017年又與Patterson共同獲得ACM圖靈獎(jiǎng)。他還獲得了七個(gè)榮譽(yù)博士學(xué)位。
1981年,Hennessy帶領(lǐng)幾位研究生在斯坦福大學(xué)開(kāi)始研究MIPS項(xiàng)目。1984年完成該項(xiàng)目后,他暫時(shí)離開(kāi)大學(xué),與他人共同創(chuàng)建了MIPS Computer Systems公司(現(xiàn)在的MIPS Technologies公司),該公司開(kāi)發(fā)了早期的商用 RISC 微處理器之一。2006年,已有超過(guò)20億個(gè)MIPS微處理器應(yīng)用在從視頻游戲和掌上計(jì)算機(jī)到激光打印機(jī)和網(wǎng)絡(luò)交換機(jī)的各類(lèi)設(shè)備中。Hennessy后來(lái)領(lǐng)導(dǎo)了共享存儲(chǔ)器體系結(jié)構(gòu)(DASH)項(xiàng)目,該項(xiàng)目設(shè)計(jì)了第一個(gè)可擴(kuò)展cache一致性多處理器原型,其中的很多關(guān)鍵思想都在現(xiàn)代多處理器中得到了應(yīng)用。除了參與科研活動(dòng)和履行學(xué)校職責(zé)之外,Hennessy還作為前期顧問(wèn)和投資者參與了很多初創(chuàng)項(xiàng)目,為相關(guān)領(lǐng)域?qū)W術(shù)成果的商業(yè)化做出了杰出貢獻(xiàn)。
他目前是Knight-Hennessy學(xué)者獎(jiǎng)學(xué)金項(xiàng)目的主管,并擔(dān)任Alphabet的非執(zhí)行董事長(zhǎng)。
Contents
CHAPTERS
Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Seven Great Ideas in Computer Architecture 10
1.3 Below Your Program 13
1.4 Under the Covers 16
1.5 Technologies for Building Processors and Memory 25
1.6 Performance 29
1.7 The Power Wall 40
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43
1.9 Real Stuff: Benchmarking the Intel Core i7 46
1.10 Going Faster: Matrix Multiply in Python 49
1.11 Fallacies and Pitfalls 50
1.12 Concluding Remarks 53
1.13 Historical Perspective and Further Reading 55
1.14 Self-Study 55
1.15 Exercises 59
Instructions: Language of the Computer 66
2.1 Introduction 68
2.2 Operations of the Computer Hardware 69
2.3 Operands of the Computer Hardware 73
2.4 Signed and Unsigned Numbers 80
2.5 Representing Instructions in the Computer 87
2.6 Logical Operations 95
2.7 Instructions for Making Decisions 98
2.8 Supporting Procedures in Computer Hardware 104
2.9 Communicating with People 114
2.10 RISC-V Addressing for Wide Immediates and Addresses 120
2.11 Parallelism and Instructions: Synchronization 128
2.12 Translating and Starting a Program 131
2.13 A C Sort Example to Put it All Together 140
2.14 Arrays versus Pointers 148
2.15 Advanced Material: Compiling C and Interpreting Java 151
2.16 Real Stuff: MIPS Instructions 152
2.17 Real Stuff: ARMv7 (32-bit) Instructions 153
2.18 Real Stuff: ARMv8 (64-bit) Instructions 157
2.19 Real Stuff: x86 Instructions 158
2.20 Real Stuff: The Rest of the RISC-V Instruction Set 167
2.21 Going Faster: Matrix Multiply in C 168
2.22 Fallacies and Pitfalls 170
2.23 Concluding Remarks 172
2.24 Historical Perspective and Further Reading 174
2.25 Self-Study 175
2.26 Exercises 178
Arithmetic for Computers 188
3.1 Introduction 190
3.2 Addition and Subtraction 190
3.3 Multiplication 193
3.4 Division 199
3.5 Floating Point 208
3.6 Parallelism and Computer Arithmetic: Subword Parallelism 233
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions
in x86 234
3.8 Going Faster: Subword Parallelism and Matrix Multiply 236
3.9 Fallacies and Pitfalls 238
3.10 Concluding Remarks 241
3.11 Historical Perspective and Further Reading 242
3.12 Self-Study 242
3.13 Exercises 246
The Processor 252
4.1 Introduction 254
4.2 Logic Design Conventions 258
4.3 Building a Datapath 261
4.4 A Simple Implementation Scheme 269
4.5 Multicycle Implementation 282
4.6 An Overview of Pipelining 283
4.7 Pipelined Datapath and Control 296
4.8 Data Hazards: Forwarding versus Stalling 313
4.9 Control Hazards 325
4.10 Exceptions 333
4.11 Parallelism via Instructions 340
4.12 Putting It All Together: The Intel Core i7 6700 and ARM
Cortex-A53 354
4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply 363
4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 365
4.15 Fallacies and Pitfalls 365
4.16 Concluding Remarks 367
4.17 Historical Perspective and Further Reading 368
4.18 Self-Study 368
4.19 Exercises 369
Large and Fast: Exploiting Memory Hierarchy 386
5.1 Introduction 388
5.2 Memory Technologies 392
5.3 The Basics of Caches 398
5.4 Measuring and Improving Cache Performance 412
5.5 Dependable Memory Hierarchy 431
5.6 Virtual Machines 436
5.7 Virtual Memory 440
5.8 A Common Framework for Memory Hierarchy 464
5.9 Using a Finite-State Machine to Control a Simple Cache 470
5.