- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
操纵器速度
展开查看详情
1 .CS 61C: Great Ideas in Computer Architecture Lecture 12: Control & Operating Speed Krste Asanović & Randy Katz http:// inst.eecs.berkeley.edu /~ cs61c/fa17
2 .Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 2
3 .Recap: Adding branches to datapath CS 61c 3 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb
4 .Implementing JALR Instruction (I-Format) JALR rd , rs , immediate Writes PC+4 to Reg [ rd ] (return address) Sets PC = Reg [rs1] + immediate Uses same immediates as arithmetic and loads no multiplication by 2 bytes 4
5 .Adding jalr to datapath CS 61c 5 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb
6 .Adding jalr to datapath CS 61c 6 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =I RegWEn =1 BrUn =* BrEq =* BrLT =* Asel =0 Bsel =1 ALUSel =Add MemRW =Read WBSel =2 PCSel wb
7 .Implementing jal Instruction JAL saves PC+4 in Reg [ rd ] (the return address) Set PC = PC + offset (PC-relative jump) Target somewhere within ±2 19 locations, 2 bytes apart ± 2 18 32-bit instructions Immediate encoding optimized similarly to branch instruction to reduce hardware cost 7
8 .Adding jal to datapath CS 61c 8 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb
9 .Adding jal to datapath CS 61c 9 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =J RegWEn =1 BrUn =* BrEq =* BrLT =* Asel =1 Bsel =1 ALUSel =Add MemRW =Read WBSel =2 PCSel wb
10 .“Upper Immediate” instructions Has 20-bit immediate in upper 20 bits of 32-bit instruction word One destination register, rd Used for two instructions LUI – Load Upper Immediate (add to zero) AUIPC – Add Upper Immediate to PC 10
11 .Implementing lui CS 61c 11 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =U RegWEn =1 BrUn =* BrE =* BrLT =* Asel =* Bsel =1 ALUSel =B MemRW =Read WBSel =1 PCSel =pc+4 wb pc
12 .Implementing auipc CS 61c 12 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel =U RegWEn =1 BrUn =* BrE =* BrLT =* Asel =1 Bsel =1 ALUSel =Add MemRW =0 WBSel =1 PCSel =pc+4 wb pc
13 .Recap: Complete RV32I ISA 13 Not in CS61C RV32I has 47 instructions total 37 instructions covered in CS61C
14 .Single-Cycle RISC-V RV32I Datapath CS 61c 14 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb
15 .Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 15
16 .Processor CS 61c Lecture 12: Control & Performance 16 Processor Control Datapath PC Registers Arithmetic & Logic Unit (ALU) Memory Bytes Enable? Read/Write Address Write Data Read Data Processor-Memory Interface Program Data
17 .Single-Cycle RISC-V RV32I Datapath CS 61c 17 IMEM ALU Imm . Gen +4 D MEM Branch Comp. Reg [] AddrA AddrB DataA AddrD DataB DataD Addr DataW DataR 1 0 0 1 2 1 0 pc 0 1 inst [11:7] inst [19:15] inst [24:20] inst [31:7] pc+4 alu mem wb alu pc+4 Reg [rs1] pc imm [31:0] Reg [rs2] Control Logic inst [31:0] ImmSel RegWEn BrUn BrEq BrLT ASel BSel ALUSel MemRW WBSel PCSel wb
18 .Control Logic Truth Table (incomplete) CS 61c Lecture 12: Control & Performance 18 Inst [31:0] BrEq BrLT PCSel ImmSel BrUn ASel BSel ALUSel MemRW RegWEn WBSel add * * +4 * * Reg Reg Add Read 1 ALU sub * * +4 * * Reg Reg Sub Read 1 ALU (R-R Op) * * +4 * * Reg Reg (Op) Read 1 ALU addi * * +4 I * Reg Imm Add Read 1 ALU lw * * +4 I * Reg Imm Add Read 1 Mem sw * * +4 S * Reg Imm Add Write 0 * beq 0 * +4 B * PC Imm Add Read 0 * beq 1 * ALU B * PC Imm Add Read 0 * bne 0 * ALU B * PC Imm Add Read 0 * bne 1 * +4 B * PC Imm Add Read 0 * blt * 1 ALU B 0 PC Imm Add Read 0 * bltu * 1 ALU B 1 PC Imm Add Read 0 * jalr * * ALU I * Reg Imm Add Read 1 PC+4 jal * * ALU J * PC Imm Add Read 1 PC+4 auipc * * +4 U * PC Imm Add Read 1 ALU
19 .Control Realization Options ROM “Read-Only Memory” Regular structure Can be easily reprogrammed fix errors add instructions Popular when designing control logic manually Combinatorial Logic Today, chip designers use logic synthesis tools to convert truth tables to networks of gates CS 61c Lecture 12: Control & Performance 19
20 .RV32I, a nine-bit ISA! 20 Not in CS61C Instruction type encoded using only 9 bits inst [30], inst [14:12], inst [6:2] inst [30] inst [14:12] inst [6:2]
21 .ROM-based Control CS 61c Lecture 12: Control & Performance 21 ROM Inst [30,14:12,6:2] BrEq 9 PCSel ALUSel [3:0] 4 11-bit address (inputs) 15 data bits (outputs) BrLT ImmSel [2:0] 3 BrUn ASel B Sel MemRW RegWEn WBSel [1:0] 2
22 .ROM Controller Implementation CS 61c Lecture 12: Control & Performance 22 Control Word for add Control Word for sub Control Word for or . . . Address Decoder . . . Inst [] BrEQ BrLT Controller output ( PCSel , ImmSel , …) add sub or jal 11
23 .Administrivia Homework 2 Due tomorrow 11:59 pm Project 1 Part 1 Due Monday Oct. 9 Part 2 due Monday Oct. 16 Midterm 1 Regrades due next Tuesday Talk to a TA if you don’t understand a midterm question or are unsure of a regrade CS 61c Lecture 12: Control & Performance 23
24 .Break! 10/5/17 24
25 .Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 25
26 .Instruction Timing IF ID EX MEM WB Total I-MEM Reg Read ALU D-MEM Reg W 200 ps 100 ps 200 ps 200 ps 100 ps 800 ps CS 61c Lecture 12: Control & Performance 26
27 .Instruction Timing Maximum clock frequency f max = 1/800ps = 1.25 GHz Most blocks idle most of the time E.g. f max,ALU = 1/200ps = 5 GHz! How can we keep ALU busy all the time? 5 billion adds/sec, rather than just 1.25 billion? Idea: Factories use three employee shifts - equipment is always busy! Instr IF = 200ps ID = 100ps ALU = 200ps MEM=200ps WB = 100ps Total add X X X X 600ps beq X X X 500ps jal X X X 500ps lw X X X X X 800ps sw X X X X 700ps
28 .Agenda Finish Single-Cycle RISC-V Datapath Controller Instruction Timing Performance Measures Introduction to Pipelining Pipelined RISC-V Datapath A n d in Conclusion, ... CS 61c Lecture 12: Control & Performance 28
29 .Performance Measures “Our” RISC-V executes instructions at 1.25 GHz 1 instruction every 800 ps Can we improve its performance? What do we mean with this statement? Not so obvious: Quicker response time, so one job finishes faster? More jobs per unit time (e.g. web server returning pages)? Longer battery life? CS 61c Lecture 12: Control & Performance 29