Reorder the following code snippet to minimise execution time for the following configurations:
a . We have software technique , and have 2 delay slots.
b. We use interlocks , and predict not taken.
c. We use forwarding , and predict not taken.
add r1, r2, r3
sub r4, r1, r1
mul r8, r9, r10
cmp r8, r9,
beq .foo
add r1 and all is assembly language
add r1, r2, r3
mul r8, r9, r10
sub r4, r1, r1
cmp r8, r9,
beq .foo
We do such that execution can take Place Parallely in the above snippet given in question sub will have to wait for add since sub uses r1 result from addition.So,we will rearrange it such that it can be executed parallely.
In cmp after mul cmp uses result of r8 which will be the result of multiplication.So,we will rearrange it also, so that it won't depend on the previous result.
Reorder the following code snippet to minimise execution time for the following configurations: a . We...
Consider the following assembly language code:I0: add $R4,$R1,$R0 //ADD R4 = R1 + R0;I1: lw $R1,100($R3) //LDW R1 = MEM[R3 + 100];I2: lw $R9,4,($R1) // LDW R9 = MEM[R1 + 4];I3: add $R3,$R4,$R9 //ADD R3 = R4 + R9;I4: lw $R1,0($R3) //LDW R1 = MEM[R3 + 0];I5: sub $R3,$R1,$R4 //SUB R3 = R1 - R4;I6: and $R9,$R9,$R7 //AND R9 = R9 & R7;I7: sw $R2,100($R4) //STW MEM[R4 + 100] = R2;I8: and $R4,$R2,$R1 //AND R4 = R2 & R1;I9: add...
5.3 Rewrite the following program fragment that is written using the GPR instruction set for execution on a CISC processor that provides the same instruction set as the GPR processor but allows the register addressing mode to be used on the input operands or destination of any instruction. (Yes, the code fragment will execute correctly as written on such a processor. Your goal should be to reduce the number of instructions as much as possible. ) Assume that the program...
Ch04.2. [3 points] Consider the following assembly language code: I0: ADD R4 R1RO I1: SUB R9R3 R4; I2: ADD R4 - R5+R6 I3: LDW R2MEMIR3100]; 14: LDW R2 = MEM [R2 + 0]; 15: STW MEM [R4 + 100] = R3 ; I6: AND R2R2 & R1; 17: BEQ R9R1, Target; I8: AND R9 R9&R1 Consider a pipeline with forwarding, hazard detection, and 1 delay slot for branches. The pipeline is the typical 5-stage IF, ID, EX, MEM, WB MIPS...
Consider a VEX-executing VLIW machine with the following characteristics: The machine supports 4 slots (4-wide machine) with the following resources: 2 memory units each with a load latency of 3 cycles 2 integer-add/sub functional units with a latency of 2 cycle 1 integer-multiply functional unit with a latency of 4 cycles Each functional unit in the machine is pipelined and can be issued a new operation at each cycle. However, the results of an operation are only available after the...
Consider a standard 5-stage MIPS pipeline of the type discussed
during the class sessions: IF-
ID-EX-M-WB.
Assume that forwarding is not implemented and only the hazard
detection and stall logic is
implemented so that all data dependencies are handled by having the
pipeline stall until the
register fetch will result in the correct data being fetched.
Furthermore, assume that the memory is written/updated in the first
half of the clock cycle
(i.e. on the rising edge of the clock) and...