Question

A processor pipeline with 1 ALU (4-way SIMD enabled),1 load, 1 store execution units and an L1 cache to CPU bandwidth of 48 Bytes per cycle, will likely run the optimized double precision matrix multiplication benchmark code faster than one configured with 1 ALU (4-way SIMD enabled), 1 load and 1 store execution units and 96 Bytes of L1 to CPU cache bandwidth. True False

0 0
Add a comment Improve this question Transcribed image text
Answer #1

False....it tends to gives the floating point error while optimized the double precision matrix multiplication

Add a comment
Know the answer?
Add Answer to:
A processor pipeline with 1 ALU (4-way SIMD enabled),1 load, 1 store execution units and an...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • A particular (fictional) CPU has the following internal units and timings: 1. IFD: Instruction fetch + decode : 160 ps...

    A particular (fictional) CPU has the following internal units and timings: 1. IFD: Instruction fetch + decode : 160 ps 2. RR: Register read 80 ps 3. ALU: 240 ps 4. MA : memory access: 160 ps (assuming cache) 5. RW : register write : 80 ps There are 5 basic instruction types: 1. LOAD : IFD+RR+ALU+MA+RW 720 ps 2. STORE: IFD+RR+ALU+MA : 640 ps 3. ARITHMETIC: IFD+RR+ALU+RW : 560 4. BRANCH: IFD+RR+ALU : 480 ps 5. MEMOP: IFD+RR+MA+ALU+MA :...

  • 4. The following diagram show the Xeon processor pipeline. What do you think is the function...

    4. The following diagram show the Xeon processor pipeline. What do you think is the function of each of the following blocks a. 2x ALU Simple Instr b. Slow ALU Complex Instr C. FP MMX SSE d. FP Move 90nm Technology, Intel Technology Journal, Vol 8, Issue 1, Feb. 2004.)" Front-End BTEB 4K Entries Instruction TLB/ Prefetcher Instruction Decoder Execution Trace Cache (12K μ0ps) Systen Bus Microcode ROM Trace Cache BTB 2K Entries Bus Interface Unit μ0p Queue Allocator /...

  • Table 1: Load 26% Compare 14% Shift left and shift right 4% Store 9% Load i...

    Table 1: Load 26% Compare 14% Shift left and shift right 4% Store 9% Load immediate 4% AND 3% Add 14% Conditional branch 17% OR 5% Sub 0% Jump 1% Other register-register instructions (XOR, NOT, etc.) 1% Multiply 0% Call 1% Divide 0% Return 1% Using the data in Table 1, which of the following two enhancements will result in faster execution of the five benchmark programs that are described by the instruction frequency data? Assume that the computer used...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT