



# The Role of Accelerated Computing in the Multi-Core Era

Chuck Moore Senior Fellow Advanced Micro Devices

# **Key Points in this Talk**



1. The semiconductor industry is dependent upon ongoing customer value:

A virtuous cycle:

customer value invest

2. Programming for Multi-Core is a difficult challenge, but it is really just the leading edge of the bigger challenges yet to come

build

## **Our industry is obsessed with Performance**





It's Time to Reorient Around Customer Value

## **Outline**



- Important Background
  - A Few High-level Trends
  - Some Thoughts on SMP and Multi-core Computing
- The Accelerated Computing Imperative
  - Dense Computing: GPUs and GP-GPUs
  - The broader potential
- A Framework for Accelerated Computing enablement
  - The Role of Architecture
  - The Emerging Layers of Computation
- Summary

# **A Few High-level Trends**









The Complexity Wall 🕾







Single thread Perf (!)



So, how can we add customer value?

# **Customer Value beyond just Performance**





SMP and Multi-Core to the long term rescue?

**Core Opteron** 

**Core Core Opteron** 





















# **Optimized SMP and Multi-core Platforms**



- In the near-term, there is definitely potential here
  - Commodity multi-core processors break the "chicken & egg" barrier
  - Impressive amount of interesting research firing up:
    - TM, coherency filters, hierarchical scheduling, MREs, VMs, etc.
  - Lots of good activity on the Tools front → More to come
- Some workloads will do well with this, but many will not:
  - As it turns out, software isn't really that soft
    - The underlying structural assumption is often serial processing
    - Transitioning the concurrency model is a very big deal
  - Amdahl's Law seriously inhibits unstructured parallelism
- In reality, SMP/Multi-core challenges are just an early indicator of the shifts yet to come
  - Power constraints will force these to be "performance heterogeneous"
  - Advances in synchronization and NUMA will give rise to new options...

## **Outline**



- Important Background
  - A Few High-level Trends
  - Some Thoughts on SMP and Multi-core Computing
- The Accelerated Computing Imperative
  - Dense Computing: GPUs and GP-GPUs
  - The broader potential
- A Framework for Accelerated Computing enablement
  - The Role of Architecture
  - The Emerging Layers of Computation
- Summary

# **The Accelerated Processing Imperative**







# **Compute Density:**







# **Ruby Statistics**



|                             | DoubleCross | The Assassin | Whiteout  |
|-----------------------------|-------------|--------------|-----------|
| Ruby Polygons               | 80,000      | 80,000       | 200,000   |
| Avg.<br>Triangles/Frame     | 227,212     | 546,087      | 1,069,503 |
| Max<br>Triangles/Frame      | 556,305     | 1,018,312    | 2,150,521 |
| No. of Pixel<br>Shaders     | 100         | 316          | 210       |
| Avg. Pixel Shader<br>Length | 20          | 74           | 142       |
| Facial Animation<br>Targets | 4           | 4            | > 128     |
| ALU: Tex Ratio              | 4:1         | 7:1          | 13:1      |
|                             | 2004        | 2005         | 2006      |

# **Ruby Statistics**



|                             | DoubleCross | The Assassin | Whiteout  |
|-----------------------------|-------------|--------------|-----------|
| Ruby Polygons               | 80,000      | 80,000       | 200,000   |
| Avg.<br>Triangles/Frame     | 227,212     | 546,087      | 1,069,503 |
| Max<br>Triangles/Frame      | 556,305     | 1,018,312    | 2,150,521 |
| No. of Pixel<br>Shaders     | 100         | 316          | 210       |
| Avg. Pixel Shader<br>Length | 20          | 74           | 142       |
| Facial Animation<br>Targets | 4           | 4            | > 128     |
| ALU: Tex Ratio              | 4:1         | 7:1          | 13:1      |
|                             | 2004        | 2005         | 2006      |

# **Realities of GP-GPU Power Efficiency**





1 TeraFLOPS in a CrossFire configuration

500 GigaFLOPS per GPU

Available <u>today</u> - not just theoretical

More than 2 GigaFLOPS-per-watt

\*Source: AMD

Generalized GPU provides unprecedented opportunity for performance-per-watt

### **HPC: Remember Attack of the Killer Micros?**





1/10<sup>th</sup> the performance, but at 1/100<sup>th</sup> the cost
Absolute performance "good enough"
Productivity greater on a workstation than on a super

Chart Source: Gordon Bell and Jim Gray, ISCA 2000

## **History Repeating Itself?**





Traditional "computing" is an order of magnitude behind Familiar vector-style programming model \$1K - \$5K PCs get amazing computational power via GPU

# You just can't ignore this ...





# **GPU Performance = End of the CPU? NO!**



Amdahl's Law is Alive and Well..



# AMD Smarter Choice

# **Accelerated Computing has very broad potential --** *A Continuum of Solutions*



# **Torrenza:** Enabling Partners to Build on the Concept of Accelerated Computing





## **Outline**



- Important Background
  - A Few High-level Trends
  - Some Thoughts on SMP and Multi-core Computing
- The Accelerated Computing Imperative
  - Dense Computing: GPUs and GP-GPUs
  - The broader potential
- A Framework for Accelerated Computing enablement
  - The Role of Architecture
  - The Emerging Layers of Computation
- Summary

### The Role of Architecture



- Architecture:
  - The contract between layers of Hardware and Software
- Provides formalism and standardization → Defines Compatibility
  - Compatibility has been a key enabler in our industry this will continue
  - History shows that viable products don't bet on wildly incompatible solutions
- Symbiotic Relationship between Hardware and Software
  - SW is typically the enabler for new HW features or new types of HW
    - Actual results dominated by the weakest link in this relationship
    - SW value chain often values features more than HW optimization
  - Software complexity driven to extreme levels this can't continue
- Architecture gives rise to The Emerging Layers of Computation
  - Can we use this to simplify the programming models?

## **The Emerging Layers of Computation**



Start with an Analogy to the Communications Industry





# **The Emerging Layers of Computation**



CONSTRUCTION OF THE PARTY OF TH **Network Runtime Layer Data Center Applications** Data Center Runtime Environment **Network Layer** Networked **Platform Applications** Network-aware Applications (web services) **Native Runtime Network Services** API's, Libs MRE's Chiest biotions but Layer Traditional OS Hypervisor (virtual platform) Platform Layer Compatible Hardware *Platform* Contage messes the second dissolution x86 Compatible Hardware Devices **Physical Layer RAW Hardware** 

# **Lots of Interesting Implications**





## **Summary:**



# The Case for Accelerated Computing

Traditional "host" → offload to dense compute accelerator

- Use APIs to enable this without heroic programming efforts
- Proven techniques already in use with DirectX & GPUs today
- ISA compatibility yields to API and Platform Compatibility

Many application classes have reasonably common "kernels"

• Video encoding; Encryption; Data Movement; Java/CLR ...

Broad range of possible accelerator designs & attach points

- Coherent domain or non-coherent domain
- Dedicated special-purpose HW or programmable processor

### Lots of Challenges

- Managing context state → Virtualizing the context state
- Communications/Messaging: "It's the synchronization, stupid"
- Memory BW and Data Movement (keep up with computation)
- New and appropriate APIs



# **Thank You!**

## Questions?

© 2007. Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Opteron, and combinations thereof, are trademarks of Advanced Micro Devices, Inc.

Other names are for informational purposes only and may be trademarks of their respective owners.