Performance Tuning Techniques for Multi-core
Architectures
Description:
This two-day course demonstrates several techniques for improving the performance of applications on multi-core systems, such as OSC's Glenn Opteron cluster. These techniques involve taking advantage of features common to most modern microprocessors, including multi-level caches and multiple pipelined functional units, as well as parallelism within and across nodes.
Topics covered in the course will include:
Day One: Single-Processor Performance
- Single-processor performance measurement and analysis tools
- Timing
- Compiler reports
- Profiling
- Hardware performance counters
- Processor and memory architecture
- Processor architecture features
- Hierarchical memory and caching
- Single-processor performance tuning techniques
- Inlining
- Loop Optimization
- Memory Optimization
- Floating point behavior
- Optimized math libraries
Day Two: Multi-core and Parallel Performance
- Parallel performance measurement and analysis tools
- Threaded performance
- Threaded programming interfaces
- Common threaded performance bottlenecks
- Message passing performance
- Message passing programming interfaces
- Interconnect characteristics
- Common message passing performance bottlenecks
Prerequisites:
Familiarity with UNIX and either Fortran 90 or C/C++ is preferred.
Knowledge of a parallel programming method (eg. MPI or OpenMP) is helpful but
not required.
Target Audience:
Those interested in improving the performance of their
applications on multi-core systems, including PCs and workstations,
as well as supercomputers.
Method of Delivery:
Lecture with hands-on exercises and demonstrations
Handouts:
May 2008 (PDF),
by Troy Baer
Example Programs (zip) Example Programs (tar.gz)
|