Eduard Ayguade, Mats Brorsson, Sven Karlsson,
Xavier Martorell, Marc Gonzalez
OpenMP has
become an important tool to bring parallel computing to a larger community.
However, for the Supercomputing community, the first attempts to use OpenMP are
sometimes discouraging in terms of performance since the user sometimes believe
that all she/he needs to do is to insert some directives at suitable places,
e.g. at for/do-loops with independent iterations.
To understand the performance of an OpenMP program, it is important
to understand how an OpenMP implementation can be done and how the
synchronization and communication of a shared memory program is actually done
in the real hardware.
In this tutorial, we will cover the design and implementation of an
OpenMP compilation system consisting of a source-to-source OpenMP translator
and a run-time library. The source code for this system is freely available and
will be distributed to tutorial participants. We will also discuss some shared
memory architecture details that affect performance and discuss some issues
regarding performance analysis of OpenMP applications.
The software described in this tutorial has been developed partly
within the EU project Intone under contract number IST-1999-20252.
Part I: OpenMP implementation: a case study
·
Run-time library implementation
In this section we will describe a run-time library specifically targeted for
OpenMP program execution. This part of the tutorial will go through some of the
implementation details of this library.
·
OpenMP compiler implementation
In this section we will go through OpenMP
constructs, one by one and show how the OpenMP translator works and maps them
to calls to the run-time library
·
OdinMP: a hacker's guide
The OpenMP compilation system has been developed partly to provide a free
OpenMP implementation to be used freely by application developers at
universities and elsewhere. It has, however, been developed also to be able to
make experiments with OpenMP extensions. This part of the tutorial will briefly
introduce OdinMP, which is an OpenMP translator for C (with some C++
extensions) and show how it has been designed so that you can start modifying
it yourself. The Fortran NanosCompiler,
also available in the distribution, will be briefly commented.
Part II: OpenMP performance issues
·
Performance
analysis tools and techniques
In this section we will comment the mechanisms
available in the distribution to carry out performance analysis of OpenMP
applications.
·
Shared
memory architectures
In this section we will go through some of the
issues that influence performance on shared memory architectures, both hardware
(SMP and ccNUMA) and software coherent (SDSM).