Compiling for Intel with Intel Composer XE, MKL, and Intel MPI

From URCFwiki
Jump to: navigation, search

General Notes

  • Static linking is not possible because Red Hat does not distribute a static libm (standard math library)

Motivation

Intel Compilers + MKL can produce executables which run significantly faster on Intel CPUs, when compared with that produced by GCC. For example, see the metrics reported here which compare linear algebra performance of MKL vs ATLAS (Automatically Tuned Linear Algebra Software).

Versions Available

The cluster management vendor Bright Computing provides the Intel Composer suite as multiple modules:

    [juser@proteusi01 ~]$ module avail intel
    
    ----------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------
    intel/compiler/64/14.0/2013_sp1.3.174      intel-cluster-checker/2.1.2                intel-mpi/64/4.1.1/036
    intel/ipp/64/8.1/2013_sp1.3.174            intel-cluster-runtime/ia32/3.6             intel-mpi/mic/4.1.1/036
    intel/mkl/64/11.1/2013_sp1.3.174           intel-cluster-runtime/intel64/3.6          intel-tbb-oss/ia32/42_20140601oss
    intel/sourcechecker/64/14.0/2013_sp1.3.174 intel-cluster-runtime/mic/3.6              intel-tbb-oss/intel64/42_20140601oss
    intel/tbb/32/4.2/2013_sp1.3.174            intel-itac/8.1.3/037
    intel/tbb/64/4.2/2013_sp1.3.174            intel-mpi/32/4.1.1/036
    
    ---------------------------------------------------- /mnt/HA/opt/modulefiles -----------------------------------------------------
    intel/composerxe/2013.3.174 intel/composerxe/2015.1.133 intel/composerxe/2016.0.109 intel/composerxe/current

The modules under /cm/shared/modulefiles are provided by Bright. The modules under /mnt/HA/opt/modulefiles are locally-installed.

For convenience, use the locally-installed modules.

Intel Composer XE

In both of the versions described below, all associated packages (MKL, TBB, IPP) are loaded with a single module.

Version 2013

Intel Composer XE is a suite of tools including compilers, parallel debugger, optimized libraries, the Math Kernel Library, and tools for profiling and tuning applications.[1]

    [juser@proteusi01 ~]$ module load intel/composerxe/2013.3.174

With Composer XE 2013.3.174, MKL 11.1 is installed.

Version 2015

Version 2015 is also installed, with all components loaded by a single module:

    [juser@proteusi01 ~]$ module load intel/composerxe/2015.1.133

With Composer XE 2015.1.133, MKL 11.2 is installed.

Version 2016

Version 2016 is installed, with all components loaded by a single module:

    [juser@proteusi01 ~]$ module load intel/composerxe/2016.0.109

With Composer XE 2016.0.109, MKL 11.3 is installed.

Optimization Flags

Please see Hardware for details on what hardware-specific optimizations may be used.

  • 2015-04-15: -xHost -- CPU architecture of proteusi01 is identical to all Intel compute nodes

Intel Math Kernel Library (MKL)

For best performance on Intel CPUs, do not use generic linear algebra libraries (BLAS, LAPACK). Instead, use the MKL.[2][3]

  • MKL 11.1 is installed with Composer XE 2013
  • MKL 11.2 is installed with Composer XE 2015
  • MKL 11.3 is installed with Composer XE 2016

The installations on Proteus also includes interfaces for BLAS95, LAPACK95, FFTW2 (double), and FFTW3 (double).

Choice of Integer Size

The MKL offers the choice of standard 32-bit integers (denoted LP64, or long 64-bit integers (denoted ILP64).[4][5] The installations on Proteus default to 32-bit integers.

Interfaces for BLAS95, LAPACK95, FFTW2, and FFTW3

The interfaces for BLAS95, LAPACK95, FFTW2, and FFTW3 are available, as well. They are provided as static library files, compiled locally against the MKL. The libraries are in the directory $MKLROOT/lib/intel64

The library files themselves are:

    libmkl_blas95_lp64.a
    libmkl_blas95_ilp64.a
    libmkl_lapack95_lp64.a
    libmkl_lapack95_ilp64.a
    libfftw3xf_intel.a
    libfftw3x_cdft_ilp64.a
    libfftw3x_cdft_lp64.a
    libfftw3xc_intel.a
    libfftw2xf_single_intel.a
    libfftw2xf_double_intel.a
    libfftw2x_cdft_DOUBLE_lp64.a
    libfftw2x_cdft_SINGLE_lp64.a
    libfftw2xc_single_intel.a
    libfftw2xc_double_intel.a

As these are not part of the base MKL libraries, the Link Line Advisor will not generate link flags for these libraries. You should manually include them in your link line, e.g.

    -L$MKLROOT/lib/intel64 -lmkl_blas95_lp64 -lfftw3xc_intel

Compiling Numpy and Scipy with MKL

Intel has instructions on using Intel Compilers + MKL to compile Numpy and Scipy:

   https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl 

They also include comparative performance numbers (against ATLAS).

MKL Link Line Advisor

Linking against the MKL can be complicated: consult the MKL User's Guide for detailed documentation.[6] The MKL Link Line Advisor web-based tool will generate the proper compilation options to compile and link against the MKL:[7]

    http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

MPI implementation

Intel MPI

NOTE As of 2015-01-01 we do not have a license for Intel MPI. Please use MVAPICH2 or OpenMPI (the latter is recommended).

IN PROGRESS

Intel MPI is Intel's implementation of MPI-2.[8][9] It is available via the module:

    [juser@proteusi01 ~]$ module load intel-mpi/64

The compiler commands are:

  • mpiicc (note the two letters "i")
  • mpiifort

See Intel MPI for Linux Getting Started Guide.[10] Also see the article on Message Passing Interface.

Open MPI

For the 2013 version, use:

    proteus-openmpi/intel/64/1.8.1-mlnx-ofed

For the 2015 version, use:

    proteus-openmpi/intel/2015/1.8.1-mlnx-ofed 

Hybrid MPI-OpenMP

Intel MPI supports hybrid MPI-OpenMP code[11]

  1. Use the thread-safe MPI library by passing the compiler option: -mt_mpi
  2. Set the environment variable I_MPI_PIN_DOMAIN to "omp": export I_MPI_PIN_DOMAIN=omp. This sets the pinning domain size to be equal to the value given by the environment variable OMP_NUM_THREADS. If OMP_NUM_THREADS is not set, Intel MPI will assume all cores are to be used.

NOTE: Grid Engine may assign only some of the cores in a node to any domain of MPI.

Recommended Combination for Proteus

This is the combination of Intel compilers/libraries and MPI implementation that we recommend:

    intel/composerxe/2015.1.133
    proteus-openmpi/intel/2015/1.8.1-mlnx-ofed

This combination supports hybrid OpenMP-MPI code, though performance improvement of hybrid code over MPI-only may be small.

See Also

References

  1. Intel Composer XE information website
  2. Intel MKL information website
  3. Intel MKL 11.1 Reference Manual
  4. Intel Math Kernel Library for Linux OS User's Guide: Using the ILP64 Interface vs. LP64 Interface
  5. Intel Math Kernel Library for Linux OS User's Guide: Support for ILP64 Programming
  6. Intel Math Kernel Library for Linux OS User's Guide: Linking Your Application with the Intel Math Kernel Library
  7. Intel MPI 4.1 Reference Manual
  8. Intel MPI Reference Manual - Interoperability with OpenMP* API
  9. File:IntelMPIforLinuxGettingStarted.pdf
  10. Intel Developer Zone - Hybrid applications: Intel MPI Library and OpenMP