If you have an Intel processor, you can take advantage of the Intel MKL, which contains performance optimizations for math routines. Although it works with AMD processors too, Atlas seems to be a better choice there.
Intel provides their own guide here, however they skip on some details.
Below is how I compiled and tested numpy-1.8, scipy-0.13, and Intel MKL-11.0 on Ubuntu 12.10.
1. See how fast your current numpy is:
import numpy as np
a = np.random.rand(1000,1000)
b = np.random.rand(1000,1000)
%timeit np.dot(a,b)
I get the following on an i5-3570k machine:
10 loops, best of 3: 68.7 ms per loop
You can also check the default config numpy.show_config()
. On Ubuntu, the default is the following:
blas_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
lapack_info:
libraries = ['lapack']
library_dirs = ['/usr/lib']
language = f77
atlas_threads_info:
NOT AVAILABLE
blas_opt_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_blas_threads_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['lapack', 'blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE
2. Get the software:
Intel Compilers and MKL:
Intel only provides MKL for free for Non-Commercial purposes. Get it from their site along with the compiler and a bunch of other useful stuff. I got the Intel Parallel Studio that includes the necessary icc and ifort compilers from here.
SciPy and NumPy from Github:
git clone https://github.com/scipy/scipy.git
git clone https://github.com/numpy/numpy.git
3. Get rid of your old numpy and scipy (optional):
On Linux:
sudo apt-get remove python-numpy python-scipy
# or do via pip uninstall if it was installed with pip
4. Install MKL and Intel/Fortran Compilers
Here you just download the software and run the bash script to install it. On Linux the default installation dir is /opt/intel/
.
You should also add the environmental variables to your shell. As per Intel's suggestion at the end of the installation, add something like the following to .bashrc
or .profile
or .bash_profile
:
source /opt/intel/bin/compilervars.sh intel64
. This runs a number of scripts that modify your $LD_LIBRARY_PATH
variable.
Now you should be able to call icc --help
in a new terminal window.
5. Prepare NumPy for compilation
Go into the numpy directory you created and add the following to the site.cfg
file:
[mkl]
library_dirs = /opt/intel/mkl/composer_xe_2013/lib/intel64
include_dirs = /opt/intel/mkl/include
mkl_libs = mkl_rt
lapack_libs =
If you are building NumPy for 32 bit, please add as the following
[mkl] library_dirs = /opt/intel/composer_xe_2013/mkl/lib/ia32
include_dirs = /opt/intel/mkl/include
mkl_libs = mkl_rt
lapack_libs =
Now modify intelcompiler.py
in the /numpy/numpy/distutils
directory:
There are currently three classes in the file. You would need to modify the one that's based on your architecture.
If you use 64 bit, modify IntelEM64TCCompiler
class by modifying self.cc_exe
to be something like
self.cc_exe = 'icc -O3 -g -fPIC -fp-model strict -fomit-frame-pointer -openmp -xhost'
If you use 32 bit, modify the IntelCCompiler's self.cc_exe
to something similar to the above.
The exact configuration is going to depend on your needs and computer, but you can check what all the options mean by running icc --help
.
Intel also suggests to modify the fortran compiler options, but they are already set to the suggested values. In case you want to modify them, they are in numpy/numpy/distutils/fcompiler/intel.py
.
6. Compile Numpy and Scipy
Run the following in the numpy directory:
Replace intelem
with intel
on 32 bit machines.
python setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install
If you want to install to user home dir, add --prefix=$HOME/.local
.
Run the following in the scipy directory:
Replace intelem
with intel
on 32 bit machines.
python setup.py config --compiler=intelem --fcompiler=intelem build_clib\
--compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install
7. You're done!
Now, check the dot product again in a new terminal window:
import numpy as np
a = np.random.rand(1000,1000)
b = np.random.rand(1000,1000)
%timeit np.dot(a,b)
# 10 loops, best of 3: 23.1 ms per loop
My numpy.show_config()
looks like this:
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/intel/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/opt/intel/mkl/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/intel/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/opt/intel/mkl/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/intel/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/opt/intel/mkl/include']
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/intel/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/opt/intel/mkl/include']
mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/intel/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/opt/intel/mkl/include']