At the occasion of the Aion supercomputer installation, we had to merge two Infiniband island together and it came with the deployment of the Mellanox OFED stack. While installation is quite straightforward, the compliance with GPFS/SpectrumScale and Lustre (based on DDN solutions in our case) assumes non-standard compilation options that motivated this blog post. In short: don’t forget the --add-kernel-support --kmp option

Below are installation notes for Mellanox OFED (MOFED) 5.1-2.5.8.0 on CentOS 7.9 (kernel 3.10.0-1127.19.1.el7) with compliance to Lustre 2.12.5_ddn18 and GPFS/SPectrumScale 4.2.3.24 clients.

TL;DR;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
### MOFED install - successfully tested on RHEL 8.3
# Beware of matching kernel
$ uname -r
$ yum info kernel-devel kernel-headers  kernel-rpm-macros
$ yum install kernel-devel kernel-headers  kernel-rpm-macros
### Pre-requisite packages
yum install createrepo rpm-build gcc elfutils-libelf-devel gdb-headless python36-devel bison flex tcl tk
# Alternative 1: Head/Compute node
./mlnxofedinstall --add-kernel-support --kmp --without-fw-update  --hpc --without-openmpi --without-mpi-selector --without-opensm --without-opensm-libs --without-ibdump
# Alternative 2: Minimal
./mlnxofedinstall --without-fw-update --basic --with-opensm --with-opensm-libs --with-ibutils2
# DO NOT FORGET - Rebuilding the initramfs - see also https://access.redhat.com/solutions/365693
cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).bak.$(date +%m-%d-%H%M%S).img
dracut -f -v

Preliminaries

Download the appropriate tarball archive MLNX_OFED_LINUX-<version>-rhel7.9-x86_64.tgz download from MLNX_OFED Download Center (adapt OS accordingly) Copy it to a shared NFS directory (Not GPFS nor Lustre as you’ll have to umount it) /path/to/shared/dir

Uncompress it at the appropriate location:

1
2
3
4
5
6
mofed_version=5.1-2.5.8.0
cp MLNX_OFED_LINUX-${mofed_version}-rhel7.9-x86_64.tgz /path/to/shared/dir  # /!\ ADAPT accordingly
cd /path/to/shared/dir
tar xf MLNX_OFED_LINUX-${mofed_version}-rhel7.9-x86_64.tgz
cd MLNX_OFED_LINUX-${mofed_version}-rhel7.9-x86_64/
./mlnxofedinstall -h

Then stop and unmount all storage depending on your IB network:

1
2
3
4
5
6
7
# Umount GPFS
systemctl stop gpfs
mmgetstate
#  Umount Lustre
umount /mnt/lscratch
systemctl stop lnet-peers.service
mount | egrep 'gpfs|lscratch' | wc -l

(eventually) save the state of your system before:

1
2
3
4
rpm -qa | egrep -i 'libib|rdma|ofa|ibutils|ipoib|infiniband' | tee $(date +%F)_list_IB_packages_BEFORE.txt
ibstat  | tee $(date +%F)_ibstat_BEFORE.txt
mlxup --query | tee $(date +%F)_mlxup--query_BEFORE.txt
ibswitches    | tee $(date +%F)_ibswitches_BEFORE.txt

(eventually) List packages that will be installed with ./mlnxofedinstall -p

1
2
3
4
5
6
7
8
9
10
11
12
### Default (bad): all packages
$ ./mlnxofedinstall -p
MLNX_OFED packages: ofed-scripts mlnx-ofa_kernel mlnx-ofa_kernel-devel kernel-mft-mlnx knem xpmem iser srp isert rdma-core libibverbs librdmacm libibumad infiniband-diags rdma-core-devel libibverbs-utils ibsim ibacm librdmacm-utils opensm-libs opensm opensm-devel opensm-static dapl dapl-devel dapl-devel-static dapl-utils perftest mstflint mft srp_daemon ibutils2 dump_pr ar_mgr ibdump dpcp mxm ucx ucx-devel sharp ucx-cma ucx-ib ucx-rdmacm ucx-knem libxpmem ucx-xpmem mpi-selector hcoll openmpi mlnx-ethtool mlnx-iproute2 rshim clusterkit mlnxofed-docs mpitests_openmpi
Created /tmp/ofed-all.conf
### HPC YET incl. OpenMPI
./mlnxofedinstall -p --hpc
MLNX_OFED packages: mlnxofed-docs ofed-scripts rdma-core libibverbs libibverbs-utils librdmacm libibumad infiniband-diags rdma-core-devel librdmacm-utils mstflint mft mlnx-ethtool mlnx-iproute2 knem mxm ucx ucx-devel ibacm ucx-cma ucx-ib ucx-knem ucx-rdmacm libxpmem ucx-xpmem dapl dapl-devel dapl-devel-static dapl-utils ibutils2 opensm-libs opensm dump_pr ar_mgr ibdump perftest mpi-selector sharp hcoll openmpi mpitests_openmpi mlnx-ofa_kernel mlnx-ofa_kernel-devel kernel-mft-mlnx iser srp isert
Created /tmp/ofed-hpc.conf
### Basic minimal
./mlnxofedinstall -p --basic
MLNX_OFED packages: mlnxofed-docs ofed-scripts rdma-core libibverbs libibverbs-utils librdmacm libibumad infiniband-diags rdma-core-devel librdmacm-utils mstflint mft mlnx-ethtool mlnx-iproute2 mlnx-ofa_kernel mlnx-ofa_kernel-devel kernel-mft-mlnx iser srp isert
Created /tmp/ofed-basic.conf

MOFED 5.1 Installation

Important: Don’t forget to add --add-kernel-support --kmp --without-fw-update [...] !

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ yum install createrepo
$ ./mlnxofedinstall --add-kernel-support --kmp --without-fw-update  --hpc --without-openmpi --without-mpi-selector --without-opensm --without-opensm-libs --without-ibdump
Note: This program will create MLNX_OFED_LINUX TGZ for rhel7.9 under /tmp/MLNX_OFED_LINUX-5.1-2.5.8.0-3.10.0-1127.19.1.el7.x86_64 directory.
See log file /tmp/MLNX_OFED_LINUX-5.1-2.5.8.0-3.10.0-1127.19.1.el7.x86_64/mlnx_iso.8747_logs/mlnx_ofed_iso.8747.log
# [...]
Building MLNX_OFED_LINUX RPMS . Please wait...
Creating metadata-rpms for 3.10.0-1127.19.1.el7.x86_64 ...
# [...]
Created /tmp/MLNX_OFED_LINUX-5.1-2.5.8.0-3.10.0-1127.19.1.el7.x86_64/MLNX_OFED_LINUX-5.1-2.5.8.0-rhel7.9-ext.tgz
# [...]
rpm --nosignature -e --allmatches --nodeps rdma-core rdma-core.i686 rdma-core rdma-core.i686 rdma-core-devel
Installing /tmp/MLNX_OFED_LINUX-5.1-2.5.8.0-3.10.0-1127.19.1.el7.x86_64/MLNX_OFED_LINUX-5.1-2.5.8.0-rhel7.9-ext
/tmp/MLNX_OFED_LINUX-5.1-2.5.8.0-3.10.0-1127.19.1.el7.x86_64/MLNX_OFED_LINUX-5.1-2.5.8.0-rhel7.9-ext/mlnxofedinstall --force --kmp --without-fw-update --hpc --without-openmpi --without-mpi-selector --without-opensm --without-opensm-libs --without-ibdump
Logs dir: /tmp/MLNX_OFED_LINUX.16047.logs
General log file: /tmp/MLNX_OFED_LINUX.16047.logs/general.log
Verifying KMP rpms compatibility with target kernel...
# [...]
rpm --nosignature -e --allmatches --nodeps libibverbs libibverbs-utils libibverbs libibverbs libibverbs libibumad ibacm librdmacm librdmacm-utils opensm opensm-libs opensm-devel dapl dapl-devel dapl-utils perftest ibutils infiniband-diags infinipath-psm libibverbs libibverbs-utils libibumad ibacm librdmacm librdmacm-utils opensm opensm-libs opensm-devel compat-opensm-libs compat-dapl compat-dapl-devel compat-dapl-utils dapl dapl-devel dapl-utils perftest infiniband-diags infinipath-psm opensm opensm-devel opensm-libs ibutils ibutils-libs infiniband-diags-devel dapl-utils-2.1.5-3.el7.x86_64 dapl-devel-2.1.5-3.el7.x86_64 compat-dapl-utils-1.2.19-4.el7.x86_64 compat-dapl-devel-1.2.19-4.el7.x86_64 ibutils-1.5.7-14.el7.x86_64
# [...]
Installation finished successfully.
Preparing...                          ################################# [100%]
Updating / installing...
   1:mlnx-fw-updater-5.1-2.5.8.0      ################################# [100%]
Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf
Skipping FW update.
To load the new driver, run:
/etc/init.d/openibd restart

Then you need to remove manually a few kernel modules before being able to [re]start openibd and confirm with openibd service:

1
2
3
4
5
rmmod rpcrdma ib_srpt ib_isert i40iw
/etc/init.d/openibd restart
etc/init.d/openibd stop    # If OK
systemctl status openibd.service
systemctl start openibd.service

Don’t forget to rebuild the initramfs

1
2
3
# DO NOT FORGET - Rebuilding the initramfs - see also https://access.redhat.com/solutions/365693
cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).bak.$(date +%m-%d-%H%M%S).img
dracut -f -v

Lustre client rebuild

Once MOFED has been installed as above, you must rebuild Lustre client.

1
2
3
4
5
6
7
8
9
cd /path/to/shared/dir
version=2.12.5_ddn18
sudo mkdir lustre-${version}_MOFED-${mofed_version}_$(uname -r)
# Extract Lustre sources into that directory
tar xvzf lustre-client-${version}.tar.gz -C lustre-${version}_MOFED-${mofed_version}_$(uname -r) --strip-components=1
# Lustre client RPMs build
cd lustre-${version}_MOFED-${mofed_version}_$(uname -r)
./configure --disable-server
make rpms

Now reinstall the client RPMs generated and mount Lustre shares:

1
2
3
4
5
6
rpm -qa | grep lustre
ls --color=never /path/to/shared/dir/lustre-${version}_MOFED-${mofed_version}_$(uname -r)/*lustre*${version}* | grep -vE 'tests|src|debug' | sort
ls --color=never /path/to/shared/dir/lustre-${version}_MOFED-${mofed_version}_$(uname -r)/*lustre*${version}* | grep -vE 'tests|src|debug' | sort | xargs -n1 rpm -ivh --reinstall
systemctl start lnet-peers.service
systemctl status lnet-peers.service
mount -v /mnt/lscratch

GPFS Portability layer RPM rebuild and reinstall

Once MOFED has been installed as above, you must rebuild the portability layer of GPFS/SpectrumScale client.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ cd /usr/lpp/mmfs/bin
$ version=4.2.3.24
$ mmbuildgpl --build-package
--------------------------------------------------------
mmbuildgpl: Building GPL (4.2.3.24) module begins at Fri May  7 17:39:11 CEST 2021.
--------------------------------------------------------
Verifying Kernel Header...
  kernel version = 31000999 (310001127019001, 3.10.0-1127.19.1.el7.x86_64, 3.10.0-1127.19.1)
  module include dir = /lib/modules/3.10.0-1127.19.1.el7.x86_64/build/include
  module build dir   = /lib/modules/3.10.0-1127.19.1.el7.x86_64/build
  kernel source dir  = /usr/src/linux-3.10.0-1127.19.1.el7.x86_64/include
  Found valid kernel header file under /usr/src/kernels/3.10.0-1127.19.1.el7.x86_64/include
Verifying Compiler...
  make is present at /bin/make
  cpp is present at /bin/cpp
  gcc is present at /bin/gcc
  g++ is present at /bin/g++
  ld is present at /bin/ld
Verifying rpmbuild...
Verifying Additional System Headers...
  Verifying kernel-headers is installed ...
    Command: /bin/rpm -q kernel-headers
    The required package kernel-headers is installed
make World ...
make InstallImages ...
make rpm ...
Wrote: /root/rpmbuild/RPMS/x86_64/gpfs.gplbin-3.10.0-1127.19.1.el7.x86_64-4.2.3-24.x86_64.rpm
--------------------------------------------------------
mmbuildgpl: Building GPL module completed successfully at Fri May  7 17:39:29 CEST 2021.
--------------------------------------------------------
$ cd /path/to/shared/dir/
$ mkdir ${version}-MOFED-${mofed_version}
$ cp /root/rpmbuild/RPMS/x86_64/gpfs.gplbin-$(uname -r).el7.x86_64-${version}.x86_64.rpm ${version}-MOFED-${mofed_version}/

Now reinstall the gpfs.gplbin RPMs generated and start the GPFS service:

1
2
3
rpm -ivh --reinstall /path/to/shared/dir/${version}-MOFED-${mofed_version}/gpfs.gplbin-$(uname -r).el7.x86_64-${version}.x86_64.rpm
systemctl start gpfs
systemctl status gpfs