Skip to main content

Troubleshooting

Performance fluctuation due to CPU bottlenecks

  • Set the CPU governor to performance mode.
    ~$ sudo apt install cpufrequtils
    ~$ cat /etc/init.d/cpufrequtils
    ...
    GOVERNOR="performance"
    ...
    ~$ sudo systemctl daemon-reload && sudo /etc/init.d/cpufrequtils reload
    ~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    ...
    performance
    performance
    ...
  • Try disabling Hyperthreading.
  • Use the taskset command to explicitly set the application’s CPU affinity to the same NUMA node as the Mango GPUBoost card.

Cannot allocate MR due to the memlock limitation

  • RDMA operations require memory pinning to prevent swapping. If the allowed pinned memory size is too small, MR registration may fail. To resolve this, modify /etc/security/limits.conf to increase the pinned memory limit. You may also set it to unlimited if necessary.
    ~$ cat /etc/security/limits.conf
    ...
    * soft memlock unlimited
    * hard memlock unlimited
    ~$ ulimit -l
    unlimited

The RDMA interface name appears as rocep.. instead of mb_<i>

  • To change the interface name to mb_<i>, modify /usr/lib/udev/rules.d/60-rdma-persistent-naming.rules and reload the driver.
    ~$ sudo modprobe -r mango-aux-rdma
    ~$ cat /usr/lib/udev/rules.d/60-rdma-persistent-naming.rule
    ...
    ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_KERNEL"
    ...
    ~$ sudo modprobe mango-aux-rdma
    ~$ rdma link
    link mb_0/1 state DOWN physical_state DISABLED netdev ens102np0
    link mb_1/1 state DOWN physical_state DISABLED netdev ens104np0

Application terminates with completion status 12

...
Completion with error at client
Failed status 12: wr_id 0 syndrom 0x81
scnt=128, ccnt=64
  • Completion status 12 (IBV_WC_RETRY_EXC_ERR) means that the retransmission attempts have exceeded the allowed limit. This issue may be mitigated by enabling congestion control to reduce packet drops.

Invalid argument error when loading the driver

~$ sudo modprobe mango-aux-rdma
modprobe: ERROR: could not insert 'mango_aux_rdma': Invalid argument
  • Make sure the linux-modules-extra package is installed (see Prerequisites).
  • Check whether Mellanox OFED is installed (see Prerequisites).