Skip to main content

Troubleshooting

Using multiple NTI cards in a single server

It is not recommended to use multiple NTI cards in a single server to expand bandwidth.
In the current version, such configurations have not been sufficiently validated for performance and stability.
Support for this feature is planned for future versions.

Requesting multiple target subsystem connections to the NTI agent

If even one connection request succeeds, the NTI agent considers the operation successful.
Currently, failed connection requests are not retried continuously.

Network failure or target server reboot

A network failure or a target server reboot will trigger an NVMe timeout on the host.
This timeout causes the NVMe controller to reset as part of the host NVMe driver’s recovery process.
If the target server returns to a normal state before the NVMe reset process completes, the NVMe will recover automatically.
If the target server returns to a normal state after the NVMe reset process has completed, the NVMe must be manually re-bound.

Unexpected low FIO performance

Ensure that both the initiator and target nodes are configured with the correct NUMA setup, as NVMe/TCPperformance is highly sensitive to NUMA settings.
You can verify the NUMA status of PCIe devices with the following commands:
```
~$ sudo apt install hwloc
~$ lstopo
```

Performance bottleneck in the NVMe-oF target node

Ensure that the target server has at least 32 cores and a proper NUMA configuration to achieve maximum full-duplex performance (5.5 million IOPS).

High CPU cycles consumed due to spinlock contention in FIO

When too many FIO threads attempt to access a single file, it can result in excessive CPU cycles spent on spinlock contention.
Example call graph
- aio_read/aio_write -> security_file_permission -> spin_lock
This issue can lead to performance degradation and may also occur with local Samsung NVMe devices.

Performance impact due to CPU scheduling

If workloads are scheduled on cores with a long NUMA distance from the NTI device, performance degradation may occur.

Performance fluctuation caused by bandwidth disparity

A bandwidth gap between the NTI TOE engine and the target-side NIC can result in packet drops.

Using multiple NTI cards in a single server
Requesting multiple target subsystem connections to the NTI agent
Network failure or target server reboot
Unexpected low FIO performance
Performance bottleneck in the NVMe-oF target node
High CPU cycles consumed due to spinlock contention in FIO
Performance impact due to CPU scheduling
Performance fluctuation caused by bandwidth disparity