Advanced Features for AI Networking
This page details the advanced RDMA features of Mango BoostX™ RoCE AI designed to address the performance and scalability challenges of large-scale AI networking. Beyond standard RoCEv2 functionality, Mango BoostX™ RoCE AI integrates sophisticated, endpoint-based enhancements that maximize fabric-wide bandwidth utilization and ensure holistic efficiency across the network fabric.
Configuration-free RoCEv2 (CFR)
Mango BoostX™ RoCE AI provides Configuration-free RoCEv2 (CFR) to enable automated congestion management across network fabrics without requiring switch-side Explicit Congestion Notification (ECN) configuration. When enabled, CFR activates built-in hardware logic that monitors network health by detecting out-of-order (OoO) packets. The feature operates in two stages:
- Autonomous CNP Generation: Upon detecting OoO packets, the receiver automatically generates Congestion Notification Packets (CNPs) to the sender, triggering sender-side rate throttling.
- Transmission Window Adaptation: A hardware rate-limiter dynamically adjusts the transmission window in response to incoming CNPs to mitigate congestion.
Usage
CFR is managed via the rdma cfr command.
Enabling a Programmable Congestion Control (PCC) application will automatically disable CFR logic. CFR remains disabled until the PCC process is terminated, at which point it reverts to its previous configuration.
# Enable CFR
~$ sudo mango-ctl rdma cfr enable mb_0
# Disable CFR
~$ sudo mango-ctl rdma cfr disable mb_0
# Check CFR status
~$ mango-ctl rdma cfr status mb_0
mb_0: CFR Enabled: Yes
Packet Spraying
RoCEv2 typically maps a single RDMA flow to a single path to maintain packet order, which can lead to hot spots in the network fabric. Mango BoostX™ RoCE AI implements Packet Spraying, which distributes packets across multiple paths by dynamically varying the UDP source port. This prevents congestion on a single link and maximizes aggregate fabric bandwidth. The level of routing entropy is controlled by the UDP port range and the OPS timer.
Usage
Packet Spraying is managed via the rdma ops command.
--udp-port-range: Specifies the number of UDP ports used by an RDMA flow (default: 16).--ops-timer: Sets the granularity of spraying. Supported values:default,level1–level5.
# Enable Packet Spraying
~$ sudo mango-ctl rdma ops enable mb_0 --udp-port-range 16 --ops-timer level3
# Disable Packet Spraying
~$ sudo mango-ctl rdma ops disable mb_0
# Check Packet Spraying status
~$ mango-ctl rdma ops status mb_0
OPS Enabled: Yes
UDP Port Range: 16
OPS Timer: level3
Selective Retransmission
RoCEv2 error recovery typically relies on a Go-Back-N protocol. If a single packet is lost or arrives out-of-order, the receiver discards all subsequent valid packets, forcing the sender to retransmit the entire window starting from the lost sequence number.
To improve recovery efficiency, Mango BoostX™ RoCE AI implements Selective Retransmission (SR). When SR is active, the sender retransmits only the specific packets that were lost or corrupted. This mechanism is particularly beneficial in multi-path (Packet Spraying) environments, where packets may arrive out-of-order due to varying path latencies.
Usage
SR is managed via the rdma sr command.
# Enable SR
~$ sudo mango-ctl rdma sr enable mb_0
# Disable SR
~$ sudo mango-ctl rdma sr disable mb_0
# Check SR status
~$ mango-ctl rdma sr status mb_0
Selective Repeat Enabled: Yes
Programmable Congestion Control (PCC)
Mango BoostX™ RoCE AI provides Programmable Congestion Control (PCC), enabling flexible and customizable congestion control tailored to diverse deployment environments. Through the Mango SDK, users can implement custom congestion control algorithms that best fit their specific network and workload characteristics.
The Mango SDK exposes APIs that allow PCC algorithms to access per-QP network congestion signals from the RDMA hardware — such as Congestion Notification Packets (CNPs) and Round-Trip Time (RTT) measurements — and to dynamically adjust each QP's transmission window size accordingly.
Mango BoostX™ RoCE AI also includes a set of pre-implemented industry-standard algorithms like DCQCN, available out of the box.
Programmable Congestion Control (PCC) is currently provided as an experimental feature. Detailed API usage guidelines will be released in a future software update.
Prerequisites
All commands in this section are run on the host, not on the Mango BoostX™ RoCE AI card.
First, set the system's PCC mode to SoC mode:
~$ mango-ctl rdma pcc mode set all soc
Set PCC mode to SoC mode for device mb_0
Please reload mango-drivers to apply PCC mode
Set PCC mode to SoC mode for device mb_1
Please reload mango-drivers to apply PCC mode
Reload the driver to apply the PCC mode change:
~$ mango-ctl proj disable rdma
~$ mango-ctl proj enable rdma
Confirm the PCC mode from the host:
~$ mango-ctl rdma pcc mode get all
mb_0: SoC mode
mb_1: SoC mode
~$ dmesg | grep PCC
[ 4593.022832] mango_core 0000:2b:00.0: PCC mode: (soc)
[ 4593.130841] mango_core 0000:ac:00.0: PCC mode: (soc)
To allow the host to reach the PCC daemon on each card's SoC, configure the management IP for every Mango management interface (mango_mgmt0, mango_mgmt1, …) using mango-ctl mgmt net config:
~$ sudo mango-ctl mgmt net config mango_mgmt0 --host-ip 192.168.10.1 --soc-ip 192.168.10.2 --subnet 24
~$ sudo mango-ctl mgmt net config mango_mgmt1 --host-ip 192.168.20.1 --soc-ip 192.168.20.2 --subnet 24
The host-side and SoC-side IPs must be on the same subnet. When --host-ip, --soc-ip, and --subnet are omitted, the defaults 192.168.10.1, 192.168.10.2, and 24 are used.
Basic Usage
PCC is managed via the rdma pcc command.
Check the target RDMA devices:
~$ mango-ctl rdma pcc dev
mb_0
mb_1
Check the loaded pre-defined PCC algorithms:
~$ mango-ctl rdma pcc algo list mb_0
aimd dcqcn rttvegas
Apply an algorithm to a target device (or all devices):
# Usage: mango-ctl rdma pcc apply {<device> | all} <algorithm>
# Apply to a specific device
~$ mango-ctl rdma pcc apply mb_0 aimd
# Apply to all available devices simultaneously
~$ mango-ctl rdma pcc apply all aimd
Start the algorithm, which spawns poller threads handling window control for each QP:
# Usage: mango-ctl rdma pcc start {<device> | all}
# Start for a specific device
~$ mango-ctl rdma pcc start mb_0
# Start for all available devices simultaneously
~$ mango-ctl rdma pcc start all
Check the current PCC status:
# Usage: mango-ctl rdma pcc status [<dev>]
~$ mango-ctl rdma pcc status
Device: mb_0
============================================================
Active QPs:
QP PID Process Ctrl Count CNP Window Size
---------------------------------------------------------------------------
17 132044 ib_write_bw 150 0 8192
18 132044 ib_write_bw 148 2 8192
Algorithm: aimd
Num threads: 1
Parameters:
alpha: 100
beta: 0.5
NAME STATUS ALGO NUM_THREADS
--------------------------------------
mb_0 Active aimd 1
mb_1 Inactive - 1
Stop the algorithm:
# Usage: mango-ctl rdma pcc stop {<device> | all}
# Stop a specific device
~$ mango-ctl rdma pcc stop mb_0
# Stop all devices simultaneously
~$ mango-ctl rdma pcc stop all
Built-in Algorithms and Parameters
Mango BoostX™ RoCE AI provides several pre-implemented congestion control algorithms. The parameters for each algorithm can be updated dynamically at runtime via the mango-ctl rdma pcc update-params command.
DCQCN (Data Center Quantized Congestion Notification)
An implementation of the standard DCQCN algorithm, utilizing CNPs to adjust the congestion probability (CP) and control the transmission window size.
| Parameter | Default | Description |
|---|---|---|
wai | 0x50 (80) | Active increase increment (bytes). |
g | 0.0625 | Congestion Probability (CP) update weight. A value between 0 and 1. |
max_fast_steps | 3 | Maximum number of fast recovery steps allowed before active increase. |
mode | 1 | DCQCN operational mode (0 for Deterministic, 1 for Probabilistic). |
threshold | 0 | CNP count threshold required to trigger a window size decrease. |
min_window | 0x1000 | Minimum allowed window size (bytes). |
max_window | 0x80000 | Maximum allowed window size (bytes). |
RTT Vegas
A congestion control algorithm inspired by TCP Vegas, which uses Round-Trip Time (RTT) measurements to estimate the queue backlog and adjusts the window size to keep the backlog within a target range.
| Parameter | Default | Description |
|---|---|---|
timeout_us | 20 | Total timeout budget for an RTT probe (microseconds). |
poll_interval_us | 1 | Sleep interval between RTT polling attempts (microseconds). |
alpha | 4096 | Target minimum backlog in the queue (bytes). |
beta | 16384 | Target maximum backlog in the queue (bytes). |
mss | 1024 | Maximum Segment Size. Used as the increment step when the backlog is less than alpha. |
d_factor | 0.99 | Decrease factor. Multiplied by the current window size when the backlog exceeds beta. |
min_window | 0x1000 | Minimum allowed window size (bytes). |
max_window | 0x500000 | Maximum allowed window size (bytes). |
Algorithm and Parameter Management
PCC algorithm parameters can be updated dynamically at runtime without stopping the ongoing congestion control process or disrupting network traffic. Updated values take effect immediately.
List the parameters of an algorithm currently applied to a device:
~$ mango-ctl rdma pcc list-params mb_0 dcqcn
wai g max_fast_steps mode threshold max_window min_window
Parameters can be updated either via CLI key-value pairs or via a JSON file (the two methods cannot be combined in one command). You can also apply updates to all active devices using the all keyword.
Method 1 — CLI parameters. Use the --param key=value option (multiple allowed):
# Usage: mango-ctl rdma pcc update-params {<device> | all} <algo> --param <key>=<value>...
~$ mango-ctl rdma pcc update-params mb_0 dcqcn --param wai=80 --param g=0.0625
Method 2 — JSON file. Use the --params_json <file> option. Recommended for updating multiple parameters at once.
~$ cat /path/to/dcqcn_params.json
{
"wai": 80,
"g": 0.0625,
"max_fast_steps": 3,
"threshold": 10
}
# Apply parameters from the JSON file to all devices
~$ mango-ctl rdma pcc update-params all dcqcn --params_json /path/to/dcqcn_params.json
Verifying the update. Check the PCC status to confirm the new values:
~$ mango-ctl rdma pcc status mb_0
Device: mb_0
...
Algorithm: dcqcn
Num threads: 4
Parameters:
g: 0.0625
max_fast_steps: 3
max_window: 524288
min_window: 0
mode: 1
threshold: 10
wai: 80
...
Writing a Custom PCC Algorithm
Mango BoostX™ RoCE AI allows users to develop and deploy their own congestion control algorithms tailored to specific network environments. Custom algorithms are compiled as shared libraries (.so) and dynamically loaded by the PCC daemon via the MangoBoost API.
To create a custom algorithm, implement the required parameter management functions, the core congestion control logic, and expose them through the plugin interface defined in <libmango.h>.
1. Core API Overview
The MangoBoost PCC SDK provides essential functions to interact with the RDMA hardware:
mango_pcc_window_current()— Retrieves the current transmission window size of the Queue Pair (QP).mango_pcc_event_get_val(event)— Retrieves hardware events. For example,MANGO_PCC_EVENT_CNPreturns the number of newly received CNPs since the last check.mango_pcc_window_update(new_window)— Updates the transmission window size to control the hardware pacing rate.mango_pcc_log(level, fmt, ...)— Prints logs that can be monitored via the PCC daemon.
2. Step-by-Step Implementation Guide
Step 2.1 — Define algorithm parameters. Define a struct for the algorithm's parameters and register them using the provided macros. This allows the PCC daemon to read and update the values at runtime.
#include <libmango.h>
#include <iterator>
/* 1. Define parameter structure */
struct my_algo_params {
unsigned int alpha;
double beta;
};
/* 2. Register parameters using macros */
static const mango_pcc_param_t my_params_meta[] = {
REGISTER_UINT32_PARAM(my_algo_params, alpha),
REGISTER_DOUBLE_PARAM(my_algo_params, beta),
};
/* 3. Set default values */
static const my_algo_params initial_params = {
.alpha = 100,
.beta = 0.5,
};
Step 2.2 — Implement parameter management. Implement the initialization and destruction callbacks. The init function allocates parameter memory and registers the metadata with the PCC core.
void *my_algo_init_params(mango_pcc_h pcc, unsigned int slot, size_t *param_size) {
*param_size = sizeof(my_algo_params);
my_algo_params *param = new my_algo_params(initial_params);
/* Register the parameter metadata to the PCC daemon */
mango_pcc_params_register(pcc, slot, my_params_meta, std::size(my_params_meta));
return param;
}
void my_algo_destroy_params(void *params) {
delete static_cast<my_algo_params *>(params);
}
Step 2.3 — Implement the congestion control logic. This is the core function called periodically by the hardware poller. It reads congestion signals (such as CNPs) and updates the window size accordingly.
void my_algo_logic(void *params) {
my_algo_params *p = (my_algo_params *)params;
/* Read current window and HW events */
unsigned int cwnd = mango_pcc_window_current();
unsigned int cnp_diff = mango_pcc_event_get_val(MANGO_PCC_EVENT_CNP);
unsigned int nwnd;
/* Simple AIMD logic example */
if (cnp_diff > 0) {
/* Congestion detected: Multiplicative Decrease */
nwnd = (unsigned int)(cwnd * p->beta);
} else {
/* No congestion: Additive Increase */
nwnd = cwnd + p->alpha;
}
/* Enforce minimum window size to prevent stalling */
if (nwnd < 1024) nwnd = 1024;
/* Update the hardware window size */
mango_pcc_window_update(nwnd);
}
Step 2.4 — Export the plugin interface. Bundle the functions into the mango_pcc_algo_plugin structure. It must be wrapped in extern "C" so the PCC daemon can load it correctly.
extern "C" {
extern const mango_pcc_algo_plugin_t mango_pcc_algo_plugin = {
.name = "my_custom_algo",
.desc = "A simple custom AIMD algorithm tutorial",
.init_params = my_algo_init_params,
.destroy_params = my_algo_destroy_params,
.algo_fn = my_algo_logic,
};
}
3. Building and Deploying
Save your code (e.g., my_algo.cc). If you are running PCC in SoC mode, use the cross-build command provided by mango-ctl to compile and deploy your algorithm directly to the device.
# Cross-build and deploy to the target device (e.g., mb_0)
~$ mango-ctl rdma pcc algo cross-build ./my_algo.cc mb_0
# Load the newly deployed algorithm
~$ mango-ctl rdma pcc algo load mb_0
# Verify the algorithm is loaded
~$ mango-ctl rdma pcc algo list mb_0
aimd dcqcn rttvegas my_custom_algo
# Apply and start your custom algorithm
~$ mango-ctl rdma pcc apply mb_0 my_custom_algo
~$ mango-ctl rdma pcc start mb_0
Connection Check
To verify connectivity to the PCC daemon on the SoC:
# ping checks RPC connectivity
~$ mango-ctl rdma pcc ping mb_0
pong
# echo sends a test message
~$ mango-ctl rdma pcc echo "Hello" mb_0
Hello