Overview

Mango BoostX™ RoCE AI is a next-generation FPGA-based RoCEv2 accelerator card engineered for large-scale AI, ML, and HPC environments. It overcomes the scalability and flexibility limitations of traditional RNICs by delivering 400GbE line-rate performance with seamless GPU peer-to-peer communication and full interoperability with existing data center infrastructure. Beyond standard RDMA capabilities, Mango BoostX™ RoCE AI incorporates advanced, configurable congestion management precisely tailored to user environments, maximizing fabric-wide bandwidth utilization for large-scale distributed AI and HPC workloads.

Highlights

High-Performance RDMA Solution

Deliver 2x200GbE line-rate performance
Support peer-to-peer communication with GPUs (e.g., GPUDirect RDMA, ROCmRDMA)

Interoperability and Standard Compatibility

Support major Linux distributions
Interoperate with commercial RNICs and network switches

Scalable AI Networking

Provide advanced features to optimize efficiency in large-scale network environments
Offer an easy-to-use SDK to implement tailored congestion management

Supported Hardware

RoCE AI 2x200GbE (HHHL)

Product brief: Download

Hardware Specification

Network Interface

1x QSFP-DD port
2x 200GbE support
8x lanes of PAM-4/NRZ Serdes
Support active and passive cables

Host Interface

2x PCIe Gen5 x8 (PCIe bifurcated)

Form Factor

HHHL, single slot
PCIe add-in card

Processing Unit

2x Arm Cortex-A72
2x Arm Cortex-R5F

Memory

8GB LPDDR4, ECC support
256MB OSPI flash
64GB eMMC flash

Management

PCIe in-band management
MCTP over SMBus
FRU (Field Replaceable Unit)
UART

Environmental

Typical Power Consumption: 55W (with full RDMA performance & passive cable)
12V, 3.3V, 3.3V_AUX input voltage via PCIe Gold Finger
Operating Temperature: 0°C to 55°C
Operating Relative Humidity: 20% to 80%
Storage Temperature: -20°C to 60°C
Storage Relative Humidity: 10% to 90%

Regulatory

FCC/CE/KC
cTUVus
RoHS

RDMA Features

GPU-RNIC peer-to-peer communication (AMD/NVIDIA)
Reliable Connection (RC) and Unreliable Datagram (UD) QPs
RDMA read/write/write with immediate/send/recv operations
IPv4 support
Configurable MTU size
Zero-length operations
Event-based CQ handling in user-space

The table below describes the number of available resources. Effective count refers to the number of resources that a user can actually use after the driver is loaded.

Resource	Per Interface	Total
Max QPs	512	1024
Effective max QPs	508	1016
Max CQs	512	1024
Effective max CQs	508	1016
Max MRs	1024	2048
Effective max MRs	1023	2046
Max PDs	8192	16384
Effective max PDs	8191	16382
Max SRQs	256	512
Max SGEs per WR	N/A	2

Advanced RDMA Features for Scalable AI Networking

Configuration-free RoCEv2: Automated congestion management without complex infrastructure tuning
Packet spraying: Maximizes fabric bandwidth by distributing packets across multiple paths
Selective retransmission: Improves network efficiency through faster recovery from packet loss
Programmable congestion control: Flexible, user-defined algorithms tailored to specific network environments

See Advanced Features for AI Networking for usage details.

Highlights​

Supported Hardware​

RoCE AI 2x200GbE (HHHL)​

Hardware Specification​

RDMA Features​

Advanced RDMA Features for Scalable AI Networking​