Overview
Mango BoostX™ RoCE AI is a next-generation FPGA-based RoCEv2 accelerator card engineered for large-scale AI, ML, and HPC environments. It overcomes the scalability and flexibility limitations of traditional RNICs by delivering 400GbE line-rate performance with seamless GPU peer-to-peer communication and full interoperability with existing data center infrastructure. Beyond standard RDMA capabilities, Mango BoostX™ RoCE AI incorporates advanced, configurable congestion management precisely tailored to user environments, maximizing fabric-wide bandwidth utilization for large-scale distributed AI and HPC workloads.
Highlights
High-Performance RDMA Solution
- Deliver 2x200GbE line-rate performance
- Support peer-to-peer communication with GPUs (e.g., GPUDirect RDMA, ROCmRDMA)
Interoperability and Standard Compatibility
- Comply with major Linux distributions
- Interoperate with commercial RNICs and network switches
Scalable AI Networking
- Provide advanced features to optimize efficiency in large-scale network environments
- Offer an easy-to-use SDK to implement tailored congestion management
Supported Hardware
RoCE AI 2x200GbE (HHHL)

- Product brief: Download
Hardware Specification
Network Interface
- 1x QSFP-DD port
- 2x 200GbE support
- 8x lanes of PAM-4/NRZ Serdes
- Support active and passive cables
Host Interface
- 2x PCIe Gen5 x8 (PCIe bifurcated)
Form Factor
- HHHL, single slot
- PCIe add-in card
Processing Unit
- 2x Arm Cortex-A72
- 2x Arm Cortex-R5F
Memory
- 8GB LPDDR4, ECC support
- 256MB OSPI flash
- 64GB eMMC flash
Management
- PCIe in-band management
- MCTP over SMBus
- FRU (Field Replaceable Unit)
- UART
Environmental
- Typical Power Consumption: 55W (with full RDMA performance & passive cable)
- 12V, 3.3V, 3.3V_AUX input voltage via PCIe Gold Finger
- Operating Temperature: 0°C to 55°C
- Operating Relative Humidity: 20% to 80%
- Storage Temperature: -20°C to 60°C
- Storage Relative Humidity: 10% to 90%
Regulatory
- FCC/CE/KC
- cTUVus
- RoHS
RDMA Features
- GPU-RNIC peer-to-peer communication (AMD/NVIDIA)
- Reliable Connection (RC) and Unreliable Datagram (UD) QPs
- RDMA read/write/write with immediate/send/recv operations
- IPv4 support
- Configurable MTU size
- Zero-length operations
- Event-based CQ handling in user-space
The table below describes the number of available resources. Effective count refers to the number of resources that a user can actually use after the driver is loaded.
| Resource | Per Interface | Total |
|---|---|---|
| Max QPs | 512 | 1024 |
| Effective max QPs | 508 | 1016 |
| Max CQs | 512 | 1024 |
| Effective max CQs | 508 | 1016 |
| Max MRs | 1024 | 2048 |
| Effective max MRs | 1023 | 2046 |
| Max PDs | 1024 | 2048 |
| Effective max PDs | 1023 | 2046 |
| Max SRQs | 256 | 512 |
| Max SGEs per WR | N/A | 2 |
Advanced RDMA Features for Scalable AI Networking
- Configuration-free RoCEv2: Automated congestion management without complex infrastructure tuning
- Packet spraying: Maximizes fabric bandwidth by distributing packets across multiple paths
- Selective retransmission: Improves network efficiency through faster recovery from packet loss
- Programmable congestion control: Flexible, user-defined algorithms tailored to specific network environments
See Advanced Features for AI Networking for usage details.