[Edgecore SONiC] SONiC RoCEv1 Tuning Tutorial Follow
Reference model:
- Switch model name: AS7726-32X
Restriction:
Causing by chipset limitation, PFC only can apply to 1 or 2 queues.
Topology:
Tuning Procedure:
Step 1. Measure bandwidth performance and latency of each traffic with Perftest tool. In this topology example, measure bandwidth performance of CE0 → CE1 and CE2 → CE1. From this measurement, you may get the big picture of current real performance and how to tune it later. Please check your traffic PFC queue number. It's necessary for configuring port QOS setting on next step.
Step 2. Configure TC to priority group map setting. Set up TC-to-Priority group map by adding below group map configuration into config_db.json. By default, we will use PTP to classify traffic to queue.
"TC_TO_PRIORITY_GROUP_MAP": {
"AZURE": { → group map profile name
"0": "0",
"1": "0",
"2": "0",
"3": "3",
"4": "4",
"5": "0",
"6": "0",
"7": "7"
}
},
Step 3. Enable PFC on queue. In this example, we will enable queue 3 of PFC on Ethernet0, Ethernet4, Ethernet20 respectively.
"PORT_QOS_MAP": {
"Ethernet0": {
"pfc_enable": "3", → Enable PFC 3
"tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE]" → Bind TC-to-Priority group map profile to Ethernet0
},
"Ethernet4": {
"pfc_enable": "3", → Enable PFC 3
"tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE]" → Bind TC-to-Priority group map profile to Ethernet4
},
"Ethernet20": {
"pfc_enable": "3", → Enable PFC 3
"tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE]" → Bind TC-to-Priority group map profile to Ethernet20
}
}
Step 4. Buffer pool for each model as attached on Appendix 1 below. If you can't find it, please request this buffer pool from us by submit a ticket through support system https://support.edge-core.com/hc/en-us/requests/new. Add this buffer pool setting to config_db.json file. In this example, AS7726-32X buffer pool setting as shown below.
"BUFFER_POOL": { → This is global buffer pool setting which is used for all ports and queues
"ingress_lossless_pool": {
"mode": "static",
"size": "10875072",
"type": "ingress",
"xoff": "4194112"
}
},
Step 5. Configure and tune buffer setting profile. Configuration below is based on our best tuning configuration of AS7726-32X. The details of how to calculate this configuration are on below.
"BUFFER_PROFILE": {
"pg_300m_profile": { → This buffer setting profile will be bound to certain or/and all ports and queues (Step 4)
"pool": "[BUFFER_POOL|ingress_lossless_pool]",
"size": "69632",
"static_th": "0",
"xoff": "13056",
"xon": "512"
}
},
The picture above is buffer design of one ingress-queue on a port. Xon/ Xoff uses headroom. When the profile.size is full, it will use headroom space for packet buffering. Once profile.xoff buffer is full (buffer line touch the upper limit of profile.xoff in green block), it sends 802.1Qbb pause frame to partner and requests to stop transmission. And when the buffer is less then profile.xon, it sends request to partner to resume the traffic transmission. AS7726-32X has 32 ports. In condition of 1:1 traffic, it will reach the most congested traffic with 16 ports traffic transmission to single port.
In order to achieve the best bandwitdh performance along with low latency and lossless packet, we have to calculate some buffer parameters.
- MTU_cell = ⌈MTU/ 256⌉ * 256 → Cell size : 256 (This cell size is for AS7726-32X. Please check appendix 2 below to match cell size of your switch model.). Rounding up the division result of RoCE MTU size by 256.
- profile.size = 16 * MTU_cell → Tuning profile.size to maximum size. In this example, 16 is the max ports number which would let AS7726-32X reach the most congested traffic.
- profile.xoff = 3 * MTU_cell → Tuning profile.xoff size to best performance. In this example, 3 is the best start constant.
- profile.xon = 2 * MTU_cell
Please check the MTU size of your NIC card. In this example, we use the maximum size of RoCE's packets is 4174 bytes. Using formula above, calculate the parameter setting for this example,
MTU_cell = ⌈4174 / 256⌉ *256 = 4352
profile.size = 4352 * 16 = 69632
profile.xoff = 4352 * 3 = 13056
profile.xon = 256 * 2 = 512
Tips in tuning : if there is packet drop, try to tune profile size. If bandwidth result is low, try to tune profile.xoff larger.
Step 6. Bind buffer profile (Step 4) to port and queue number then check the result. Compare performance and latency result after tuning and before tuning (Step 1). Tune again by follow tips in tuning on step 4 till you get a best result which means the bandwidth is better than before but doesn't increase latency.
"BUFFER_PG": {
"Ethernet20|3": { → Port|PFC Queue number
"profile": "[BUFFER_PROFILE|pg_300m_profile]" → Bind pg_300m_profile buffer profile to Ethernet20 Queue3
},
"Ethernet4|3": {
"profile": "[BUFFER_PROFILE|pg_300m_profile]"
},
"Ethernet0|3": {
"profile": "[BUFFER_PROFILE|pg_300m_profile]"
}
},
Reference:
Test Result Comparison Table:
Test Item | Single RoCE QP | Two Tuned RoCE QP | |
CE2 -> CE1 | CE0 -> CE1 | CE2&CE0->CE1 | |
Bandwidth (Mbps) | 76.51 | 83.32 | 79.13 |
Latency (µs) | 2.046 | 2.126 | 2.048 |
Note :
- QP : Queue Pair
- Tuning two RoCE QPs with lossless packet can achieve bandwidth with almost approaching the average bandwidth of two single RoCE QPs bandwidth and latency.
- Bandwidth and latency results are average of 5 times test.
Appendix 1:
Buffer Pool
Switch Model | Buffer Pool |
AS4630-54PE |
"BUFFER_POOL": { |
AS5835-54X |
"BUFFER_POOL": { |
AS7326-56X |
"BUFFER_POOL": { |
AS7726-32X |
"BUFFER_POOL": { |
AS7816-64X |
"BUFFER_POOL": { |
AS9716-32D |
"BUFFER_POOL": { |
AS8000 |
"BUFFER_POOL": { |
Appendix 2:
Cell Size Table
Switch Model | Chipset Model | Chipset Version | Packet Buffer Cell Size |
AS4630-54PE | Helix 5(Trident3 X3) | BCM56371 | 256 bytes |
AS5835-54X | Trident 3 | BCM56771 | 256 bytes |
AS7326-56X | Trident 3 | BCM56873 | 256 bytes |
AS7726-32X | Trident 3 | BCM56870 | 256 bytes |
AS7816-64X | Tomahawk 2 | BCM56970 | 208 bytes |
AS9716-32D | Tomahawk 3 | BCM56980 | 254 bytes |
AS8000 | Tomahawk 3 | BCM56980 | 254 bytes |
Comments
3 comments
Phoebe, SONIC version?
Hi Dear all,
Is there asym PFC explaination?
what is the function doing?
In the step 2, can we use the "MAP_PFC_PRIORITY_TO_QUEUE" ?
Please sign in to leave a comment.