Home Forums Hardware discussions Performance (Router) Reply To: Performance (Router)

#333
0000
Participant

Ok, I did my first smartbits test. Here is the setup I have used:

yocto with latest changes, built following the software howto , and also with enabled CONFIG_NETFILTER options in the kernel.
root@cb-88f3720-ddr3-expbin:~# uname -a
Linux cb-88f3720-ddr3-expbin 4.4.8-armada-17.02.2-armada-17.02.2+g8148be9 #3 SMP PREEMPT Thu Apr 6 14:48:55 CEST 2017 aarch64 GNU/Linux

I ran this script before testing:

root@cb-88f3720-ddr3-expbin:~# cat setup.sh
#!/bin/sh -ex
ip link set lan0 up
ifconfig lan0 172.18.1.1 netmask 255.255.0.0 up
ip link set wan up
ifconfig wan 172.19.1.1 netmask 255.255.0.0 up
echo 1 > /proc/sys/net/ipv4/ip_forward

The smartbits puts UDP load bidirectionally through the lan0 and wan ports and tries to find the maximum throughput while less then 0.5 % of the sent frames get lost:

smb (max 1Gbps) -> lan0 -> CPU -> wan -> smb
smb <- lan0 <- CPU <- wan <- smb (max 1 Gbps)

The iptables netfilter rules are empty (default rule ACCEPT) , only the conntrack entries for the UDP connection are used for fast forwarding of the frames:

root@cb-88f3720-ddr3-expbin:~# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

root@cb-88f3720-ddr3-expbin:~# cat /proc/net/ip_conntrack
udp 17 174 src=172.18.13.1 dst=172.19.13.1 sport=5000 dport=5000 src=172.19.13.1 dst=172.18.13.1 sport=5000 dport=5000 [ASSURED] use=2
udp 17 175 src=172.18.11.1 dst=172.19.11.1 sport=5000 dport=5000 src=172.19.11.1 dst=172.18.11.1 sport=5000 dport=5000 [ASSURED] use=2
udp 17 175 src=172.18.10.1 dst=172.19.10.1 sport=5000 dport=5000 src=172.19.10.1 dst=172.18.10.1 sport=5000 dport=5000 [ASSURED] use=2
udp 17 174 src=172.18.12.1 dst=172.19.12.1 sport=5000 dport=5000 src=172.19.12.1 dst=172.18.12.1 sport=5000 dport=5000 [ASSURED] use=2

Here are my first results. The constant frame rate for different frame sizes is usually a good indicator for a functioning network offload engine of some kind (DMA etc.)

FrameSize | FrameRate | BitRate
[bytes] | [fps] | [bps] L3
64 | 87516 | 44814528
128 | 85916 | 87995547
256 | 85730 | 175595067
512 | 85601 | 350749484
1024 | 82315 | 674329501
1280 | 83947 | 859692307
1400 | 85002 | 952332746
1514 | 76186 | 922809647

Since this is my first shot I’m sure there are possibilities to optimize the throughput. I have already checked the /proc/irq/9/smp_affinity setting for the interrupt allocated for eth0 (i.e. RGMII) and it is set to “3”, meaning both cores can be utilized for eth0 activity. But I did only get 50% CPU load for the ksoftirqd/0 thread during the high network load phases, looks like only one core gets busy currently.

Any hints for improvement of the setup ?

Signup to our newsletter

Technical specification tables can not be displayed on mobile. Please view on desktop