I did another throughput test, this time with a mini PCIe Gbit network card (RTL8111 chipset using the linux rtl8169 driver). After recompiling the kernel with the required driver option, the card is recognized and works ok. I also added the cards firmware blob manually under /lib/firmware/rtl_nic/rtl8168e-3.fw (took the firmware from the debian package firmware-nonfree_0.43.tar.gz )

The setup is the same, but this time I tuned the SMP affinity of the interrupts 9 and 32 to share the workload of LAN and WAN interfaces to both cores:

echo “3” > /proc/irq/9/smp_affinity
echo “2” > /proc/irq/32/smp_affinity

This worked as intended, as the following output after the throughput test run shows:
root@cb-88f3720-ddr3-expbin:~# cat /proc/interrupts

9: 88062744 0 GICv3 74 Level eth0

32: 1 12487562 GICv3 61 Level advk-pcie

100: 1 12487562 d0070000.pcie-msi 0 Edge eth1

The output of top during the test also showed that both cores get really busy this time (during my last test CPU load only went up to 50 %)

These are my results, which are notably better than the ones using lan0 and wan via the switch:

FrameSize	FrameRate	BitRate

64		136064		69670588
128		133782		137000531
256		127721		261595952
512		122464		501616536
1024		116115		951226069
1280		111183		1139557695
1400		110145		1234474020
1514		106913		1294944515
1518		107531		1306439707

So the throughput is going up to ~1.3 Gbps L3 this time. The used RTL8111 PCIe Ethernet controller is not the fastest one, maybe there is still room for further gains. IMHO the results so far show that the network drivers interrupt thread using only one core (maybe my smp_affinity settings were wrong?) , or the single RGMII interface used for both lan0, lan1 and wan interfaces on the switch pose a bottleneck on the maximum throughput possible with the processor.

