Forum Replies Created

Viewing 7 posts - 31 through 37 (of 37 total)
  • Author
  • in reply to: Confused about network devices #390

    ifconfig -a

    will list all detected interfaces, then you can use

    ip link set dev XXX up
    ifconfig XXX netmask up

    eth0 is the RGMII device interfacing to the switch, the switch ports have names like “wan” and “lan0”. These names are defined in the flat device tree file (*.dtb) loaded by u-boot.

    in reply to: Performance (Router) #386

    I finally had the chance to do the same throughput measurements with a db_88f3720_ddr3_modular board, which has the same 88F3720 CPU, and two phys connected to two dedicated SGMII busses (via SerDes), and it runs at 1GHz. Otherwise the same setup as above, both cores can be fully utilized using CPU affinity settings for the two SGMII related interrupts. Here are the results:

    FrameSize         FrameRate       BitRate
    64                 349726          179166674  
    128                340851          349324332  
    256                313973          643659427  
    512                328397          1346240596 
    1024               237464          1946168575 
    1280               192286          1969230769
    1400               176001          1971830985 
    1514               162956          1973924380
    1518               162506          1973992197

    So this time the GBit Ethernet gets saturated for frames >=1024bytes, and all this while staying below 55°C without a heat sink, which was really impressive 😉

    in reply to: Performance (Router) #359

    >If you consider routing between two 1Gb ethernet networks connected to two 1Gb ethernet ports (for example wan and lan0 in the EspressoBin case), >maximum throughput is 2Gbps — just because each port can receive 1Gbps at maximum (2Gbps in sum) and there is no way how to get more data into the >router. How did you calculate the value 4Gbps?

    By 4Gbps I meant “bidirectional”, which equals 2 Gbps full duplex. If you look at the setup again, it should get clear:

    Smartbits Sender  (port 1, max 1Gbps tx)  ->     lan0    ---RGMII-->   CPU     ---RGMII-->    wan -> Smartbits Receiver (port 2, max 1 Gbps rx)
    Smartbits Receiver(port 1, max 1Gbps rx)  <-     lan0    <--RGMII---   CPU     <---RGMII--    wan <- Smartbits Sender  (port 2, max 1 Gbps tx)

    Each gbit port can manage 1Gbps full duplex, i.e send 1Gbps and simultaneously receive 1Gbps. If the CPU handles each packet for routing, it has to pass the RGMII interface twice. This adds up to a total of 4Gbps of bidirectional traffic, or 2Gbps fullduplex, passing the RGMII bus.

    The frame rate improvement is because using the dedicated interrupt of the PCIe interface I can now split the workload between the two cores using the smp_affinity settings: IRQ 32 (PCIe) is handled by core 1, irq 9 (RGMII) is handled by core0. Both cores together can manage more frames.

    In my first test there was only the RGMII interrupt (irq 9) signalling all network traffic. This was exclusively handled by one core, while the other one was idling. This bottlenecked the frame rate.

    in reply to: Performance (Router) #356

    Thanks for the hints. My results above are all while running at 800MHz, and without touching the fdt or the cpufreq driver I thinks that’s all I can get:

    root@cb-88f3720-ddr3-expbin:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies 
    133333 200000 400000 800000 
    root@cb-88f3720-ddr3-expbin:~# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 

    I’m somewhat reluctant to overclock the board, as it is the only one I have currently ;).

    Regarding the affinity settings, I used mainly the interrupt distribution shown in /proc/interrupts for checking if I get any effects. Since I cannot split the interrupt handling for the interfaces or for rx/tx , I can only shift all of the network handling to cpu1, which the github commit you mentioned above does. I tried it during my test, but setting the smp_affinity to 1, 2 or 3 did not make a real difference.

    in reply to: Performance (Router) #354

    I did another throughput test, this time with a mini PCIe Gbit network card (RTL8111 chipset using the linux rtl8169 driver). After recompiling the kernel with the required driver option, the card is recognized and works ok. I also added the cards firmware blob manually under /lib/firmware/rtl_nic/rtl8168e-3.fw (took the firmware from the debian package firmware-nonfree_0.43.tar.gz )

    The setup is the same, but this time I tuned the SMP affinity of the interrupts 9 and 32 to share the workload of LAN and WAN interfaces to both cores:

    echo “3” > /proc/irq/9/smp_affinity
    echo “2” > /proc/irq/32/smp_affinity

    This worked as intended, as the following output after the throughput test run shows:
    root@cb-88f3720-ddr3-expbin:~# cat /proc/interrupts
    CPU0 CPU1

    9: 88062744 0 GICv3 74 Level eth0

    32: 1 12487562 GICv3 61 Level advk-pcie

    100: 1 12487562 d0070000.pcie-msi 0 Edge eth1

    The output of top during the test also showed that both cores get really busy this time (during my last test CPU load only went up to 50 %)

    These are my results, which are notably better than the ones using lan0 and wan via the switch:

    FrameSize	FrameRate	BitRate
    64		136064		69670588
    128		133782		137000531
    256		127721		261595952
    512		122464		501616536
    1024		116115		951226069
    1280		111183		1139557695
    1400		110145		1234474020
    1514		106913		1294944515
    1518		107531		1306439707

    So the throughput is going up to ~1.3 Gbps L3 this time. The used RTL8111 PCIe Ethernet controller is not the fastest one, maybe there is still room for further gains. IMHO the results so far show that the network drivers interrupt thread using only one core (maybe my smp_affinity settings were wrong?) , or the single RGMII interface used for both lan0, lan1 and wan interfaces on the switch pose a bottleneck on the maximum throughput possible with the processor.

    in reply to: Performance (Router) #333

    Ok, I did my first smartbits test. Here is the setup I have used:

    yocto with latest changes, built following the software howto , and also with enabled CONFIG_NETFILTER options in the kernel.
    root@cb-88f3720-ddr3-expbin:~# uname -a
    Linux cb-88f3720-ddr3-expbin 4.4.8-armada-17.02.2-armada-17.02.2+g8148be9 #3 SMP PREEMPT Thu Apr 6 14:48:55 CEST 2017 aarch64 GNU/Linux

    I ran this script before testing:

    root@cb-88f3720-ddr3-expbin:~# cat
    #!/bin/sh -ex
    ip link set lan0 up
    ifconfig lan0 netmask up
    ip link set wan up
    ifconfig wan netmask up
    echo 1 > /proc/sys/net/ipv4/ip_forward

    The smartbits puts UDP load bidirectionally through the lan0 and wan ports and tries to find the maximum throughput while less then 0.5 % of the sent frames get lost:

    smb (max 1Gbps) -> lan0 -> CPU -> wan -> smb
    smb <- lan0 <- CPU <- wan <- smb (max 1 Gbps)

    The iptables netfilter rules are empty (default rule ACCEPT) , only the conntrack entries for the UDP connection are used for fast forwarding of the frames:

    root@cb-88f3720-ddr3-expbin:~# iptables -L
    Chain INPUT (policy ACCEPT)
    target prot opt source destination

    Chain FORWARD (policy ACCEPT)
    target prot opt source destination

    Chain OUTPUT (policy ACCEPT)
    target prot opt source destination

    root@cb-88f3720-ddr3-expbin:~# cat /proc/net/ip_conntrack
    udp 17 174 src= dst= sport=5000 dport=5000 src= dst= sport=5000 dport=5000 [ASSURED] use=2
    udp 17 175 src= dst= sport=5000 dport=5000 src= dst= sport=5000 dport=5000 [ASSURED] use=2
    udp 17 175 src= dst= sport=5000 dport=5000 src= dst= sport=5000 dport=5000 [ASSURED] use=2
    udp 17 174 src= dst= sport=5000 dport=5000 src= dst= sport=5000 dport=5000 [ASSURED] use=2

    Here are my first results. The constant frame rate for different frame sizes is usually a good indicator for a functioning network offload engine of some kind (DMA etc.)

    FrameSize | FrameRate | BitRate
    [bytes] | [fps] | [bps] L3
    64 | 87516 | 44814528
    128 | 85916 | 87995547
    256 | 85730 | 175595067
    512 | 85601 | 350749484
    1024 | 82315 | 674329501
    1280 | 83947 | 859692307
    1400 | 85002 | 952332746
    1514 | 76186 | 922809647

    Since this is my first shot I’m sure there are possibilities to optimize the throughput. I have already checked the /proc/irq/9/smp_affinity setting for the interrupt allocated for eth0 (i.e. RGMII) and it is set to “3”, meaning both cores can be utilized for eth0 activity. But I did only get 50% CPU load for the ksoftirqd/0 thread during the high network load phases, looks like only one core gets busy currently.

    Any hints for improvement of the setup ?

    in reply to: Performance (Router) #330

    I’m about to do some throughput tests as well. Testing with Espresso Bin and buildroot-2015.11-16.08 and I am going to use a smartbits device for throughput testing in router mode.
    Seeing that the switch is apparently connected via RGMII with a 2.5 Gbps link:

    U-Boot 2015.01-armada-17.02.0-gc80c919 (Mar 04 2017 – 15:51:07)

    I2C: ready
    DRAM: 1 GiB
    Board: DB-88F3720-ESPRESSOBin
    CPU @ 800 [MHz]
    L2 @ 800 [MHz]
    TClock @ 200 [MHz]
    DDR @ 800 [MHz]
    Comphy-0: PEX0 2.5 Gbps
    Comphy-1: USB3 5 Gbps
    Comphy-2: SATA0 5 Gbps


    and the link to the switch itself is reported as 1Gps:

    # ip link set dev eth0 up
    # [ 404.704060] mvneta d0030000.ethernet eth0: Link is Up – 1Gbps/Full – flow control off

    I’m not sure about the switches capability to get faster than the reported 1Gbps via RGMII, but 2.5Gbps would be the upper limit, right ?
    Then I would suspect that we will see no more than 1.25Gbps bidirectional throughput going from lan1 to wan interface while using the CPU for routing. That would be ~750Mbps unidirectional per port.

    The upper limit for a gbit network is ~2Gbps throughput bidirectional, which means the connection between switch and CPU would have to manage ~4Gbps when routing/NATing between lan1 and wan for example. I’ll try to test also with an additional USB3.0 network adapter connected to Comphy-1, to get a little bit less of a bottleneck on the RGMII bus.

Viewing 7 posts - 31 through 37 (of 37 total)
Signup to our newsletter

Technical specification tables can not be displayed on mobile. Please view on desktop