Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1735638 - qede nic: pvp case throughput got 0 with packet size is jumbo frame
Summary: qede nic: pvp case throughput got 0 with packet size is jumbo frame
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovsdb2.15
Version: FDP 19.E
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1743813
TreeView+ depends on / blocked
 
Reported: 2019-08-01 07:54 UTC by liting
Modified: 2023-07-12 08:49 UTC (History)
7 users (show)

Fixed In Version: openvswitch2.15-2.15.0-133.el7fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1743813 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description liting 2019-08-01 07:54:05 UTC
Description of problem:
qede nic: pvp case throughput got 0 with packet size is jumbo frame

Version-Release number of selected component (if applicable):
[root@dell-per730-52 vswitchperf]# rpm -qa|grep openv
openvswitch-selinux-extra-policy-1.0-13.el7fdp.noarch
openvswitch2.11-2.11.0-18.el7fdp.x86_64

[root@dell-per730-52 vswitchperf]# rpm -qa|grep dpdk
dpdk-tools-18.11.2-1.el7_6.x86_64
dpdk-18.11.2-1.el7_6.x86_64

[root@dell-per730-52 vswitchperf]# ethtool -i p4p1
driver: qede
version: 8.37.0.20
firmware-version: mfw 8.40.24.0 storm 8.37.7.0
expansion-rom-version: 
bus-info: 0000:82:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: yes
[root@dell-per730-52 vswitchperf]# lspci -s  0000:82:00.0
82:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)

How reproducible:

Steps to Reproduce:
Run vsperf pvp_tput case with jumbo frame on dell52, got 0mpps. Detail step:
Dell52 qede nic connect with Dell53 xxv nic directly. Dell53 use for TRex sender.

1. bind two port to dpdk
driverctl -v set-override 0000:82:00.0 vfio-pci
driverctl -v set-override 0000:82:00.1 vfio-pci
2. build ovs topo
/usr/bin/ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
/usr/bin/ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,4096
/usr/bin/ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x2
/usr/bin/ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
/usr/bin/ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x80000008000000
/usr/bin/ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:82:00.0 options:n_rxq=1 mtu_request=2000
/usr/bin/ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:82:00.1 options:n_rxq=1 mtu_request=2000
/usr/bin/ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface dpdkvhostuserclient0 type=dpdkvhostuserclient -- set Interface dpdkvhostuserclient0 options:vhost-server-path=/var/run/openvswitch/dpdkvhostuserclient0 mtu_request=2000
/usr/bin/ovs-vsctl add-port br0 dpdkvhostuserclient1 -- set Interface dpdkvhostuserclient1 type=dpdkvhostuserclient -- set Interface dpdkvhostuserclient1 options:vhost-server-path=/var/run/openvswitch/dpdkvhostuserclient1 mtu_request=2000
/usr/bin/ovs-ofctl -O OpenFlow13 del-flows br0 
/usr/bin/ovs-ofctl -O OpenFlow13 add-flow br0 in_port=1,idle_timeout=0,action=output:3 
/usr/bin/ovs-ofctl -O OpenFlow13 add-flow br0 in_port=3,idle_timeout=0,action=output:1
/usr/bin/ovs-ofctl -O OpenFlow13 add-flow br0 in_port=4,idle_timeout=0,action=output:2
/usr/bin/ovs-ofctl -O OpenFlow13 add-flow br0 in_port=2,idle_timeout=0,action=output:4

3. use following command to start guest
sudo -E taskset -c 3,5,33 /usr/libexec/qemu-kvm -m 8192 -smp 3,sockets=3,cores=1,threads=1 -cpu host,migratable=off -drive if=ide,file=rhel7.6-vsperf-1Q-noviommu.qcow2 -boot c --enable-kvm -monitor unix:/tmp/vm0monitor,server,nowait -object memory-backend-file,id=mem,size=8192M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc -nographic -vnc :0 -name Client0 -snapshot -net none -no-reboot -chardev socket,id=char0,path=/var/run/openvswitch/dpdkvhostuserclient0,server -netdev type=vhost-user,id=net1,chardev=char0,vhostforce,queues=1 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=net1,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,rx_queue_size=1024,mq=on,vectors=4 -chardev socket,id=char1,path=/var/run/openvswitch/dpdkvhostuserclient1,server -netdev type=vhost-user,id=net2,chardev=char1,vhostforce,queues=1 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=net2,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,rx_queue_size=1024,mq=on,vectors=4

4. Inside guest, start testpmd to forward packet
modprobe -r vfio
modprobe -r vfio_iommu_type1
modprobe vfio enable_unsafe_noiommu_mode=Y
modprobe vfio-pci
/usr/share/dpdk/usertools/dpdk-devbind.py -b vfio-pci 00:03.0 00:04.0
/usr/bin/testpmd -l 0,1,2 -n 4 --socket-mem 1024 -- --burst=64 -i --txqflags=0xf00 --rxd=512 --txd=512 --disable-hw-vlan --nb-cores=2 --txq=1 --rxq=1 --max-pkt-len=2000 --forward-mode=io  --auto-start

5. Use Trex send rfc2544 traffic. src mac and dst mac are the mac of Trex two ports.


Actual results:
Got 0 throughput.
According to the flows, the dpdk0 didn't receive any packet.
[root@dell-per730-52 ~]# ovs-ofctl dump-flows br0
 cookie=0x0, duration=53.014s, table=0, n_packets=0, n_bytes=0, in_port=dpdk0 actions=output:3
 cookie=0x0, duration=52.982s, table=0, n_packets=0, n_bytes=0, in_port=3 actions=output:dpdk0
 cookie=0x0, duration=52.950s, table=0, n_packets=0, n_bytes=0, in_port=4 actions=output:dpdk1
 cookie=0x0, duration=52.917s, table=0, n_packets=0, n_bytes=0, in_port=dpdk1 actions=output:4

It work well with fdp 18.11 openvswitch2.10-2.10.0-10.el7fdp. 
fdp 19.A: openvswitch2.11-2.11.0-0.20190129gitd3a10db.el7fdp does not work.
fdp 18.12 openvswitch2.10-2.10.0-28.el7fdp.x86_64 also does not work.

run same case on fdp 18.11 openvswitch2.10-2.10.0-10.el7fdp, check flows as following, and got 1.2mpps throughput.
[root@dell-per730-52 vswitchperf]# ovs-ofctl dump-flows br0
 cookie=0x0, duration=45.323s, table=0, n_packets=16181, n_bytes=32297276, in_port=dpdk0 actions=output:3
 cookie=0x0, duration=45.286s, table=0, n_packets=16211, n_bytes=32357156, in_port=3 actions=output:dpdk0
 cookie=0x0, duration=45.248s, table=0, n_packets=16181, n_bytes=32297276, in_port=4 actions=output:dpdk1
 cookie=0x0, duration=45.211s, table=0, n_packets=16211, n_bytes=32357156, in_port=dpdk1 actions=output:4

Expected results:
The jumbo frame case should work well.

Additional info:

Comment 1 liting 2019-08-01 08:02:00 UTC
It also cannot work with openvswitch2.11-2.11.0-18.el8fdp on rhel8.

Comment 2 Jean-Tsung Hsiao 2019-08-01 23:04:19 UTC
Hi Li Ting,

What kernel were you using --- 7.6, 7.7 RC or 8.0.0 ?

Thanks!

Jean

Comment 3 Jean-Tsung Hsiao 2019-08-03 00:06:47 UTC
I reproduced exactly the same issue. 

Further investigation indicated that, with frame-size=2000, Lua_trafficgen binary search failed right at the start.

Trex-core(t-rex-64) showed Total-Rx=0.00 bps. The "ovs-ofctl dump-ports ovsbr0" stats showed that each dpdk interface only received the first 13 pkts and transmitted only two pkts. Please check logs below. 

-Global stats enabled 
 Cpu Utilization : 78.6  %  15.4 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :      48.32 Gbps  
 Total-Rx        :       0.00  bps  
 Total-PPS       :       3.02 Mpps  
 Total-CPS       :       0.00  cps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :        0  Clients :        0   Socket-util : 0.0000 %    
 Open-flows      :        0  Servers :        0   Socket :        0 Socket/Clients :  -nan 
 Total_queue_full : 46924954         
 drop-rate       :      48.32 Gbps   
 current time    : 29.4 sec  


[root@netqe10 jhsiao]# ovs-ofctl dump-ports ovsbr0
OFPST_PORT reply (xid=0x2): 5 ports
  port vhost0: rx pkts=2, bytes=120, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=13, bytes=780, drop=0, errs=?, coll=?
  port "dpdk-11": rx pkts=13, bytes=832, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=2, bytes=128, drop=0, errs=0, coll=?
  port vhost1: rx pkts=2, bytes=120, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=12, bytes=720, drop=0, errs=?, coll=?
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=0, errs=0, coll=0
  port "dpdk-10": rx pkts=13, bytes=832, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=2, bytes=128, drop=0, errs=0, coll=?
[root@netqe10 jhsiao]# ovs-ofctl dump-ports ovsbr0
OFPST_PORT reply (xid=0x2): 5 ports
  port vhost0: rx pkts=2, bytes=120, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=13, bytes=780, drop=0, errs=?, coll=?
  port "dpdk-11": rx pkts=13, bytes=832, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=2, bytes=128, drop=0, errs=0, coll=?
  port vhost1: rx pkts=2, bytes=120, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=12, bytes=720, drop=0, errs=?, coll=?
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=0, errs=0, coll=0
  port "dpdk-10": rx pkts=13, bytes=832, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=2, bytes=128, drop=0, errs=0, coll=?
[root@netqe10 jhsiao]#

Comment 4 liting 2019-08-05 11:53:15 UTC
(In reply to Jean-Tsung Hsiao from comment #2)
> Hi Li Ting,
> 
> What kernel were you using --- 7.6, 7.7 RC or 8.0.0 ?
> 
> Thanks!
> 
> Jean

Both 7.7 and 8.0.0 has this issue. 7.6 has no issue.

thanks,
Li Ting

Comment 5 Aaron Conole 2019-08-05 14:39:41 UTC
> Both 7.7 and 8.0.0 has this issue. 7.6 has no issue.

Are you sure you're not hitting the issue in https://bugzilla.redhat.com/show_bug.cgi?id=1736517

Try the workaround posted at:
https://bugzilla.redhat.com/show_bug.cgi?id=1711739#c14

The short version is:

ovs-vsctl --no-wait set Open_vSwitch .  other_config:dpdk-extra='--iova-mode=va'

Comment 6 liting 2019-08-06 09:17:04 UTC
(In reply to Aaron Conole from comment #5)
> > Both 7.7 and 8.0.0 has this issue. 7.6 has no issue.
> 
> Are you sure you're not hitting the issue in
> https://bugzilla.redhat.com/show_bug.cgi?id=1736517
> 
> Try the workaround posted at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1711739#c14
> 
> The short version is:
> 
> ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-extra='--iova-mode=va'

I didn't hit the bug1736517 on qede nic, the detail version as following.
RHEL 7.7 (kernel 3.10.0-1061.el7.x86_64)
OVS: openvswitch2.11-2.11.0-18.el7fdp.x86_64
DPDK: dpdk-18.11.2-1.el7_6.x86_64
driverctl: driverctl-0.95-1.el7fdparch.noarch

This issue pvp case throughput got 0 with packet size is jumbo frame only exist on qede nic. Other nic driver work well with same case and ovs version. 
And I think it should be ovs issue. Because it work well with fdp 18.11 openvswitch2.10-2.10.0-10.el7fdp. But following ovs version does not work.
fdp 19.E openvswitch2.11-2.11.0-18.el7fdp.x86_64 : does not work.
fdp 19.A openvswitch2.11-2.11.0-0.20190129gitd3a10db.el7fdp: does not work.
fdp 18.12 openvswitch2.10-2.10.0-28.el7fdp.x86_64: does not work.


thanks,
Li Ting

Comment 7 Jean-Tsung Hsiao 2019-08-15 00:03:39 UTC
Hi Li Ting,

I got 2000 bytes jumbo frame working with Trex running under 7.6 GA. You may want to try that as well.

RESULT:
[
{
    "rx_bandwidth": 141801044740,
    "rx_packets": 70198537,
    "rx_pps": 1166618.7102153124,
    "tx_bandwidth": 141801044740,
    "tx_packets": 70198537,
    "tx_pps": 1166618.7102153124,
    "tx_pps_target": 1269975.5859374998
}
,
{
    "rx_bandwidth": 141801044740,
    "rx_packets": 70198537,
    "rx_pps": 1166618.7102153124,
    "tx_bandwidth": 141801044740,
    "tx_packets": 70198537,
    "tx_pps": 1166618.7102153124,
    "tx_pps_target": 1269975.5859374998
}
]
[root@netqe29 lua-trafficgen]# 

[root@netqe29 lua-trafficgen]# cat search.mac.2000.sh
./binary-search.py \
--traffic-generator=trex-txrx --frame-size=2000 --run-bidirec=1 \
--src-macs=3c:fd:fe:bb:1c:90,3c:fd:fe:bb:1c:91 \
--dst-macs=3c:fd:fe:bb:1c:91,3c:fd:fe:bb:1c:90 \
--search-granularity=5 \
--search-runtime=60 --validation-runtime=60 --rate=3 \
--max-loss-pct=0 --use-device-stats --use-dst-mac-flows=0 \
--measure-latency=1 --latency-rate=100000 \
--rate-tolerance=20
[root@netqe29 lua-trafficgen]# uname -r
3.10.0-957.el7.x86_64
[root@netqe29 lua-trafficgen]#

Comment 8 Rasesh Mody 2019-08-21 00:41:13 UTC
Hi Jean, Li,

Can you please provide results when testing jumbo frames with following?
FDP 19.B on RHEL8.0
FDP 19.B on RHEL7.6

Based on discussions in the thread, looks like issue is observed when jumbo frame is tested with latest FDP release, 19.E.
Once a baseline is established, change FDP release upwards toward FDP 19.E to identify where issue is first seen.

There seems to be some delta introduced between FDP 19.B(or baseline version, in my opinion the baseline version should be 19.B) and FDP 19.E which may be causing Jumbo test case to fail.
The delta could be in the test suite or QEDE pmd or somewhere else.

After understanding what is working and not working, following data will be needed to narrow down if some pmd change is causing the issue.
What is the delta between openvswitch rpms in 19.B(or baseline version) vs 19.E w.r.t. QEDE pmd?

Thanks!
-Rasesh

Comment 9 Jean-Tsung Hsiao 2019-08-21 15:53:15 UTC
(In reply to Rasesh Mody from comment #8)
> Hi Jean, Li,
> 
> Can you please provide results when testing jumbo frames with following?
> FDP 19.B on RHEL8.0
> FDP 19.B on RHEL7.6
> 
> Based on discussions in the thread, looks like issue is observed when jumbo
> frame is tested with latest FDP release, 19.E.
Hi Rasesh,

Just reproduced the same issue as I switched the Trex system kernel from 7.6 GA to 7.7 GA.

Under 7.7 GA below is the dpdk installed on Trex system:
[root@netqe29 lua-trafficgen]# rpm -q dpdk
dpdk-18.11.2-1.el7.x86_64
[root@netqe29 lua-trafficgen]#

Let me find out what dpdk was used when Trex system was under 7.6 GA.

Thanks!

Jean


> Once a baseline is established, change FDP release upwards toward FDP 19.E
> to identify where issue is first seen.
> 
> There seems to be some delta introduced between FDP 19.B(or baseline
> version, in my opinion the baseline version should be 19.B) and FDP 19.E
> which may be causing Jumbo test case to fail.
> The delta could be in the test suite or QEDE pmd or somewhere else.
> 
> After understanding what is working and not working, following data will be
> needed to narrow down if some pmd change is causing the issue.
> What is the delta between openvswitch rpms in 19.B(or baseline version) vs
> 19.E w.r.t. QEDE pmd?
> 
> Thanks!
> -Rasesh

Comment 10 Rasesh Mody 2019-08-27 00:26:05 UTC
(In reply to Jean-Tsung Hsiao from comment #9)
> (In reply to Rasesh Mody from comment #8)
> > Hi Jean, Li,
> > 
> > Can you please provide results when testing jumbo frames with following?
> > FDP 19.B on RHEL8.0
> > FDP 19.B on RHEL7.6
> > 
> > Based on discussions in the thread, looks like issue is observed when jumbo
> > frame is tested with latest FDP release, 19.E.
> Hi Rasesh,
> 
> Just reproduced the same issue as I switched the Trex system kernel from 7.6
> GA to 7.7 GA.
> 
> Under 7.7 GA below is the dpdk installed on Trex system:
> [root@netqe29 lua-trafficgen]# rpm -q dpdk
> dpdk-18.11.2-1.el7.x86_64
> [root@netqe29 lua-trafficgen]#
> 
> Let me find out what dpdk was used when Trex system was under 7.6 GA.

When the issue is observed, the variable seems to be Trex/source side.

Which adapter is being used on Trex system? Are source and destination both QLogic adapters?

What are the versions of dpdk rpms used on Trex system 7.6 GA vs 7.7 GA?

Thanks!
-Rasesh
 
> Thanks!
> 
> Jean
> 
> 
> > Once a baseline is established, change FDP release upwards toward FDP 19.E
> > to identify where issue is first seen.
> > 
> > There seems to be some delta introduced between FDP 19.B(or baseline
> > version, in my opinion the baseline version should be 19.B) and FDP 19.E
> > which may be causing Jumbo test case to fail.
> > The delta could be in the test suite or QEDE pmd or somewhere else.
> > 
> > After understanding what is working and not working, following data will be
> > needed to narrow down if some pmd change is causing the issue.
> > What is the delta between openvswitch rpms in 19.B(or baseline version) vs
> > 19.E w.r.t. QEDE pmd?
> > 
> > Thanks!
> > -Rasesh


Note You need to log in before you can comment on or make changes to this bug.