netmap: pkt-gen multi-thread crash (RHEL7.3, ixgbe)

I’m trying netmap with a pair of old machines (Xeon E5450). Because I saw that the receiver can only get about 5Gbps while the sender is sending 9.8Gbps, I want to try multi-threading by specifying -p and -c. However, with -p and -c specified, the pkt-gen cannot receive and packet. And when I stop it and run pkt-gen again, the pkt-gen stops at “Ready” and netmap locks a CPU core. Is it the problem with netmap, pkt-gen, the usage, or local environment settings? Note: No problem with the sending part.

Single-thread works but throughput is poor.

[root@m2 pkt-gen]# ./pkt-gen -i netmap:enp7s0f0 -s 10.0.0.1:1234 -d 10.0.0.2:1234 -f rx
208.951077 main [2647] interface is netmap:enp7s0f0
208.951121 main [2772] using default burst size: 512
208.951147 main [2780] running on 1 cpus (have 8)
208.951425 extract_ip_range [463] range is 10.0.0.1:1234 to 10.0.0.1:1234
208.951431 extract_ip_range [463] range is 10.0.0.2:1234 to 10.0.0.2:1234
209.311308 main [2880] mapped 334980KB at 0x7f3fc394c000
Receiving from netmap:enp7s0f0: 8 queues, 1 threads and 1 cpus.
209.311391 start_threads [2326] Wait 2 secs for phy reset
211.311428 start_threads [2328] Ready...
211.311531 receiver_body [1712] reading from netmap:enp7s0f0 fd 3 main_fd 3
...
224.324258 main_thread [2416] 608.851 Kpps (609.448 Kpkts 6.826 Gbps in 1000980 usec) 2.83 avg_batch 492 min_space

No receiving with 2 threads / 2 cores

[root@m2 pkt-gen]# ./pkt-gen -i netmap:enp7s0f0 -s 10.0.0.1:1234 -d 10.0.0.2:1234 -f rx -p 2 -c 2
330.974942 main [2647] interface is netmap:enp7s0f0
330.974988 main [2772] using default burst size: 512
330.975013 main [2780] running on 2 cpus (have 8)
330.975269 extract_ip_range [463] range is 10.0.0.1:1234 to 10.0.0.1:1234
330.975275 extract_ip_range [463] range is 10.0.0.2:1234 to 10.0.0.2:1234
331.404888 nm_open [850] overriding ifname enp7s0f0 ringid 0x0 flags 0x8004
331.662274 main [2880] mapped 334980KB at 0x7f664007c000
Receiving from netmap:enp7s0f0: 8 queues, 2 threads and 2 cpus.
331.662346 nm_open [850] overriding ifname enp7s0f0 ringid 0x1 flags 0x8004
331.877262 nm_mmap [959] do not mmap, inherit from parent
331.877269 start_threads [2326] Wait 2 secs for phy reset
333.877329 start_threads [2328] Ready...
333.877513 receiver_body [1712] reading from netmap:enp7s0f0 fd 3 main_fd 3
333.877546 receiver_body [1712] reading from netmap:enp7s0f0 fd 4 main_fd 3
334.878487 main_thread [2416] 0.000 pps (0.000 pkts 0.000 bps in 1000964 usec) 0.00 avg_batch 0 min_space
334.878557 receiver_body [1727] waiting for initial packets, poll returns 0 0
335.879437 main_thread [2416] 0.000 pps (0.000 pkts 0.000 bps in 1000950 usec) 0.00 avg_batch 199998 min_space
335.879439 receiver_body [1727] waiting for initial packets, poll returns 0 0
336.880446 main_thread [2416] 0.000 pps (0.000 pkts 0.000 bps in 1001009 usec) 0.00 avg_batch 199998 min_space
...
350.891992 receiver_body [1727] waiting for initial packets, poll returns 0 0
350.893544 main_thread [2416] 0.000 pps (0.000 pkts 0.000 bps in 1000675 usec) 0.00 avg_batch 199998 min_space
^C351.855576 sigint_h [518] received control-C on thread 0x7f665548c740
351.855606 main_thread [2416] 0.000 pps (0.000 pkts 0.000 bps in 962062 usec) 0.00 avg_batch 199998 min_space
351.893019 receiver_body [1727] waiting for initial packets, poll returns 0 0
352.856617 main_thread [2416] 0.000 pps (0.000 pkts 0.000 bps in 1001010 usec) 0.00 avg_batch 0 min_space
Received nothing.

And then it dies here

[root@m2 pkt-gen]# ./pkt-gen -i netmap:enp7s0f0 -s 10.0.0.1:1234 -d 10.0.0.2:1234 -f rx
451.343137 main [2647] interface is netmap:enp7s0f0
451.343178 main [2772] using default burst size: 512
451.343205 main [2780] running on 1 cpus (have 8)
451.343470 extract_ip_range [463] range is 10.0.0.1:1234 to 10.0.0.1:1234
451.343476 extract_ip_range [463] range is 10.0.0.2:1234 to 10.0.0.2:1234
451.558281 main [2880] mapped 334980KB at 0x7fbaacbcd000
Receiving from netmap:enp7s0f0: 8 queues, 1 threads and 1 cpus.
451.558322 start_threads [2326] Wait 2 secs for phy reset

Related kernel messages

Aug 30 14:54:38 m2 kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [pkt-gen:3312]
Aug 30 14:54:38 m2 kernel: intel_powerclamp: No package C-state available
Aug 30 14:54:38 m2 kernel: Modules linked in:
Aug 30 14:54:38 m2 kernel: ixgbe(OE) netmap(OE) sch_fq tcp_htcp fuse tun ip_set nfnetlink bridge stp llc coretemp iTCO_wdt iTCO_vendor_support kvm gpio_ich i5000_edac irqbypa
ss edac_core sg i2c_i801 mdio i2c_core pcspkr lpc_ich ioatdma i5k_amb shpchp dca nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod
cdrom crct10dif_common ata_generic pata_acpi ata_piix libata serio_raw e1000e megaraid_sas e1000 ptp pps_core fjes dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ixgbe]
Aug 30 14:54:38 m2 kernel: CPU: 6 PID: 3312 Comm: pkt-gen Tainted: G          IOE  ------------   3.10.0-514.el7.x86_64 #1
Aug 30 14:54:38 m2 kernel: task: ffff8800b8866dd0 ti: ffff88040bb4c000 task.ti: ffff88040bb4c000
Aug 30 14:54:38 m2 kernel: RIP: 0010:[<ffffffffa03f3080>]  [<ffffffffa03f3080>] ixgbe_netmap_rxsync+0xc0/0x2a0 [ixgbe]
Aug 30 14:54:38 m2 kernel: RSP: 0018:ffff88040bb4f9f8  EFLAGS: 00000202
Aug 30 14:54:38 m2 kernel: RAX: 00000000000001ff RBX: 0000001d0000000a RCX: ffff88042d820e60
Aug 30 14:54:38 m2 kernel: RDX: 00000000000000e7 RSI: 00000000000000e7 RDI: ffff88042eb27500
Aug 30 14:54:38 m2 kernel: RBP: ffff88040bb4fa28 R08: ffff88041e7b6e70 R09: 0000000000000073
Aug 30 14:54:38 m2 kernel: R10: ffff88042d820000 R11: ffff8800b9bdccc0 R12: 0000000000169dd3
Aug 30 14:54:38 m2 kernel: R13: 0000001d0000000a R14: 00000000002a2564 R15: 0000000000167e61
Aug 30 14:54:38 m2 kernel: FS:  00007fbaacbcc700(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
Aug 30 14:54:38 m2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 30 14:54:38 m2 kernel: CR2: 00007fbac13dadd0 CR3: 000000041128a000 CR4: 00000000000407e0
Aug 30 14:54:38 m2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 30 14:54:38 m2 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 30 14:54:38 m2 kernel: Stack:
Aug 30 14:54:38 m2 kernel: ffff8800b9bdd000 0000000000000003 ffff88042d820000 ffff88042eb27568
Aug 30 14:54:38 m2 kernel: ffff88042eb27500 0000000000000000 ffff88040bb4faf0 ffffffffa0460882
Aug 30 14:54:38 m2 kernel: ffffffff811a1f19 0100000000001000 0000000100000001 0000000100000001
Aug 30 14:54:38 m2 kernel: Call Trace:
Aug 30 14:54:38 m2 kernel: [<ffffffffa0460882>] netmap_poll+0x582/0x840 [netmap]
Aug 30 14:54:38 m2 kernel: [<ffffffff811a1f19>] ? zone_statistics+0x89/0xa0
Aug 30 14:54:38 m2 kernel: [<ffffffff8133304d>] ? list_del+0xd/0x30
Aug 30 14:54:38 m2 kernel: [<ffffffffa046344f>] linux_netmap_poll+0x3f/0x60 [netmap]
Aug 30 14:54:38 m2 kernel: [<ffffffff81213f77>] do_sys_poll+0x327/0x580
Aug 30 14:54:38 m2 kernel: [<ffffffff810c95a8>] ? __enqueue_entity+0x78/0x80
Aug 30 14:54:38 m2 kernel: [<ffffffff810d000c>] ? enqueue_entity+0x26c/0xb60
Aug 30 14:54:38 m2 kernel: [<ffffffff811895ab>] ? free_pcppages_bulk+0x34b/0x390
Aug 30 14:54:38 m2 kernel: [<ffffffff810cefb8>] ? check_preempt_wakeup+0x148/0x250
Aug 30 14:54:38 m2 kernel: [<ffffffff810c1955>] ? check_preempt_curr+0x85/0xa0
Aug 30 14:54:38 m2 kernel: [<ffffffff810c1989>] ? ttwu_do_wakeup+0x19/0xd0
Aug 30 14:54:38 m2 kernel: [<ffffffff81212840>] ? poll_initwait+0x50/0x50
Aug 30 14:54:38 m2 kernel: [<ffffffff811800ae>] ? __find_get_page+0x1e/0xa0
Aug 30 14:54:38 m2 kernel: [<ffffffff81183105>] ? filemap_fault+0x215/0x410
Aug 30 14:54:38 m2 kernel: [<ffffffffa023cbcc>] ? xfs_iunlock+0x11c/0x130 [xfs]
Aug 30 14:54:38 m2 kernel: [<ffffffffa022f64e>] ? xfs_filemap_fault+0x6e/0xf0 [xfs]
Aug 30 14:54:38 m2 kernel: [<ffffffff8118055b>] ? unlock_page+0x2b/0x30
Aug 30 14:54:38 m2 kernel: [<ffffffff811ac6d4>] ? do_read_fault.isra.42+0xe4/0x130
Aug 30 14:54:38 m2 kernel: [<ffffffff811b0dc1>] ? handle_mm_fault+0x6b1/0xfe0
Aug 30 14:54:38 m2 kernel: [<ffffffff810bbf44>] ? __wake_up+0x44/0x50
Aug 30 14:54:38 m2 kernel: [<ffffffff816919b1>] ? __do_page_fault+0x171/0x450
Aug 30 14:54:38 m2 kernel: [<ffffffff810eaa6c>] ? ktime_get_ts64+0x4c/0xf0
Aug 30 14:54:38 m2 kernel: [<ffffffff812142d4>] SyS_poll+0x74/0x110
Aug 30 14:54:38 m2 kernel: [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
Aug 30 14:54:38 m2 kernel: Code: cc 41 39 f1 41 0f 4e f4 49 c1 e0 04 4d 03 43 20 45 89 fd 45 31 e4 41 83 cd 20 45 8b 48 08 41 f6 c1 01 74 70 48 89 4d d0 0f 1f 00 <45> 0f b7 40 0c 89 f1 48 c1 e1 04 4c 01 d1 41 83 e1 02 66 44 89

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 35

Most upvoted comments

Hi @xmhaoyu, I think that the reason the multi-threaded pkt-gen is not receiving anything is trivial (my fault that this escaped me).

In multitheaded mode, pkt-gen only scans one ring per thread, while in single-threaded mode it scans the rings specified in the ‘netmap:…’ string (all the rings, in your case). Now, since you have 8 rings and you are only starting 2 threads, you are only scanning the first two rings. It may well be the case that all traffic is going to the other ones.

If I am right, you should be able to see the traffic with ‘-p 8’. Moreover, with ‘-p 2’ you should be able to see the traffic in tcpdump (since netmap is not intercepting it).

None of this, however, explains why killing pkt-gen hangs the kernel. I am not able to reproduce this behaviour locally. Does the log show the same bug as in the OP?

Is there any change in the multithreaded behaviour with the current master?

@leleobhz Thank you for your explanation. I have tried that patch but multi-thread pkt-gen still hangs cpu. 😦 Anyway I found that my machine spec can deal with 10G x 2 with 2 single-thread process and it is stable even without patching so it is not urgent any more to me.

DD it is a flag in the receive descriptor of the ixgbe (the hardware counterpart of the netmap slot). The datasheet says that when you see this flag set, then there is a new packet in the slot. Apparently, this is not always true.

Yes, that is a rate-limited debug message that I added in the patch.

Even if the datasheet only mentions the DD flag as indicator for new packets, in 5.2.1 the Intel driver started looking for size!=0 instead. Indeed, the card can now set the DD flag even if size is zero, maybe as a side effect of some other programming.

I need to do some more tests, than this can go into master (without the debug message).