seastar: Native stack bug when the num_cores > num_queues

I just got started using Seastar so it’s entirely possible i’m configuring something incorrectly. However, I spent the last several days tracking down a nasty little issue I was having running the native stack on an EC2 instance. I noticed that I could complete a TCP connection a small percentage of the time, but it seemed completely random. Eventually I tracked it down to these offending lines of code:

In interface::dispatch_packet (net.cc line 328): auto fw = _dev->forward_dst(engine().cpu_id(), [&p, &l3, this] () {

and

tcp::connect (tcp.hh line 844):

        src_port = _port_dist(_e);
        id = connid{src_ip, dst_ip, src_port, dst_port};
    } while (_inet._inet.netif()->hw_queues_count() > 1 &&
             (_inet._inet.netif()->hash2cpu(id.hash(_inet._inet.netif()->rss_key())) != engine().cpu_id()
              || _tcbs.find(id) != _tcbs.end()));

As you can see here, CPU selection is being done slightly differently upon opening the connection, vs receiving packets. Inside of hash2cpu the CPU is selected like this:

return forward_dst(hash2qid(hash), [hash] { return hash; });

It passes in hash2qid rather than engine().cpu_id(), like tcp::connect does. What this ends up meaning is that my connection only works if, by chance, these two values happen to match. Which ends up being a small percentage of the time on the instance i’m using.

I know that this is the issue, because if I change the engine().cpu_id() call in dispatch_packet to hash2qid, everything works reliably again. However, I don’t think that’s going to spread the load over all the cores in the way that I want.

Is this an issue of me misunderstanding some aspect of configuration, or is this a real bug?

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 1
Comments: 33 (17 by maintainers)

Most upvoted comments

Alright guys…after fumbling around in the dark for a while, I have managed to get it working with the latest DPDK on my EC2 instance. Here’s what I had to change:

Line 116 of dpdk.cc I changed default_ring_size to 1024. This was necessary because in the ENA driver if you pass a ring_size of 512, it decides to change that to 8192 for you, which creates problems because seastar has not allocated that much space.
Even with the new dpdk, my original issue doesn’t seem to be fixed, so I had to reinstate the change of interface::dispatch_packet to use _dev->hash2qid(hash) rather than engine().cpu_id(). However, I do notice that with the new DPDK it seems to now be taking advantage of the “Low Latency Queue” option for the ENA NIC, which is great.

ashaffer on Jun 23, 2019

Unfortunately dpdk can’t fix it. It’s at the hardware/VM level on EC2. I tried going in and modifying the ENA DPDK driver, and that wasn’t sufficient, because the NIC simply ignores commands related to RSS. It’s not a huge deal to me, i’ve gone ahead and changed the key in my fork. The AWS guys say that they’re going to enable configurable RSS “soon”, but they’ve been saying that for a little while now it seems.

ashaffer on Jul 2, 2019

Thanks, I appreciate the help.

Dmesg output when running seastar:

[54249.156964] vfio-pci 0000:00:06.0: enabling device (0400 -> 0402)
[54249.382714] vfio-pci 0000:00:06.0: vfio-noiommu device opened by user (seastar_test:11151)
[54250.161493] show_signal_msg: 10 callbacks suppressed
[54250.161496] reactor-25[11178]: segfault at fffffffffffffff0 ip 000055555561f10d sp 00007fffe6bc1310 error 5
[54250.161498] traps: reactor-35[11188] general protection ip:55555561f10d sp:7fffe1bb7310 error:0
[54250.161502] reactor-5[11158]: segfault at 5fd555573df5 ip 000055555561f10d sp 00007ffff0bd5310 error 4
[54250.161504] reactor-3[11156]: segfault at 10 ip 000055555561f10d sp 00007ffff1bd7310 error 4
[54250.161506] traps: reactor-30[11183] general protection ip:55555562c62c sp:7fffe43bbf60 error:0
[54250.161791] reactor-33[11186]: segfault at 18000003788 ip 000055555561f10d sp 00007fffe2bb9310 error 4 in seastar_test[555555554000+66b000]

Testpmd does seem to work, which I guess is a good sign that it is at least possible in principle for VFIO to work on a non-metal EC2 instance.

Also, I traced exactly where the call is failing. It’s inside of rte_eth_dev_init, which in turn ultimately calls the ENA (amzn NIC) driver’s specific startup function ena_queue_start_all, which calls ena_queue_start, which fails on line 1185, calling ena_populate_rx_queue with (in my case) bufs_num equal to 8191. ena_populate_rx_queue fails and returns zero because rte_mempool_get_bulk on line 1382 returns -1.

That’s about as far as i’ve gotten trying to debug it myself. It’s clearly some problem allocating memory, but I don’t yet understand why it’s failing there.

EDIT: For comparison, when I run testpmd this is all I see in dmesg:

[54308.125539] vfio-pci 0000:00:06.0: enabling device (0400 -> 0402)
[54308.354966] vfio-pci 0000:00:06.0: vfio-noiommu device opened by user (testpmd:11277)

ashaffer on Jun 22, 2019