seastar: Native stack bug when the num_cores > num_queues
I just got started using Seastar so it’s entirely possible i’m configuring something incorrectly. However, I spent the last several days tracking down a nasty little issue I was having running the native stack on an EC2 instance. I noticed that I could complete a TCP connection a small percentage of the time, but it seemed completely random. Eventually I tracked it down to these offending lines of code:
In interface::dispatch_packet (net.cc line 328):
auto fw = _dev->forward_dst(engine().cpu_id(), [&p, &l3, this] () {
and
tcp::connect (tcp.hh line 844):
src_port = _port_dist(_e);
id = connid{src_ip, dst_ip, src_port, dst_port};
} while (_inet._inet.netif()->hw_queues_count() > 1 &&
(_inet._inet.netif()->hash2cpu(id.hash(_inet._inet.netif()->rss_key())) != engine().cpu_id()
|| _tcbs.find(id) != _tcbs.end()));
As you can see here, CPU selection is being done slightly differently upon opening the connection, vs receiving packets. Inside of hash2cpu
the CPU is selected like this:
return forward_dst(hash2qid(hash), [hash] { return hash; });
It passes in hash2qid
rather than engine().cpu_id()
, like tcp::connect
does. What this ends up meaning is that my connection only works if, by chance, these two values happen to match. Which ends up being a small percentage of the time on the instance i’m using.
I know that this is the issue, because if I change the engine().cpu_id()
call in dispatch_packet
to hash2qid
, everything works reliably again. However, I don’t think that’s going to spread the load over all the cores in the way that I want.
Is this an issue of me misunderstanding some aspect of configuration, or is this a real bug?
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 1
- Comments: 33 (17 by maintainers)
Alright guys…after fumbling around in the dark for a while, I have managed to get it working with the latest DPDK on my EC2 instance. Here’s what I had to change:
Line 116 of dpdk.cc I changed
default_ring_size
to 1024. This was necessary because in the ENA driver if you pass a ring_size of 512, it decides to change that to 8192 for you, which creates problems because seastar has not allocated that much space.Even with the new dpdk, my original issue doesn’t seem to be fixed, so I had to reinstate the change of
interface::dispatch_packet
to use_dev->hash2qid(hash)
rather thanengine().cpu_id()
. However, I do notice that with the new DPDK it seems to now be taking advantage of the “Low Latency Queue” option for the ENA NIC, which is great.Unfortunately dpdk can’t fix it. It’s at the hardware/VM level on EC2. I tried going in and modifying the ENA DPDK driver, and that wasn’t sufficient, because the NIC simply ignores commands related to RSS. It’s not a huge deal to me, i’ve gone ahead and changed the key in my fork. The AWS guys say that they’re going to enable configurable RSS “soon”, but they’ve been saying that for a little while now it seems.
Thanks, I appreciate the help.
Dmesg output when running seastar:
Testpmd does seem to work, which I guess is a good sign that it is at least possible in principle for VFIO to work on a non-metal EC2 instance.
Also, I traced exactly where the call is failing. It’s inside of
rte_eth_dev_init
, which in turn ultimately calls the ENA (amzn NIC) driver’s specific startup functionena_queue_start_all
, which callsena_queue_start
, which fails on line 1185, callingena_populate_rx_queue
with (in my case)bufs_num
equal to8191
.ena_populate_rx_queue
fails and returns zero becauserte_mempool_get_bulk
on line 1382 returns-1
.That’s about as far as i’ve gotten trying to debug it myself. It’s clearly some problem allocating memory, but I don’t yet understand why it’s failing there.
EDIT: For comparison, when I run
testpmd
this is all I see in dmesg: