frawk: Default field splitting behaving inconsistently
$ cat fields.txt
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyyyyyy 111111
xxxxxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyy 222222
xxxxxxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyyyyyy 3333333
xxxxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyy 4444444
# wrong output for 2nd and 4th lines, and the failure is different
$ frawk '{print NR ":" $3 }' fields.txt
1:111111
2:yyyyyyyyyyyyyyyyyyyyyyyy
3:3333333
4:
# works correctly if those lines are given as the sole input
$ sed -n '2p' fields.txt | frawk '{print NR ":" $3 }'
1:222222
$ sed -n '4p' fields.txt | frawk '{print NR ":" $3 }'
1:4444444
The failure also seems to depend on the length of the input lines or something like that, which is why I have those long x
and y
in the input, couldn’t find a simpler failing case.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24 (15 by maintainers)
Commits related to this issue
- Add long lines, large fields test for whitespace This is relevant to the case mentioned in #60. — committed to ezrosent/frawk by ezrosent 3 years ago
- Fix non-AVX2 whitespace splitting. This ended up being the root cause of #60. Most of this change is refactoring to ensure that all available implementations run during testing, not just AVX2. That'l... — committed to ezrosent/frawk by ezrosent 3 years ago
- Fix non avx2 (#61) * Fix non-AVX2 whitespace splitting. This ended up being the root cause of #60. Most of this change is refactoring to ensure that all available implementations run during test... — committed to ezrosent/frawk by ezrosent 3 years ago
After some more reading, I think this is mostly a matter of passing the right compiler flags. I’ll follow up and try and post binaries with minimal dependencies, but for now I’m fairly confident that building from source as you have been doing so far should work, once the aforementioned bug fixes have been merged.
Thanks again for your patience. I’ll plan to close this issue out in the next few days and file other issues for follow-up items.
Had to take more time away than expected. I started up work again on the
fix_non_avx2
branch this week. I think a number of things have been fixed, but I’m still seeing someavx
instructions getting executed. I’ll spend some more time looking, but I may also merge the branch as is to fix your initial issue, depending on how much progress I make in the short term tracking down the inclusion ofavx
instructions.quick update: I’ve made progress on the
fix_non_avx2
branch in fixing bugs and increasing test coverage, but I may not be able to merge until next week.To your point on not being able to build from git directly, I suspect that is related to that “second problem” I mentioned above. I have some preliminary changes on the same branch that I think will help, but I’ll only be able to verify once I get a QEMU setup without AVX2 (I’m afraid my last computer without AVX2 isn’t functioning).
Okay, I have reproduced the initial issue, there is a bug in the non-AVX2 implementation of whitespace splitting. I’m going to focus on that bug first.
There’s a further bug, though, which is why you got SIGILL. I just re-checked my my code and there isn’t anything obviously wrong with the runtime feature detection (in that, I’d expect that you would have gotten the same, buggy, output when running the pre-built binary). That issue could take longer to track down, and it’s possibly an issue with one of frawk’s dependencies.
I have a theory about what might be going on. I should be able to test it out on my own later, but in the meantime could you confirm if your CPU has AVX2 support (see “CPUs with AVX2” here) ? I’m thinking that the runtime fallback to SSE2 may not be working. That would explain why you don’t get SIGILL when compiling from source; I should be able to confirm later by reproducing the initial bug while compiling without AVX2.
Sure thing. I should be able to get to that in the next day or two. Thanks for bearing with me on this.