astropy: BUG: ascii.read converts - to float -1.0
This was a tricky one and caused a bug being hidden in astroquery for a while (as it made parsing a mock table possible, while the one from the server was failing):
>>> from astropy.io import ascii
>>> ascii.read('1 | 2 | 3 \n 4 | 5 | 6 \n')
<Table length=2>
col1 col2 col3
int64 int64 int64
----- ----- -----
1 2 3
4 5 6
>>> ascii.read('1 | 2 | 3 \n 4 | - | 6 \n')
<Table length=2>
col1 col2 col3
int64 float64 int64
----- ------- -----
1 2.0 3
4 -1.0 6
Changing the hyphen to a dash makes it work as expected, to return a string column.
>>> ascii.read('1 | 2 | 3 \n 4 | – | 6 \n')
<Table length=2>
col1 col2 col3
int64 str1 int64
----- ---- -----
1 2 3
4 – 6
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 30 (22 by maintainers)
@pllim my pleasure,yeah it was hard to dig-it-out with out the docs
Thanks, I think you deserve at least as much credit for tracking this down; but I am probably a bit more familiar with the C part as I contributed some of that code myself! 😃
@bsipocz - very unscientific profiling on my new MBP with SSD, for a space-delimited ASCII table with 25 columns by 10000 rows, which is about 1 Mb in size:
One can roughly scale from there, but I agree you are most likely to be limited by server speeds in getting the text file into memory/disk before trying to parse into a table. So it may be a consideration to go with the pure Python parser.
Having said that, I cannot thank @dhomeier and @Harshil-C enough for trying to maintain the fast C parser!! I don’t understand that code for the most part, so having people ready to step in and investigate bugs is hugely important.
Whoever ends up taking this up as a PR, there is an option to give the other person credit via GitHub’s Co-Author feature. 😄 Thanks!
https://github.blog/2018-01-29-commit-together-with-co-authors/
iam not confident enough to write c yet,in the future i hope for it (tracking it was pretty fun:) )i learned many things
i think you own this pr brother, go ahead (i didnt make much changes as iam new to c,i just touched cython and c for this issue )
@Harshil-C you are most welcome to take care of the fixes - attaching a diff for the changes I have tested so far. Note that I had to change it a bit to accommodate floats without leading digits like
-.25
. lone-signs.patch.gzThe crucial point is that no
CONVERSION_ERROR
is raised in these cases, although neither aNaN
norinf
are identified. So I think at https://github.com/astropy/astropy/blob/1cb6c81e874a785e2d7a3075a75a372188b0d844/astropy/io/ascii/src/tokenizer.c#L719is missing. But further, the
use_fast_converter=True
never jumps to theconversion_error
as itxstrtod
quietly skips through all the trailing spaces. Analogously tostrtod
- from the manpage:I’d suggest to add this check after the sign has been parsed:
Oh, I can get it with option+‘-’