astropy: Possible Bug? `parse_single_table` lose dtype info for `char(*)` type.
Description
With a VOTable like:
<?xml version="1.0" encoding="utf-8"?>
<!-- Produced with astropy.io.votable version 5.1
http://www.astropy.org/ -->
<VOTABLE version="1.4" xmlns="http://www.ivoa.net/xml/VOTable/v1.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/VOTable-1.4.xsd">
<RESOURCE type="results">
...
<FIELD ID="original_ext_source_id" arraysize="80" datatype="char" name="original_ext_source_id" ucd="meta.id.cross"/>
...
<DATA>
<TABLEDATA>
...
The field original_ext_source_id
is of type char[80]
running:
>>> from astropy.io.votable import parse_single_table
>>> table = parse_single_table(<filename>)
>>> for row in table.array:
print(type(row))
for cell in row:
print(type(cell))
print(cell.dtype)
<class 'numpy.ma.core.mvoid'>
<class 'numpy.int64'>
int64
<class 'str'>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[24], line 5
3 for cell in row:
4 print(type(cell))
----> 5 print(cell.dtype)
AttributeError: 'str' object has no attribute 'dtype'
The cell object is a builtin python string
and the dtype
information got lost.
Expected behavior
I would expect the same behavior as with unicodeChar
type:
with the following VOTable file:
<?xml version="1.0" encoding="utf-8"?>
<!-- Produced with astropy.io.votable version 5.1
http://www.astropy.org/ -->
<VOTABLE version="1.4" xmlns="http://www.ivoa.net/xml/VOTable/v1.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/VOTable-1.4.xsd">
<RESOURCE type="results">
...
<FIELD ID="original_ext_source_id" arraysize="80" datatype="unicodeChar" name="original_ext_source_id" ucd="meta.id.cross"/>
...
<DATA>
<TABLEDATA>
...
The field original_ext_source_id
is now of type unicodeChar[80]
running:
>>> from astropy.io.votable import parse_single_table
>>> table = parse_single_table(<filename>)
>>> for row in table.array:
print(type(row))
for cell in row:
print(type(cell))
print(cell.dtype)
<class 'numpy.ma.core.mvoid'>
<class 'numpy.int64'>
int64
<class 'numpy.str_'>
<U80
In this case the cell object is a numpy string
and the dtype
information is available.
How to Reproduce
- Get package from ‘…’
- Then run ‘…’
- An error occurs.
>>> from astropy.io.votable import parse_single_table
>>> table = parse_single_table(<filename>)
>>> for row in table.array:
print(type(row))
for cell in row:
print(type(cell))
print(cell.dtype)
<class 'numpy.ma.core.mvoid'>
<class 'numpy.int64'>
int64
<class 'str'>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[24], line 5
3 for cell in row:
4 print(type(cell))
----> 5 print(cell.dtype)
AttributeError: 'str' object has no attribute 'dtype'
Versions
Linux-4.19.0-9-amd64-x86_64-with-glibc2.28
Python 3.9.13+ (heads/3.9:e8f2fe355b, Jun 13 2022, 10:51:14)
[GCC 8.3.0]
astropy 5.1
Numpy 1.22.4
pyerfa 2.0.0.1
no scipy and no matplotlib
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 18 (8 by maintainers)
Dear @pllim not really. I could change the title though, now that we better identified the issue.
Sure! Sorry 😄 @tomdonaldson please see my comment above.
@pllim thank you for your fast answer. Indeed I was not sure if calling it a BUG was accurate. I would say it is more
inconsistent
than a real BUG.The use case is the following: when parsing a VOTable you have access to extra metadata: type and length of the array. By using
numpy
types it is possible to store and keep this metadata in thedtype
attribute, and use a clearer and richer denomination (U20
,S20
,>F16
…) than the built-in types from python.This is extremely useful and allow cleaner exports or interfaces to other type of services, like database, spark…
The last point is that, it used to work, so at some point in past
chars
should have been interpreted asnumpy._str
or similar.Therefore my suggestion to cope to our usecase would be to always interpret values in numpy types and not in built-in types. Especially since
numpy
is a core dependence ofastropy
, it guess it would be legit to systematically use numpy objects.Again that is a suggestion that would solve our problem, as a astropy non-expert, I am not sure about the implications.