hydroshare: invalid XML characters present in abstracts
@martinseul @aphelionz @dtarb
When running check_bag
on all resources, the resource below crashed on bag creation, because there are invalid characters in the resource abstract when creating resourcemetadata.xml
. Thus the bag for this resource cannot be created. The only way that this could happen is if the UI and/or REST API allowed the resource abstract to contain invalid characters. This was tested on cuahsi-dev-2.hydroshare.org using an image of the production system.
bag bags/1a7dc5d6b9fa4253bca442341eef500d.zip NOT FOUND
metadata_dirty is True
bag_modified is True
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 354, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 346, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.7/site-packages/django/core/management/base.py", line 394, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python2.7/site-packages/django/core/management/base.py", line 445, in execute
output = self.handle(*args, **options)
File "/hydroshare/hs_core/management/commands/check_bag.py", line 194, in handle
check_bag(r.short_id, options)
File "/hydroshare/hs_core/management/commands/check_bag.py", line 48, in check_bag
create_bag_files(resource)
File "/hydroshare/hs_core/hydroshare/hs_bagit.py", line 92, in create_bag_files
out.write(resource.get_metadata_xml())
File "/hydroshare/hs_core/models.py", line 1949, in get_metadata_xml
include_format_elements=include_format_elements)
File "/hydroshare/hs_core/models.py", line 3935, in get_xml
dcterms_abstract.text = self.description.abstract
File "src/lxml/lxml.etree.pyx", line 1031, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:51697)
File "src/lxml/apihelpers.pxi", line 711, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:23071)
File "src/lxml/apihelpers.pxi", line 699, in lxml.etree._createTextNode (src/lxml/lxml.etree.c:22931)
File "src/lxml/apihelpers.pxi", line 1439, in lxml.etree._utf8 (src/lxml/lxml.etree.c:30219)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
I have modified check_bag
to watch for this error and am running it again on all resources to check the extent of the bug.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 20 (20 by maintainers)
@horsburgh @dtarb @aphelionz @martinseul Try http://cuahsi-dev-2.hydroshare.org now.
This was in the spirit of the discussion on the call yesterday.
\n
or\n\n
as appropriate.\n\n
to paragraph break.\n
to<br>