smart_open: Getting error when trying to write .txt file from local fs to webhdfs.
Getting the following error when trying to write a text file from local to webhdfs:
TypeError: can only concatenate str (not "bytes") to str
Hoping that I am just doing something wrong. My code is as follows:
from smart_open import open
def smart_copy(source_file, sync_file):
with open(source_file, 'rb') as source:
with open(sync_file, 'wb') as sync:
for line in source:
sync.write(line)
smart_copy('./test_file.txt', 'webhdfs://{username}@{host}:{port}/user/XXXX/smart_copy/test_file.txt')
The stack trace for the error is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-b63bd711d13c> in <module>
----> 1 smart_copy('./test_file.txt', 'webhdfs://XXXX@XXX.XXX.XXX.XXX:XXXXX/user/XXXX/smart_copy/test_file.txt')
<ipython-input-38-694f70cf0776> in smart_copy(source_file, sync_file)
3 '''
4 with open(source_file, 'rb') as source:
----> 5 with open(sync_file, 'wb') as sync:
6 for line in source:
7 sync.write(line)
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in open(uri, mode, buffering, encoding, errors, newline, closefd, opener, ignore_ext, transport_params)
346 except KeyError:
347 binary_mode = mode
--> 348 binary, filename = _open_binary_stream(uri, binary_mode, transport_params)
349 if ignore_ext:
350 decompressed = binary
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in _open_binary_stream(uri, mode, transport_params)
560 elif parsed_uri.scheme == "webhdfs":
561 kw = _check_kwargs(smart_open_webhdfs.open, transport_params)
--> 562 return smart_open_webhdfs.open(parsed_uri.uri_path, mode, **kw), filename
563 elif parsed_uri.scheme.startswith('http'):
564 #
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in open(uri, mode, min_part_size)
40 return BufferedInputBase(uri)
41 elif mode == 'wb':
---> 42 return BufferedOutputBase(uri, min_part_size=min_part_size)
43 else:
44 raise NotImplementedError('webhdfs support for mode %r not implemented' % mode)
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in __init__(self, uri_path, min_part_size)
129 params=payload, allow_redirects=False)
130 if not init_response.status_code == httplib.TEMPORARY_REDIRECT:
--> 131 raise WebHdfsException(str(init_response.status_code) + "\n" + init_response.content)
132 uri = init_response.headers['location']
133 response = requests.put(uri, data="", headers={'content-type': 'application/octet-stream'})
TypeError: can only concatenate str (not "bytes") to str
test_file.txt is just an ascii text file. Using python 3.7.
Any guidance that you could provide would be awesome. End use case is copying files from s3 to webHDFS and back again.
Thanks!!!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (5 by maintainers)
Commits related to this issue
- webhdfs improvements (#338) — committed to mrk-its/smart_open by mrk-its 5 years ago
- webhdfs improvements (#338) — committed to mrk-its/smart_open by mrk-its 5 years ago
- webhdfs improvements (#338) — committed to mrk-its/smart_open by mrk-its 5 years ago
- webhdfs improvements (#338) — committed to mrk-its/smart_open by mrk-its 5 years ago
- Various webhdfs improvements (#383) * webhdfs improvements (#338) * address reviewer comments * address reviewer comments, continued — committed to piskvorky/smart_open by mrk-its 5 years ago
That’s a good question. I’m guessing the tests pass because it still raises an exception, just not the informative one that was intended. I will have to look first to verify and then I’ll submit a PR. Thanks for looking at this!
Looking at this further, it looks like
init_response.content
on line 131 of webhdfs.py is returning a byte string instead of a string which is causing the error. I changed this to ‘init_response.text’ and now I am seeing the intended error message. Happy to open a PR to address this issue. Looks like it may occur again on 135 of the same module.