smart_open: Getting error when trying to write .txt file from local fs to webhdfs.

Getting the following error when trying to write a text file from local to webhdfs: TypeError: can only concatenate str (not "bytes") to str

Hoping that I am just doing something wrong. My code is as follows:

from smart_open import open

def smart_copy(source_file, sync_file):
    with open(source_file, 'rb') as source:
        with open(sync_file, 'wb') as sync:
            for line in source:
                sync.write(line)

smart_copy('./test_file.txt', 'webhdfs://{username}@{host}:{port}/user/XXXX/smart_copy/test_file.txt')

The stack trace for the error is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-b63bd711d13c> in <module>
----> 1 smart_copy('./test_file.txt', 'webhdfs://XXXX@XXX.XXX.XXX.XXX:XXXXX/user/XXXX/smart_copy/test_file.txt')

<ipython-input-38-694f70cf0776> in smart_copy(source_file, sync_file)
      3     '''
      4     with open(source_file, 'rb') as source:
----> 5         with open(sync_file, 'wb') as sync:
      6             for line in source:
      7                 sync.write(line)

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in open(uri, mode, buffering, encoding, errors, newline, closefd, opener, ignore_ext, transport_params)
    346     except KeyError:
    347         binary_mode = mode
--> 348     binary, filename = _open_binary_stream(uri, binary_mode, transport_params)
    349     if ignore_ext:
    350         decompressed = binary

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in _open_binary_stream(uri, mode, transport_params)
    560         elif parsed_uri.scheme == "webhdfs":
    561             kw = _check_kwargs(smart_open_webhdfs.open, transport_params)
--> 562             return smart_open_webhdfs.open(parsed_uri.uri_path, mode, **kw), filename
    563         elif parsed_uri.scheme.startswith('http'):
    564             #

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in open(uri, mode, min_part_size)
     40         return BufferedInputBase(uri)
     41     elif mode == 'wb':
---> 42         return BufferedOutputBase(uri, min_part_size=min_part_size)
     43     else:
     44         raise NotImplementedError('webhdfs support for mode %r not implemented' % mode)

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in __init__(self, uri_path, min_part_size)
    129                                      params=payload, allow_redirects=False)
    130         if not init_response.status_code == httplib.TEMPORARY_REDIRECT:
--> 131             raise WebHdfsException(str(init_response.status_code) + "\n" + init_response.content)
    132         uri = init_response.headers['location']
    133         response = requests.put(uri, data="", headers={'content-type': 'application/octet-stream'})

TypeError: can only concatenate str (not "bytes") to str

test_file.txt is just an ascii text file. Using python 3.7.

Any guidance that you could provide would be awesome. End use case is copying files from s3 to webHDFS and back again.

Thanks!!!

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (5 by maintainers)

Commits related to this issue

Most upvoted comments

That’s a good question. I’m guessing the tests pass because it still raises an exception, just not the informative one that was intended. I will have to look first to verify and then I’ll submit a PR. Thanks for looking at this!

Looking at this further, it looks like init_response.content on line 131 of webhdfs.py is returning a byte string instead of a string which is causing the error. I changed this to ‘init_response.text’ and now I am seeing the intended error message. Happy to open a PR to address this issue. Looks like it may occur again on 135 of the same module.