impyla: HS2Error when running as_pandas

I’m running a smallish query (the result is 8MB of data), and getting an HS2Error when I try to read the data. as_pandas is working on smaller queries. Any idea what could be going on here?

Here’s what I’m running:

import impala.dbapi
from impala.util import as_pandas
c = impala.dbapi.connect(port=21050).cursor() # works fine
c.execute("[my query]") # works fine
df = as_pandas(c) # oh no!

and the error:

---------------------------------------------------------------------------
HS2Error                                  Traceback (most recent call last)
<ipython-input-5-bee92ca13acd> in <module>()
----> 1 df = as_pandas(c)

/Users/jocelyn/anaconda/lib/python2.7/site-packages/impyla-0.9.0_dev-py2.7.egg/impala/util.pyc
in as_pandas(cursor)
     21     def as_pandas(cursor):
     22         names = [metadata[0] for metadata in cursor.description]
---> 23         return pd.DataFrame([dict(zip(names, row)) for row in
cursor], columns=names)
     24 except ImportError:
     25     print "Failed to import pandas"

/Users/jocelyn/anaconda/lib/python2.7/site-packages/impyla-0.9.0_dev-py2.7.egg/impala/dbapi.pyc
in next(self)
    246             rows = impala.rpc.fetch_results(self.service,
    247                     self._last_operation_handle, self.description,
--> 248                     self.buffersize)
    249             self._buffer.extend(rows)
    250             if len(self._buffer) == 0:

/Users/jocelyn/anaconda/lib/python2.7/site-packages/impyla-0.9.0_dev-py2.7.egg/impala/rpc.pyc
in wrapper(*args, **kwargs)
    116                 if not transport.isOpen():
    117                     transport.open()
--> 118                 return func(*args, **kwargs)
    119             except socket.error as e:
    120                 pass

/Users/jocelyn/anaconda/lib/python2.7/site-packages/impyla-0.9.0_dev-py2.7.egg/impala/rpc.pyc
in fetch_results(service, operation_handle, schema, max_rows,
orientation)
    235                            maxRows=max_rows)
    236     resp = service.FetchResults(req)
--> 237     err_if_rpc_not_ok(resp)
    238
    239     rows = []

/Users/jocelyn/anaconda/lib/python2.7/site-packages/impyla-0.9.0_dev-py2.7.egg/impala/error.pyc
in err_if_rpc_not_ok(resp)
     55     if (resp.status.statusCode !=
TStatusCode._NAMES_TO_VALUES['SUCCESS_STATUS'] and
     56             resp.status.statusCode !=
TStatusCode._NAMES_TO_VALUES['SUCCESS_WITH_INFO_STATUS']):
---> 57         raise HS2Error(resp.status.errorMessage)

HS2Error: Invalid session id

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Comments: 15 (10 by maintainers)

Most upvoted comments

Sounds like a timeout. You may want to increase your Connection’s timeout.

After invalidating a table it can take quite a lot of hive metastore calls before it becomes operational again.