superset: jdbc+hive in sqlalchemy URI is not working

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven’t found one similar

Superset version

0.18.4

Expected results

using jdbc+hive:// in sqlalchemy URI will work

Actual results

superset web server raise an exception:

sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:jdbc.hive

Steps to reproduce

  • pip install -U pyhive
  • create a new database in superset using jdbc+hive:// prefix, and then press the test button.

More

I’ve read https://github.com/airbnb/superset/issues/241 to learn that it’s a known issue, and @shkr had posted a databricks tutorial that will guide new comers to setup this jdbc+hive connector, but the link within https://github.com/airbnb/superset/issues/241#issuecomment-234010902 is already gone, and I haven’t been able to found any related information on https://docs.databricks.com/user-guide/getting-started.html

that’s why I’m re-raising this issue, focusing on how to get jdbc+hive:// to work, and hopefully help make the docs more complete and friendly.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

SQLAlchemy URI: hive://localhost:10000

I solved this problem by follow reference. Hoping can help 😃

https://pypi.python.org/pypi/PyHive

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('hive://localhost:10000/default')
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
print select([func.count('*')], from_obj=logs).scalar()

Requirements

Install using

pip install pyhive[hive] for the Hive interface and pip install pyhive[presto] for the Presto interface.

pyhive==0.5.0 maybe also raise some error

pip install pythrifthiveapi

comment out getProgressUpdate in site-packages/pyhive/hive.py

@JazzChen , I followed the solution you provided, still have the same issue with Can’t load plugin: sqlalchemy.dialects:hive.jdbc

I am using superset version 0.20.5

visualization works.

qq20170615-095758 qq20170615-095844


Here is my table column setup.

qq20170615-094950

The type of a time column should be TIMESTAMP, and selected the Is temporal

impyla support TIMESTAMP type https://github.com/cloudera/impyla/blob/master/impala/sqlalchemy.py#L99

_impala_type_to_sqlalchemy_type = {
    'BOOLEAN': BOOLEAN,
    'TINYINT': TINYINT,
    'SMALLINT': SMALLINT,
    'INT': INT,
    'BIGINT': BIGINT,
    'TIMESTAMP': TIMESTAMP,
    'FLOAT': FLOAT,
    'DOUBLE': DOUBLE,
    'STRING': STRING,
    'DECIMAL': DECIMAL}

I used impala://127.0.0.1:10000/default to connect a spark thrift server, and it works.

yes, I have installed pyhive v0.3.0, and hive:// seems to be working correctly, but there’s some other issue relating pyhive and its dependencies that I can’t get around with, so I’m trying sparksql on superset. @xrmx