trino: Fetching all hive metadata failed issues

Hi all. I’m trying to find solution about fetching all hive metadata tables. Here is the information.

Environment

trino version - 419 hive version - 2.3.3 or 3.1.3

hive.metastore-timeout=5m

Problem

When I execute below query, it failed due to hive metastore timeout error. but under trino 417 works well within 3~4 minutes.

select * from hive.information_schema.tables

image

Root Cause

https://github.com/trinodb/trino/pull/17127

after 418 version, fetching all hive metadata logic has been changed.

For the < 418 version, it follows below logics

  1. get all schemas
  2. get all tables each schema
  3. concat the results and return it.

For the >= 418 version, it follows below logics

  1. get all tables at once.
  2. concat the results and return it.

This changes may happened too much load for hive metastore so that needed lots of memory compared with before. In my case, # of tables are around 500,000 so it definitely get too much stress for hivemetastore.

Is there any solution about this?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 21 (13 by maintainers)

Most upvoted comments

So this is kind of the bug or issue in HMS. However, since we now expose the issue should we have a kill switch for the https://github.com/trinodb/trino/pull/17127? @findepi @huberty89 ?

Hi @kokosing @huberty89 . I’ve tested #18274 and I’ve confirmed that query was executed well. Thanks for support!

image

@huberty89 would you like to post a PR and create a kill switch?