duckdb: SET memory_limit command is not working as expected

What happens?

For our usecase we are trying out duckdb. We have the requirement to run a few queries on parquet files and write the result to a new parquet files. These queries are simple sql queries. We have memory constraint. So I want to limit it and also we need performance too. We use java as our language. Before using the connection object I set the below parameters:

Connection conn = DriverManager.getConnection("jdbc:duckdb:");
        try (Statement stmt = conn.createStatement()) {
            stmt.execute("SET memory_limit='6GB'");
            stmt.execute("SET threads TO 10");
        }
        return conn;

This connection we are using a single thread. But this memory limit doesn’t work. We saw it consuming almost 30 GB RAM for some of the instances. There are millions of rows which we are trying to process. But I expect it to may be work slowly but it shouldn’t cross the max_memory_allowed.

Is it because SET command doesn’t work and I have to use PRAGMA command ? Or is it because of the number of threads set to 10 and duckdb assumes that memory as 10 * 6 = 60 GB ? It should’t be.

To Reproduce

I could provide the set of files I am using. Those are some test data. But basically I was trying to merge 2 sets of parquet files using queries. 1 set contains around 77 number of 500 MB files and another set contains may be 10 number of 500 MB files or lesser.

OS:

linux

DuckDB Version:

duckdb_jdbc:0.4.0

DuckDB Client:

Jdbc

Full Name:

Kamaljit Pati

Affiliation:

Fivetran data pipelining company

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (7 by maintainers)

Most upvoted comments

It’s hard to verify why this is happening without knowing your exact workload, but this might be related to the issues we found in glibc’s standard allocator. Are you running on Linux?

We are planning to include jemalloc with the next release. It is also possible to force DuckDB to use jemalloc using e.g. LD_PRELOAD which might be useful as a temporary work-around.

@kamaljit-1991 we’ve actually released 0.5.0 since you created this issue, so can you try with that version instead?

We understand not deploying a development version to production, but could you please do some tests with our latest build as described above? This issue may already be fixed with some of the work done after 0.4.0 was released. Our next release is planned for September 5th, so you don’t have too long to wait!