duckdb: Out of memory (OOM) during window function on large data

What happens?

I’m getting an OOM error on a 1 billion row table stored in a DuckDB database using the below query on DuckDB 0.8.0. I’m trying to create a new column with values for each corresponding group specified by the window. For this work I only have access to a machine with about 10 GB of usable RAM for DuckDB (16 GB total) given the security requirements of the data.

I do see that there are several GBs of temp files on disk being created while this query runs.

Below is the query that causes the error:

con.sql("""create or replace table new_table as SELECT 
    Pid, 
    Planid, 
    Claimid, 
    'CLAIM' || dense_rank() OVER(PARTITION BY Pid, Planid ORDER BY Claimid) AS Fake_Claimid
FROM big_table;""")

OutOfMemoryException: Out of Memory Error: failed to pin block of size 262KB (10.0GB/10.0GB used)

To Reproduce

The data types are double, double, and varchar for Patid, Planid, and Claimid respectively.

Patid is fixed length of 15 - e.g. 123456789012345 Planid is fixed length of 15 - e.g. 123456789012345 Claimid is fixed length of 10 - e.g. ABC123ABC12

@hawkfish Helped me out by posting a proposed data generator script (below) to give us a start with generating example data.

CREATE TABLE big_table AS SELECT (i % 500)::DOUBLE AS “Patid”, (i % 5000)::DOUBLE AS “Planid”, uuid()::VARCHAR AS “Claimid”, FROM range(1000000000) tbl(i);

I have also attached a Python-based faker script that gets pretty close as well, but it goes to Parquet as an intermediate step instead of direct within DuckDB. This code should produce about 10 million rows every ~90 seconds or so on a decent computer.

pip install -r requirements.txt

# Generate fake data
import pyarrow as pa
import pyarrow.parquet as pq
from faker import Faker
import random

# Create a Faker instance
fake = Faker()

# Define the total number of rows and row size of each chunk
num_rows = 1_000_000_000
chunk_size = 10_000_000

# Calculate the number of chunks
num_chunks = num_rows // chunk_size

# Cardinality is about 2% of all rows, so number of unique Pids and unique Planids is 2% of total rows
num_unique_values = int(num_rows * .02)  

# Generate unique Pid and Planids
start_number = 100000000000000
values = [_ for _ in range(start_number, start_number + num_unique_values)]

# Create a PyArrow Table schema
fields = [
    pa.field('Pid', pa.int64()),
    pa.field('Planid', pa.int64()),
    pa.field('Claimid', pa.string()),
]
schema = pa.schema(fields)

# Generate a chunk of data
for i in range(num_chunks):
    data = {
        'Pid': [random.choice(values) for _ in range(chunk_size)],
        'Planid': [random.choice(values) for _ in range(chunk_size)],
        'Claimid': [''.join(fake.random_letters(length=10)).upper() for _ in range(chunk_size)]
    }

    # Convert the chunk to a PyArrow Table
    table = pa.Table.from_pydict(data)

    # Write the chunk to a Parquet file
    pq.write_table(table, f'faker_data_output\\fake_data-{i}.parquet')
    print(f"Wrote chunk {i}")

requirements.txt

OS:

Windows 10

DuckDB Version:

0.8.0

DuckDB Client:

Python

Full Name:

Patrick Brady

Affiliation:

University of Michigan

Have you tried this on the latest `master` branch?

I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

I agree

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 24 (10 by maintainers)

Commits related to this issue

Issue #7808: Partition Using PartitionedTupleData The old code did not spool. This change makes Window and AsOf able to spool. — committed to hawkfish/duckdb by deleted user a year ago
Issue #7808: Partition Using PartitionedTupleData Add test with limited memory. — committed to hawkfish/duckdb by deleted user a year ago
Issue #7808: Partition Using PartitionedTupleData Compensate for more limited memory in CI systems. — committed to hawkfish/duckdb by deleted user a year ago
Issue #7808: Partition Using PartitionedTupleData Estimate maximum hash grouping size from memory budget. — committed to hawkfish/duckdb by deleted user a year ago
Issue #7808: Partition Using PartitionedTupleData Use verify_external. — committed to hawkfish/duckdb by deleted user a year ago
Issue #7808: Partition Using PartitionedTupleData PR Feedback: Use Repartition instead of manual copy. — committed to hawkfish/duckdb by deleted user a year ago
Issue #7808: Partition Using PartitionedTupleData Replace `CreateShared`. — committed to hawkfish/duckdb by deleted user a year ago
Issue #7808: Partition Using PartitionedTupleData * Require 64 bits for 5GB test * Update coverage for code motion. — committed to hawkfish/duckdb by deleted user a year ago
Merge pull request #8085 from hawkfish/window-memory Issue #7808: Partition Using PartitionedTupleData — committed to duckdb/duckdb by Mytherin a year ago

Most upvoted comments

@lschneiderbauer sorry for the radio silence on this. I got sucked into a couple of other large projects at work. I’ll try to give it another shot soon.

pbrady32 on Aug 24, 2023

Hi @hawkfish , I think I am suffering from the same symptons. Just choosing a different threads parameter does not help me though. Is there a possibility to try your patch (in an R package) without actually building duckdb myself? I am limited here in my access to build tools that would do that for me.

Hmm I was hoping that my PR would have the built artefacts but I don’t see them. I think we are very close to merge, though, so watch this space.

hawkfish on Jun 29, 2023

@hawkfish ColumnDataCollection can go to disk. If you create a ColumnDataCollection using this constructor:

ColumnDataCollection(BufferManager &buffer_manager, vector<LogicalType> types);

All of it goes to disk (including the strings, using the same on-the-fly pointer recomputation as TupleDataCollection, no manual swizzling!).

If you create a ColumnDataCollection like this, though:

ColumnDataCollection(Allocator &allocator, vector<LogicalType> types);

It will be in-memory only, no spilling to disk.

lnkuiper on Jun 6, 2023

I think the problem is that the ColumnDataCollection class doesn’t page to disk (although the sort logic does). I will check with @lnkuiper and maybe we can use a newer data structure TupleDataCollection here.

hawkfish on Jun 5, 2023

Just tried with 100 million rows instead of 1 billion, and also set memory limit to 1GB as a test. Basically trying to do 10% of the data processing on 10% of the hardware. Out of mem occurred right away with 1GB memory limit.

Bumped it up to 5GB memory limit and kept data size at 100 million and that worked. Unfortunately I can’t bump my machine up to 50GB of available RAM for DuckDB to run on the full 1 billion rows.

pbrady32 on Jun 2, 2023