duckdb: Out of memory (OOM) during window function on large data
What happens?
I’m getting an OOM error on a 1 billion row table stored in a DuckDB database using the below query on DuckDB 0.8.0. I’m trying to create a new column with values for each corresponding group specified by the window. For this work I only have access to a machine with about 10 GB of usable RAM for DuckDB (16 GB total) given the security requirements of the data.
I do see that there are several GBs of temp files on disk being created while this query runs.
Below is the query that causes the error:
con.sql("""create or replace table new_table as SELECT
Pid,
Planid,
Claimid,
'CLAIM' || dense_rank() OVER(PARTITION BY Pid, Planid ORDER BY Claimid) AS Fake_Claimid
FROM big_table;""")
OutOfMemoryException: Out of Memory Error: failed to pin block of size 262KB (10.0GB/10.0GB used)
To Reproduce
The data types are double, double, and varchar for Patid, Planid, and Claimid respectively.
Patid is fixed length of 15 - e.g. 123456789012345 Planid is fixed length of 15 - e.g. 123456789012345 Claimid is fixed length of 10 - e.g. ABC123ABC12
@hawkfish Helped me out by posting a proposed data generator script (below) to give us a start with generating example data.
CREATE TABLE big_table AS SELECT (i % 500)::DOUBLE AS “Patid”, (i % 5000)::DOUBLE AS “Planid”, uuid()::VARCHAR AS “Claimid”, FROM range(1000000000) tbl(i);
I have also attached a Python-based faker script that gets pretty close as well, but it goes to Parquet as an intermediate step instead of direct within DuckDB. This code should produce about 10 million rows every ~90 seconds or so on a decent computer.
pip install -r requirements.txt
# Generate fake data
import pyarrow as pa
import pyarrow.parquet as pq
from faker import Faker
import random
# Create a Faker instance
fake = Faker()
# Define the total number of rows and row size of each chunk
num_rows = 1_000_000_000
chunk_size = 10_000_000
# Calculate the number of chunks
num_chunks = num_rows // chunk_size
# Cardinality is about 2% of all rows, so number of unique Pids and unique Planids is 2% of total rows
num_unique_values = int(num_rows * .02)
# Generate unique Pid and Planids
start_number = 100000000000000
values = [_ for _ in range(start_number, start_number + num_unique_values)]
# Create a PyArrow Table schema
fields = [
pa.field('Pid', pa.int64()),
pa.field('Planid', pa.int64()),
pa.field('Claimid', pa.string()),
]
schema = pa.schema(fields)
# Generate a chunk of data
for i in range(num_chunks):
data = {
'Pid': [random.choice(values) for _ in range(chunk_size)],
'Planid': [random.choice(values) for _ in range(chunk_size)],
'Claimid': [''.join(fake.random_letters(length=10)).upper() for _ in range(chunk_size)]
}
# Convert the chunk to a PyArrow Table
table = pa.Table.from_pydict(data)
# Write the chunk to a Parquet file
pq.write_table(table, f'faker_data_output\\fake_data-{i}.parquet')
print(f"Wrote chunk {i}")
OS:
Windows 10
DuckDB Version:
0.8.0
DuckDB Client:
Python
Full Name:
Patrick Brady
Affiliation:
University of Michigan
Have you tried this on the latest master branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 24 (10 by maintainers)
Commits related to this issue
- Issue #7808: Partition Using PartitionedTupleData The old code did not spool. This change makes Window and AsOf able to spool. — committed to hawkfish/duckdb by deleted user a year ago
- Issue #7808: Partition Using PartitionedTupleData Add test with limited memory. — committed to hawkfish/duckdb by deleted user a year ago
- Issue #7808: Partition Using PartitionedTupleData Compensate for more limited memory in CI systems. — committed to hawkfish/duckdb by deleted user a year ago
- Issue #7808: Partition Using PartitionedTupleData Estimate maximum hash grouping size from memory budget. — committed to hawkfish/duckdb by deleted user a year ago
- Issue #7808: Partition Using PartitionedTupleData Use verify_external. — committed to hawkfish/duckdb by deleted user a year ago
- Issue #7808: Partition Using PartitionedTupleData PR Feedback: Use Repartition instead of manual copy. — committed to hawkfish/duckdb by deleted user a year ago
- Issue #7808: Partition Using PartitionedTupleData Replace `CreateShared`. — committed to hawkfish/duckdb by deleted user a year ago
- Issue #7808: Partition Using PartitionedTupleData * Require 64 bits for 5GB test * Update coverage for code motion. — committed to hawkfish/duckdb by deleted user a year ago
- Merge pull request #8085 from hawkfish/window-memory Issue #7808: Partition Using PartitionedTupleData — committed to duckdb/duckdb by Mytherin a year ago
@lschneiderbauer sorry for the radio silence on this. I got sucked into a couple of other large projects at work. I’ll try to give it another shot soon.
Hmm I was hoping that my PR would have the built artefacts but I don’t see them. I think we are very close to merge, though, so watch this space.
@hawkfish
ColumnDataCollectioncan go to disk. If you create aColumnDataCollectionusing this constructor:All of it goes to disk (including the strings, using the same on-the-fly pointer recomputation as
TupleDataCollection, no manual swizzling!).If you create a
ColumnDataCollectionlike this, though:It will be in-memory only, no spilling to disk.
I think the problem is that the
ColumnDataCollectionclass doesn’t page to disk (although the sort logic does). I will check with @lnkuiper and maybe we can use a newer data structureTupleDataCollectionhere.Just tried with 100 million rows instead of 1 billion, and also set memory limit to 1GB as a test. Basically trying to do 10% of the data processing on 10% of the hardware. Out of mem occurred right away with 1GB memory limit.
Bumped it up to 5GB memory limit and kept data size at 100 million and that worked. Unfortunately I can’t bump my machine up to 50GB of available RAM for DuckDB to run on the full 1 billion rows.