cayley: bolt: Performance degradation while loading Freebase dump

Description I’m trying to load full Freebase dump into cayley, but load performance is degrading, and it looks like the load process will never finish.

At first I’ve run a series of experiments to determine the fastest way to load the data. Here are the results (cells contain minutes needed to load the number of quads from the column header). As we can see Bolt + pq + nosync + 2.5M had the best performance (not sure though that nosync contributed at all)

| 5M | 10M | 15M | 20M | 25M – | – | – | – | – | – Bolt + nq + 10k | 2 | 7 | 16 | 20 | 29 Bolt + nq + 50k | 2 | 5 | 13 | 15 | 23 Bolt + nq + 100k | 2 | 5 | 12 | 14 | 20 Bolt + nq + 200k | 2 | 5 | 10 | 12 | 17 Bolt + nq + 500k | 2 | 5 | 8 | 10 | 13 Bolt + nq + 500k | 2 | 5 | 8 | 9 | 12 Bolt + nq + 1.25M | 2 | 5 | 7 | 9 | 12 Bolt + nq + 2.5M | 2 | 5 | 7 | 9 | 12 Bolt + nq + 5M | 3 | 5 | 8 | 9 | 12 Bolt + nq + 1M + nosync | 2 | 5 | 7 | 9 | 12 Bolt + nq + 2.5M + nosync | 2 | 5 | 7 | 9 | 11 Bolt + pq.gz + 1.25M | 2 | 4 | 7 | 8 | 10 Bolt + pq + nosync + 1.25M | 2 | 4 | 6 | 7 | 10 Bolt + pq + nosync + 2.5M | 2 | 4 | 6 | 7 | 9 Leveldb + nq + buffer 20 + 10k | 4 | 12 | 26 |   |   Leveldb + pq.gz + buffer 20M + 1.25M | 1 | 8 | 16 | 18 | 27 Leveldb + nq + buffer 20M + 5M | 2 | 18 |   |   |   Leveldb + pq.gz + buffer 200M + 1.25M | 1 | 8 | 16 | 18 | 27 Leveldb + pq.gz + buffer 1G + 1.25M | 1 | 8 | 16 | 18 | 27 Leveldb + pq.gz + buffer 1G + 500k | 1 | 5 | 11 | 13 | 19 Leveldb + pq.gz + buffer 4G + 500k | 1 | 5 | 11 | 13 | 19 Leveldb + pq.gz + buffer 4G + 1.25M | 1 | 8 | 16 | 18 | 27 Leveldb + pq.gz + buffer 4G + cache 200M + 1.25M | 1 | 8 | 16 | 18 | 28

Steps to reproduce the issue:

  1. cayley load -c bolt.yml --verbose=3 -i freebase.pq
  2. bolt.yml:
store:
  backend: bolt
  address: bolt
load:
  batch: 2500000

Received results: At first, everything was ok, but then load started to slow down. The graph can demonstrate it better: image

Then I’ve decided to make smaller batches and add nosync:

  1. cayley load -c bolt.yml --verbose=3 -i freebase.pq
  2. bolt.yml:
store:
  backend: bolt
  address: bolt
  options:
    nosync: true
load:
  batch: 500000

Things became a bit better, but not for long: image

htop says the process consumes more and more memory (58gb after 2 days)

Expected results: Freebase loaded in less than infinity

Output of cayley version or commit hash:

Cayley version: 0.7.5
Git commit hash: cf576babb7db

Environment details: CPU: 8 x Intel® Xeon® CPU @ 2.30GHz Memory: 29Gb OS: Ubuntu 16.04.6 LTS Disk: SSD

Backend database: (database and version) Bolt, not sure about version - cayley handled that for me

So the question is - am I doing it wrong or is there some bug?

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 2
  • Comments: 19 (10 by maintainers)

Most upvoted comments

Just a follow up, but does now work to import freebase on Cayley?

@hubyhuby Note that Bolt pre-allocates some space, so not all of it is actually used by the database.

@manishrjain Thanks for a suggestion, we will definitely consider that API 😃

The problem is not with Badger specifically, as I mentioned. It’s a problem with Cayley’s legacy write API which tries to be suitable both for batch uploads and regular transactions. The new importer can take advantage of a WriteBatch in Badger directly and it work really well so far!

Thanks for taking the time to test the new version!

badger died after 20min and 14M quads (batch size 1k) with Error: db: failed to load data: Txn is too big to fit into one request

I think we may give up for now on Badger. I’ll need to change the way how we import data into it to avoid those large transaction.

Bolt failed after 13.5h and 207M quads

We are getting somewhere. But it seems like the SP index may also build up over time. I wonder what in the Freebase schema may cause it? I guess we will find out after a successful import 😃

In any case, I will get back to you after making a few more changes.

First, I want to add instrumentation, so me can get more insights from the import process. I will add a “time per batch” metric, so we can get the same graph as you’ve built from Prometheus/Grafana directly. Also, I’m particularly interested in seeing the sizes of those index entries on each index. Also, things like lookup durations may be useful to speed up the import further.

And second, I will add an option to set custom indexes for KV, so we can try “SPO” index instead of “SP”. Since the import was able to proceed further than before, I think the “O” index was indeed the case for previous OOM.

@eawer I made a few changes to indexing in https://github.com/cayleygraph/cayley/pull/816. It should help with the OOM issue, but at the same time it will increase the size of the index on disk, so write performance may suffer. But it’s hard to make any statements because the new indexing strategy should also improve quad lookup performance, which may remove the need for multiple reads during the import process.

I will continue looking into it and will run a few tests locally as well. But the help with the testing is highly appreciated.

Thanks a lot for testing it @eawer. This definitely looks like a memory issue in the new code path. I will investigate it.