django-watson: ./manage.py buildwatson extremely slow on 0,5 million rows

In my Postgresql db, there are around 438 972 rows that should be tracked by watson. The problem is that full index build (using the buildwatson management command) is extremely slow.

(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson

Killed

real    123m22.753s

Here the process was killed probably because it reached some system limits. It had been running for more than two hours and didn’t finish.

These are register commands I use:

  watson.register(Crag, fields=('normalized_name', 'country'))
  watson.register(Member.objects.all(), fields=('normalized_name', 'user', 'country'))
  watson.register(Event, fields=('normalized_name', 'country'))
  watson.register(Route, fields=('normalized_name', 'crag__name', 'crag__normalized_name'))

The majority of all objects is contained in the Route model (more than 400 000).

I would be very happy if the time could be reduced somehow.

About this issue

  • Original URL
  • State: closed
  • Created 11 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

@clime, would you mind to share the PL/pgSQL script you made? i have a similarly sized database that i need to build an index from.

Ye, they don’t scale well. On my server machine it has finally finished:

(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson

refreshed 439000 search entry(s) in u'default' search engine.
Deleted 0 stale search entry(s) in u'default' search engine.
Refreshed 0 search entry(s) in u'admin' search engine.
Deleted 0 stale search entry(s) in u'admin' search engine.

real    1094m11.385s
user    43m48.102s
sys     0m32.725s

Over 18 hours xD and the server wasn’t under heavy load or something. On my local machine it is much faster (around 40 mins on the same data) so probably disk IO makes the difference (cpu was on 100% all the time but I don’t believe that only cpu would make such a difference, network is out of the question, db runs on the same machine as the application). I am not sure why I am posting it here. Probably there is just nothing that can be done but still, 18 hours is a lot right?

EDIT: I am additionally testing if there is a difference between first build and the following rebuilds.