django-watson: ./manage.py buildwatson extremely slow on 0,5 million rows
In my Postgresql db, there are around 438 972 rows that should be tracked by watson. The problem is that full index build (using the buildwatson management command) is extremely slow.
(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson
Killed
real 123m22.753s
Here the process was killed probably because it reached some system limits. It had been running for more than two hours and didn’t finish.
These are register commands I use:
watson.register(Crag, fields=('normalized_name', 'country'))
watson.register(Member.objects.all(), fields=('normalized_name', 'user', 'country'))
watson.register(Event, fields=('normalized_name', 'country'))
watson.register(Route, fields=('normalized_name', 'crag__name', 'crag__normalized_name'))
The majority of all objects is contained in the Route model (more than 400 000).
I would be very happy if the time could be reduced somehow.
About this issue
- Original URL
- State: closed
- Created 11 years ago
- Comments: 17 (8 by maintainers)
@clime, would you mind to share the PL/pgSQL script you made? i have a similarly sized database that i need to build an index from.
Ye, they don’t scale well. On my server machine it has finally finished:
Over 18 hours xD and the server wasn’t under heavy load or something. On my local machine it is much faster (around 40 mins on the same data) so probably disk IO makes the difference (cpu was on 100% all the time but I don’t believe that only cpu would make such a difference, network is out of the question, db runs on the same machine as the application). I am not sure why I am posting it here. Probably there is just nothing that can be done but still, 18 hours is a lot right?
EDIT: I am additionally testing if there is a difference between first build and the following rebuilds.