tile38: Question: Reloading a large AOF file (and running out of memory)
This is a by-product of issue #70
- We have an AOF file (produced by a version 1.5.1 instance of Tile38) that is ~17GB on disk and, when its all said and done, consumes about 24GB RAM on a 32GB machine. This is an AOF that has not been
SHRINK
-ed (so maybe that’s the root problem?) - After backporting to version 1.4.2 we spun up Tile38 and watched it import the AOF file eventually running out of memory.
- After upgrading to version 1.5.2 we spun up Tile38 and watched it import the AOF file eventually running out of memory, again.
Is this a known-known and if so are there any best practices for mitigating against running out of memory when starting up a Tile38 server? Is there a relationship between the size of the AOF file on disk and memory requirements for both startup and general operations?
(As I write this simply re-feeding the index is not a big deal.)
Cheers,
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 33 (18 by maintainers)
Commits related to this issue
- optimized idprops field for #71 — committed to tidwall/tile38 by tidwall 8 years ago
@umpc Great tip. I went ahead and removed the
map[string]interface{}
and added a string field that contains the id+properties. This should be very small in memory without losing the Id.I didn’t actually see an improvment in HeapAlloc from the test bundle that @thisisaaronland provided, though I don’t think features were being used in that dataset.
I also removed the “encoding/json” package in favor of “tidwall/gjson” which sped up the parsing by about 25%.
These changed are posted to the memoptz branch.
The average size is probably pretty small when you factor in the fact that most of the records will be venues, meaning points or in rare occasions simple polygons. No 3D objects.
That said there are about 4M simple to very complex polygons for postal codes and administrative places:
https://whosonfirst.mapzen.com/spelunker/placetypes/
One of the tenets of Who’s On First is that any given record can have multiple geometries, including simplified or display geometries. That said right now they don’t (have simplified geometries) and some national boundaries are enormous and fiddly – like New Zealand:
https://whosonfirst.mapzen.com/data/856/333/45/85633345.geojson
So most records (20M+) will be very small but some of them will be very large, relative to all the others. Ideally we’d like to be able to index all the places (24M and growing) in a single index rather than breaking things out by placetype or some other grouping.
If you’re looking for test data you can grab stuff here:
https://whosonfirst.mapzen.com/bundles/
Venue bundles are also available just not listed on the page, because venues are hard (the long version is over here). For example:
https://whosonfirst.mapzen.com/bundles/wof-venue-us-ca-latest-bundle.tar.bz2
All of which should be able to be imported in to Tile38 using this:
https://github.com/whosonfirst/go-whosonfirst-tile38/blob/master/README.md#example