elasticsearch-dsl-py: Can't paginate using elasticsearch_dsl

When I want to paginate through the search results, not iterate as the scan does from elasticsearch.helpers.

My current workaround is something like this

search = Search()
...
# construct your search query
...
result = search.params(search_type='scan', scroll='1m').execute()
scroll_id = result.scroll_id
# Now start using scroll_id to do the pagination,
# but I have to use Elasticsearch.scroll which returns dictionaries not a Result object
client = connections.get_connection()
while data_to_paginate:
  result = Response(client.scroll(scroll_id, scroll='1m'))

There probably should be a helper function that should abstract at least the following part

client = connections.get_connection()
result = Response(client.scroll(scroll_id, scroll='1m'))

Maybe even getting the scroll_id from the result. Basically the user probably shouldn’t be getting a client and manually constructing a Response object.

@HonzaKral what do you think? If we agree on the interface I could implement that since I am probably going to do that for my project.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 1
  • Comments: 26 (16 by maintainers)

Most upvoted comments

@sbnajardhane you can always pass in the preserve_order attribute to the underlying helper by calling s = s.params(preserve_order=True) before calling s.scan() - that way the sorting will still be applied even when scanning

+1 for a more default ES-DSL implementation for pagination 😃