label-studio: Filtering on Annotation Results is Very Slow
Describe the bug Filtering on annotation results in labelstudio is really slow and sometimes times out.

On a dataset of ~10k tasks with only “Choice” annotations, running this filter takes ~20 seconds.
Example annotation result:
[{"id": "n7JrgyJ0B8", "type": "choices", "value": {"choices": ["abc"]}, "to_name": "image", "from_name": "redacted_1"}, {"id": "mL0b2WOgeD", "type": "choices", "value": {"choices": ["def"]}, "to_name": "image", "from_name": "redacted_2"}]
I took a look at the underlying query in google cloud query insights
SELECT
COUNT(*) AS "__count"
FROM
"task_completion"
WHERE
("task_completion"."task_id" IN (
SELECT
U0."id"
FROM
"task" U0
LEFT OUTER JOIN
"task_completion" U2
ON
(U0."id" = U2."task_id")
INNER JOIN
"task_completion" U3
ON
(U0."id" = U3."task_id")
INNER JOIN
"task_completion" U4
ON
(U0."id" = U4."task_id")
WHERE
(U0."project_id" = $1
AND UPPER(U3."result"::text) LIKE UPPER($2)
AND UPPER(U4."result"::text) LIKE UPPER($3))
GROUP BY
U0."id")
AND NOT "task_completion"."was_cancelled")
After playing around with it, it looks like really slow bit is where it does a full text search on the annotation results.
AND UPPER(U3."result"::text) LIKE UPPER('%"abc"%')
AND UPPER(U4."result"::text) LIKE UPPER('%"def"%'))
For this dataset particular dataset(10k tasks) the filter takes 20 seconds and the annotation results aren’t even that large.

But for tasks where we have brushlabel annotations, the annotation results are huge and the corresponding filters always time out.

Is there any way to speed this query up?
The options I see to speed this up are:
- Some clever query optimization
- Adjust the task filtering to allow for some sort of jsonpath style querying rather than a full text search.
- Restructure how annotation results are stored.
Thoughts?
To Reproduce Steps to reproduce the behavior:
- Create a dataset with ~10k tasks and corresponding “Choice” annotations.
- Click on “Filters” and select “Where annotation results contains <foo>”
Expected behavior The filter results return within a few seconds.
Environment (please complete the following information):
- Client OS: macOS Big Sur
- Client Browser: Brave | 1.31.87 Chromium: 95.0.4638.54 (Official Build) (x86_64)
- Label Studio Version 1.3.post1
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (8 by maintainers)
I’ve checked it on projects with 100k-500k tasks, it worked almost immediately.
@ijmiller2 in this case better to write a direct message to me in label studio slack: https://label-studio.slack.com/ https://slack.labelstudio.heartex.com/?source=site - invite link
@csaroff Probably in the first quarter of 2022.