argilla: [Bug] Token Classification emojis cause overlapping spans error & wrong annotations

Describe the bug If there is a prediction annotation mismatch + an emoji 💚 (haven’t tested with other emojis) image

on the UI this shows error: image

I was told clearing all annotations and then annotating and saving works sometimes!

on the server side caused by ValueError: IOB tags cannot handle overlapping spans!

argilla_1        | ERROR:    Exception in ASGI application
argilla_1        | Traceback (most recent call last):
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
argilla_1        |     result = await app(  # type: ignore[func-returns-value]
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
argilla_1        |     return await self.app(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in __call__
argilla_1        |     await super().__call__(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
argilla_1        |     await self.middleware_stack(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
argilla_1        |     raise exc
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
argilla_1        |     await self.app(scope, receive, _send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
argilla_1        |     raise exc
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
argilla_1        |     await self.app(scope, receive, sender)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
argilla_1        |     raise e
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
argilla_1        |     await self.app(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
argilla_1        |     await route.handle(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 443, in handle
argilla_1        |     await self.app(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in __call__
argilla_1        |     await super().__call__(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
argilla_1        |     await self.middleware_stack(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
argilla_1        |     raise exc
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
argilla_1        |     await self.app(scope, receive, _send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/brotli_asgi/__init__.py", line 85, in __call__
argilla_1        |     await responder(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/gzip.py", line 44, in __call__
argilla_1        |     await self.app(scope, receive, self.send_with_gzip)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 92, in __call__
argilla_1        |     await self.simple_response(scope, receive, send, request_headers=headers)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 147, in simple_response
argilla_1        |     await self.app(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
argilla_1        |     raise exc
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
argilla_1        |     await self.app(scope, receive, sender)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
argilla_1        |     raise e
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
argilla_1        |     await self.app(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
argilla_1        |     await route.handle(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
argilla_1        |     await self.app(scope, receive, send)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
argilla_1        |     response = await func(request)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 235, in app
argilla_1        |     raw_response = await run_endpoint_function(
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
argilla_1        |     return await dependant.call(**values)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/argilla/server/apis/v0/handlers/token_classification.py", line 125, in bulk_records
argilla_1        |     result = await service.add_records(
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/argilla/server/services/tasks/token_classification/service.py", line 64, in add_records
argilla_1        |     failed = await self.__storage__.store_records(
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/argilla/server/services/storage/service.py", line 66, in store_records
argilla_1        |     record.metrics = metrics.record_metrics(record)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/argilla/server/services/tasks/token_classification/metrics.py", line 296, in record_metrics
argilla_1        |     annotated_tags = cls._compute_iob_tags(span_utils, record.annotation) or []
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/argilla/server/services/tasks/token_classification/metrics.py", line 340, in _compute_iob_tags
argilla_1        |     return span_utils.to_tags(spans)
argilla_1        |   File "/usr/local/lib/python3.9/site-packages/argilla/utils/span_utils.py", line 163, in to_tags
argilla_1        |     raise ValueError("IOB tags cannot handle overlapping spans!")
argilla_1        | ValueError: IOB tags cannot handle overlapping spans!

Steps to reproduce I’m using char tokens:

text='💚abcd💚efgjklm.04 / abcde fgjklm (aaaaa)'
tokens=['💚','a','b','c','d','💚','e','f','g','j','k','l','m','.','0','4','/','a', 'b','c','d','e','f','g','j','k','l', 'm','(','a','a','a','a','a',')']
entities= [('A', 6, 13), ('B', 25, 31), ('C', 33, 38)]
record = rg.TokenClassificationRecord(
        text=text,
        tokens=tokens,
        prediction=entities,
        prediction_agent="annotator",
        # annotation=entities,
        # annotation_agent="old",
        metadata= {},
        status='Default',
        id=0
    )

⚠️ This also causes the pred alignment to be off by 1char on the UI

Environment (please complete the following information):

  • OS [e.g. iOS]: linux
  • Browser [e.g. chrome, safari]: chrome
  • Argilla Version [e.g. 1.0.0]: 1.2.1
  • ElasticSearch Version [e.g. 7.10.2]: elasticsearch:8.5.3
  • Docker Image (optional) [e.g. argilla:v1.0.0]: argilla-server:v1.2.1
  • Python 3.9.12

Additional context If you are validating multiple records at once and one of them fails the others fail too, and if the annotator is not careful about the toast error message shown moves on to the next page then their annotation on the prev page are lost. Maybe we can show an alert() popup when there are unsaved annotations and the user is trying to navigate out?

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 21 (19 by maintainers)

Most upvoted comments

I would heavily suggest not straying from the norm of using len() list() on the python side (ie counting codepoints), because that is basically how most tokenization libraries work (transformers,spacy… etc). Also grapheme library can become outdated if a new Unicode version gets released. I saw a very old PR for a native grapheme splitter on the cpython library, but who knows when that will be adopted.

We should use graphemes only on the UI side because that is what makes sense to human eyes 👀 😄

I’ve found a related issue.

import argilla as rg

records = [
    rg.TokenClassificationRecord(
        text="I ❤️ you", tokens=["I", "❤️", "you"], prediction=[("I", 0, 1), ("emoji", 2, 4), ("you", 5, 8)]
    ),
    rg.TokenClassificationRecord(
        text="I 💚 you", tokens=["I", "💚", "you"], prediction=[("I", 0, 1), ("emoji", 2, 3), ("you", 4, 7)]
    ),
    rg.TokenClassificationRecord(
        text="I h you", tokens=["I", "h", "you"], prediction=[("I", 0, 1), ("emoji", 2, 3), ("you", 4, 7)]
    ),
]

rg.delete("issue_2353_emoji")
rg.log(records, "issue_2353_emoji")

produces: image image

Note also the following awkward Python behaviour:

>>> len("h")
1
>>> len("💚")
1
>>> len("❤️")
2
>>> 

bump

but that is how python counts too! It count’s code points. How humans perceive a single letter(A,B,C etc)(can think of this as the grapheme) and how a single grapheme is represented (by one or more unicode points), and how those points are encoded (UTF-16, UTF-8) are all different things. Also the problem is not just emojis. It is everything that is represented by more than 1 codepoint (basically some non-english letters, accented chars, emojis) But emojis are an easy way to debug this

image

also was helpful: https://stackoverflow.com/a/51422499/3726119

On the UI side we want graphemes (using Intl.Segmenter() polyfill can work for this), on the backend we want to calculate codepoints = python. whereas JS natively (just doing “str”.length) calculates UTF-16 encoding length.

@leiyre I’ve pushed my latest example to dev under issue_2353_emoji. I think you should be able to access it there now.

image And this seems to be the issue. In Python, the green heart only has length 1, while on the JS side it has length 2. As a result, the required prediction start and end spans on Python simply slice differently on JS. Note for example that in my last comment I had to use different prediction spans to select only “efg”.

I’ve expanded on @cceyda’s useful example with the red heart to show that the issue only exists with the colored one:

import argilla as rg

rg.delete("issue_2353_emoji")

tokens = ["💚", "a", "b", "c", "d", "💚", "e", "f", "g", "j", "k", "l"]
text = "💚abcd💚efgjkl"
entities = [("A", 6, 9)]
assert text[6:9] == "efg"
record = rg.TokenClassificationRecord(
    text=text,
    tokens=tokens,
    prediction=entities,
)

rg.log(record, "issue_2353_emoji")

tokens = ["❤️", "a", "b", "c", "d", "❤️", "e", "f", "g", "j", "k", "l"]
text = "❤️abcd❤️efgjkl"
entities = [("A", 8, 11)]
assert text[8:11] == "efg"
record = rg.TokenClassificationRecord(
    text=text,
    tokens=tokens,
    prediction=entities,
)

rg.log(record, "issue_2353_emoji")

produces: image image