label-studio: Uploading individual .txt files as individual tasks fails in multiple ways

Describe the bug For the NER usecase when each text file is a separate task; the export of annotated data exports only text file names for text content; even though there is no issues with labelling the content

To Reproduce Steps to reproduce the behavior:

  1. Set up NER labelling Task with following Config (I used the valueType=“url” based on recommendation here to have each .txt file as a separate task : https://labelstud.io/guide/tasks.html#Plain-text; it reads “If you want to import entire plain text files without each line becoming a new labeling task, customize the labeling configuration to specify valueType=“url” in the Text tag. See the Text tag documentation”)
  <Labels name="label" toName="text">
    <Label value="label1" background="#FFA39E"/>
    <Label value="label2" background="#D4380D"/>
    <Label value="label3" background="#FFC069"/>
    <Label value="label4" background="#AD8B00"/>
    <Label value="label5" background="#D3F261"/>
    <Label value="label6" background="#389E0D"/>
    <Label value="labe7" background="#5CDBD3"/>
  </Labels>
  <Text name="text" value="$text" valueType="url"/>
</View>
  1. Upload a few .txt files and Import as Time Series (only Time Series option keeps the each .txt files as a separate task)
  2. Annotate the data
  3. Try to Export the annotated data through API or UX and in any format. The annotated data does not contain content of .txt files

Expected behavior

  1. The text content is visible in the exported data
  2. When uploading .txt files; there is an option to upload it as .txt or .txtl (text or textLine similar to json or jsonl)
  3. Perhaps a new valueType param in Text tag to indicate text or textline

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Label Studio Version [e.g. v1.01]

Additional context https://label-studio.slack.com/archives/C01SKFX54QK/p1621488696013000

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (7 by maintainers)

Most upvoted comments

using last docker build, the workaround with valueType=“url” works

<Text name="text" value="$text" granularity="word" valueType="url"/>

still no solution ?

I think I had the same issue. And I want to explain why it is confusing to the end users. So I wanted to import a plain text document as one task. Let’s say the content is: my-doc.txt

this is the first line
this is the second line
this is the third line

So when I import it into label studio. It as me to choice Treat CSV/TSV as: List of tasks/Time Series or Whole Text File which is confusing becasuse I want to import a plain text document instead of CSV/TSV file. But any way

when I chose List of tasks I’ll get 3 tasks, which is not expected, I want one task instead of 3.

task1: this is the first line
task2: this is the second line
task3: this is the third line

when I chose Time Series or Whole Text File I get one task but not data imported

task1: `/data/my-doc.txt`

What I’m actually expecting is:

task1: this is the first line\nthis is the second line\nthis is the third line

tempory solution

conver the plain text into JSON before import

{
  "id": 1,
  "data": {
    "value": "this is the first line\nthis is the second line\nthis is the third line"
  },
  "annotations": [],
  "predictions": []
}

Don’t know if we can make the behavior more intuitive to the user.

I wanted to do the same thing as you - import a whole text file as a single task. I did what you said in the OP and I can’t reproduce - exporting shows me labels just fine.

  1. I set up a project with the NER template, but I needed to change the valueType=“url” (even though the preview showed me errors that it couldn’t find a file with the url from the example - well, the example (text about Jimi Hendrix) is plaintext, so no wonder). image

I clicked ok anyways (though it’d be cool if the example detected that I used valueType=“url” and switched example text to an url). The template I used is in the screenshot also.

  1. Upload a .txt file as time series (this was very confusing - I’m uploading a text file and the app is asking me if it should interpret TSV as time series or separate tasks? I’m uploading plain text, not TSV or CSV). I have similar findings that uploading as time series is the only way to get Label Studio to interpret the file as a single task.

  2. Annotate some text and save my annotations - I have a feeling maybe you didn’t save after annotating? image

  3. Export as CSV: image

  4. The exported CSV contains annotations that seem to be correct: image

I downloaded the Label Studio Docker image 2 hours ago, so my version is “from today” 😛 Below is more detailed info

# docker -it exec label-studio /bin/bash
root@0a7d94e30c07:/label-studio# pip list | grep label-studio
label-studio               1.0.2         /label-studio
label-studio-converter     0.0.28