simpletransformers: ValueError: too many dimensions 'str'

To Reproduce Steps to reproduce the behavior:

Here is my Colab Notebook you can run to to see the error https://gist.github.com/lenyabloko/adcb84ac04e5e10b7391d49b5bd0539c

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-0b9dcdf94c77> in <module>()
     71 
     72 # Train the model
---> 73 model.train_model(train_df)
     74 
     75 # Evaluate the model

1 frames
/usr/local/lib/python3.6/dist-packages/simpletransformers/classification/classification_model.py in train_model(self, train_df, multi_label, output_dir, show_running_loss, args, eval_df, verbose, **kwargs)
    261             ]
    262 
--> 263         train_dataset = self.load_and_cache_examples(train_examples, verbose=verbose)
    264 
    265         os.makedirs(output_dir, exist_ok=True)

/usr/local/lib/python3.6/dist-packages/simpletransformers/classification/classification_model.py in load_and_cache_examples(self, examples, evaluate, no_cache, multi_label, verbose, silent)
    757 
    758         if output_mode == "classification":
--> 759             all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.long)
    760         elif output_mode == "regression":
    761             all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.float)

ValueError: too many dimensions 'str'

The problem arises when using

from simpletransformers.classification import ClassificationModel

import pandas as pd
prefix = '/content/'
train_df = pd.read_csv(prefix + 'train.csv', header=None)
train_df=train_df.drop(index=0)

model = ClassificationModel('roberta', 'roberta-base')

model.train_model(train_df)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 32 (9 by maintainers)

Most upvoted comments

For anyone else running into this. this is usually due to having strings in the labels column.

I figure it out, it comes from train_df[“label”], if you look closer you have actually a list (of list) of str, and not a list (of list) of int.

So you have to convert every line with this code : train_df["label"] = train_df["label"].apply(lambda x: list(map(int, x)))

I think I had a similar issue a few weeks ago: The problem was that the classification model expected the labels to start with 0 and move up from there, so I had to replace my labels. Not sure though if that was the solution or just a part of it.

I have the same error in simpletransformers-0.34.1. When I have a signle text column it works fine, but this error pops up when I add a second text column.

Basically, I can’t reproduce your example https://simpletransformers.ai/docs/sentence-pair-classification/

Out of the box it gives error AttributeError: 'InputExample' object has no attribute 'tokens_b'

and when I change model.train_model(train_df) to model.train_model(pd.DataFrame(train_df.values.tolist()[:10])) it gives the error we discuss here: ValueError: too many dimensions 'str'

Then, if I keep only one text column, it’s fine: model.train_model(pd.DataFrame(train_df[['text_a','labels']].values.tolist()[:10]))

Update: there’s no error in v0.32.3. I go for it.

Same problem, but solved by using all_labels = torch.tensor([int(f.label) for f in features], dtype=torch.long) in my case.

Can you try it in a fresh directory or with "reprocess_input_data": True? In case some cached features are causing the issue.