simpletransformers: ValueError: too many dimensions 'str'
To Reproduce Steps to reproduce the behavior:
Here is my Colab Notebook you can run to to see the error https://gist.github.com/lenyabloko/adcb84ac04e5e10b7391d49b5bd0539c
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-0b9dcdf94c77> in <module>()
71
72 # Train the model
---> 73 model.train_model(train_df)
74
75 # Evaluate the model
1 frames
/usr/local/lib/python3.6/dist-packages/simpletransformers/classification/classification_model.py in train_model(self, train_df, multi_label, output_dir, show_running_loss, args, eval_df, verbose, **kwargs)
261 ]
262
--> 263 train_dataset = self.load_and_cache_examples(train_examples, verbose=verbose)
264
265 os.makedirs(output_dir, exist_ok=True)
/usr/local/lib/python3.6/dist-packages/simpletransformers/classification/classification_model.py in load_and_cache_examples(self, examples, evaluate, no_cache, multi_label, verbose, silent)
757
758 if output_mode == "classification":
--> 759 all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.long)
760 elif output_mode == "regression":
761 all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.float)
ValueError: too many dimensions 'str'
The problem arises when using
from simpletransformers.classification import ClassificationModel
import pandas as pd
prefix = '/content/'
train_df = pd.read_csv(prefix + 'train.csv', header=None)
train_df=train_df.drop(index=0)
model = ClassificationModel('roberta', 'roberta-base')
model.train_model(train_df)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 32 (9 by maintainers)
For anyone else running into this. this is usually due to having strings in the
labelscolumn.I figure it out, it comes from train_df[“label”], if you look closer you have actually a list (of list) of str, and not a list (of list) of int.
So you have to convert every line with this code :
train_df["label"] = train_df["label"].apply(lambda x: list(map(int, x)))I think I had a similar issue a few weeks ago: The problem was that the classification model expected the labels to start with 0 and move up from there, so I had to replace my labels. Not sure though if that was the solution or just a part of it.
I have the same error in simpletransformers-0.34.1. When I have a signle text column it works fine, but this error pops up when I add a second text column.
Basically, I can’t reproduce your example https://simpletransformers.ai/docs/sentence-pair-classification/
Out of the box it gives error
AttributeError: 'InputExample' object has no attribute 'tokens_b'and when I change
model.train_model(train_df)tomodel.train_model(pd.DataFrame(train_df.values.tolist()[:10]))it gives the error we discuss here:ValueError: too many dimensions 'str'Then, if I keep only one text column, it’s fine:
model.train_model(pd.DataFrame(train_df[['text_a','labels']].values.tolist()[:10]))Update: there’s no error in v0.32.3. I go for it.
Same problem, but solved by using
all_labels = torch.tensor([int(f.label) for f in features], dtype=torch.long)in my case.Can you try it in a fresh directory or with
"reprocess_input_data": True? In case some cached features are causing the issue.