System Info
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0]["input_ids"]), training_set[0]["labels"]):
print('{0:10} {1}'.format(token, label))
The error I am getting is:
Traceback (most recent call last):
File "C:\Users\1632613\Documents\Anit\NER_Trans\test.py", line 108, in <module>
for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0]["input_ids"]), training_set[0]["labels"]):
File "C:\Users\1632613\Documents\Anit\NER_Trans\test.py", line 66, in __getitem__
encoding = self.tokenizer(sentence,
File "C:\Users\1632613\AppData\Local\conda\conda\envs\ner\lib\site-packages\transformers\tokenization_utils_base.py", line 2477, in __call__
return self.batch_encode_plus(
File "C:\Users\1632613\AppData\Local\conda\conda\envs\ner\lib\site-packages\transformers\tokenization_utils_base.py", line 2668, in batch_encode_plus
return self._batch_encode_plus(
TypeError: _batch_encode_plus() got an unexpected keyword argument 'is_pretokenized'
Who can help?
@SaulLu
Information
Tasks
Reproduction
- Download the NER Dataset from the Kaggle link (https://www.kaggle.com/datasets/namanj27/ner-dataset)
- Use the Script with the mentioned versions of transformers and tokenizers:
tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-uncased’)
for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0][“input_ids”]), training_set[0][“labels”]):
print(‘{0:10} {1}’.format(token, label))
Expected behavior
I expect to get the token, label from the script above.
Python Version: 3.9
tokenizers-0.12.1
transformers-4.19.2
Anyone can shed some light please?
I am having the same problem
here is the output of
transformers-cli env
you can also find the colab notebook here