bert: Error: Trying to access flag --preserve_unused_tokens before flags were parsed
I have been using the following code fine until this morning. I got an error for using bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
Please let me know how to fix it
import pandas as pd
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization
from tensorflow.contrib import predictor
import pkg_resources
pkg_resources.get_distribution("bert-tensorflow").version
input_words = "Hello"
DATA_COLUMN = "message"
LABEL_COLUMN = "category_label"
test = pd.DataFrame({DATA_COLUMN: [input_words], LABEL_COLUMN : [0]})
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
def create_tokenizer_from_hub_module():
"""Get the vocab file and casing info from the Hub module."""
with tf.Graph().as_default():
bert_module = hub.Module(BERT_MODEL_HUB)
tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
with tf.Session() as sess:
vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
tokenization_info["do_lower_case"]])
return bert.tokenization.FullTokenizer(
vocab_file=vocab_file, do_lower_case=do_lower_case)
tokenizer = create_tokenizer_from_hub_module()
test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None,
text_a = x[DATA_COLUMN],
text_b = None,
label = x[LABEL_COLUMN]), axis = 1)
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
label_list = [6,1,2,4,3,5,0]
# Convert our test features to InputFeatures that BERT understands.
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
Error:
INFO:tensorflow:Writing example 0 of 1
INFO:tensorflow:Writing example 0 of 1
UnparsedFlagAccessError: Trying to access flag --preserve_unused_tokens before flags were parsed.
---------------------------------------------------------------------------
UnparsedFlagAccessError Traceback (most recent call last)
<command-35675914> in <module>
16 label_list = [6,1,2,4,3,5,0]
17 # Convert our test features to InputFeatures that BERT understands.
---> 18 test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
19
20 input_ids_list = [x.input_ids for x in test_features]
/databricks/python/lib/python3.7/site-packages/bert/run_classifier.py in convert_examples_to_features(examples, label_list, max_seq_length, tokenizer)
778
779 feature = convert_single_example(ex_index, example, label_list,
--> 780 max_seq_length, tokenizer)
781
782 features.append(feature)
/databricks/python/lib/python3.7/site-packages/bert/run_classifier.py in convert_single_example(ex_index, example, label_list, max_seq_length, tokenizer)
394 label_map[label] = i
395
--> 396 tokens_a = tokenizer.tokenize(example.text_a)
397 tokens_b = None
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 21
Hey, this is an error caused due to a recent version update in bert. Change pip install bert-tensorflow to pip install bert-tensorflow==1.0.1 This will solve the error by installing the previous version. You can use the previous version till the developers fix this issue.
I set the flag manually. Not sure this is right but made my code work.
This fixes the issue
Hello Friends, I had this issue with tf 2.8 and bert 1.0.4 as well. I just stick these lines before the offending call: import sys sys.argv=[‘preserve_unused_tokens=False’] #Or true, if you like flags.FLAGS(sys.argv)
Cheers!
For me, downgrading bert-tensorflow to 1.0.1 and downgrading tensorflow to 2.0.0 worked but w/ one work around. context: after downgrading the libraries and running my script, error was thrown:
with tf.gfile.GFile(vocab_file, "r") as reader: AttributeError: module 'tensorflow' has no attribute 'gfile'
So to fix this, I edited the code in site-packages/bert/tokenization.py where gfile was being called and replaced it withwith tf.io.gfile.Gfile(vocab_file, "r") as reader
it’s not the cleanest, but it worked for me.Degraded the bert-tensorflow version, resolved!