tapas: Error when create_data is used on WIKISQL

I’m trying to replicate the results (to ensure I’m setting everything up properly) using the WikiSQL dataset. I am using “tapas_wikisql_sqa_masklm_small_reset.zip” model. However when I try running:

!python tapas/tapas/run_task_main.py \
  --task="WIKISQL" \
  --input_dir="data/" \
  --output_dir="results/wsql/input_data" \
  --bert_vocab_file="tapas_model/bert_config.json" \
  --mode="create_data"

I encounter the following error:

I1005 02:55:27.819439 139825599944576 sqa_utils.py:102] Total	Valid	Failed	File
56355	55775	580	train.tsv
8421	8421	0	dev.tsv
15878	15878	0	test.tsv
Creating TF examples ...
I1005 02:55:36.143442 139825599944576 run_task_main.py:152] Creating TF examples ...
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tapas/scripts/prediction_utils.py:48: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
W1005 02:55:36.144381 139825599944576 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tapas/scripts/prediction_utils.py:48: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
I1005 02:55:36.159258 139825599944576 number_annotation_utils.py:149] Can't consolidate types: (None, text: "Current slogan"
) {0: [float_value: 2013.0
, date {
  year: 2013
}
], 1: [], 2: [], 3: [], 4: [], 5: [], 6: []} 1
Traceback (most recent call last):
  File "tapas/tapas/run_task_main.py", line 782, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "tapas/tapas/run_task_main.py", line 743, in main
    output_dir=output_dir)
  File "tapas/tapas/run_task_main.py", line 178, in _create_all_examples
    test_mode=test_mode)
  File "tapas/tapas/run_task_main.py", line 231, in _create_examples
    examples.append(converter.convert(interaction, i))
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/tf_example_utils.py", line 1096, in convert
    drop_rows_to_fit=self._drop_rows_to_fit)
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/tf_example_utils.py", line 1018, in _to_trimmed_features
    serialized_example.tokens, feature_dict, table=table, question=question)
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/tf_example_utils.py", line 673, in _to_features
    input_ids = self._to_token_ids(tokens)
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/tf_example_utils.py", line 656, in _to_token_ids
    return self._tokenizer.convert_tokens_to_ids(_get_pieces(tokens))
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/tf_example_utils.py", line 332, in convert_tokens_to_ids
    return self._wp_tokenizer.convert_tokens_to_ids(word_pieces)
  File "/usr/local/lib/python3.6/dist-packages/official/nlp/bert/tokenization.py", line 190, in convert_tokens_to_ids
    return convert_by_vocab(self.vocab, tokens)
  File "/usr/local/lib/python3.6/dist-packages/official/nlp/bert/tokenization.py", line 150, in convert_by_vocab
    output.append(vocab[item])
KeyError: '[CLS]'

Is this an issue with the WikiSQL data itself, or is it the .py utility within TAPAS?

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 41

Most upvoted comments

@alxblandin yes agreed, I’ll get back at it in a bit here. Please ping me if you manage to crack the nut!

aminfardi on Oct 5, 2020

@alxblandin, any luck with your predict_evaluate? So far a lot of sleeping for me:

, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 4.0, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I1005 16:13:07.712592 140186778113920 tpu_context.py:216] _TPUContext: eval_on_tpu True
Sleeping 5 mins before predicting
I1005 16:13:07.713266 140186778113920 run_task_main.py:152] Sleeping 5 mins before predicting
Sleeping 5 mins before predicting
I1005 16:18:07.792725 140186778113920 run_task_main.py:152] Sleeping 5 mins before predicting
Sleeping 5 mins before predicting
I1005 16:23:07.893801 140186778113920 run_task_main.py:152] Sleeping 5 mins before predicting
Sleeping 5 mins before predicting
I1005 16:28:07.994912 140186778113920 run_task_main.py:152] Sleeping 5 mins before predicting

UPDATE: My issue was I didn’t have the model checkpoint in the right directory. So it was basically sitting and looking for a model every 5 mins. Once I moved the checkpoint in the right folder, it worked.

aminfardi on Oct 5, 2020