transformers: hyperparameter_search raytune: ModuleNotFoundError: No module named 'datasets_modules'

Environment info

transformers version: 4.4.2
Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.10
Python version: 3.8.8
PyTorch version (GPU?): 1.6.0 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help

@richardliaw, @amogkam

Information

Model I am using (Bert, XLNet …): Bert (neuralmind/bert-base-portuguese-cased)

The problem arises when using:

the official example scripts: (give details below)
[ x ] my own modified scripts: (give details below)

The tasks I am working on is:

[ x ] an official GLUE/SQUaD task: (give the name)
[ x ] my own task or dataset: (give details below)

I’m running a modified run_ner example to use trainer.hyperparameter_search with raytune. I’m using my own datasets, but I have run into the same issue using other glue scripts and official glue datasets, such as the ones other people ran into here:

https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/34 https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/35 Colab from @piegu

At first I was using the run_ner and transformers version from the current 4.6.0-dev branch, but I ran into the same issue as reported here: #11249

So I downgraded transformers and ray to 4.4.2 and 1.2.0 (creating a fresh conda environment), and made the necessary adjustments to the run_ner script, to become compatible with 4.4.2.

To reproduce

Steps to reproduce the behavior:

This is the full code from the script:

#!/usr/bin/env python
# coding: utf-8


import json
import logging
import os
import sys
import copy

from dataclasses import dataclass, field
from typing import Optional, Dict, Any

import numpy as np
from datasets import ClassLabel, load_dataset, load_metric

from ray import tune
from ray.tune.integration.wandb import WandbLogger
from ray.tune.logger import DEFAULT_LOGGERS
from ray.tune.schedulers import PopulationBasedTraining

import transformers
from transformers import (
    AutoConfig,
    AutoModelForTokenClassification,
    AutoTokenizer,
    DataCollatorForTokenClassification,
    HfArgumentParser,
    PreTrainedTokenizerFast,
    Trainer,
    TrainingArguments,
    set_seed,
)
from transformers.trainer_utils import get_last_checkpoint, is_main_process
from transformers.utils import check_min_version

# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.4.0")

logger = logging.getLogger(__name__)


@dataclass
class RayArguments:
    """[summary]
    """

    time_budget_h: str = field(
        metadata={"help": "Time budget in hours."}
    )


@dataclass
class ModelArguments:
    """
    Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
    """

    model_name_or_path: str = field(
        metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
    )
    config_name: Optional[str] = field(
        default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
    )
    tokenizer_name: Optional[str] = field(
        default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
    )
    cache_dir: Optional[str] = field(
        default=None,
        metadata={"help": "Where do you want to store the pretrained models downloaded from huggingface.co"},
    )
    model_revision: str = field(
        default="main",
        metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
    )
    use_auth_token: bool = field(
        default=False,
        metadata={
            "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script "
                    "with private models)."
        },
    )


@dataclass
class DataTrainingArguments:
    """
    Arguments pertaining to what data we are going to input our model for training and eval.
    """

    task_name: Optional[str] = field(default="ner", metadata={"help": "The name of the task (ner, pos...)."})
    dataset_name: Optional[str] = field(
        default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."}
    )
    dataset_config_name: Optional[str] = field(
        default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."}
    )
    train_file: Optional[str] = field(
        default=None, metadata={"help": "The input training data file (a csv or JSON file)."}
    )
    validation_file: Optional[str] = field(
        default=None,
        metadata={"help": "An optional input evaluation data file to evaluate on (a csv or JSON file)."},
    )
    test_file: Optional[str] = field(
        default=None,
        metadata={"help": "An optional input test data file to predict on (a csv or JSON file)."},
    )
    overwrite_cache: bool = field(
        default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
    )
    preprocessing_num_workers: Optional[int] = field(
        default=None,
        metadata={"help": "The number of processes to use for the preprocessing."},
    )
    pad_to_max_length: bool = field(
        default=False,
        metadata={
            "help": "Whether to pad all samples to model maximum sentence length. "
                    "If False, will pad the samples dynamically when batching to the maximum length in the batch. More "
                    "efficient on GPU but very bad for TPU."
        },
    )
    max_train_samples: Optional[int] = field(
        default=None,
        metadata={
            "help": "For debugging purposes or quicker training, truncate the number of training examples to this "
                    "value if set."
        },
    )
    max_val_samples: Optional[int] = field(
        default=None,
        metadata={
            "help": "For debugging purposes or quicker training, truncate the number of validation examples to this "
                    "value if set."
        },
    )
    max_test_samples: Optional[int] = field(
        default=None,
        metadata={
            "help": "For debugging purposes or quicker training, truncate the number of test examples to this "
                    "value if set."
        },
    )
    label_all_tokens: bool = field(
        default=False,
        metadata={
            "help": "Whether to put the label for one word on all tokens of generated by that word or just on the "
                    "one (in which case the other tokens will have a padding index)."
        },
    )
    return_entity_level_metrics: bool = field(
        default=False,
        metadata={"help": "Whether to return all the entity levels during evaluation or just the overall ones."},
    )

    def __post_init__(self):
        if self.dataset_name is None and self.train_file is None and self.validation_file is None:
            raise ValueError("Need either a dataset name or a training/validation file.")
        else:
            if self.train_file is not None:
                extension = self.train_file.split(".")[-1]
                assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."
            if self.validation_file is not None:
                extension = self.validation_file.split(".")[-1]
                assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
        self.task_name = self.task_name.lower()


def compute_objective(metrics: Dict[str, float]) -> float:
    """
    The default objective to maximize/minimize when doing an hyperparameter search. It is the evaluation loss if no
    metrics are provided to the :class:`~transformers.Trainer`, the sum of all metrics otherwise.
    Args:
        metrics (:obj:`Dict[str, float]`): The metrics returned by the evaluate method.
    Return:
        :obj:`float`: The objective to minimize or maximize
    """
    metrics = copy.deepcopy(metrics)
    loss = metrics.pop("eval_loss", None)
    _ = metrics.pop("epoch", None)
    # Remove speed metrics
    speed_metrics = [m for m in metrics.keys() if m.endswith("_runtime") or m.endswith("_samples_per_second")]
    for sm in speed_metrics:
        _ = metrics.pop(sm, None)
    return loss if len(metrics) == 0 else sum(metrics.values())


def main():
    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, RayArguments))
    model_args, data_args, training_args, ray_args = parser.parse_args_into_dataclasses()

    # Detecting last checkpoint.
    last_checkpoint = None
    if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
        last_checkpoint = get_last_checkpoint(training_args.output_dir)
        if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
            raise ValueError(
                f"Output directory ({training_args.output_dir}) already exists and is not empty. "
                "Use --overwrite_output_dir to overcome."
            )
        elif last_checkpoint is not None:
            logger.info(
                f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
                "the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
            )

    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        handlers=[logging.StreamHandler(sys.stdout)],
    )
    logger.setLevel(logging.INFO if is_main_process(training_args.local_rank) else logging.WARN)

    # Log on each process the small summary:
    logger.warning(
        f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
        + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
    )
    # Set the verbosity to info of the Transformers logger (on main process only):
    if is_main_process(training_args.local_rank):
        transformers.utils.logging.set_verbosity_info()
        transformers.utils.logging.enable_default_handler()
        transformers.utils.logging.enable_explicit_format()
    logger.info("Training/evaluation parameters %s", training_args)

    # Set seed before initializing model.
    set_seed(training_args.seed)

    # Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below)
    # or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/
    # (the dataset will be downloaded automatically from the datasets Hub).
    #
    # For CSV/JSON files, this script will use the column called 'text' or the first column if no column called
    # 'text' is found. You can easily tweak this behavior (see below).
    #
    # In distributed training, the load_dataset function guarantee that only one local process can concurrently
    # download the dataset.
    if data_args.dataset_name is not None:
        # Downloading and loading a dataset from the hub.
        datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name)
    else:
        data_files = {}
        if data_args.train_file is not None:
            data_files["train"] = data_args.train_file
        if data_args.validation_file is not None:
            data_files["validation"] = data_args.validation_file
        if data_args.test_file is not None:
            data_files["test"] = data_args.test_file
        extension = data_args.train_file.split(".")[-1]
        datasets = load_dataset(extension, data_files=data_files)
    # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
    # https://huggingface.co/docs/datasets/loading_datasets.html.

    if training_args.do_train:
        column_names = datasets["train"].column_names
        features = datasets["train"].features
    else:
        column_names = datasets["validation"].column_names
        features = datasets["validation"].features
    text_column_name = "tokens" if "tokens" in column_names else column_names[0]
    label_column_name = (
        f"{data_args.task_name}_tags" if f"{data_args.task_name}_tags" in column_names else column_names[1]
    )

    # In the event the labels are not a `Sequence[ClassLabel]`, we will need to go through the dataset to get the
    # unique labels.
    def get_label_list(labels):
        unique_labels = set()
        for label in labels:
            unique_labels = unique_labels | set(label)
        label_list = list(unique_labels)
        label_list.sort()
        return label_list

    if isinstance(features[label_column_name].feature, ClassLabel):
        label_list = features[label_column_name].feature.names
        # No need to convert the labels since they are already ints.
        label_to_id = {i: i for i in range(len(label_list))}
    else:
        label_list = get_label_list(datasets["train"][label_column_name])
        label_to_id = {l: i for i, l in enumerate(label_list)}
    num_labels = len(label_list)

    # Load pretrained model and tokenizer
    #
    # Distributed training:
    # The .from_pretrained methods guarantee that only one local process can concurrently
    # download model & vocab.
    config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        num_labels=num_labels,
        finetuning_task=data_args.task_name,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=True,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
        model_max_length=512
    )
    model = AutoModelForTokenClassification.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )

    # Tokenizer check: this script requires a fast tokenizer.
    if not isinstance(tokenizer, PreTrainedTokenizerFast):
        raise ValueError(
            "This example script only works for models that have a fast tokenizer. Checkout the big table of models "
            "at https://huggingface.co/transformers/index.html#bigtable to find the model types that meet this "
            "requirement"
        )

    # Preprocessing the dataset
    # Padding strategy
    padding = "max_length" if data_args.pad_to_max_length else False

    # Tokenize all texts and align the labels with them.
    def tokenize_and_align_labels(examples):
        tokenized_inputs = tokenizer(
            examples[text_column_name],
            padding=padding,
            truncation=True,
            # We use this argument because the texts in our dataset are lists of words (with a label for each word).
            is_split_into_words=True,
        )
        labels = []
        for i, label in enumerate(examples[label_column_name]):
            word_ids = tokenized_inputs.word_ids(batch_index=i)
            previous_word_idx = None
            label_ids = []
            for word_idx in word_ids:
                # Special tokens have a word id that is None. We set the label to -100 so they are automatically
                # ignored in the loss function.
                if word_idx is None:
                    label_ids.append(-100)
                # We set the label for the first token of each word.
                elif word_idx != previous_word_idx:
                    label_ids.append(label_to_id[label[word_idx]])
                # For the other tokens in a word, we set the label to either the current label or -100, depending on
                # the label_all_tokens flag.
                else:
                    label_ids.append(label_to_id[label[word_idx]] if data_args.label_all_tokens else -100)
                previous_word_idx = word_idx

            labels.append(label_ids)
        tokenized_inputs["labels"] = labels
        return tokenized_inputs

    if training_args.do_train:
        if "train" not in datasets:
            raise ValueError("--do_train requires a train dataset")
        train_dataset = datasets["train"]
        if data_args.max_train_samples is not None:
            train_dataset = train_dataset.select(range(data_args.max_train_samples))
        train_dataset = train_dataset.map(
            tokenize_and_align_labels,
            batched=True,
            num_proc=data_args.preprocessing_num_workers,
            load_from_cache_file=not data_args.overwrite_cache,
        )

    if training_args.do_eval:
        if "validation" not in datasets:
            raise ValueError("--do_eval requires a validation dataset")
        eval_dataset = datasets["validation"]
        if data_args.max_val_samples is not None:
            eval_dataset = eval_dataset.select(range(data_args.max_val_samples))
        eval_dataset = eval_dataset.map(
            tokenize_and_align_labels,
            batched=True,
            num_proc=data_args.preprocessing_num_workers,
            load_from_cache_file=not data_args.overwrite_cache,
        )

    if training_args.do_predict:
        if "test" not in datasets:
            raise ValueError("--do_predict requires a test dataset")
        test_dataset = datasets["test"]
        if data_args.max_test_samples is not None:
            test_dataset = test_dataset.select(range(data_args.max_test_samples))
        test_dataset = test_dataset.map(
            tokenize_and_align_labels,
            batched=True,
            num_proc=data_args.preprocessing_num_workers,
            load_from_cache_file=not data_args.overwrite_cache,
        )

    # Data collator
    data_collator = DataCollatorForTokenClassification(tokenizer, pad_to_multiple_of=8 if training_args.fp16 else None)

    # Metrics
    metric = load_metric("seqeval")

    def compute_metrics(p):
        predictions, labels = p
        predictions = np.argmax(predictions, axis=2)

        # Remove ignored index (special tokens)
        true_predictions = [
            [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
            for prediction, label in zip(predictions, labels)
        ]
        true_labels = [
            [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
            for prediction, label in zip(predictions, labels)
        ]

        results = metric.compute(predictions=true_predictions, references=true_labels)
        if data_args.return_entity_level_metrics:
            # Unpack nested dictionaries
            final_results = {}
            for key, value in results.items():
                if isinstance(value, dict):
                    for n, v in value.items():
                        final_results[f"{key}_{n}"] = v
                else:
                    final_results[key] = value
            return final_results
        else:
            return {
                "precision": results["overall_precision"],
                "recall": results["overall_recall"],
                "f1": results["overall_f1"],
                "accuracy": results["overall_accuracy"],
            }

    def model_init():
        model = AutoModelForTokenClassification.from_pretrained(
            model_args.model_name_or_path,
            from_tf=bool(".ckpt" in model_args.model_name_or_path),
            config=config,
            cache_dir=model_args.cache_dir,
            revision=model_args.model_revision,
            use_auth_token=True if model_args.use_auth_token else None,
        )
        return model

    class CustomTrainer(Trainer):

        def __init__(self, *args, **kwargs):
            super(CustomTrainer, self).__init__(*args, **kwargs)

        def _hp_search_setup(self, trial: Any):
            try:
                trial.pop('wandb', None)
            except AttributeError:
                pass
            super(CustomTrainer, self)._hp_search_setup(trial)

    # Initialize our Trainer
    trainer = CustomTrainer(
        model_init=model_init,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset if training_args.do_eval else None,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
        data_collator=data_collator,
    )

    # Hyperparameter Search
    def hp_space_fn(*args, **kwargs):
        config = {
            "seed": tune.choice([42, 43, 44]),
            "weight_decay": tune.choice([0.0, 0.1, 0.2, 0.3]),
            "adam_epsilon": tune.choice([1e-6, 1e-7, 1e-8]),
            "max_grad_norm": tune.choice([1.0, 2.0]),
            "warmup_steps": tune.choice([50, 100, 500, 1000]),
            "learning_rate": tune.choice([2e-5, 3e-5, 4e-5, 5e-5]),
            "num_train_epochs": tune.quniform(0.0, 8.0, 0.5),
        }
        wandb_config = {
            "wandb": {
                "project": "hf-ner-testing",
                "api_key": os.environ.get("API_KEY"),
                "log_config": True
            }
        }
        config.update(wandb_config)
        return config

    time_budget_h = 60 * 60 * int(ray_args.time_budget_h)

    best_run = trainer.hyperparameter_search(
        direction="maximize",
        backend="ray",
        scheduler=PopulationBasedTraining(
            time_attr='time_total_s',
            metric='eval_f1',
            mode='max',
            perturbation_interval=600.0
        ),
        hp_space=hp_space_fn,
        loggers=DEFAULT_LOGGERS + (WandbLogger,),
        time_budget_s=time_budget_h,
        keep_checkpoints_num=1,
        checkpoint_score_attr='eval_f1',
        compute_objective=compute_objective
    )

    output_params_file = os.path.join(
        training_args.output_dir,
        "best_run.json"
    )

    with open(output_params_file, "w") as f:
        json.dump(
            best_run.hyperparameters,
            f,
            indent=4)

    return best_run


if __name__ == "__main__":
    main()

And these are the args I used for running it:

--model_name_or_path neuralmind/bert-base-portuguese-cased
--train_file train.json
--validation_file dev.json
--output_dir output
--do_train
--do_eval
--evaluation_strategy steps
--per_device_train_batch_size=2
--per_device_eval_batch_size=2
--time_budget_h 2

This is the full output log:

/media/discoD/anaconda3/envs/transformers/bin/python /media/discoD/pycharm-community-2019.2/plugins/python-ce/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 38419 --file /media/discoD/repositorios/transformers_pedro/examples/pytorch/token-classification/run_ner_hp_search_442.py --model_name_or_path neuralmind/bert-base-portuguese-cased --train_file train.json --validation_file dev.json --output_dir transformers-hp --do_train --do_eval --evaluation_strategy steps --per_device_train_batch_size=2 --per_device_eval_batch_size=2 --time_budget_h 2
Connected to pydev debugger (build 211.7142.13)
05/03/2021 08:10:04 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
05/03/2021 08:10:04 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=transformers-hp, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.STEPS, prediction_loss_only=False, per_device_train_batch_size=2, per_device_eval_batch_size=2, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/May03_08-10-04_user-XPS-8700, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=transformers-hp, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard', 'wandb'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, _n_gpu=1)
05/03/2021 08:10:04 - WARNING - datasets.builder -   Using custom data configuration default-438421c06175ed26
05/03/2021 08:10:04 - WARNING - datasets.builder -   Reusing dataset json (/home/user/.cache/huggingface/datasets/json/default-438421c06175ed26/0.0.0/83d5b3a2f62630efc6b5315f00f20209b4ad91a00ac586597caee3a4da0bef02)
[INFO|configuration_utils.py:463] 2021-05-03 08:10:06,050 >> loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /home/user/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
[INFO|configuration_utils.py:499] 2021-05-03 08:10:06,063 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "finetuning_task": "ner",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5",
    "6": "LABEL_6",
    "7": "LABEL_7",
    "8": "LABEL_8",
    "9": "LABEL_9",
    "10": "LABEL_10",
    "11": "LABEL_11",
    "12": "LABEL_12"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_10": 10,
    "LABEL_11": 11,
    "LABEL_12": 12,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5,
    "LABEL_6": 6,
    "LABEL_7": 7,
    "LABEL_8": 8,
    "LABEL_9": 9
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 29794
}

[INFO|configuration_utils.py:463] 2021-05-03 08:10:06,767 >> loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /home/user/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
[INFO|configuration_utils.py:499] 2021-05-03 08:10:06,777 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 29794
}

[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,936 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/vocab.txt from cache at /home/user/.cache/huggingface/transformers/aa6d50227b77416b26162efcf0cc9e9a702d13920840322060a2b41a44a8aff4.af25fb1e29ad0175300146695fd80069be69b211c52fa5486fa8aae2754cc814
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,936 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer.json from cache at None
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,937 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/added_tokens.json from cache at /home/user/.cache/huggingface/transformers/9188d297517828a862f4e0b0700968574ca7ad38fbc0832c409bf7a9e5576b74.5cc6e825eb228a7a5cfd27cb4d7151e97a79fb962b31aaf1813aa102e746584b
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,937 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/special_tokens_map.json from cache at /home/user/.cache/huggingface/transformers/eecc45187d085a1169eed91017d358cc0e9cbdd5dc236bcd710059dbf0a2f816.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,938 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer_config.json from cache at /home/user/.cache/huggingface/transformers/f1a9ba41d40e8c6f5ba4988aa2f7702c3b43768183e4b82483e04f2848841ecf.a6c00251b9344c189e2419373d6033016d0cd3d87ea59f6c86069046ac81956d
[INFO|modeling_utils.py:1051] 2021-05-03 08:10:10,709 >> loading weights file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin from cache at /home/user/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
[WARNING|modeling_utils.py:1158] 2021-05-03 08:10:13,606 >> Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1169] 2021-05-03 08:10:13,607 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at neuralmind/bert-base-portuguese-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 7/7 [00:02<00:00,  3.06ba/s]
100%|██████████| 2/2 [00:00<00:00,  3.13ba/s]
[INFO|modeling_utils.py:1051] 2021-05-03 08:10:19,160 >> loading weights file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin from cache at /home/user/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
[WARNING|modeling_utils.py:1158] 2021-05-03 08:10:22,280 >> Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1169] 2021-05-03 08:10:22,280 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at neuralmind/bert-base-portuguese-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|trainer.py:482] 2021-05-03 08:10:24,327 >> The following columns in the training set  don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, tokens.
[INFO|trainer.py:482] 2021-05-03 08:10:24,334 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, tokens.
[INFO|integrations.py:184] 2021-05-03 08:10:24,396 >> No `resources_per_trial` arg was passed into `hyperparameter_search`. Setting it to a default value of 1 CPU and 1 GPU for each trial.
2021-05-03 08:10:25,807	INFO services.py:1172 -- View the Ray dashboard at http://127.0.0.1:8265
2021-05-03 08:10:27,788	WARNING function_runner.py:540 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
== Status ==
Memory usage on this node: 21.2/31.4 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 1/8 CPUs, 1/1 GPUs, 0.0/7.67 GiB heap, 0.0/2.64 GiB objects (0/1.0 accelerator_type:GTX)
Result logdir: /home/user/ray_results/_inner_2021-05-03_08-10-27
Number of trials: 1/20 (1 RUNNING)
+--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------+
| Trial name         | status   | loc   |   adam_epsilon |   learning_rate |   max_grad_norm |   num_train_epochs |   seed |   warmup_steps |   weight_decay |
|--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------|
| _inner_2a8cd_00000 | RUNNING  |       |          1e-06 |           4e-05 |               2 |                  3 |     42 |            500 |              0 |
+--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------+


wandb: Currently logged in as: pvcastro (use `wandb login --relogin` to force relogin)
2021-05-03 08:10:31,794	ERROR trial_runner.py:616 -- Trial _inner_2a8cd_00000: Error processing event.
Traceback (most recent call last):
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1456, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train_buffered() (pid=4311, ip=172.16.9.2)
  File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trainable.py", line 167, in train_buffered
    result = self.train()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trainable.py", line 226, in train
    result = self.step()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 366, in step
    self._report_thread_runner_error(block=True)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 512, in _report_thread_runner_error
    raise TuneError(
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train_buffered() (pid=4311, ip=172.16.9.2)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
    self._entrypoint()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
    return self._trainable_func(self.config, self._status_reporter,
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
    output = fn()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
    inner(config, checkpoint_dir=None)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
    fn_kwargs[k] = parameter_registry.get(prefix + k)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
    return ray.get(self.references[k])
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
    self._deserialize_object(data, metadata, object_ref))
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
    return self._deserialize_msgpack_data(data, metadata_fields)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
    python_objects = self._deserialize_pickle5_data(pickle5_data)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
    obj = pickle.loads(in_band, buffers=buffers)
ModuleNotFoundError: No module named 'datasets_modules'
(pid=4311) 2021-05-03 08:10:31,755	ERROR function_runner.py:254 -- Runner Thread raised error.
(pid=4311) Traceback (most recent call last):
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
(pid=4311)     self._entrypoint()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
(pid=4311)     return self._trainable_func(self.config, self._status_reporter,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=4311)     output = fn()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
Result for _inner_2a8cd_00000:
  {}
  
(pid=4311)     inner(config, checkpoint_dir=None)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
(pid=4311)     fn_kwargs[k] = parameter_registry.get(prefix + k)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
(pid=4311)     return ray.get(self.references[k])
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
(pid=4311)     return func(*args, **kwargs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1448, in get
(pid=4311)     values, debugger_breakpoint = worker.get_objects(
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 319, in get_objects
(pid=4311)     return self.deserialize_objects(data_metadata_pairs,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 282, in deserialize_objects
(pid=4311)     return context.deserialize_objects(data_metadata_pairs, object_refs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
(pid=4311)     self._deserialize_object(data, metadata, object_ref))
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
(pid=4311)     return self._deserialize_msgpack_data(data, metadata_fields)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
(pid=4311)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
(pid=4311)     obj = pickle.loads(in_band, buffers=buffers)
(pid=4311) ModuleNotFoundError: No module named 'datasets_modules'
(pid=4311) Exception in thread Thread-2:
(pid=4311) Traceback (most recent call last):
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/threading.py", line 932, in _bootstrap_inner
(pid=4311)     self.run()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 267, in run
(pid=4311)     raise e
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
(pid=4311)     self._entrypoint()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
(pid=4311)     return self._trainable_func(self.config, self._status_reporter,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=4311)     output = fn()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
(pid=4311)     inner(config, checkpoint_dir=None)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
(pid=4311)     fn_kwargs[k] = parameter_registry.get(prefix + k)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
(pid=4311)     return ray.get(self.references[k])
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
(pid=4311)     return func(*args, **kwargs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1448, in get
(pid=4311)     values, debugger_breakpoint = worker.get_objects(
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 319, in get_objects
(pid=4311)     return self.deserialize_objects(data_metadata_pairs,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 282, in deserialize_objects
(pid=4311)     return context.deserialize_objects(data_metadata_pairs, object_refs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
(pid=4311)     self._deserialize_object(data, metadata, object_ref))
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
(pid=4311)     return self._deserialize_msgpack_data(data, metadata_fields)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
(pid=4311)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
(pid=4311)     obj = pickle.loads(in_band, buffers=buffers)
(pid=4311) ModuleNotFoundError: No module named 'datasets_modules'
Problem at: /media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/integration/wandb.py 197 run
python-BaseException

CondaError: KeyboardInterrupt


Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 22 (6 by maintainers)

Most upvoted comments

ah yes! will put on todo list.

On Fri, Jun 25, 2021 at 8:14 AM Pedro Vitor Quinta de Castro < @.***> wrote:

@richardliaw https://github.com/richardliaw @amogkam https://github.com/amogkam anyone working on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/11565#issuecomment-868571102, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCRZZLICPENUETSL5PXSJLTUSMNHANCNFSM44AUIF4Q .

richardliaw on Jun 25, 2021

@amogkam I believe the dataset_modules path is added upon load_dataset, which can occur before the creation of the Trainer.

To support this, I think we need to allow the custom path to be added before the invocation of each trial.

richardliaw on May 5, 2021

What I am saying is just that if you want that that one of the pre-packaged modules is used (json, csv, etc.), you have to be sure that the dataset name passed to load_dataset must be “json”, “csv”, …

If on the contrary you pass a custom --dataset_name, then the library will need to download the corresponding module.

albertvillanova on May 4, 2021