datasets: ImportError: cannot import name 'SchemaInferenceError' from 'datasets.arrow_writer' (/opt/conda/lib/python3.10/site-packages/datasets/arrow_writer.py)
Describe the bug
While importing from packages getting the error Code:
import os
import torch
from datasets import load_dataset, Dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
from huggingface_hub import login
import pandas as pd
Error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[5], line 14
4 from transformers import (
5 AutoModelForCausalLM,
6 AutoTokenizer,
(...)
11 logging
12 )
13 from peft import LoraConfig, PeftModel
---> 14 from trl import SFTTrainer
15 from huggingface_hub import login
16 import pandas as pd
File /opt/conda/lib/python3.10/site-packages/trl/__init__.py:21
8 from .import_utils import (
9 is_diffusers_available,
10 is_npu_available,
(...)
13 is_xpu_available,
14 )
15 from .models import (
16 AutoModelForCausalLMWithValueHead,
17 AutoModelForSeq2SeqLMWithValueHead,
18 PreTrainedModelWrapper,
19 create_reference_model,
20 )
---> 21 from .trainer import (
22 DataCollatorForCompletionOnlyLM,
23 DPOTrainer,
24 IterativeSFTTrainer,
25 PPOConfig,
26 PPOTrainer,
27 RewardConfig,
28 RewardTrainer,
29 SFTTrainer,
30 )
33 if is_diffusers_available():
34 from .models import (
35 DDPOPipelineOutput,
36 DDPOSchedulerOutput,
37 DDPOStableDiffusionPipeline,
38 DefaultDDPOStableDiffusionPipeline,
39 )
File /opt/conda/lib/python3.10/site-packages/trl/trainer/__init__.py:44
42 from .ppo_trainer import PPOTrainer
43 from .reward_trainer import RewardTrainer, compute_accuracy
---> 44 from .sft_trainer import SFTTrainer
45 from .training_configs import RewardConfig
File /opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:23
21 import torch.nn as nn
22 from datasets import Dataset
---> 23 from datasets.arrow_writer import SchemaInferenceError
24 from datasets.builder import DatasetGenerationError
25 from transformers import (
26 AutoModelForCausalLM,
27 AutoTokenizer,
(...)
33 TrainingArguments,
34 )
ImportError: cannot import name 'SchemaInferenceError' from 'datasets.arrow_writer' (/opt/conda/lib/python3.10/site-packages/datasets/arrow_writer.py
transformers version: 4.36.2 python version: 3.10.12 datasets version: 2.16.1
Steps to reproduce the bug
- Install packages
!pip install -U datasets trl accelerate peft bitsandbytes transformers trl huggingface_hub
- import packages
import os
import torch
from datasets import load_dataset, Dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
from huggingface_hub import login
import pandas as pd
Expected behavior
No error while importing
Environment info
datasets
version: 2.16.0- Platform: Linux-5.15.133±x86_64-with-glibc2.35
- Python version: 3.10.12
huggingface_hub
version: 0.20.1- PyArrow version: 11.0.0
- Pandas version: 2.1.4
fsspec
version: 2023.10.0
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 15 (2 by maintainers)
In kaggle I used:
%pip install -U datasets
and then restarted runtime and then everything works fine.Yes, this is working. When I restart the runtime after installing packages, it’s working perfectly. Thank you so much. But why do we need to restart runtime every time after installing packages?
I have the same issue - using with datasets version 2.16.1. Also this is on a kaggle notebook - other people with the same issue also seem to be having it on kaggle?