transformers: device_map='auto' gives bad results

System Info

  • transformers version: 4.25.1

  • Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.17

  • Python version: 3.8.15

  • Huggingface_hub version: 0.11.1

  • PyTorch version (GPU?): 1.11.0 (True)

  • Tensorflow version (GPU?): not installed (NA)

  • Flax version (CPU?/GPU?/TPU?): not installed (NA)

  • Jax version: not installed

  • JaxLib version: not installed

  • Using GPU in script?: yes

  • Using distributed or parallel set-up in script?: no

  • GPUs: two A100

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Minimal test example:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'EleutherAI/gpt-neo-125M'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(model_name)

sentence = 'Hello, nice to meet you. How are'
with torch.no_grad():
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    gen_tokens = model.generate(tensor_input, max_length=32)
    generated = tokenizer.batch_decode(gen_tokens)[0]

print(generated)

Results:

Hello, nice to meet you. How are noise retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy

The above result is not expected behavior. Without device_map='auto' at line 5, it works correctly. Line 5 becomes model = AutoModelForCausalLM.from_pretrained(model_name)

Results:

Hello, nice to meet you. How are you?

I’m a bit of a newbie to the world of web development, but I

My machine has two A100 (80 GB) GPUs, and I confirmed that the model is loaded on two GPUs when I use device_map='auto'.

Expected behavior

Explained above

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 17 (3 by maintainers)

Most upvoted comments

I solved this problem by disabling ACS in BIOS. This document might be helpful to some of you. https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html

Hello, @younesbelkada I’m using the same version 0.15.0 of accelerate. I also got the correct result when I ran with export CUDA_VISIBLE_DEVICES=0 Still wrong results with two GPUS export CUDA_VISIBLE_DEVICES=0,1

Mmmm there is no reason for the script to give different results for different GPUs, especially since removing the device_map=“auto” gives the same results.

I also can’t reproduce on my side. Are you absolutely certain your script is launched in the same Python environment you are reporting? E.g. can you print the versions of Accelerate/Transformers/Pytorch in the same script?

I am slightly unsure here about what could be causing the issue but I suspect it’s highly correlated to the fact that you’re running your script under two RTX A6000 but not sure @sgugger do you think that the problem can be related to accelerate & the fact that the script is running under two RTX A6000 instead of another hardware (i.e. have you seen similar discrepancy errors in the past)? @youngwoo-yoon could you ultimately try the script with the latest pytorch version (1.13.1)?