unsloth: Xformers and other dependency errors

I am attempting to replicate the Mistral 7b model finetuning locally as outlined below Unsloth Open column in the README.md file. I have successfully downloaded the dataset and the model to my local machine, which is a deviation from the original jupyter notebook that used data directly from the cloud.

model_name = "./Mistral-7B-v0.1"
...
dataset = load_dataset(path = "./SlimOrca", split = "train")

The initial stages of the Jupyter notebook, which include multiple code blocks, were executed without any issues. However, I have run into an error during the execution of the trainer_stats = trainer.train() block. The issue arises with a ValueError stating that the Query/Key/Value must have either BMHK or BMK shape.

ValueError: Query/Key/Value should all have BMHK or BMK shape.
  query.shape: torch.Size([4, 2021, 8, 4, 128])
  key.shape  : torch.Size([4, 2021, 8, 4, 128])
  value.shape: torch.Size([4, 2021, 8, 4, 128])

I have even tried reverting back to using the HuggingFace dataset instead of the local one to see if the issue was related to my dataset location. Unfortunately, I encountered the same error. Here’s a brief overview of my setup:

==((====))==  Unsloth: Fast Mistral patching release 2023.12
   \\   [/](https://file+.vscode-resource.vscode-cdn.net/)|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.647 GB
O^O/ \_/ \    CUDA compute capability = 8.9
\        [/](https://file+.vscode-resource.vscode-cdn.net/)    Pytorch version: 2.1.0.post300. CUDA Toolkit = 11.8
 "-____-"     bfloat16 support = TRUE

GPU = NVIDIA GeForce RTX 4090. Max memory = 23.647 GB.
4.66 GB of memory reserved.

Additionally, I have addressed an issue with the torch version format. The original Unsloth configuration didn’t support version numbers with four segments, so I modified the version parsing by changing from major_torch, minor_torch, _ = torch.version.split(".") to major_torch, minor_torch, _ = torch.__version__.split(".")[0:3] to accommodate the expected format.

Would anyone be able to point me in the right direction, or has anyone experienced a similar issue while working on model finetuning replication? Your help would be greatly appreciated as this error is currently standing in the way of my progress.

Thank you in advance for any assistance!

About this issue

Original URL
State: closed
Created 6 months ago
Comments: 21 (13 by maintainers)

Most upvoted comments

@alvis233 Great you got it to work! So in fact I have an elaborate dispatching mechanism via pyproject.toml which is the “new” requirements.txt". The issue is if and else statements to dispatch to which is not allowed as of yet - that would require a setup.py`.

Generally the setup process should be smooth, but i guess sometimes not so much.

But glady it works now!! Hopefully Unsloth will make training and everything faster!! If you need any help - join our Discord!! https://discord.gg/u54VK8m8tk

Again hope I was helpful!! (Closing this issue now!)

danielhanchen on Dec 18, 2023

@alvis233 Including conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 \ -c pytorch -c nvidia -c xformers -c conda-forge -y right?

Hmm this sounds like a complex case - your error message is very weird since I can’t seem to glean anything out of it

danielhanchen on Dec 17, 2023