intel-extension-for-pytorch: F32 Example Training Gets Stuck after One Iteration of For Loop
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ sudo -H ./build.sh
[a bunch of output here]
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ IMAGE_NAME=intel-extension-for-pytorch:gpu
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ VIDEO=$(getent group video | sed -E 's,^video:[^:]*:([^:]*):.*$,\1,')
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,')
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ test -z "$RENDER" || RENDER_GROUP="--group-add ${RENDER}"
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ sudo -H docker run --rm -v /home/tedliosu/intel_pytorch_workspace:/workspace --group-add ${VIDEO} ${RENDER_GROUP} --device=/dev/dri --ipc=host -it $IMAGE_NAME bash
[sudo] password for tedliosu:
groups: cannot find name for group ID 109
root@8e852a62c8b4:/# cd workspace/
root@d4958d53cb7c:/workspace# python3 -m trace -t ipex_f32_example.py 2>&1 | tee ipex_f32_example_py_trace.txt | grep ipex_f32_example
--- modulename: ipex_f32_example, funcname: <module>
ipex_f32_example.py(1): import torch
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(2): import torchvision
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(4): import intel_extension_for_pytorch as ipex
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(7): LR = 0.001
ipex_f32_example.py(8): DOWNLOAD = True
ipex_f32_example.py(9): DATA = 'datasets/cifar10/'
ipex_f32_example.py(11): transform = torchvision.transforms.Compose([
ipex_f32_example.py(12): torchvision.transforms.Resize((224, 224)),
ipex_f32_example.py(13): torchvision.transforms.ToTensor(),
ipex_f32_example.py(14): torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
ipex_f32_example.py(11): transform = torchvision.transforms.Compose([
ipex_f32_example.py(16): train_dataset = torchvision.datasets.CIFAR10(
ipex_f32_example.py(17): root=DATA,
ipex_f32_example.py(18): train=True,
ipex_f32_example.py(19): transform=transform,
ipex_f32_example.py(20): download=DOWNLOAD,
ipex_f32_example.py(16): train_dataset = torchvision.datasets.CIFAR10(
ipex_f32_example.py(22): train_loader = torch.utils.data.DataLoader(
ipex_f32_example.py(23): dataset=train_dataset,
ipex_f32_example.py(24): batch_size=128
ipex_f32_example.py(22): train_loader = torch.utils.data.DataLoader(
ipex_f32_example.py(27): model = torchvision.models.resnet50()
ipex_f32_example.py(28): criterion = torch.nn.CrossEntropyLoss().to("xpu")
ipex_f32_example.py(29): optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
ipex_f32_example.py(30): model.train()
ipex_f32_example.py(32): model = model.to("xpu")
ipex_f32_example.py(33): model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.float32)
ipex_f32_example.py(36): for batch_idx, (data, target) in enumerate(train_loader):
ipex_f32_example.py(37): print("Begin 1 loop iteration")
ipex_f32_example.py(39): data = data.to("xpu")
ipex_f32_example.py(40): print("Moved data onto XPU")
ipex_f32_example.py(41): target = target.to("xpu")
ipex_f32_example.py(42): print("Moved target onto XPU")
ipex_f32_example.py(44): optimizer.zero_grad()
ipex_f32_example.py(45): print("About to apply model to data")
ipex_f32_example.py(46): output = model(data)
ipex_f32_example.py(47): print("Finished applying model to data")
ipex_f32_example.py(48): loss = criterion(output, target)
ipex_f32_example.py(49): print("About to execute loss.backward()")
ipex_f32_example.py(50): loss.backward()
ipex_f32_example.py(51): print("About to execute optimizer.step()")
ipex_f32_example.py(52): optimizer.step()
ipex_f32_example.py(53): print("Current batch id : %d" % (batch_idx))
ipex_f32_example.py(54): data = None
ipex_f32_example.py(55): target = None
ipex_f32_example.py(36): for batch_idx, (data, target) in enumerate(train_loader):
[I killed the process after ***90 minutes*** of being stuck here]
root@d4958d53cb7c:/workspace# tail -n35 ipex_f32_example_py_trace.txt
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
root@d4958d53cb7c:/workspace# pip list
Package Version
--------------------------- -------------------
contourpy 1.0.6
cycler 0.11.0
fonttools 4.38.0
intel-extension-for-pytorch 1.10.200+gpu
kiwisolver 1.4.4
matplotlib 3.6.1
numpy 1.23.4
packaging 21.3
Pillow 9.3.0
pip 20.0.2
pyparsing 3.0.9
python-dateutil 2.8.2
setuptools 45.2.0
six 1.16.0
torch 1.10.0a0+git3d5f2d4
torchvision 0.11.3
typing-extensions 4.4.0
wheel 0.34.2
Contents of ipex_f32_example.py
(as you can see it’s basically the Float32 example from here):
import torch
import torchvision
############# code changes ###############
import intel_extension_for_pytorch as ipex
############# code changes ###############
LR = 0.001
DOWNLOAD = True
DATA = 'datasets/cifar10/'
transform = torchvision.transforms.Compose([
torchvision.transforms.Resize((224, 224)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_dataset = torchvision.datasets.CIFAR10(
root=DATA,
train=True,
transform=transform,
download=DOWNLOAD,
)
train_loader = torch.utils.data.DataLoader(
dataset=train_dataset,
batch_size=128
)
model = torchvision.models.resnet50()
criterion = torch.nn.CrossEntropyLoss().to("xpu")
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()
#################################### code changes ################################
model = model.to("xpu")
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.float32)
#################################### code changes ################################
for batch_idx, (data, target) in enumerate(train_loader):
print("Begin 1 loop iteration")
########## code changes ##########
data = data.to("xpu")
print("Moved data onto XPU")
target = target.to("xpu")
print("Moved target onto XPU")
########## code changes ##########
optimizer.zero_grad()
print("About to apply model to data")
output = model(data)
print("Finished applying model to data")
loss = criterion(output, target)
print("About to execute loss.backward()")
loss.backward()
print("About to execute optimizer.step()")
optimizer.step()
print("Current batch id : %d" % (batch_idx))
data = None
target = None
torch.save({
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
}, 'checkpoint.pth')
As you can see in the command line output I noted that the ipex_f32_example.py
script basically froze for 90 minutes when I was running it after it reached the for batch_idx, (data, target) in enumerate(train_loader):
line; when I was running it without the tracing it froze at data = data.to("xpu")
for over 8 hours before I had to simply kill the process. I have no idea if this is a driver issue or a torchvision issue or whatever, but this is really annoying and I’d be more than happy to provide extra info about my system to help solve this freezing problem. Also note that tail -n35 ipex_f32_example_py_trace.txt
displays the last 35 lines of the trace I ran on the script to see exactly where the execution of the script is freezing.
P.S. since I already mentioned this issue in here before I made this separate issue, I saw the reply here to my initial comment about this issue but I have no idea how to apply that person’s comment to help solve this issue 😕
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 20 (9 by maintainers)
Hi @tedliosu! Thanks again for the info! We’ll investigate this issue while enabling
Intel Extension for PyTorch
for iGPUs. iGPUs are currently unsupported.Awesome! Thanks, @tedliosu! Looks like your iGPU is indeed
tgllp
! 😃Hi @tedliosu, thanks for checking it out! While the link you provided does not have the latest info (it doesn’t even list discrete GPUs), you’re right that you might have to use
tgllp
! Since we don’t currently have official support for iGPUs, we haven’t gotten a chance to check them out.Can you please use
dpcpp
instead oficpx
for your example, BTW? Are you using the latest oneAPI Basekit, BTW? If so,ocloc compile --help
would show you even more device targets.Thanks to you taking this initiative, your solution here would also help others! 😃
Thanks for your interest in
Intel Extension for PyTorch
, @tedliosu! We look forward to your response!As @jingxu10 also mentioned, the current
whl
s are for Flex Series 170 GPUs (which are discrete GPUs similar to Intel Arc Alchemist series GPUs). Just FYI, AOT (orUSE_AOT_DEVLIST
) builds GPU kernels for the target device (in this case, thewhl
s were generated for Discrete GPUs), so they’d not work with your iGPU, and you’d have to build from source for your own GPU.There are several things involved.
As Sanchit mentioned in the reply above, you can try compiling ipex from source with AOT configured for your graphics card. What needs to mention is that the official support to IPEX GPU currently is only on Flex Series 170.
This current release is for Discrete Graphics cards. While it only mentions
Flex Series 170 GPU
, it also supports the Intel Arc Alchemist series GPUs.Intel Extension for PyTorch
is currently not officially supported for integrated GPUs. We may support them in the near future, though. However, if you’d like, you’d be able to build from source with these instructions, except that you’d have to set the environment variableUSE_AOT_DEVLIST
asxe
, or modifyUSE_AOT_DEVLIST
inCMakeLists.txt
:@jingxu10 Thank you so much for the info! I’ll try building from source and testing it out within the next week or so as I’m very busy with school right now 😄