plaidml: Stripe problems with amd RX580
I just tested out the stripe backend with an amd card (radeon rx580) and while MNIST ran without problem (albeit a bit slower as without stripe) i ran into a crash with the resnet50 model.
Traceback (most recent call last):
File "resnet50.py", line 14, in <module>
preprocess_input(img), np.zeros((SAMPLES, 1000), dtype='float32'), batch_size=BS, epochs=5)
File "/home/nope/venvs/fs/lib/python3.7/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/nope/venvs/fs/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/nope/venvs/fs/lib/python3.7/site-packages/plaidml/keras/backend.py", line 176, in __call__
self._invoker.invoke()
File "/home/nope/venvs/fs/lib/python3.7/site-packages/plaidml/__init__.py", line 1440, in invoke
return Invocation(self._ctx, self)
File "/home/nope/venvs/fs/lib/python3.7/site-packages/plaidml/__init__.py", line 1449, in __init__
self._as_parameter_ = _lib().plaidml_schedule_invocation(ctx, invoker)
File "/home/nope/venvs/fs/lib/python3.7/site-packages/plaidml/__init__.py", line 764, in _check_err
self.raise_last_status()
File "/home/nope/venvs/fs/lib/python3.7/site-packages/plaidml/library.py", line 131, in raise_last_status
raise self.last_status()
plaidml.exceptions.Unknown: AliasMap::AliasMap: Mismatched access dimensions on refinement: d1:X_T18 X_T18
code:
from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
import numpy as np
if __name__ == '__main__':
BS=8
SAMPLES=BS*50
model = ResNet50(weights='imagenet')
img = np.random.rand(SAMPLES, 224, 224, 3)
preds = model.predict(preprocess_input(img)[:BS])
print('Predicted:', decode_predictions(preds, top=3)[0])
model.compile("SGD", loss='categorical_crossentropy')
preds = model.fit(
preprocess_input(img), np.zeros((SAMPLES, 1000), dtype='float32'), batch_size=BS, epochs=5)
Prediction works fine, but training crashes with the above stacktrace. I am using plaidml 0.6.0
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 16 (11 by maintainers)
I minimized the above code a bit more:
With
PLAIDML_USE_STRIPE=1
this leads to the above stacktrace…With
USE_REFLECTIVE_PADDING
set toFalse
in the snippet stripe hangs for ever with one CPU core being maxed out. This hanging is depended on the batch size. It works fine with a BS <8, hangs with 8, works well with a BS > 8 < 13 and hangs again with BS >= 13.Tested with the current master branch.