onnx-mlir: commit 694cdb033214e6baa314ae396fbeff1ce6a77ca0 breaks mnist model on s390x

I have a mnist model and driver that worked fine before this patch but now is getting the wrong results. This looks like an endian problem. I remember Tian asked about the endian issue for this patch but there was no response.

Both native library and JNI jar are giving the wrong results (expected since JNI simply calls the native library).

Commands to compile the native library and run the inference:

# onnx-mlir --EmitLib model.onnx
# gcc -g -Iinclude_path mnist.c -o mnist model.so -Llib_path -lcruntime
# LD_LIBRARY_PATH=. ./mnist

include_path and lib_path are where runtime headers and libraries are, respectively. You should see the following output:

input  signature: [    { "type" : "float" , "dims" : [1 , 1 , 28 , 28]  }

]
output signature:  [   { "type" : "float" , "dims" : [1 , 10]  }

]
prediction[0] = -118029507121385983488744457502720.000000
prediction[1] = -71.833557
prediction[2] = 1230729779023607448758206332928.000000
prediction[3] = 112626513890916027531402412032.000000
prediction[4] = -22.549494
prediction[5] = 9.097518
prediction[6] = 1172105306786765666460368896.000000
prediction[7] = -13.723551
prediction[8] = -2.159079
prediction[9] = -3669.105713
The digit is 2

The correct results should be:

input  signature: [    { "type" : "float" , "dims" : [1 , 1 , 28 , 28]  }

]
output signature:  [   { "type" : "float" , "dims" : [1 , 10]  }

]
prediction[0] = 84.358864
prediction[1] = -71.825768
prediction[2] = -2.669611
prediction[3] = -23.258825
prediction[4] = -22.675903
prediction[5] = 9.237737
prediction[6] = 18.885262
prediction[7] = -13.772935
prediction[8] = -2.074757
prediction[9] = 0.823274
The digit is 0

Commands to compile the JNI jar and run the inference:

# onnx-mlir --EmitJNI model.onnx
# javac -cp model.jar MnistTestDriver.java
# ONNX_MLIR_JNI_LOG_LEVEL=debug java -cp .:model.jar MnistTestDriver

You should see the following output:

[    { "type" : "float" , "dims" : [1 , 1 , 28 , 28]  }

]
 [   { "type" : "float" , "dims" : [1 , 10]  }

]
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:data=[ -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 ... ]
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:shape=[ 1 1 28 28 ]
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:strides=[ 784 784 28 1 ]
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:dataType=1
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:bufferSize=3136
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:rank=4
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:owning=0
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:numElems=784
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:data=[ -118029507121385983488744457502720.000000 -71.833557 1230729779023607448758206332928.000000 112626513890916027531402412032.000000 -22.549494 9.097518 1172105306786765666460368896.000000 -13.723551 -2.159079 -3669.105713 ]
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:shape=[ 1 10 ]
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:strides=[ 10 1 ]
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:dataType=1
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:bufferSize=40
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:rank=2
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:owning=1
[2021-04-10 12:59:13 -0400][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:numElems=10
shape=[1, 10]
prediction[0] = -1.1802951E32
prediction[1] = -71.83356
prediction[2] = 1.2307298E30
prediction[3] = 1.1262651E29
prediction[4] = -22.549494
prediction[5] = 9.097518
prediction[6] = 1.1721053E27
prediction[7] = -13.723551
prediction[8] = -2.159079
prediction[9] = -3669.1057
The digit is 2

The data=[ -118029507121385983488744457502720.000000 ... line are the results coming back from the native code, which are wrong. The prediction[0] = -1.1802951E32 ... lines are printed out by the Java code, which basically verify that it got the same results from the native code.

The correct results should be (before this patch and on amd64):

[    { "type" : "float" , "dims" : [1 , 1 , 28 , 28]  }

]
 [   { "type" : "float" , "dims" : [1 , 10]  }

]
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:data=[ -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 -0.424213 ... ]
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:shape=[ 1 1 28 28 ]
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:strides=[ 784 784 28 1 ]
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:dataType=1
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:bufferSize=3136
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:rank=4
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:owning=0
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_java_to_native:246 omt[0]:numElems=784
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:data=[ 84.358864 -71.825768 -2.669611 -23.258825 -22.675903 9.237737 18.885262 -13.772935 -2.074757 0.823274 ]
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:shape=[ 1 10 ]
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:strides=[ 10 1 ]
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:dataType=1
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:bufferSize=40
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:rank=2
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:owning=1
[2021-04-10 17:10:09 +0000][debug]jniwrapper.c:omtl_native_to_java:313 omt[0]:numElems=10
shape=[1, 10]
prediction[0] = 84.358864
prediction[1] = -71.82577
prediction[2] = -2.6696112
prediction[3] = -23.258825
prediction[4] = -22.675903
prediction[5] = 9.237737
prediction[6] = 18.885262
prediction[7] = -13.772935
prediction[8] = -2.074757
prediction[9] = 0.82327354
The digit is 0

mnist.zip

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 41 (4 by maintainers)

Most upvoted comments

Just a memo. This PR #819 annotates ModuleOp with endianness information. With this information, LLVM optimizations correctly interprets our float array which is stored in string. So I can say we completely solve this issue on Z.