sklearn-onnx: (OneVsOneClassifier) Not able to convert sklearn model using pipeline to ONNX format for real time inferencing

It is a multi-class classification model with sklearn.

I am using OneVsOneClassifier model to train and predict 150 intents. Its a multi-class classification problem.

Data:

text          intents

text1         int1
text2         int2

I convert these intents in labels using:

le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)

Expectation:

Without changing the training pipeline or parameters, note the inference time. Currently, it’s slow, ~1second for 1 inference. So to convert pipeline to ONNX format and then use for inferencing on 1 example.

Code:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC,LinearSVC

def create_pipe(clf):
    
    # Each pipeline uses the same column transformer.  
    column_trans = ColumnTransformer(
            [('Text', TfidfVectorizer(), 'text')
             ],
            remainder='drop') 
    
    pipeline = Pipeline([('prep',column_trans),                     
                         ('clf', clf)])
     
    return pipeline

def fit_and_print(pipeline):
    
    pipeline.fit(X_train, y_train)
    y_pred = pipeline.predict(X_test)

    print(metrics.classification_report(y_test, y_pred, 
                                        target_names=le.classes_, 
                                        digits=3))
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
%time fit_and_print(pipeline)

# convert input to df

def create_test_data(x):
    d = {'text' : x}
    df = pd.DataFrame(d, index=[0])
    return df

revs=[]
for idx in [948, 5717, 458]:
     cur = test.loc[idx, 'text']
     revs.append(cur)
print(revs) 

revs=sam['text'].values

%%time
for rev in revs:
    c_res = pipeline.predict(create_test_data(rev))
    print(rev, '=', labels[c_res[0]])

ONNX conversion code

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType

initial_type = [('UTTERANCE', StringTensorType([None, 2]))]
model_onnx = convert_sklearn(pipeline, initial_types=initial_type)

Error

MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.multiclass.OneVsOneClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

How to resolve this ? Also how to do prediction after converting to ONNX format?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21

Most upvoted comments

@pratikchhapolika done. please check.