Kashgari: [BUG] Different behavior in 0.1.8 and 0.2.1

Environment

  • Colab.research.google.com
  • Kashgari 1.1.8 / 0.2.1

Issue Description

Different behavior in 0.1.8 and 0.2.1 In Kashgari 0.1.8 BLSTModel converge in training process and I see val_acc: 0.98 and train_acc: 0.9594 In Kashgari 0.2.1 BLSTModel is overfitting and I see val_acc ~0.5 and train_acc ~0.96 There is no difference in my code, only different versions of library.

Reproduce

code:

from sklearn.model_selection import train_test_split
import pandas as pd
import nltk
from kashgari.tasks.classification import BLSTMModel

# get and process data
!wget https://www.dropbox.com/s/265kphxkijj1134/fontanka.zip

df1 = pd.read_csv('fontanka.zip')
df1.fillna(' ', inplace = True)
nltk.download('punkt')

# split on train/test
X_train, X_test, y_train, y_test = train_test_split(df1.full_text[:3570].values, df1.textrubric[:3570].values, test_size=0.2, random_state=42)
X_train = [nltk.word_tokenize(sentence) for sentence in X_train]
X_test  = [nltk.word_tokenize(sentence) for sentence in X_test]
y_train = y_train.tolist()
y_test  = y_test.tolist()

# train model
model = BLSTMModel()
model.fit(X_train, y_train, x_validate=X_test, y_validate=y_test, epochs = 10)

code in colab: https://colab.research.google.com/drive/1yTBMeiBl2y7-Yw0DS_vTn2A4y_Vj3N-8

Result

Last epoch:

Kashgari 0.1.8

Epoch 10/10 55/55 [==============================] - 90s 2s/step - loss: 0.1378 - acc: 0.9615 - val_loss: 0.0921 - val_acc: 0.9769

Kashgari 0.2.1

Epoch 10/10 44/44 [==============================] - 76s 2s/step - loss: 0.0990 - acc: 0.9751 - val_loss: 2.3739 - val_acc: 0.5323

Other Comment

In 0.2.1 all models now in different file and lr hyperparameter is given explicitly (1e-3) In 0.1.8 lr hyperparameter was omitted, I suppose that it used keras default, which is the same (1e-3)

Also in 0.1.8 you had (dense size = +1 classes on classifier) https://github.com/BrikerMan/Kashgari/issues/21 and ommited it in 0.2.1. I don’t see how this could affect training process.

I couldn’t find more differences between versions, could you help with this - why models began to overfit in new version of library?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (5 by maintainers)

Most upvoted comments

I have reproduced the problem. 0.2.1 does overfit in my dataset too. I have compared 0.1.8 with 0.2.1, we have changed BLSTMModel’s activation function from sigmoid to softmax. Please try this with 0.2.1.

hyper_parameters = {
    'activation_layer': {
        'activation': 'sigmoid'
    }
}
model1 = BLSTMModel(hyper_parameters=hyper_parameters)

model2 = BLSTMModel()