imbalanced-learn: [BUG] SMOTEEN and SMOTETomek run for ages on larger datasets on the new update

I’ve been using SMOTETomek in production with success for a while. The 0.7.6 version runs through the dataset in around 5-8min. Updated and the new version ran for 1,5h before I killed the process.

               balancer = SMOTETomek(random_state=2425, n_jobs=-1)
               df_resampled, target_resampled = balancer.fit_resample(dataframe, target)
               return df_resampled, target_resampled

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 29 (8 by maintainers)

Most upvoted comments

Hello, I’m trying to apply SMOTETOMEK to a base of size 2500000x32 but it runs endlessly. How to do?