imbalanced-learn: [BUG] SMOTEEN and SMOTETomek run for ages on larger datasets on the new update
I’ve been using SMOTETomek in production with success for a while. The 0.7.6 version runs through the dataset in around 5-8min. Updated and the new version ran for 1,5h before I killed the process.
balancer = SMOTETomek(random_state=2425, n_jobs=-1)
df_resampled, target_resampled = balancer.fit_resample(dataframe, target)
return df_resampled, target_resampled
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 29 (8 by maintainers)
Hello, I’m trying to apply SMOTETOMEK to a base of size 2500000x32 but it runs endlessly. How to do?