scikit-learn: sklearn.mixture.GaussianMixture doesn't sample properly plus a prob with fitting
Description
the new sklearn.mixture.GaussianMixture has messed up several things. 2 of them:
- IMPORTANT: why can’t I sample from a GMM if I don’t fit. I want to define all the parameters and to be able to sample from that distribution
- the trick I use is to specify gmm.precisions_cholesky_ but I find it overcomplicated
- why does it assume that the covariance has to be a 2d matrix and I have to specify “spherical” if I am in 1d.
- IMPORTANT: the process of sampling gives back fewer samples than it’s asked for, 665 instead of 1000. I think its caused by the line 384 in scikit-learn/sklearn/mixture/base.py:
n_samples_comp = np.round(self.weights_ * n_samples).astype(int)
Steps/Code to Reproduce
import numpy as np
from sklearn.mixture import GaussianMixture
np.random.seed(0)
# I define 10 gaussians for the mixture
n_gaussians = 10
means = np.array(10*(2*np.random.rand(n_gaussians,1)-1))
weights = np.array(.4*np.random.rand(n_gaussians,1)) ** 2
to_normalize_weights = np.random.rand(n_gaussians,1)
covars = np.array(to_normalize_weights/np.sum(to_normalize_weights))
precisions = np.array([ x**(-1) for x in covars ])
gmm = GaussianMixture(n_gaussians, covariance_type='full')
gmm.means_ = means
gmm.weights_ = weights
gmm.covariances_ = covars
gmm.precisions_cholesky_ = precisions #otherwise sklearn thinks it's not fit
smples = gmm.sample(n_samples=1000)
print len(smples[0]), len(smples[1])
Expected Results
1000,1000
Actual Results
665, 665
Versions
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 16 (13 by maintainers)
Thanks a lot.
At the end I decided to use another package instead of sklearn
http://pypr.sourceforge.net/mog.html
I don’t know if you can get inspiration from there. I don’t know how far reaching you want your package to be. In my interest I would want it as complete as possible, but it’s clear that there are alternative solutions.
Best, Luca