scikit-learn: sklearn.mixture.GaussianMixture doesn't sample properly plus a prob with fitting

Description

the new sklearn.mixture.GaussianMixture has messed up several things. 2 of them:

  • IMPORTANT: why can’t I sample from a GMM if I don’t fit. I want to define all the parameters and to be able to sample from that distribution
    • the trick I use is to specify gmm.precisions_cholesky_ but I find it overcomplicated
  • why does it assume that the covariance has to be a 2d matrix and I have to specify “spherical” if I am in 1d.
  • IMPORTANT: the process of sampling gives back fewer samples than it’s asked for, 665 instead of 1000. I think its caused by the line 384 in scikit-learn/sklearn/mixture/base.py:
n_samples_comp = np.round(self.weights_ * n_samples).astype(int)

Steps/Code to Reproduce

import numpy as np
from sklearn.mixture import GaussianMixture
np.random.seed(0)

# I define 10 gaussians for the mixture
n_gaussians = 10
means = np.array(10*(2*np.random.rand(n_gaussians,1)-1))
weights = np.array(.4*np.random.rand(n_gaussians,1)) ** 2
to_normalize_weights = np.random.rand(n_gaussians,1)
covars = np.array(to_normalize_weights/np.sum(to_normalize_weights))
precisions = np.array([ x**(-1) for x in covars ])

gmm = GaussianMixture(n_gaussians, covariance_type='full')
gmm.means_ = means
gmm.weights_ = weights
gmm.covariances_ = covars
gmm.precisions_cholesky_ = precisions    #otherwise sklearn thinks it's not fit

smples = gmm.sample(n_samples=1000)
print len(smples[0]), len(smples[1])

Expected Results

1000,1000

Actual Results

665, 665

Versions

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 1
  • Comments: 16 (13 by maintainers)

Most upvoted comments

Thanks a lot.

At the end I decided to use another package instead of sklearn

http://pypr.sourceforge.net/mog.html

I don’t know if you can get inspiration from there. I don’t know how far reaching you want your package to be. In my interest I would want it as complete as possible, but it’s clear that there are alternative solutions.

Best, Luca