hmmlearn: Memory error : HMM for MFCC feautres

I am trying to create audio vocabulary from MFCC features by applying HMM. Since I have 10 speakers in the MFCC features. I need 50 states per speaker. So I used N = 500 states and it throws Memory error, but it works fine with N =100 states.

Memory Error is because of computational in efficiency of a machine or due to in proper initialization?

Here is my code

import numpy as np
from hmmlearn import hmm
import librosa
import matplotlib.pyplot as plt

def getMFCC(episode):

    filename = getPathToGroundtruth(episode)

    y, sr = librosa.load(filename)  # Y gives 

    data = librosa.feature.mfcc(y=y, sr=sr)

    return data

def hmm_init(n,data):  #n = states d = no of feautures

    states =[]

    model = hmm.GaussianHMM(n_components=N, covariance_type="full")

    model.transmat_ = np.ones((N, N)) / N

    model.startprob_ = np.ones(N) / N

    fit = model.fit(data.T)

    z=fit.decode(data.T,algorithm='viterbi')[1]

    states.append(z)

    return states

data_m = getMFCC(1)  # Provides MFCC features of numpy array [20 X 56829]

N = 500

D= len(data)

states = hmm_init(N,data)

In [23]: run Final_hmm.py
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/home/elancheliyan/Final_hmm.py in <module>()
     73 D= len(data)
     74 
---> 75 states = hmm_init(N,data)
     76 states.dump("states")
     77 

/home/elancheliyan/Final_hmm.py in hmm_init(n, data)
     57     model.startprob_ = np.ones(N) / N
     58 
---> 59     fit = model.fit(data.T)
     60 
     61     z=fit.decode(data.T,algorithm='viterbi')[1]

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in fit(self, X, lengths)
    434                 self._accumulate_sufficient_statistics(
    435                     stats, X[i:j], framelogprob, posteriors, fwdlattice,
--> 436                     bwdlattice)
    437 
    438             # XXX must be before convergence check, because otherwise

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/hmm.py in _accumulate_sufficient_statistics(self, stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
    221                                           posteriors, fwdlattice, bwdlattice):
    222         super(GaussianHMM, self)._accumulate_sufficient_statistics(
--> 223             stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
    224 
    225         if 'm' in self.params or 'c' in self.params:

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in _accumulate_sufficient_statistics(self, stats, X, framelogprob, posteriors, fwdlattice, bwdlattice)
    620                 return
    621 
--> 622             lneta = np.zeros((n_samples - 1, n_components, n_components))
    623             _hmmc._compute_lneta(n_samples, n_components, fwdlattice,
    624                                  log_mask_zero(self.transmat_),

MemoryError:

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 25 (10 by maintainers)

Commits related to this issue

ENH use less memory when computing ``lneta`` Firstly, ``lneta`` is now ``log_xi`` which matches the notation in the Rabiner paper. Secondly, we don't need the whole ``log_xi`` matrix to update ... — committed to hmmlearn/hmmlearn by superbobry 8 years ago

Most upvoted comments

Sorry, I should’ve explained where the above number comes from.

As part of HMM parameter estimation you need to compute the probability of transitioning from component i to component j at observation t. Therefore, you need a matrix of shape n_components * n_components * n_observations. In your particular case the total number of items in a matrix is

>>> 500 * 500 * 56829
14207250000

Each element is stored as double which takes 8 bytes. This give us a total of

>>> 500 * 500 * 56829 * 8 / (2**20)  # in Mb
108392.71545410156

I’m not sure how to solve this without significantly changing the learning algorithm. Anyway, I’ll do a search in the literature and post the results here.

superbobry on Jun 27, 2016