pitch-detection: Pitch Detection Algorithm Is Missing a Step from the McLeod Paper

Let me start by saying this is a great library and works quite well. However, after reviewing the code and working with it in a real application, I believe you are missing this part of the algorithm defined in the McLeod Paper:

From the key maxima we define a threshold which is equal to the value of the highest maximum, nmax, multiplied by a constant k. We then take the first key maximum which is above this threshold and assign its delay, τ , as the pitch period. The constant, k, has to be large enough to avoid choosing peaks caused by strong harmonics, such as those in Figure 2, but low enough to not choose the unwanted beat or sub-harmonic signals. Choosing an incorrect key maximum causes a pitch error, usually a ’wrong octave’.

Your “clarity threshold” somewhat implements this, however you don’t do any scaling by the maximum peak. In the two example NSDF plots below, picking the red line (~0.7) as the threshold works, but what if you have a bit more noise in the signal? Perhaps the clarity of all the peaks reduce, by ~0.2 for example. In that scenario, you might incorrectly pick the second peak in the first image, and perhaps pick no peaks in the second image, if you kept your threshold as 0.7.

If instead, you scale the selection threshold by the maximum peak, you would still be able to select the correct peak in either case, because the threshold would move up and down with the clarity of the overall signal. You should still have a minimum clarity threshold below which you don’t want to return any result. But when it comes to picking between the various maxima above the clarity threshold, you can get much better octave-selection by applying this third input.


I was able to modify your code to accept a third input, which I’ve called “pick_threshold” that implements the constant from the McLeod paper. With my change, you now will select the first peak that is above both the “clarity threshold” and “maximum_peak * pick_threshold”. In my testing, this change led to much fewer octave-errors. I used these settings: power_thresh = 3, clarity_thresh = 0.7, and pick_thresh = 0.98. The pick threshold can be adjusted a bit lower if you are sometimes picking up subharmonics, but I found 0.98 to be a sweet spot to avoid picking up higher order harmonics, which was happening more frequently with the original code.

The only main changes to the code I made were in the pitch_from_peaks function and the choose_peak function. I barely know Rust (I’m compiling this to wasm for a web application), so I’m sure there is a more efficient way to extract the maximum peak than what I did here (duplicating the peaks iterator), but this code did compile and worked a lot better in terms of picking the correct octave.

`pub fn pitch_from_peaks<T>( input: &[T], sample_rate: usize, clarity_threshold: T, pick_threshold: T, correction: PeakCorrection, ) -> Option<Pitch<T>> where T: Float, { let sample_rate = T::from_usize(sample_rate).unwrap(); let peaks = detect_peaks(input); let peaks2 = detect_peaks(input); let maxpeak = peaks2.max_by(|x, y| x.1.partial_cmp(&y.1).unwrap_or_else(|| Ordering::Equal)); let thresh2 = match maxpeak { None => T::from(0).unwrap(), Some(p) => p.1 * pick_threshold, }; choose_peak(peaks, thresh2, clarity_threshold) .map(|peak| correct_peak(peak, input, correction)) .map(|peak| Pitch { frequency: sample_rate / peak.0, clarity: peak.1 / input[0], }) }

pub fn choose_peak<I: Iterator<Item = (usize, T)>, T: Float>( mut peaks: I, threshold: T, threshold2: T, ) -> Option<(usize, T)> { peaks.find(|p| p.1 > threshold && p.1 > threshold2) }`

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 15 (1 by maintainers)

Commits related to this issue

Most upvoted comments

And by the way, in case you are interested, this is my project that is currently running a wasm-compiled version of your library with the changes I mentioned above. It is now much less common to see random jumps to different octaves.