DeepSpeech: Transcription having lot of spelling errors and getting wrong word segments(although phonetically correct some times)
- Have I written custom code (as opposed to running examples on an unmodified clone of the repository):no
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):16.04
- TensorFlow installed from (our builds, or upstream TensorFlow):
- TensorFlow version (use command below):1.12.0
- Python version: 3.6
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:cuda 9
- GPU model and memory:NVIDIA K80 GPUs 12 gb memory, aws p2 instance.
- Exact command to reproduce:
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Hi, I was trying to transcribe two different audio samples. One has a bit of backgroud music. I actually extracted audio from an apple ad where jonathan ive speaks with a really clear voice but has background music.I converted to 16000 samples a second as required by deepspeech I found a lot of spelling errors.
Mistakes like evolution is spelt evil lution. And its an apple watch ad. So how do i correct this. I tried to use the latest lm , trie models still the transcription is bad.
I ll list what i used but please tell what should i use.
I used the latest alpha release of deep speech 0.4.0-alpha.3 as the stable release was giving really bad results. I used output_graph from reuben’s release because the 0.3.0 was giving very bad results as it was just gibberish and nothing of vcalue was there in the transcription for 0.3.0 models and this fix was providing in the github issue https://github.com/mozilla/DeepSpeech/issues/1156
output graph of reuben’s release: GitHub 1
reuben/DeepSpeech A TensorFlow implementation of Baidu’s DeepSpeech architecture - reuben/DeepSpeech
lm and trie i used from https://github.com/mozilla/DeepSpeech/tree/master/data/lm
and alphabet.txt i used from the 0.3.0 models release in the github readme.The alphabets.txt maybe from this link but i am not sure right now: https://github.com/mozilla/DeepSpeech/tree/master/data
So the transcription that i get for apple ad : https://www.youtube.com/watch?v=6EiI5_-7liQ
transcription is : e e e in i an an an enemple agh seres for is more than an evil lution erepresents a fundamental redesin anryengineering of apple watchretaining the riginal i comicg design veloped ury find the for olsimanaging to make it fine be new display is now oven birty percen larger and is seemlessly integrated into the product the interface as been read deigned fron you tiplay providing more information with rich a detail the heard wore hand the software combine to define a very new and truly intergrated singular design novigating with the digital crown olready one of the most intricat makhalisms wit ever created has been intirely igreengineeredwith hapti feeback dilivering a presise ecannical field as idrol in addition to an obtea hasanco the is a new applepizine ilectrical hars and se to the lousutitake in electra cardia graham or easy ge to share with your doctor a momnentesichievement for a were of a divice placing a finger on the tigital crownd i eeplose cerkid with a lectrods on the bank providing dater the easy g busesanaliz your harid whole understanding hea health is a sential to ou well bei aditional features in in harmsmans in courag es ti live and overall healther or tantive life the excela romiter girescove an alfliter allow you to recall youtypes of workelse measure runs withincreased presision and tra your all day activity with great accuracy in hart selilar connectiv ity in tabu something prulyliberating the obility distaklinected with just your wach fon case music streaming and even a mergency essistence ol immediately evolable from your restch eries for is a device so powerful so postnal so liperating i con change the way ou liveach day
and for the other file link is : https://www.youtube.com/watch?v=GnGI76__sSA
and the transcipption with vad transcriber is - DEBUG:root:Processing chunk 00 DEBUG:root:Running inference… DEBUG:root:Inference took 2.720s for 5.880s audio file. DEBUG:root:Transcript: stevies to um saye o me and heused to saye is a lut DEBUG:root:Processing chunk 01 DEBUG:root:Running inference… DEBUG:root:Inference took 0.292s for 1.470s audio file. DEBUG:root:Transcript: jonny DEBUG:root:Processing chunk 02 DEBUG:root:Running inference… DEBUG:root:Inference took 0.337s for 1.620s audio file. DEBUG:root:Transcript: is it that the idea DEBUG:root:Processing chunk 03 DEBUG:root:Running inference… DEBUG:root:Inference took 0.282s for 1.530s audio file. DEBUG:root:Transcript: DEBUG:root:Processing chunk 04 DEBUG:root:Running inference… DEBUG:root:Inference took 0.772s for 3.750s audio file. DEBUG:root:Transcript: and sometimes they wore DEBUG:root:Processing chunk 05 DEBUG:root:Running inference… DEBUG:root:Inference took 0.639s for 3.180s audio file. DEBUG:root:Transcript: really do pe DEBUG:root:Processing chunk 06 DEBUG:root:Running inference… DEBUG:root:Inference took 0.918s for 4.410s audio file. DEBUG:root:Transcript: sometimes they would tru to dreadful DEBUG:root:Processing chunk 07 DEBUG:root:Running inference… DEBUG:root:Inference took 0.632s for 3.090s audio file. DEBUG:root:Transcript: sometimes they of the air from the room DEBUG:root:Processing chunk 08 DEBUG:root:Running inference… DEBUG:root:Inference took 0.638s for 3.000s audio file. DEBUG:root:Transcript: an me liftis poth completely silent DEBUG:root:Processing chunk 09 DEBUG:root:Running inference… DEBUG:root:Inference took 0.845s for 4.200s audio file. DEBUG:root:Transcript: od crazy magninificen ideas DEBUG:root:Processing chunk 10 DEBUG:root:Running inference… DEBUG:root:Inference took 0.403s for 2.010s audio file. DEBUG:root:Transcript: whire simple ones DEBUG:root:Processing chunk 11 DEBUG:root:Running inference… DEBUG:root:Inference took 0.371s for 1.890s audio file. DEBUG:root:Transcript: hin this sufflety DEBUG:root:Processing chunk 12 DEBUG:root:Running inference… DEBUG:root:Inference took 0.288s for 1.470s audio file. DEBUG:root:Transcript: tee tal DEBUG:root:Processing chunk 13 DEBUG:root:Running inference… DEBUG:root:Inference took 0.352s for 1.740s audio file. DEBUG:root:Transcript: eatto e profound DEBUG:root:Processing chunk 14 DEBUG:root:Running inference… DEBUG:root:Inference took 0.366s for 1.860s audio file. DEBUG:root:Transcript: just i speve DEBUG:root:Processing chunk 15 DEBUG:root:Running inference… DEBUG:root:Inference took 0.382s for 1.950s audio file. DEBUG:root:Transcript: loved ydeas DEBUG:root:Processing chunk 16 DEBUG:root:Running inference… DEBUG:root:Inference took 0.434s for 2.160s audio file. DEBUG:root:Transcript: an loved maan stuff DEBUG:root:Processing chunk 17 DEBUG:root:Running inference… DEBUG:root:Inference took 0.513s for 2.550s audio file. DEBUG:root:Transcript: he treated the process DEBUG:root:Processing chunk 18 DEBUG:root:Running inference… DEBUG:root:Inference took 1.094s for 5.370s audio file. DEBUG:root:Transcript: treativeity with the rare and a wonderful reverence DEBUG:root:Processing chunk 19 DEBUG:root:Running inference… DEBUG:root:Inference took 0.871s for 4.260s audio file. DEBUG:root:Transcript: is the i think he better than any one understood DEBUG:root:Processing chunk 20 DEBUG:root:Running inference… DEBUG:root:Inference took 1.017s for 5.010s audio file. DEBUG:root:Transcript: wile ideas oltemately can be so powerful DEBUG:root:Processing chunk 21 DEBUG:root:Running inference… DEBUG:root:Inference took 0.598s for 2.970s audio file. DEBUG:root:Transcript: egin as fratile DEBUG:root:Processing chunk 22 DEBUG:root:Running inference… DEBUG:root:Inference took 0.383s for 1.920s audio file. DEBUG:root:Transcript: e fomd thoughts DEBUG:root:Processing chunk 23 DEBUG:root:Running inference… DEBUG:root:Inference took 1.123s for 5.490s audio file. DEBUG:root:Transcript: so esily mistd so easily compromise so isily josquift DEBUG:root:Processing chunk 24 DEBUG:root:Running inference… DEBUG:root:Inference took 0.909s for 4.230s audio file. DEBUG:root:Transcript: on love the way that he listened so intendly DEBUG:root:Processing chunk 25 DEBUG:root:Running inference… DEBUG:root:Inference took 0.432s for 2.190s audio file. DEBUG:root:Transcript: loved his perseption DEBUG:root:Processing chunk 26 DEBUG:root:Running inference… DEBUG:root:Inference took 0.582s for 2.910s audio file. DEBUG:root:Transcript: is remarkable sensitive ity DEBUG:root:Processing chunk 27 DEBUG:root:Running inference… DEBUG:root:Inference took 0.544s for 2.700s audio file. DEBUG:root:Transcript: nd his surgecly preciseieinion DEBUG:root:Processing chunk 28 DEBUG:root:Running inference… DEBUG:root:Inference took 0.350s for 1.920s audio file. DEBUG:root:Transcript: DEBUG:root:Processing chunk 29 DEBUG:root:Running inference… DEBUG:root:Inference took 0.551s for 2.700s audio file. DEBUG:root:Transcript: i really believe there was a beuty DEBUG:root:Processing chunk 30 DEBUG:root:Running inference… DEBUG:root:Inference took 0.869s for 4.410s audio file. DEBUG:root:Transcript: e sehela how meen his insih was DEBUG:root:Processing chunk 31 DEBUG:root:Running inference… DEBUG:root:Inference took 0.456s for 2.280s audio file. DEBUG:root:Transcript: sometimes et could spey DEBUG:root:Processing chunk 32 DEBUG:root:Running inference… DEBUG:root:Inference took 0.585s for 3.030s audio file. DEBUG:root:Transcript: as um suremany you know DEBUG:root:Processing chunk 33 DEBUG:root:Running inference… DEBUG:root:Inference took 1.022s for 4.920s audio file. DEBUG:root:Transcript: steve didn’t comfined his sensif excellent to make him products DEBUG:root:Processing chunk 34 DEBUG:root:Running inference… DEBUG:root:Inference took 0.544s for 2.610s audio file. DEBUG:root:Transcript: you a wo we travel together DEBUG:root:Processing chunk 35 DEBUG:root:Running inference… DEBUG:root:Inference took 0.356s for 1.770s audio file. DEBUG:root:Transcript: wold check hin DEBUG:root:Processing chunk 36 DEBUG:root:Running inference… DEBUG:root:Inference took 0.387s for 1.920s audio file. DEBUG:root:Transcript: t gop to my room DEBUG:root:Processing chunk 37 DEBUG:root:Running inference… DEBUG:root:Inference took 0.868s for 4.260s audio file. DEBUG:root:Transcript: nat leave my bags thery needly but te door DEBUG:root:Processing chunk 38 DEBUG:root:Running inference… DEBUG:root:Inference took 1.239s for 6.390s audio file. DEBUG:root:Transcript: with numat DEBUG:root:Processing chunk 39 DEBUG:root:Running inference… DEBUG:root:Inference took 0.814s for 4.080s audio file. DEBUG:root:Transcript: gon si on the bed DEBUG:root:Processing chunk 40 DEBUG:root:Running inference… DEBUG:root:Inference took 1.061s for 5.220s audio file. DEBUG:root:Transcript: on si on the bed next to the fhun DEBUG:root:Processing chunk 41 DEBUG:root:Running inference… DEBUG:root:Inference took 0.283s for 1.470s audio file. DEBUG:root:Transcript: wat DEBUG:root:Processing chunk 42 DEBUG:root:Running inference… DEBUG:root:Inference took 0.434s for 2.130s audio file. DEBUG:root:Transcript: n evetible fone cal DEBUG:root:Processing chunk 43 DEBUG:root:Running inference… DEBUG:root:Inference took 2.631s for 12.990s audio file. DEBUG:root:Transcript: ony this hoodself soctless go DEBUG:root:Processing chunk 44 DEBUG:root:Running inference… DEBUG:root:Inference took 0.308s for 1.560s audio file. DEBUG:root:Transcript: used to joe DEBUG:root:Processing chunk 45 DEBUG:root:Running inference… DEBUG:root:Inference took 0.631s for 3.150s audio file. DEBUG:root:Transcript: lunitics a takean over the assinem DEBUG:root:Processing chunk 46 DEBUG:root:Running inference… DEBUG:root:Inference took 0.576s for 2.760s audio file. DEBUG:root:Transcript: swe shard gedioxsignment DEBUG:root:Processing chunk 47 DEBUG:root:Running inference… DEBUG:root:Inference took 1.090s for 5.070s audio file. DEBUG:root:Transcript: spending months and months working on a part of a product DEBUG:root:Processing chunk 48 DEBUG:root:Running inference… DEBUG:root:Inference took 0.493s for 2.310s audio file. DEBUG:root:Transcript: nobody with ever see DEBUG:root:Processing chunk 49 DEBUG:root:Running inference… DEBUG:root:Inference took 0.290s for 1.380s audio file. DEBUG:root:Transcript: owith the rese DEBUG:root:Processing chunk 50 DEBUG:root:Running inference… DEBUG:root:Inference took 0.872s for 4.020s audio file. DEBUG:root:Transcript: did it because we because we really believed that it was right DEBUG:root:Processing chunk 51 DEBUG:root:Running inference… DEBUG:root:Inference took 0.276s for 1.410s audio file. DEBUG:root:Transcript: cause we cared DEBUG:root:Processing chunk 52 DEBUG:root:Running inference… DEBUG:root:Inference took 0.542s for 2.520s audio file. DEBUG:root:Transcript: elieved that there was a grammidty DEBUG:root:Processing chunk 53 DEBUG:root:Running inference… DEBUG:root:Inference took 0.751s for 3.570s audio file. DEBUG:root:Transcript: umast ascensive civic responsibility DEBUG:root:Processing chunk 54 DEBUG:root:Running inference… DEBUG:root:Inference took 0.452s for 2.280s audio file. DEBUG:root:Transcript: so care wavbyyongs DEBUG:root:Processing chunk 55 DEBUG:root:Running inference… DEBUG:root:Inference took 0.619s for 2.940s audio file. DEBUG:root:Transcript: and e sot of functional imperative DEBUG:root:Processing chunk 56 DEBUG:root:Running inference… DEBUG:root:Inference took 0.108s for 0.630s audio file. DEBUG:root:Transcript: DEBUG:root:Processing chunk 57 DEBUG:root:Running inference… DEBUG:root:Inference took 0.340s for 1.800s audio file. DEBUG:root:Transcript: wok DEBUG:root:Processing chunk 58 DEBUG:root:Running inference… DEBUG:root:Inference took 0.488s for 2.340s audio file. DEBUG:root:Transcript: hoopfully appeared in evi table DEBUG:root:Processing chunk 59 DEBUG:root:Running inference… DEBUG:root:Inference took 0.309s for 1.560s audio file. DEBUG:root:Transcript: hid simple DEBUG:root:Processing chunk 60 DEBUG:root:Running inference… DEBUG:root:Inference took 0.225s for 1.140s audio file. DEBUG:root:Transcript: teasy DEBUG:root:Processing chunk 61 DEBUG:root:Running inference… DEBUG:root:Inference took 0.301s for 1.500s audio file. DEBUG:root:Transcript: really cost DEBUG:root:Processing chunk 62 DEBUG:root:Running inference… DEBUG:root:Inference took 0.323s for 1.650s audio file. DEBUG:root:Transcript: cost te soledin i DEBUG:root:Processing chunk 63 DEBUG:root:Running inference… DEBUG:root:Inference took 0.460s for 2.190s audio file. DEBUG:root:Transcript: you know i cost him most DEBUG:root:Processing chunk 64 DEBUG:root:Running inference… DEBUG:root:Inference took 0.312s for 1.500s audio file. DEBUG:root:Transcript: cared the most DEBUG:root:Processing chunk 65 DEBUG:root:Running inference… DEBUG:root:Inference took 0.956s for 4.620s audio file. DEBUG:root:Transcript: he wo in the most deeply he constantly questioned DEBUG:root:Processing chunk 66 DEBUG:root:Running inference… DEBUG:root:Inference took 0.290s for 1.380s audio file. DEBUG:root:Transcript: this good enough DEBUG:root:Processing chunk 67 DEBUG:root:Running inference… DEBUG:root:Inference took 0.245s for 1.230s audio file. DEBUG:root:Transcript: this right DEBUG:root:Processing chunk 68 DEBUG:root:Running inference… DEBUG:root:Inference took 0.530s for 2.610s audio file. DEBUG:root:Transcript: dispite all his successis DEBUG:root:Processing chunk 69 DEBUG:root:Running inference… DEBUG:root:Inference took 0.404s for 2.040s audio file. DEBUG:root:Transcript: his achievements DEBUG:root:Processing chunk 70 DEBUG:root:Running inference… DEBUG:root:Inference took 1.089s for 5.220s audio file. DEBUG:root:Transcript: never presued he never assumed thet we would get there in the end DEBUG:root:Processing chunk 71 DEBUG:root:Running inference… DEBUG:root:Inference took 0.397s for 2.010s audio file. DEBUG:root:Transcript: nideas didn’t come DEBUG:root:Processing chunk 72 DEBUG:root:Running inference… DEBUG:root:Inference took 0.529s for 2.640s audio file. DEBUG:root:Transcript: the proace it types faled DEBUG:root:Processing chunk 73 DEBUG:root:Running inference… DEBUG:root:Inference took 0.778s for 3.840s audio file. DEBUG:root:Transcript: it was with great intent with faith DEBUG:root:Processing chunk 74 DEBUG:root:Running inference… DEBUG:root:Inference took 0.477s for 2.400s audio file. DEBUG:root:Transcript: he decided to believe DEBUG:root:Processing chunk 75 DEBUG:root:Running inference… DEBUG:root:Inference took 0.298s for 1.530s audio file. DEBUG:root:Transcript: then shally DEBUG:root:Processing chunk 76 DEBUG:root:Running inference… DEBUG:root:Inference took 0.317s for 1.530s audio file. DEBUG:root:Transcript: a something greaght DEBUG:root:Processing chunk 77 DEBUG:root:Running inference… DEBUG:root:Inference took 0.539s for 2.730s audio file. DEBUG:root:Transcript: joy of getting man DEBUG:root:Processing chunk 78 DEBUG:root:Running inference… DEBUG:root:Inference took 0.526s for 2.640s audio file. DEBUG:root:Transcript: i loved is infhusiasm DEBUG:root:Processing chunk 79 DEBUG:root:Running inference… DEBUG:root:Inference took 0.484s for 2.430s audio file. DEBUG:root:Transcript: simple thelight DEBUG:root:Processing chunk 80 DEBUG:root:Running inference… DEBUG:root:Inference took 0.474s for 2.370s audio file. DEBUG:root:Transcript: ma i mixed with serilief DEBUG:root:Processing chunk 81 DEBUG:root:Running inference… DEBUG:root:Inference took 0.423s for 2.130s audio file. DEBUG:root:Transcript: the year we got there DEBUG:root:Processing chunk 82 DEBUG:root:Running inference… DEBUG:root:Inference took 0.319s for 1.590s audio file. DEBUG:root:Transcript: we got there in the end DEBUG:root:Processing chunk 83 DEBUG:root:Running inference… DEBUG:root:Inference took 0.233s for 1.140s audio file. DEBUG:root:Transcript: ahe was good DEBUG:root:Processing chunk 84 DEBUG:root:Running inference… DEBUG:root:Inference took 0.448s for 2.250s audio file. DEBUG:root:Transcript: conceise smile conye DEBUG:root:Processing chunk 85 DEBUG:root:Running inference… DEBUG:root:Inference took 1.010s for 4.710s audio file. DEBUG:root:Transcript: selebration of making something grat for everybody DEBUG:root:Processing chunk 86 DEBUG:root:Running inference… DEBUG:root:Inference took 0.662s for 3.270s audio file. DEBUG:root:Transcript: enjoying the defeat of sinisism DEBUG:root:Processing chunk 87 DEBUG:root:Running inference… DEBUG:root:Inference took 1.439s for 6.600s audio file. DEBUG:root:Transcript: rjection of reason the rejection of being told a hundred times in condo that DEBUG:root:Processing chunk 88 DEBUG:root:Running inference… DEBUG:root:Inference took 0.733s for 3.570s audio file. DEBUG:root:Transcript: so hes i think was in victory for beauty DEBUG:root:Processing chunk 89 DEBUG:root:Running inference… DEBUG:root:Inference took 0.307s for 1.560s audio file. DEBUG:root:Transcript: pperity DEBUG:root:Processing chunk 90 DEBUG:root:Running inference… DEBUG:root:Inference took 0.605s for 2.970s audio file. DEBUG:root:Transcript: he would say for givein at dham DEBUG:root:Processing chunk 91 DEBUG:root:Running inference… DEBUG:root:Inference took 0.840s for 4.140s audio file. DEBUG:root:Transcript: he was my closeess and we must loa friend DEBUG:root:Processing chunk 92 DEBUG:root:Running inference… DEBUG:root:Inference took 2.090s for 9.300s audio file. DEBUG:root:Transcript: together fornerly fitteen years and he still laughed to the way i sad ali minum DEBUG:root:Processing chunk 93 DEBUG:root:Running inference… DEBUG:root:Inference took 0.487s for 2.340s audio file. DEBUG:root:Transcript: past tothe weeks DEBUG:root:Processing chunk 94 DEBUG:root:Running inference… DEBUG:root:Inference took 0.968s for 4.410s audio file. DEBUG:root:Transcript: wh we ill bing struggling to find ways to save tood by DEBUG:root:Processing chunk 95 DEBUG:root:Running inference… DEBUG:root:Inference took 0.342s for 1.620s audio file. DEBUG:root:Transcript: t smooning DEBUG:root:Processing chunk 96 DEBUG:root:Running inference… DEBUG:root:Inference took 0.380s for 1.920s audio file. DEBUG:root:Transcript: smply once who weren DEBUG:root:Processing chunk 97 DEBUG:root:Running inference… DEBUG:root:Inference took 0.372s for 1.860s audio file. DEBUG:root:Transcript: ank you staye DEBUG:root:Processing chunk 98 DEBUG:root:Running inference… DEBUG:root:Inference took 0.628s for 3.000s audio file. DEBUG:root:Transcript: f youl remarkable vision DEBUG:root:Processing chunk 99 DEBUG:root:Running inference… DEBUG:root:Inference took 0.332s for 1.620s audio file. DEBUG:root:Transcript: ichis inited DEBUG:root:Processing chunk 100 DEBUG:root:Running inference… DEBUG:root:Inference took 0.319s for 1.590s audio file. DEBUG:root:Transcript: nspired DEBUG:root:Processing chunk 101 DEBUG:root:Running inference… DEBUG:root:Inference took 0.526s for 2.550s audio file. DEBUG:root:Transcript: this extraordinary groups of people DEBUG:root:Processing chunk 102 DEBUG:root:Running inference… DEBUG:root:Inference took 0.525s for 2.580s audio file. DEBUG:root:Transcript: for the oll the weav hof men from you DEBUG:root:Processing chunk 103 DEBUG:root:Running inference… DEBUG:root:Inference took 0.781s for 3.660s audio file. DEBUG:root:Transcript: nfor all thet we will continue to learn from each other DEBUG:root:Processing chunk 104 DEBUG:root:Running inference… DEBUG:root:Inference took 0.200s for 1.050s audio file. DEBUG:root:Transcript: st DEBUG:root:Processing chunk 105 DEBUG:root:Running inference… DEBUG:root:Inference took 1.926s for 9.900s audio file. DEBUG:root:Transcript: ee
The results are sometime phonetically correct but the transcription is full of spelling errors as above.
So how should i improve this transcription. should i use different models but where do i get them from. How can i improve this without training because i dont have annotated samples.
And if it needs training how much minimum training it needs and how do i train it in the most minimum way possible to get a good transcription . And how many minimum samples would i need to annotate and train to get a good transcription if training is needed.
I used discourse but didnt get any response
Thanks in advance Raghav
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 36 (4 by maintainers)
As you look to have a mix-and-match model, what might be easier, instead of tracking down the problem, is to just wait until Monday when we are planning on doing the 0.4.0 release, then use that.