kliontech.blogg.se - Open source speech to text

Open source speech to text code#

All of the models and libraries it relies on are open source, and the code for the tool itself is available at /petewarden/spchcat. The result is spchcat, a command line tool to read in audio from microphones, system audio, or wav files, and output text. Over the winter holidays I decided my fun project would be to write a simple command line tool to listen to my Raspberry Pi's microphone and write what text it heard to the terminal. I knew they had impressive results, and I wanted to experiment with building simple voice interfaces on my Pi using their framework, but there wasn't an easy way for me to get started.

I've been aware of Coqui's work since they launched, because they use TensorFlow Lite, a library I helped build, for the machine learning calculations.

The results may not, yet, be as good as the best commercial systems, but the speed of improvement is impressive, and they can already enable a lot of interesting new applications. By gathering massive amounts of speech in many different languages, and open sourcing training code and completed models, they've made it possible to build speech recognizers that are useful for a lot of tasks. Thankfully the open source community, especially projects like Mozilla's Common Voice and Coqui's speech-to-text library, have changed all that. It used to be that only big companies could afford the armies of engineers, mountains of training data, and time it took to build a usable speech recognition system. In the rest of this article, I'll tell you a bit more about how I built it, what it can do, and how you can help improve it. Did you know you can run speech recognition on your Pi, for free? If you want to dive straight in, just download, install, and run the spchcat command to get started.