ECO - EMO — the emotion detector by speech

3 min readNov 17, 2020

Speech is the most natural way of expressing ourselves as humans. It is only natural then to extend this communication medium to computer applications. As humans, we find speech to be the most natural way to express ourselves.ECO-EMO an emotion detector is an application that detects emotion by human speech which is a new step for machine learning in artificial intelligence, of course, ECOEMO is not accurate as human detection in emotion as it is an ongoing thing in the present day. This application detects the emotion, well in accuracy it is less but it detects.

This application will be later available on and a website where it will be having the option of uploading an audio file and later on like it contains a record, mike and more options like an extension to some assistants. The users can find their emotions the main application of this is like creating a sense of humanity for Artificial intelligence future which helps the machine to understand human more furthermore where machines live with us or maybe more. To successfully implement a speech emotion recognition system, we need to define and model emotion carefully. However, there is no consensus about the definition of emotion, and it is still an open problem in psychology.

SPEECH EMOTION RECOGNITION: There are some things we need to understand how it works,

Preprocessing is the very first step after collecting data that will be used to train the classifier in an SER system.
Signal framing, also known as speech segmentation, is the process of partitioning continuous speech signals into fixed-length segments to overcome several challenges in SER.
After framing the speech signal, the next phase is generally applying a window function to frames. The windowing function is used to reduce the effects of leakages that occur during a Fast Fourier Transform (FFT) of data caused by discontinuities at the edge of the signals.
An utterance consists of three parts; voiced speech, unvoiced speech, and silence. Voiced speech is generated with the vibration of vocal folds that creates periodic excitation to the vocal tract during the pronunciation of phonemes. Unvoiced speech is the result of air passing through a constriction in the vocal tract, producing transient and turbulent noises that are aperiodic excitations of the vocal tract. It’s hard to model silence and noise accurately in a dynamic environment; if voice and noise frames are removed, it will be easier to model speech

Basically, in this application, I use NLP(Natural language processing ) mostly and tensor flow which detects emotion by speech. Databases are an essential part of speech emotion recognition since classification process relies on the labelled data. Quality of the data affects the success of the recognition process. Incomplete, low-quality, or faulty data may lead to incorrect predictions; hence, data should be carefully designed and collected. Natural speech emotion databases. And later I will create our website using HTML/CSS using Django as a backend, right now I do the code for detection of emotion. ECO EMO finds the emotion of aggression, joy, sadness mostly, detection of emotion in A. I am a kind of big step for creation.

It’s a creation of a big step ….

In the future find your robot twin.

ECO - EMO — the emotion detector by speech

Written by Nagasekharnov23