We all know how frustrating it is to listen to bad sound on a computer, DVD player or even TV.
To some extent, extraneous (background, white, digital) noise from a microphone recording can be removed, suppressed, removed or cleaned - call it what you want, just keep in mind that along with the noise the volume often decreases.
This article describes a free program for removing extraneous noise, audacity (at the very bottom there is a direct link to download for free).
I should immediately note that there are paid professional programs for completely suppressing interference in recordings.
But an amateur won’t be able to remove it. The process is comparable to working in Photoshop, and as is known, it is preceded by a long period of training.
If you have difficulty suppressing noise, here are instructions on how to use the audacity program
What does the noise cleaning program “audacity” provide?
The audacity program can be called a stable sound editor, in particular cleaning (removing) noise from recordings.
Its functionality impresses even professional workers, despite it being free. Of course, she doesn’t pretend to be a leader, but she is an excellent assistant.
In addition to noise removal, this program works great with audio in Ogg Vorbis, FLAC, WAV and MP3 formats.
With its help, you can change microphone recordings, digitize, record and overlay tracks and effects.
As an example, suppressing hum, changing tempo and tone. Everything turns out simple, fast and relatively high quality.
As I wrote above, you can download it for free in Russian from this site - without advertising, registration and SMS. In addition to suppressing (cleaning) noise, you can cut out the voice.
Audacity is designed to record and edit popular audio formats. You can also change the voice from the microphone, not just remove (remove) noise.
We figure out how to suppress noise in speech using deep learning and OpenVINO
This article will be useful to students and those who want to understand how speech noise reduction (Speech Denoising) occurs using deep learning. There were already articles on Habré on this topic several years ago (once, twice), but our goal is to give a somewhat deeper understanding of the process of working with sound.
Picture with sound
The problem of noise reduction using deep learning and OpenVINO fell into the hands of students at ITlab, an educational and research laboratory at Lobachevsky University with the support of Intel. Students, starting from the 2nd year, work on interesting engineering and scientific projects under the guidance of teachers. Creating high-performance software requires the use of special developer tools and parallel code execution technologies, and students become familiar with them as part of laboratory projects. This article is the result of the work of students Ivan Vikhrev, Azer Rustamov, Ksenia Zaitseva, Nikita Kim, Mikhail Burdukov, Andrey Filatov.
What is sound in a computer?
Suppression of background speech noise has been a concern for people for a very long time, and with the advent of remote work and learning, the importance of the problem has arisen many times over. We have to talk more and more over the Internet, and audio messages are built into all popular instant messengers. No matter how trivial it may sound, people perceive good, clear sound better. However, a good microphone costs money, and the sound insulation in the average apartment leaves much to be desired, and we hear everything from the operation of the coolers in the laptop to the drill of an annoying neighbor (the noise of the teacher’s drill especially interferes with the perception of the material).
How to get clear sound without buying a studio microphone and without upholstering your entire apartment with soundproofing materials? People have been thinking about this question since the end of the last century. The solution was programs that digitize and edit the sound coming from the microphone.
Lizemijn Libgott / Vice.com
The recorded sound consists of many sound waves simultaneously hitting the microphone sensor in a certain period of time, as a result of which we get a long vector of numbers - these are the amplitudes (loudness) of the signal over a short period of time. The signal frequency of a wired phone is 8kHz, which means that we measure the amplitude (loudness) of the total signal 8000 times per second; sound cards usually use a frequency of 44.1 or 48kHz.
This picture shows 3 seconds of sound or a graph with 120 thousand values. By compressing the X-axis so much, it looks like we've shaded the area under the graph.
You can play an audio file in Python using the soundfile and sounddevice libraries.
import sounddevice as sd import soundfile as sf path_wav = 'test_wav.wav' data, fs = sf.read(path_wav) sd.play(data, fs) status = sd.wait()
Recording data from a microphone is also very simple - take a look and run record.py.
By the way, this is important information for us. Our sequence is the sum of many sound waves, and we can calculate which waves contributed to our sum. Theoretically, any complex sound can be decomposed into a sequence of simple harmonic signals of different frequencies, each of which is a regular sinusoid and can be described by numerical parameters (and you said that matan is not needed). Research on this topic is included in the field of digital signal processing; on the hub there is information about the open course “Fundamentals of Digital Signal Processing”.
On the right we see the spectrum - the contribution of each wave to the frequency decomposition. Diagram from the website nuancesprog.ru
To do complex things with audio, such as human voice recognition, speech-to-text translation, or noise removal using deep learning, we will need to calculate the contribution of different frequencies to an audio sequence—the spectrum. The spectrum can be represented as a spectrogram - an image showing the dependence of the signal amplitude over time at different frequencies. One column in the spectrogram corresponds to the spectrum of a short section of the original signal, warmer tones indicate a higher value.
An example of a spectrogram obtained from an audio file using the Numpy library.
The spectrum for a spectrogram can be calculated using the discrete Fourier transform implemented in the Numpy library. Let's look at the example of creating a spectrogram described in the sample. To do this, use two functions from the features.py file:
def calcSpec(y, params, channel=None): """compute complex spectrum from audio file""" fs = int(params["fs"]) # Constant indicating the discredit frequency # In our case, equal to 16000 - our wav the file is recorded at a frequency of 16 kHz if channel is not None and (len(y.shape)>1): # If the audio contains two channels (stereo) - take only one channel sig = sig[:,channel] # STFT parameters N_win = int(float(params["winlen"])*fs) # Calculation of Hanning window size # In our case 320 if 'nfft' in params: N_fft = int(params['nfft']) else: N_fft = int(float( params['winlen'])*fs) # Calculation of the window width for the Fourier transform # In our case, 320 N_hop = int(N_win * float(params[“hopfrac”])) # Calculation of the jump for the Fourier transform # In our case, 160 win = np.sqrt(np.hanning(N_win)) # Hanning window Y = stft(y, N_fft, win, N_hop) return Y
The Stft function performs the Fourier transform. The array is divided into parts of a certain length (calculated in calcSpec) and for each part the Fourier transform function taken from Numpy is applied and returns the finished spectrogram.
def stft(x, N_fft, win, N_hop, nodelay=True): """ short-time Fourier transform x - Input signal N_fft - Number of points on which the transform is used win - Hanning window N_hop - Jump size nodelay - Removal of first points from the final array (They have a side effect of the transformation) """ # get lengths if x.ndim == 1: x = x[:,np.newaxis] # If several files are supplied, an additional axis is created Nx = x.shape [0] # Number of points in the input data (in our case 160000) M = x.shape[1] # Number of files in the input data (in our case 1) specsize = int(N_fft/2+1) N_win = len(win ) # Hanning window size N_frames = int(np.ceil( (Nx+N_win-N_hop)/N_hop )) # How many parts do we divide the input array into Nx = N_frames*N_hop # padded length x = np.vstack([x, np. zeros((Nx-len(x),M))]) # init X_spec = np.zeros((specsize,N_frames,M), dtype=complex) # A matrix filled with zeros that will become a spectrogram win_M = np.outer(win ,np.ones((1,M))) # Create a matrix in which each column is equal to the Hanning window x_frame = np.zeros((N_win,M)) # A vector filled with zeros (vectors if several files were given as input) for nn in range(0,N_frames): idx = int(nn*N_hop) x_frame = np.vstack((x_frame[N_hop:,:], x[idx:idx+N_hop,:])) # Divide the input array into pieces of size N_hop x_win = win_M * x_frame X = np.fft.rfft(x_win, N_fft, axis=0) # The transformation returns a column of complex ones, where the real part is the amplitude and the complex part is the phase shift X_spec[:,nn,:] = X # Add the resulting column to the spectrogram if nodelay: delay = int(N_win/N_hop - 1) X_spec = X_spec[:,delay:,:] # Remove the extra column from the beginning if M==1: X_spec = np.squeeze(X_spec ) # Remove the extra axis return X_spec
Another important function is calcFeat, which allows us to logarithm the spectrogram, stretching the lower frequencies and compressing the upper ones. The human voice lies in the range of 85-3000 Hz, and the range of sound frequencies in our recording is 16 kHz - a small gap throughout the entire range, and with the help of logarithms we “stretch” the low frequencies we need and “squeeze” the unnecessary high ones
def calcFeat(Spec, cfg): """compute spectral features""" if cfg['feattype'] == "MagSpec": inpFeat = np.abs(Spec) elif cfg['feattype'] == "LogPow" : pmin = 10**(-12) powSpec = np.abs(Spec)**2 # All spectrogram values are squared inpFeat = np.log10(np.maximum(powSpec, pmin)) # and logarithmized, cutting off too low values else: ValueError('Feature not implemented.') return inpFeat
Our deep denoising model is trained on log spectrograms, so preprocessing with this function is mandatory. To convert the spectrogram obtained by applying the filter (output of the neural network) to the Fourier image obtained using the calcSpec function, the Spec2sig function is used. It calculates the parameters of the inverse Fourier transform and calls the istft (inverse fast Fourier transform) function.
def spec2sig(Spec, params): """Converts a spectrogram to sound""" # sample rate fs = int(params["fs"]) # window width N_win = int(float(params["winlen"])*fs ) if 'nfft' in params: N_fft = int(params['nfft']) else: # length of fast Fourier transform N_fft = int(float(params['winlen'])*fs) #length of window segments N_hop = int( N_win * float(params[“hopfrac”])) # Hanning window win = np.sqrt(np.hanning(N_win)) # inverse Fourier transform x = istft(Spec, N_fft, win, N_hop) return x
In istft, the inverse Fourier transform is also performed using a function taken from Numpy.
def istft(X, N_fft, win, N_hop): # get lengths specsize = X.shape[0] # Spectrogram N_frames = X.shape[1] # number of frames if X.ndim < 3: X = X[: ,:,np.newaxis] # Reducing the size to 3 M = X.shape[2] # number of channels N_win = len(win) # length of the Hanning window Nx = N_hop*(N_frames - 1) + N_win # Multiplying the matrix win and an identity matrix of size 1,M win_M = np.outer(win,np.ones((1, M))) x = np.zeros((Nx,M)) # zero matrix Nx,M to store the answer for nn in range(0, N_frames): X_frame = np.squeeze(X[:,nn,:]) # Vector over this frame # inverse Fourier transform for X_frame ,N_fft x_win = np.fft.irfft(X_frame, N_fft, axis=0 ) x_win = x_win.reshape(N_fft,M) # change the size # get the Hanning window of the required size x_win = win_M * x_win[0:N_win,:] # add the result for this frame idx1 = int(nn*N_hop); idx2 = int(idx1+N_win) x[idx1:idx2,:] = x_win + x[idx1:idx2,:] if M == 1: x = np.squeeze(x) # Remove unnecessary measurements if there is only one channel return x
An audio signal recorded under real acoustic conditions often contains unwanted noise, which may be generated by the environment or the recording equipment. This means that the resulting digital description will also contain unwanted noise.
Speech before and after noise removal
To “clean up” the sound, a filter must be applied to the digital description to remove unwanted noise. But another problem arises. Each type of noise requires its own filter, which must be selected manually or searched in filter data banks. There are no problems filtering out noise at frequencies different from human speech; they were getting rid of them even before these neural networks of yours. But it was problematic to remove children's crying or the clattering of keys without significantly degrading the quality of the voice.
Deep learning models can help solve this problem. The main advantage of neural networks over pre-trained filters is their greater coverage of various types of noise. A neural network can be trained by constantly adding new types of noise.
In our case, we will use the NSNet2 model. This neural network was used in the Deep Noise Suppression Challenge conducted by Microsoft. The goal of developing this network was to create a model for clearing sound from noise in real time. This model consists of a fully connected layer with ReLU, two recurrent GRU (Gated Recurrent Unit) blocks and fully connected layers (FF, feed forward) with ReLU and sigmoid activation.
Speech is influenced by a large number of external conditions. A person can speak loudly or quietly, quickly or slowly, he can speak in a large room or in a small one, far from the microphone or close to it. To simulate these more complex conditions, augmentations were applied. In particular, in this case, random biquad filters were used to modify the sound. Thanks to the use of such augmentations, sound noise is closer to real conditions.
The presented results on the quality of work can be viewed in the article Data augmentation and loss normalization for deep noise suppression. The constructed model has good performance for various types of noise.
Converting a model to OpenVINO
When working with our model, we used OpenVINO (Open Visual Inference & Neural Network Optimization), a product developed by Intel. As the name suggests, OpenVINO is a set of tools for neural network execution and optimization.
There are many frameworks for creating and training neural networks. In order to be able to run neural networks from various frameworks on any Intel hardware, OpenVINO includes a Model Optimizer module.
We take a trained model in some framework, convert it to OpenVINO and now we can run it on a CPU, iGPU or dGPU, or on an FPGA
In fact, Model Optimizer is a set of python scripts that allow you to bring neural networks of various formats to some universal representation called IR (Intermediate Representation). This allows OpenVINO to work with any neural network, regardless of which framework it is taken from.
During its operation, Model Optimizer also optimizes the structure of convolutional neural networks. For example, combining the results of convolutions, replacing layers with a sequence of linear operations, etc.
Recently, with the advent of the API, less and less optimization is carried out in Model Optimizer, and its main work comes down to converting models without any major changes.
Conversion to IR representation differs between models from Open Model Zoo and other models. Open Model Zoo is a repository of deep neural network models containing a large number of trained models that can be executed using OpenVINO. This repository stores not only models, but also parameters for converting models from different frameworks into the OpenVINO intermediate format.
To convert models downloaded from Open Model Zoo, you need to use the Model Optimizer tool and the converter.py script included in it. This module has access to parameters for converting models from a zoo of models.
Console command to convert the loaded model:
python converter.py —name <
To convert your own model, you need to use the mo.py script with additional parameters:
python mo.py —input_model <
To convert our ONNX model to OpenVINO format (on Windows), the above command looks like this:
python mo.py —input_model <
- number of channels, 1000 - time intervals, 161 - frequencies.
Also, you can specify more additional parameters for convenience. The entire list of possible parameters can be viewed with the command:
python mo.py --help
The only difference between converter.py and mo.py is that converter.py uses parameters for converting from the model description to Open Model Zoo and passes them to mo.py
It should be noted that the problem of noise reduction has not yet been completely solved. Speech enhancement techniques using neural networks have recently received enormous attention in both scientific research and commercial applications. One of the most important benefits of using neural networks for noise reduction is that they are able to clean up transient noise from audio. Previously known approaches did not allow this to be done.
Neural networks are not perfect and it is not possible to get rid of absolutely all noise. However, the presented model showed good results in speech cleaning in “home” conditions.
The main features of the program to remove noise from recordings
The first thing that is very important is that it is in Russian, which undoubtedly simplifies the process of (removing) noise. There shouldn't be any difficulties in your work.
Anyone who has already encountered similar software will be able to easily edit digital audio, digitize recordings from cassettes and suppress, or rather remove, noise. You can also glue or trim pieces of audio files if you wish.
Among the set of tools, the most notable ones are amplification, attenuation, normalization, increase and removal of extraneous sound noise.
You can also apply many effects to your recording: equalizer, reverb and echo, choruses, presets and more.
The tool is adapted for beginners and experienced users
If we start editing audio, we won't need much technical knowledge. To get a good result, we just need to upload the file using the "Select File" button and then click "Start". This way, the program will take care of the rest, performing the process automatically. It will be responsible for sampling noise without other sound and checking the map of frequencies that make it up to attenuate it.
But we don't always achieve the best results the first time. If you know a little about the matter, Audio Denoise has the necessary options to customize the audio file to our needs.
In this case we find some parameters that will help us, with a little practice, to control the noise reduction. In this way we can control the type of noise model by means of a fitted distribution, an average or an autoregressive model. Depending on the type of noise we select, the values of other characteristics such as the amount of noise reduction , noise modulation tracking, noise model complexity, or smoothing will change. Additionally, these values can also be changed manually.
The application shows us a window in which we can listen to both the original audio and the received audio. From here, after making various changes, we can compare the options and check if we get the desired result. When finished, click the “Download” button to download the file with all the resulting variations. Of course, it should be noted that at the moment the application only supports the WAV format. Therefore, if we have files in other formats such as MP3 or AAC, we must convert them first.
Disadvantages of noise removal software
Audacity is of course a little weaker than Sound Forge. It does not work with video files, and the LAME MP3 codec will have to be installed separately.
Another drawback is the absence of some modules, instruments and effects that connect via VST or DirectX.
It’s also possible to completely remove the hum in a recording from a microphone, which means you probably won’t be able to make the recording perfect, but it’s quite possible to improve the quality.
The program is not designed to reduce the noise of a computer cooler - only cleaning the fan from dust helps, but there are no such programs. Program features other than noise suppression:
- simultaneous use of several tracks;
- export, import and file encoding;
- listening to tracks while recording simultaneously;
- control of playback and recording levels;
- suppression (removal) of sound interference;
- recording from a microphone;
- format support: WAV, AIFF, NEXT/SUN AU, MP3, MPEG,
- White noise source
Audio Denoise helps improve the quality of your audio files
Audio Denoise is a web application that we can access from our favorite internet browser with which we can eliminate background noise from recordings and audio files . Thus, there is no need to download any programs on our computer. This application runs in the cloud and was created to clean and enhance all types of audio files, from the most homemade to the most professional.
The best thing about this online application is that it is 100% free and that it does not require any registration, so to use it, simply access your website. This application can be used by both beginners and professional users. This is possible because it has a simple user interface, albeit in English, and with a lot of customizable controls it will help us work well with our audios.
AudioLab on Android
Another part of the application that will come in handy when processing voice is our task. The app is feature rich and you can do a lot beyond basic editing. All the options like crop, merge, transform, split, etc. are presented as tiles so you can choose a specific function to work with.
Scroll down and find Silence Remover then select it. You can select tracks from the list of all tracks or select a file manager to select it manually. It is also possible to record audio. In addition to this, you can preview the audio and change the file name. You can choose from presets to remove silence. You can remove all the silence or just the beginning and end. You must then select a decibel threshold below which everything will be considered quiet. You can leave the discovery mode as is. There's also the option to use the output as an alarm, notification, or even your default ringtone, but let's skip that for now. Then click the checkmark, which will remove the silence and create another file. You can view the file in AudioLab's output section and also listen to the original file.
Get AudioLab for Android.
Alternative options for Audio Denoise
If we are looking for a tool to remove ambient noise from our audio files, we suggest several Audio Denoise alternatives that you should take into account:
insolence
It is a very popular digital audio recording and audio editing software because it is completely free. It has a large number of tools for editing our audio files. One of these features is the ability to easily remove annoying external noise from any audio file in just a few steps. Audacity is cross-platform, so it can be used on Windows, Linux, or MacOS computers. We can download it for free from their website.
Adobe Audition
It is a professional audio editing and post-production tool that will turn our computer into a multi-track recording studio. This program offers many advantages when it comes to improving audio quality, allowing us to quickly eliminate variable bandwidth noise such as background sounds, whispers and wind. We can test out Saman Listening for free by downloading its seven-day trial from their website. To be able to use it permanently, it is necessary to pay for your license, which we can purchase from 24.19 euros per month.
Total
- It's likely that your noise problem can be solved not with technology, but with a simple conversation with your household or neighbors: warn them about an important video conference or call and ask them to be quiet during this time.
- If the noise does not depend on the people around you, and you don’t need to make calls on your computer so often and a couple of hours a week is enough for you, then you can use the free version of Krisp - this program combats noise quite well.
- The paid version of the same Krisp will cost quite a lot of money (and not just once, but annually), so it may be more profitable to invest in a noise-cancelling microphone - more about this in our previous post.
- We were unable to find any applications specifically for noise reduction on smartphones. Probably because many smartphones have multiple microphones and built-in noise reduction. By the way, you can try using a smartphone instead of a computer for important video calls.
More about creating a comfortable environment at home - digital and not only - can be found in our blog.
Step-by-step instructions for removing noise
- Open the source file (File – Open or Ctrl+O) and get something like the following window:
- We find a section of the file with background noise. We use the block of buttons to scroll through the record under the chart. You can also use the buttons in the lower right corner to increase/decrease the signal amplitude, as well as zoom the block of control buttons. After trying it a couple of times, I think you'll figure it out.
- Select the found area. Just left-click, without pressing any extra buttons. The background color of the graph is inverted. The background will turn black and the selected area will turn white.
- Go to the menu at the top “Effects – Noise Reduction/Restoration – Noise Reduction (process)”. The following window will appear:
- Click Capture Noise Print to capture the noise area (1). Red, green and yellow dots will appear on the frequency diagram. To apply noise removal to the entire file, click Select Entire File (2). Move the points on the blue curve to change the sound. With each change, press the play button (5) below. I advise you not to increase the Noise Reduction parameter (4) more than 60-70 to avoid distortion. If you take noise with admixtures of voice as noise, for example, then do not exceed 20-30% (distortion will occur earlier). The “Reduce by” parameter can be left as default.
Once you find the best sound quality by playing around in this window, click Apply (6). - After applying the filter, the peaks in the chart will become slightly smoother. Before applying the filter: After applying the filter:
- To save the file “File – Save as”. A window will appear:
Click Change next to Sample Type:
Here you can set the “Sample Rate” value higher and specify “Stereo” in the number of channels, or even better, “Same as source”.
Click Change next to Format Settings:
In the Type column, always specify Constant (constant bitrate). The fact is that with a constant bitrate, a certain amount of memory is allocated for each second of audio information, regardless of the type of information. Let's say 5 seconds of one track contain the rapidly changing voice of an opera singer, and 5 seconds of another contain the constant monotonous hum of an engine. Question: in which case will you need more space to save the signal? It is logical that in the first. With a variable bitrate, less space will be spent in the second 5 seconds. Accordingly, we save space. With a constant bitrate, both there and there will be spent the same amount of memory (larger).
Why use a constant bitrate then? And then, the area that can be encoded in worse quality is determined by the program and not always correctly. And compressed audio files don’t take up so much space that it makes sense to save.
In the Bitrate line, set it to the maximum: 320 kbps (44100 Hz).
After all the changes made in the saving settings, click “OK” and save the file.
Ferrite recording studio
If you're an iOS user looking for an easy way to remove silence from an audio file, this is the app you need.
To get started, download the application from the link below and open it. You can import audio by opening the menu in the top right corner. If you want to record your voice, just press the microphone button. Once you import or record audio, you'll find the track right below the recording section. Now let's move on to removing the silent parts.
Simply load a track and click on a track to select it. Then switch to see more options and select Silence Strip. There are options to customize which parts you want to remove. To do this, select a threshold, you can also reduce the silence and set the duration of the minimum silence. One thing to keep in mind is that this will simply remove the silence. Essentially, it leaves empty spaces, so you'll have to drag the clip to line them up together. It's simple, right?
Enable Ferrite Recording Studio iOS.
Courage
It is already widely used software. He does everything from editing podcasts to recording entire albums. I use it for all audio editing and basic processing and it fits the bill perfectly. One feature I use mostly is the Silence Removal Tool. It comes in handy in all situations, removes large chunks of silence and saves not only time but also storage space. Here's how to do it on Windows with Audacity.
The parts of silence are highlighted in the image above.
Download Audacity from the link below, install and open. You can record audio or simply drag and drop a sample of the audio you want to process. You will see your sound in a waveform, and if you pay attention there will be areas where there is no sound (like a straight line). But when you play a track and notice the monitor, you will notice that it bounces even when playing silently. So if you have recorded yourself with a lot of silences, relax, we will remove the silence in one go.
To get started, double-click the track or click Ctrl + A to select all. Then click on Effects, scroll down and find Truncate Silence. If you are not sure, leave the settings as they are and click OK. This will give you a trimmed version of the audio file after removing all the silence.
If you want to customize the setting yourself. Here are the attributes you need to be aware of in the Trim Silence dialog box. The Silence Detection section specifies the threshold and duration.
Detect Silence
Threshold - If the sound is below the specified level, it will be considered silent. To remove more silence, you can reduce this number, for example from -36 dB to -20 dB. This will give the program more sound to work with.
Duration - It defines the minimum duration during which silent parts should be considered for removal. The latter value can be adjusted to 0.001 seconds.
Action
Truncate Detect Silence - The track will be trimmed to the parameters in the Detect Silence box. If the audio is lower in both the level and duration set in the Silence Detection Field, it will respect the number set in Trim to Duration.
Shrink to - This is a more advanced feature. If the audio is below the silence detection threshold and duration, but still longer, it will be reduced to the percentage specified in the Compress to function.
Trim tracks yourself - If there are more tracks and you apply this tool to all of them, it will not remove the silence because it will not detect it at the same point. Thus, selecting this option will remove silence regardless of all tracks.
You can also save a preset if you want to use the same settings more often.
Get bold here.