AQuA - Audio Quality Analyzer
AQuA – Audio Quality Analyzer

Introduction

AQuA – Audio Quality Analyzer is a simple but powerful tool to provide intrusive perceptual voice quality analysis. This is the easiest way to compare two audio files and test voice quality loss between reference and degraded files. Besides this the software can also test audio codecs and generate audio signals for voice quality testing. Download Complete AQuA Manual here.

AQuA gives a unique opportunity to design your own voice quality testing solution not being dependent on particular hardware and software. It is available as a library for Windows and Linux, portable to Java and mobile devices.

Functionality

Requirements

AQuA is capable to work with audio files represented in .wav or .pcm formats. Audio files should have the following parameters depending on the version one uses:

AQuA Sample rate Bits per sample Compressions Channels
Voice 8kHz 8, 16, 24, 32 Uncompressed, a-law, u-law Mono
WB 8kHz – 192kHz 8, 16, 24, 32 Uncompressed, a-law, u-law Not limited (16 channels max)

Compare wav files

  • Allows intrusive testing of reference wav file against degraded wav file;
  • Allows measuring voice quality for any language;
  • Reports percentage of quality similarity and MOS score according to ITU-T P.800.

Testing parameters

AQuA supports the following parameters for voice quality testing:

  • Choosing type of quality measurement: overall quality loss or voice naturalness;
  • Choosing filenames for original, test and generated audio files;
  • Define type of weight coefficients: uniform, linear or logarithmic;
  • Allows energy normalization;
  • Allows setting envelope smoothing level from 1 to 10;
  • Allows choosing source for original sound (external file or generated internally);
  • Contains audio synchronization and voice activity detection;
  • Provides reasons for voice quality loss (quite unique feature on the market):
    • Duration distortion;
    • Changes in signal spectrum;
    • Distortion detection in low, medium and high frequency bands;
    • ASR;
    • Amplitude clipping;
    • Signal energy analysis;
  • Provides voice quality feedback in:
    • Percentage of similarity;
    • MOS value;
  • Enabling advanced psycho-acoustic model:
    • Psycho-acoustic filter;
    • Normalization to loudness level at 1kHz;
    • Spectrum transform into detectable range of loudness;
  • Audio synchronization – trimming silence in the beginning and end of the test file;
  • Adjusting ratio between calculation performance and quality score forecast accuracy.

 

 

Scientific Background

The human ear is a non-linear system, which produces an effect named masking. Masking occurs on hearing a message against a noisy background or masking sounds.

As result of the research of the harmonic signal masking by narrow-band noise Zwiker has determined that the entire spectrum of audible frequencies could be divided into frequency groups or bands, recognizable by the human ear. Before Zwiker, Fletcher, who had named the selected frequency groups as critical bands of hearing, had drawn a similar conclusion.

Critical bands determined by Fletcher and Zwiker differ since the former has defined bands by means of masking with noise and the latter – from the relations of perceived loudness.

Sapozhkov has determined a critical band as “a band of frequency speech range, perceptible as a single whole”. In his earlier researches he even suggested that sound signals in a band could be substituted by an equivalent tone signal, but experiments did not confirm this assumption. Critical bands determined by Sapozhkov differ from those determined by Fletcher and Zwiker since Sapozhkov proceeded from the properties of speech signal.

Pokrovskij has also determined critical bands on the basis of speech signal properties. According to his definition the bands provide equal probability of finding formants in them.

The value of spectrum energy in bands can be used for different purposes; one of which is the sound signal quality estimation. However, using only one author’s critical bands (for example, Zwiker’s critical bands are used in prototype) does not allow getting an estimation objective enough, since they show only one of the aspects of perception or speech production. AQuA can determine energy in various critical bands as well as in logarithmic and resonator bands, that allows taking into consideration more properties of hearing and speech processing.

Taking into account that the bands determined by Pokrovskij and Sapozhkov are better for speech signal and not for sound signal, in general allows increasing the accuracy of estimation depending on its purpose.

Perceptual model and audio synchronization

AQuA utilized research results of the above mentioned scientists implementing different algorithms in one software solution. AQuA also has several advantages compared to other existing voice quality measurement software:

Besides critical bands new AquA implements a more advanced psycho-acoustic model, which consists of three layers:

  • psy-filtering;
  • level normalization;
  • transform into detectable range.

Psycho-acoustic model is based on dependencies obtained during experiments. The most complex phase is psy-filtering represented at Fig. 1.

Sevana AQuA Psycho-acoustic model
Fig. 1. General scheme of psy-filtering 1

Masking procedure includes the following sequence of actions:

  1. hearing threshold processing;
  2. fluid level masking;
  3. spectrum separation into tones and noises;
  4. creating masks from tone components;
  5. creating masks from noise components;
  6. joining tone and noise mask components;
  7. joining current mask with post-mask;
  8. preparing post-mask for the next frame;
  9. creating mask for the previous frame.

Hearing threshold corresponds to ear sensitivity towards intensity of sound energy, and minimal sound pressure that produces feeling of hearing is called hearing threshold. Threshold level depends on type of sound fluctuations and measuring conditions. One of possible options to detect hearing threshold (implemented in AQuA 7.x) is standardized in ISU/R-226.

Psycho-acoustic model implemented in AQuA 7.3 introduces the so-called range of detectable loudness, which is minimal change of signal amplitude detectable by a human ear. It’s a well-known fact that depending on signal loudness level and frequency human perception varies from 2 up to 40%.

AQuA algorithms have certain advantages:

  • it is universal since it allows measuring signals quality from various sources and processed in different ways;
  • one can optimize quality estimation depending on the purpose:
    • for speed (for example, it is possible to receive rough estimation quickly);
    • by signal type (using different bands for speech signals and sound signals in general);
  • resulting estimations correlate well with that of МОS;
  • quality estimations received for speech signals can be translated in values of various scores of intelligibility.

Original and degraded signals are sent to the sound quality measurement system input. The quality of reference signal is considered 100%, the system treats it as “ideal signal”, the one without distortions and degradation. Test signal is a result of the reference transmission over communication channel, which may contain different impairments.

 

 

AQuA Command Line parameters

AQuA Usage

AquA-XX <license file> [options]

Print AQuA command line help

-h                        prints    AQuA command line help;

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.314.

Copyright (c) 2018 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou. Internal use only

—————————————————————

-h [sndn | exam]

– prints this help;

sndn – prints list of sounds names;

exam – prints samples of program usage;

-mode <mod>   – defines program mode; The following modes are available

<mod> :

codec    – codec testing mode;

files    – audio file comparison mode;

folder   – folder comparison mode;

generate – test signals generation mode;

-clibf <file> – codec library file name;

-src <file <fname> | gen <mode> | folder <fname> <ext>>

determines source of initial sound: <file> – external sound

file, <gen> – internal signal generator or <folder> external

sound files from folder. In file mode one should specify name

of audio file. In generator mode – one of signal generation

modes: short, normal or long. In folder mode – path to audio

files and their extensions;

-ct <ctype>   – sets type of weight coefficients: uniform, linear, logarithmic

or htrsdelta;

-tstf <fname> – sets name of the file being tested; in <folder> mode sets

name of folder, with the files being tested;

-frep <fname> – sets name of report file for folder processing mode;

-dst <fname>  – sets name of the file generated by the speech model;

-sn <all-speech | short-noise | normal-noise | long-noise | <sname>>

– sets the name of the synthesized speech sound or runs

the generation of a sound signal corresponding to one

of the models:

all-speech – full distribution of speech sounds;

short-noise – short noise model;

normal-noise – the average noise model;

long-noise – the large noise model;

-voit <female | male>

– sets voice type for synthesized sound;

-slen <num>   – sets duration for synthesized sound equal to <num> samples.

num = 1..960000;

-qt <quality | naturalness>

– sets type of quality measurement overall quality loss

or voice naturality;

-power <on | off>

– enable/disable output of codec speed performance indicator;

-enorm <on | off | rms>

– enable/disable energy normalization;

-g711 <on | off>

– enable/disable normalization to G.711 quality scale;

-npnt <<num> | auto>

– sets numbers of links points. num = 1..10;

auto – enables detection of optimal amount of linking points;

-miter <num>  – sets envelope smoothing level. num = 1..10;

-gch          – turns waiting for key pressed after voice quality output on;

-fau <fname>  – prints reasons for quality loss;

-ratem %mp    – sets output estimations: %) voice quality in percentage,

m) MOS-like estimation, p) PESQ-like estimation;

-acr <<num> | auto>

– sets spectral analysis precision. num = 7..16,

auto – enables automated analysis  precision detection

according to sampling frequency;

-decor <on | off>

– enable/disable delta correction;

-emode <normal | log | 10log>

– integrating mode: normal – linear, [10]log – logarithmic;

-mprio <on | off>

– sets signal type: on – music, off – voice;

-tdel <num>   – delay from start of test file;

-spfrcor <on | off>

– On/off different frequencies perceptions correction;

-psyf <on | off>

– On/off psy-filter;

-psyn <on | off>

– On/off psy-normalyzer;

-smtnrm <on | off>

– enable/disable smart normalization of energy;

-avlp <on | off>

– enable/disable average levels correction;

-tmc <on | off>

– allow/forbid calculation time measure;

-voip <on | off>

– turns on/off processing of only speech related and specific frequency bands.

In particular this parameter forces AQuA to consider signals

only in the range between 300Hz and 3.4kHz (telephone frequency band);

-grad <on | off>

– allow/forbid amplitude gradation using;

-specp <num> <fname>

– out <num> spectral pair values to file <fname>. num = 8,16 or 32;

-fst <num>    – sets program performance speed. Increasing speed decreases accuracy;

num = 0.0 (slow) … 1.0 (fast);

-trim[-src|-tst] <a | r> <level>

sets trimming type ((a)bsolute or (r)elative threshold) and level

(0.0 <= level < 120.0). -trim-src run trimming for source file;

-trim-tst run trimming for degraded file; -trim run trimming for

both files;

-short <<sec> | -1>

sets the maximal value of the short file duration (2.0 <= sec <= 180.0);

-1 Disables operation with short files;

-hist-pitch <on | off>  <on | off>

Allows calculating pitch statistics; First flag enables or disables

metrics calculation. The second flag enables or disables metrics usage

for quality evaluation;

-hist-level <on | off> <on | off> <on | off>

Allows calculating signal levels and quantization step histogram.

First flag enables or disables metrics calculation. The second flag allows

signal levels histogram usage for quality evaluation, the third flag

allows quantization step histogram usage for quality evaluation;

-cut-src <start-time> <end-time>

This option removes audio from the beginning or end of reference file before

processing. <start-time> – remove <start-time> milliseconds in the beginning

of the correspondent audio file; <end-time> – remove

<end-time> milliseconds in the end of the correspondent audio file;

-cut-tst <start-time> <end-time>

This option removes audio from the beginning or end of test file before

processing. <start-time> – remove <start-time> milliseconds in the beginning

of the correspondent audio file; <end-time> – remove <end-time> milliseconds

<end-time> milliseconds in the end of the correspondent audio file;

-output <txt | json>

Sets the output format for the report file.

txt – the report file is displayed in a simple text form;

json – the report file is output in the json format;

-echo-src <on | off>

– allow/forbid calculation of the echo parameters on source file;

-echo-tst <on | off>

– allow/forbid calculation of the echo parameters on test file;

-echo-interval <milliseconds>

– specifies echo detection frame length (in milliseconds); default is 4096ms;

-echo-min-delay <milliseconds>

– specifies echo detection minimal delay (in milliseconds);

 

-echo-max-length <milliseconds>

– specifies echo detection max length (in milliseconds);

Define program mode: -mode <mod>

Defines AQuA mode of operation. The following modes are available:

mod:

codec     codec testing mode;

files        audio file comparison mode;

folder   folder comparison mode;

generate             test signals generation mode.

For example: -mode codec

aqua-v.exe tst.lic -mode files -src file ORIGINAL_FILE -tstf REFERENCE_FILE

aqua-v.exe tst.lic -mode codec -clibf <DLL_LIBRARY_NAME> -src file <TEST_AUDIO_FILE>

aqua-v.exe tst.lic -mode codec -clibf GSM610.dll -src file short.pcm

Command line argument: -clibf <file>

Codec library file name.

Set source of reference sound: -src <file <fname> | gen <mode> | folder <fname> <ext>>

Determines source of initial sound:

file                                     external sound file;

gen                                    internal signal generator;

folder                                external sound files from folder.

In file mode one should specify name of audio file.

In generator mode – one of signal generation modes:

mode                                short, normal or long.

In folder mode – path to audio files and their extensions:

ext                                     .wav or .pcm.

Set type of weight coefficients: -ct <ctype>

ctype                                 uniform, linear, logarithmic or htrsdelta.

Set name of the file under test: -tstf <fname>

<folder> mode sets name of folder, with the files to test.

Set name of report file for folder processing mode: -frep <fname>

Set name of the file generated by the speech model: -dst <fname>

Generate full speech sounds distribution or synthesized sound: -sn <all-speech | short-noise | normal-noise | long-noise | <sname>>

Sets the name of the synthesized speech sound or runs the generation of a sound signal corresponding to one of the models:

all-speech                         full distribution of speech sounds;

short-noise                       short noise model;

normal-noise                   the average noise model;

long-noise                        the large noise model.

Examples:

aqua-v.exe tst.lic -mode generate -sn MODEL -dst SPEECH_MODEL_FILE

Here is example for generating full distribution of speech sounds model audio signal:

aqua-v.exe tst.lic -mode generate -sn all-speech -dst generated_01.pcm

also can specify separate sounds from the table of sounds you can see in the manual.

aqua-v.exe tst.lic -mode generate -sn a0 -voit female -slen 8000 -dst generated_02.pcm

aqua-v.exe tst.lic -mode generate -sn i0 -voit male -slen 8000 -dst generated_03.pcm

For separate sounds one can also set type of voice “-voit male/female” and duration of the sound to be generated “-slen 8000”.

Set voice type: -voit <female | male>

Sets voice type for synthesized sound.

Set duration: -slen <num>

Sets duration for synthesized sound equal to <num> samples.

num                                   1 .. 960000.

Set quality loss or naturallness: -qt <quality | naturalness>

Sets type of quality measurement overall quality loss or voice naturalness.

Enable indication of codec speed performance: -power <on | off>

Enables output of codec speed performance indicator.

Enable energy normalization -enorm <on | off | rms>

Enables energy normalization.

on/off                               these parameters manage amplitude normalization;

rms                                    this parameter turns RMS normalization on.

Turns compatibility with G.711 codec on or off: -g711 <on | off>

Enable/disable normalization to G.711 quality scale.

Set number of link points: -npnt <num | auto>

Sets number of link points:

num                                   1 .. 10;

auto                                  enables detection of optimal amount of linking points (recommended).

Set envelope smoothing level: -miter <num>

Smoothing level is in the range of [1..10].

Turn on “waiting for key press” after showing voice quality output: -gch

Turns on “waiting for a key press” after output of voice quality.

Print reason of quality loss: -fau <fname>

Prints reasons for quality loss to the file specified.

Set voice quality output type: -ratem <% | m | p>

%                                       voice/audio quality in percentage;

m                                       MOS score prediction (objective score of P.800 MOS prediction).

Set spectral analysis precision: -acr <num | auto>

Sets spectral analysis precision.

num                                   7 .. 16;

auto                                 enables automated analysis precision detection according to sampling frequency.

Set delta correction mode: -decor <on | off>

Enable/disable delta correction.

Set spectrums integrating mode: -emode <normal | log | 10log>

Sets one of the integration modes:

normal                              linear;

[10]log                              logarithmic.

Set signals type: -mprio <on | off>

Sets signal type:

on                                      music;

off                                      voice.

Set initial delay: -tdel <num>

Sets delay in samples <num> from the beginning of test file. In order to obtain correct number of samples for certain period in milliseconds please use this formula:

and vice versa:

Enable perception correction: -spfrcor <on | off>

Turns on/off perception correction. This option introduces additional coefficients to specific frequencies is preferred for VoIP or G.729 signal only (8kHz only).

Enable processing speech related frequency bands only: -voip <on | off>

Turns on/off processing of only speech related and specific frequency bands. In particular, this parameter forces AQuA to consider signals only in the range between 300Hz and 3.4kHz (telephone frequency band). When the option is turned on differences in signals spectrum outside of the range above is not             considered. This option is recommended for VoIP, mobile, PSTN and converged networks transmitting telephone-like speech signals.

Set psychoacoustics: -psyf <on | off>

Sets psycho-acoustic filter on/off.

Set psychoacoustics: -psyn <on | off>

Sets psycho-acoustic normalizer on/off.

Set level gradation: -grad <on | off>

Allows/forbids amplitude gradation.

AQuA performance calculation: -tmc <on | off>

Allows/forbids quality score calculation time measurement.

Set average levels correction: -avlp <on | off>

Enables/disables average levels correction.

Smart energy normalization: -smtnrm <on | off>

Enables/disables smart energy normalization. Performs energy normalization according to energy levels in integral spectrums of the most significant frequency band.

Export spectral pairs into CSV file: -specp <num> <fname>

Exports specified amount (<num>) of spectral pairs into the file specified (<fname>).

num                                 this parameter may be equal to 8, 16 or 32. This is important for visualizing differences in original and degraded signals’ spectrums.

Set program performance speed: -fst <num>

Sets program performance speed. Increasing the speed decreases score accuracy.

num                                   this parameter should be in the range between 0.0 (slow) and up to 1.0 (fast).

Set silence trimming: -trim[-src|-tst] <a | r> <level>

This option trims silence in the beginning and end of the file(s) up to the predefined silence level.

Sets silence trimming type: absolute (a) (should be below average signal level), or relative (r) threshold (should be below SNR level), the <level> parameter is set in dB and varies from 0.0 up to 120.0.

-trim                    runs trimming for both files (synchronizes both audio files in time domain);

-trim-src                           runs trimming for source file only;

-trim-tst                            runs trimming for degraded file only;

a                                       absolute threshold, audio below <level> set in dB will be removed in the beginning and end of the audio files;

r                                        relative threshold, audio with SNR below <level> set in dB will be removed in the beginning and end of the audio files;

level                    signal trimming level  is set in dB and varies from 0.0 up to 120.0

Set short file duration: -short <sec | -1>

Sets the maximal value of the short file duration.

-1                                      this value disables operation with short files. This option guarantees quality measurement for the files with duration of the audio from 2 seconds to defined value <sec> regardless of the amount of active sound in the recording;

sec                                     2 .. 180.

Calculate pitch statistics: -hist-pitch <on | off > <on | off>

Allows calculating pitch statistics.

First flag enables or disables metrics calculation;

The second flag enables or disables metrics usage for quality evaluation.

Calculate signal levels: -hist-levels <on | off > <on | off> <on | off>

Allows calculating signal levels and quantization step histogram.

First flag enables or disables metrics calculation;

The second flag allows signal levels histogram usage for quality evaluation;

The third flag allows quantization step histogram usage for quality evaluation.

Remove audio from beginning or end of reference file: -cut-src <start-time> <end-time>

This option removes audio from the beginning or end of reference file before processing.

start-time                        remove <start-time> milliseconds in the beginning of the corresponding audio file;

end-time                          remove <end-time> milliseconds in the end of the corresponding audio file.

Remove audio from beginning or end of test file: -cut-tst <start-time> <end-time>

This option removes audio from the beginning or end of test file before processing.

start-time                        remove <start-time> milliseconds in the beginning of the corresponding audio file;

end-time                         remove <end-time> milliseconds <end-time> milliseconds in the end of the corresponding audio file.

Set the output format for the report file: -output <txt | json>

This option sets the output format for the report file.

txt                        the report file is displayed in a simple text form;

json                     the report file is output in the json format.

Calculate echo on reference file: -echo-src <on | off>

This option allows/forbids calculation of the echo parameters on source file.

Calculate echo on test file: -echo-tst <on | off>

This option allows/forbids calculation of the echo parameters on test file.

Set echo detection frame length: -echo-interval <milliseconds>

Specifies echo detection frame length (in milliseconds).

milliseconds                     5 .. 120000. Default is 4096ms.

Set minimal delay for echo detection: -echo-min-delay <milliseconds>

Specifies echo detection minimal delay (in milliseconds).

milliseconds                     5 .. 120000.

Set max length for echo detection: -echo-max-length <milliseconds>

Specifies echo detection max length (in milliseconds).

milliseconds                     5 .. 120000.

Set used channel in tested audio file: -channel-tst

As AQuA analysis runs on single channel. Selects audio channel in test file. Default value is -1 – it means “mix all channels together to single mono”.

 

Set used channel in reference audio file: -channel-src

As AQuA analysis runs on single channel. Selects audio channel in reference file. Default value is -1 – it means “mix all channels together to single mono”.

 

Use audio impairments analysis: -new-impairments <on | off>

Enables audio impairments collecting. Depending on configuration report can be saved or used to normalize MOS value. Default behavior is off (no audio impairments collecting).

Set output path for impairments report: -save-impairments-report <report_path>

Specifies output file to save impairments report. Implicitly enables audio impairments collecting ( -new-impairments on)

report_path                     Path to output file where impairments saved.

Enable MOS normalizing: -normalize-mos <on | off | all | diff>

Forces MOS normalizing depending on audio impairments analysis.

  • on – MOS normalized on files which audio impairments quality estimation worser than classical one (from AQuA)
  • off – MOS is not normalized. Default behavior.
  • all – MOS normalize runs on all audio files.
  • diff – MOS normalize uses audio impairments difference (i.e. changes between reference and test files).

 

MOS weight coefficient: -mos-weight <weight>

Sets AQuA non normalized MOS value weight coefficient. When normalizing final MOS = AQuA MOS (non normalized) * weight + audio impairments MOS * (1.0 – weight). Default value is 0.5.

AQuA Command Line usage

Most of our customers represent the following business segments:

  • VoIP service providers
  • Mobile service providers
  • PSTN service providers
  • Satellite service providers
  • Audio and web conferencing providers
  • Radio communications
  • Unified communications
  • Solution providers for telecom

AQuA helps telecom business to solve a wide range of tasks:

  • test conference bridges quality when dialing from different locations
  • monitor quality on a conference bridge
  • monitor quality to certain destinations
  • monitor quality at different terminations by end-to-end testing with termination’s echo server
  • test quality in converged networks (f.e. Mobile-VoIP)
  • IVR system tests
  • device testing in various environments
  • audio improvement algorithms development

In all cases AQuA is the means for intrusive (active) end-to-end testing, which involves a reference audio file compared to the test one passed through a network, device or any other environment that may introduce degradation (f.e. a voice codec).

In order to show how AQuA performs perceptual voice quality assessment we are going to use WAV files one can download from Microtronix web site ( http://www.microtronix.ca/pesq.html). However, one can use any audio files within AQuA Wideband or those that are recorded at 8kHz sampling and are 16 bit mono (in case of AQuA Voice).

Compare two audio files and learn about reasons for voice quality loss

To compare two audio files in AQuA Command Line version when one is interested to get extensive feedback from the software we suggest invoke AQuA in the following manner (sample-01.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/Or272.wav  -tstf ./wavs/Dg002.wav  -acr auto -npnt auto -miter 1 -ratem %m -fau sample-01-log.txt

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.369.

Copyright (c) 2018 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou. Internal use only

—————————————————————

File Quality is

Percent value   39.31

MOS value       2.23

Thus one can see that file comparison gives only 39.31% of similarity what corresponds to 2.23 MOS. By the way, this is an example of when AQuA does detect voice quality loss and PESQ does not (please read more details about this test case on Microtronix page).

After test was executed sample-01-log.txt file contains quantitative reasons for voice quality loss:

Duration distortion.

Audio stretching corresponds to 1.41 percent.

Delay of audio signal activity.

Signal delayed by    100 ms.

Audio signal activity mistiming (unsynchronization) is 1.25 percent.

 

Corrupted signal spectrum.

Overall spectral energy distortion approaches 62.18 %

Vibration along the whole spectrum [-19.73, 42.45] %

Duration distortion.

Audio stretching corresponds to 1.41 percent.

Delay of audio signal activity.

Signal delayed by    100 ms.

Audio signal activity mistiming (unsynchronization) is 1.25 percent.

Corrupted signal spectrum.

Overall spectral energy distortion approaches 62.18 %

Vibration along the whole spectrum [-19.73, 42.45] %

 

Significant distortion in low frequencies band.

Energy distortion approaches 32.27 %

Spectrum vibration in low frequency band [-16.91, 15.36] %

 

Significant distortion in medium frequencies band.

Energy distortion approaches 27.10 %Amplification approaches 24.29 %

 

Value Name              Source      Degraded    Units

SNR                :    63.16       71.19       dB

ASR                :    43.96       41.93       %

RMS classic        :    0.0010      0.0015

RMS bounded        :    0.0040      0.0068

Is RMSbounded valid:    true        true

AvgEnergy          :    31.08       28.58       dB

MinEnergy          :    6.94        1.66        dB

MaxEnergy          :    70.10       72.84       dB

AvgSample          :    83          13

MinSample          :    -11460      -13220

MaxSample          :    11253       14210

Test two audio files and receive audio quality score

In case we like to simply compare two audio files and get feedback on how similar the quality of the one under test is towards the reference audio we suggest invoking AQuA in the following manner (sample-02.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/Or272.wav -tstf ./wavs/Dg001.wav  -acr auto -npnt auto -miter 1 -ratem %m

Result will be:

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   92.08

MOS value       4.89

or invoking it for the other degraded file (sample-03.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/Or272.wav -tstf ./wavs/Dg002.wav  -acr auto -npnt auto -miter 1 -ratem %m

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   39.31

MOS value       2.23

Adapting AQuA to actual environment

AQuA parameters have pre-set values by default, however, in some cases it is required to adapt the algorithm to actual environment, which is network, device, or specific codec. Majority of our customers don’t require adjusting AQuA parameters, but in some cases software tuning makes test results more consistent. There is no common case when it’s 100% required, but some of our customers mentioned that when doing tests in VoLTE networks, or VoIP-mobile this tuning gives better scores.

In case your tests show unexpected results means that AQuA engine or VAD may need tuning. We suggest to start with these parameters first:

  • -npnt This parameter sets the amount of linking points required to catch different “holes”                           inside the signal. By default the value is 5.
  • -miter Sets amount of voice activity detector frames that are used during smoothing. By                 default it’s 5. This is required to smooth the detector’s vibration.

For example (sample-04.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/Or272.wav  -tstf ./wavs/Dg002.wav -acr auto -npnt 1 -miter 5 -ratem %m

Result is:

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   39.73

MOS value       2.25

or invoking it for the other degraded file (sample-05.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/Or272.wav  -tstf ./wavs/Dg001.wav  -acr auto -npnt 1 -miter 5 -ratem %m -fau sample-05-log.txt

In fact this result is much closer to what one would hear, however, the file was degraded. One can find the reasons for voice quality loss in the log.txt file, e.g.:

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   90.22

MOS value       4.82

Duration distortion.

Audio stretching corresponds to 14.15 percent.

 

Advancing of audio signal activity.

Signal advances the original by   -400 ms.

Audio signal activity mistiming (unsynchronization) is 1.34 percent.

 

Value Name          Source   Degraded Units

SNR                : 63.16        60.85        dB

ASR                : 44.27        39.63        %

RMS classic        : 0.0010   0.0010

RMS bounded        : 0.0040   0.0040

Is RMSbounded valid: true     true

AvgEnergy          : 31.08        30.71        dB

MinEnergy          : 6.94     9.25     dB

MaxEnergy          : 70.10        70.10        dB

AvgSample          : 83      74

MinSample          : -11460   -11460

MaxSample          : 11253        11247

Synchronizing reference and test files using AQuA 8.x

In many cases when monitoring voice quality in real life one receives degraded file from the network containing pauses (silence) before and/or after the actual audio. Let’s consider an example received from one of our customers while doing voice quality monitoring in a mobile network. Initial audio is a male voice pronouncing a phrase in English language with the following wave form:

This audio is sent over a mobile network and then recorded back, but due to delays before the call is established and after hang-up degraded file has delays in the beginning and end of the audio:

Furthermore, if one zooms into the “silence” he will realize that it contains noise:

According to AQuA algorithm introduction of silence or noise into audio signal leads to quality degradation, and taking into account that establishing a test call as well as then detecting disconnect tone may take even a couple of seconds, this may significantly decrease the final quality score.

In order to trim irrelevant parts of the test signal in the beginning and end of the degraded file one just needs to invoke AQuA with a -trim parameter (sample-06.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/male.wav  -tstf ./wavs/male_5s_delay_5s_end_-36db_whitenoise.wav  -acr auto -npnt auto -miter 1 -trim r 5 -ratem %m -fau sample-06-log.txt

AQuA output will be:

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   74.59

MOS value       3.98

or one can use another option as described above (sample-07.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/male.wav   -tstf ./wavs/male_5s_delay_5s_end_-36db_whitenoise.wav   -acr auto -npnt auto -miter 1 -trim a 45 -ratem %m -fau sample-07-log.txt

AQuA output will be:

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   75.28

MOS value       4.02

However, in order to be absolutely sure that the trimming works properly let’s test it with an artificially created file containing silence (sample-08.bat):

 

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/male.wav   -tstf ./wavs/male_5s_delay_5s_beginning.wav  -acr auto -npnt auto -miter 1 -trim a 45 -ratem %m -fau sample-08-log.txt

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   100.00

MOS value       5.00

 

and another file with silence in the beginning and the end of the file (sample-09.bat):

 

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/male.wav   -tstf ./wavs/male_5s_delay_5s_end.wav  -acr auto -npnt auto –miter 1 -trim a 45 -ratem %m -fau sample-09-log.txt

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   100.00

MOS value       5.00

 

 

Analysis of possible reasons for voice and audio quality loss

Besides audio quality score AQuA gives a possibility to determine and analyze possible reasons that caused audio signal degradation. Software automatically prepares analysis results that are stored in a log file.

Additional audio quality metrics returned by the system may not look trivial to understand and this chapter is devoted to the main principles of how these metrics are built and how one can interpret them.

AQuA returns additional metrics only in the case when they are out of range for their “typical values” (exception Signal/Noise Ratio (SNR) that is always present in the report). In case the metrics are within the range the system returns “Cannot determine the major reason for audio quality loss”.

Pitch and signal level statistics

Pitch and signal level statistics AQuA builds according to level histograms and rate of sample values and pitch change, e.g.:

Duration distortion.

Audio stretching corresponds to 1.41 percent.

 

Delay of audio signal activity.

Signal delayed by 100 ms.

Corrupted signal spectrum.

Overall spectral energy distortion approaches 61.58 %

Vibration along the whole spectrum [-14.31, 47.27] %

Significant distortion in medium frequencies band.

Energy distortion approaches 31.47 %

Spectrum vibration in low frequency band [-12.52, 18.95] %

Significant distortion in medium frequencies band.

Energy distortion approaches 27.22 %

Amplification approaches 25.42 %

 

Value Name          Source   Degraded Units

SNR                : 63.16        69.59        dB

Average pitch      : 145.12   155.59   Hz

Pitch delta        : 2.44     2.99     Hz

ASR                : 44.27        42.22        %

RMS classic        : 0.0010   0.0015

RMS bounded        : 0.0040   0.0068

 

 

Is RMSbounded valid: true     true

AvgEnergy          : 31.08        26.82        dB

MinEnergy          : 6.94     1.02     dB

MaxEnergy          : 70.10        70.60        dB

AvgSample          : 83      10

MinSample          : -11460   -10216

MaxSample          : 11253        10981

 

Pitch frequencies distribution distortion  : 19.85.

Samples values distribution distortion     : 16.82.

Quantization steps distribution distortion : 56.66.

ASR (Active Speech Ratio)

ASR (Active Speech Ratio) calculation is based on VAD algorithm as division of the number of active speech frames by the overall number of frames in the signal. ASR is represented in percentage, e.g.:

Value Name          Source   Degraded Units

SNR                : 73.21        73.28        dB

ASR                : 93.66        93.66        %

RMS classic        : 0.0078   0.0065

RMS bounded        : 0.0130   0.0130

Is RMSbounded valid: true     true

AvgEnergy          : 53.25        53.11        dB

MinEnergy          : 3.96     3.88     dB

MaxEnergy          : 77.18        77.16        dB

AvgSample          : 0       0

MinSample          : -28818   -28817

MaxSample          : 32766        32766

Trimmed            : true     true

Trimming Level     : 45.00        45.00        dB

Trimming in head   : 22      5022     ms

Trimming in tail   : 66      5066     ms

RMS classic and RMS bounded

RMS “classic” (Root Mean Square) for both signals is calculated at the first step of processing if RMS normalization is turned on (-enorm rms). The system normalizes RMS of degraded signal to match with RMS of the reference signal.

Value Name          Source   Degraded Units

SNR                : 73.21        73.28        dB

ASR                : 93.88        93.87        %

RMS classic        : 0.0078   0.0065

RMS bounded        : 0.0130   0.0130

Is RMSbounded valid: true     true

AvgEnergy          : 53.25        53.11        dB

MinEnergy          : 3.96     3.88     dB

MaxEnergy          : 77.18        77.16        dB

AvgSample          : 0       0

MinSample          : -28818   -28817

MaxSample          : 32766        32766

Trimmed            : true     true

RMS “bounded” excludes low level part of the signal and peak values of the audio samples that occur due to degradation. This parameter is used to optimize normalization by RMS. In case there is enough data in the signal to calculate this parameter it will be used for normalization, otherwise classic RMS is applied. Validity of RMSbound (meaning that there was enough data to calculate it) is present in AQuA report file, e.g.:

Trimming Level     : 8.96     8.88     dB

Trimming in head   : 0       5000     ms

Trimming in tail   : 0       5032     ms

Signal/Noise Ratio (SNR)

These metrics represent SNR both in the original and degraded files.

Value Name              Source      Degraded Units

SNR                :    XX.XX       XX.XX dB

These metrics show the signal/noise ratio of the original and degraded signals. Typically signal quality gets lower when SNR value decreases (or significantly increases) in the test audio.

Duration distortion

This metric represents continuity of compared audio files. Ideally amount of audio data in the original signal and file under test should be the same. During audio processing or transfer over communication channels audio fragments may be lost as well as inserted into the audio. If such audio degradation took place then value of this metric is lower than 100. The bigger the difference the stronger the degradation, however, this metric does not consider possible starting pauses.

When the value is less than 100% this means that audio data was lost and analysis result will be:

Audio shrinking corresponds to ХХ.ХХ percent.

where ХХ.ХХ corresponds to deviation from 100%.

When the actual value is more than 100% this means that data was inserted and analysis result will be:

Audio stretching corresponds to ХХ.ХХ percent.

where ХХ.ХХ corresponds to deviation from 100%.

Tolerance range for this value is set to 100% ± 1%.

Amplitude clipping

Delay of audio signal activity.

Signal delayed by    100 ms.

Audio signal activity mistiming (unsynchronization) is 2.73 percent.

 

Value Name          Source   Degraded Units

SNR                : 40.53        42.01        dB

Average pitch      : 186.42   193.60   Hz

Pitch delta        : 2.87     2.86     Hz

ASR                : 86.75        86.99        %

RMS classic        : 0.0547   0.0601

RMS bounded        : 0.0669   0.0666

Is RMSbounded valid: true     true

AvgEnergy          : 66.42        66.62        dB

MinEnergy          : 40.24        40.28        dB

MaxEnergy          : 80.78        82.29        dB

AvgSample          : -92     -216

MinSample          : -32766   -32768

MaxSample          : 32510        32767

Ampltude Clipping  : 0.04     0.23     %

Amplitude clipping impairment or the so called “buzziness” is related to the fact if the signal amplitude is too high at some point along the analog voice path, when the voice signal is converted to a digital form amplitude clipping can occur. Users report that speech may seem excessively loud and potentially “buzzy” or “fuzzy”. In case amount of clipped samples is higher than 2% the audio quality gets considerably low. One often reason for having amplitude clipping impairment in network is gateway amplification settings on the voice path, e.g.:

 

 

Pitch frequencies distribution distortion  : 0.12.

Samples values distribution distortion     : 6.19.

Quantization steps distribution distortion : 76.98.

Delay/Advancing of audio signal activity

This metric represents signal shift in test file compared to the original and determines how much active level of the test signal delays/advances active level of the reference (original) signal. When it is delayed analysis returns the following

Signal delayed by ХХ.ХХ ms.

where ХХ.ХХ is delay time in milliseconds. Correspondingly, when the signal advances the original the return string is

Signal advances the original by -ХХ.ХХ ms.

where ХХ.ХХ  is advancing time.

Tolerance range for this value is interval of ±50 ms.

Audio signal activity mistiming

This metric represents unsynchronization of active levels in reference and test signals. Original (reference) audio signal and test signal are merged to determine characteristics of audio activity, and when the characteristics of audio activity do not match system increases unsynchronization counter. After processing the final unsynchronization value is presented as percentage of cases when unsynchronization was detected.

If the metric value is not zero analysis result represents it as:

Audio signal activity mistiming (unsynchronization) is ХХ.ХХ percent.

where ХХ.ХХ is percentage of unsynchronization. The value is not considered if it is less than 1%.

Corrupted signal spectrum

This represents a set of metrics reflecting differences in integral energy spectrums of the original signal and audio under test. If overall spectrums difference is more than 15% than analysis returns the following string:

Corrupted signal spectrum.

If difference in spectrums is multidirectional (goes both into positive and negative zones) analysis returns the following string:

Vibration along the whole spectrum [-ХХ.XX, YY.YY] %

where ХХ.XX and YY.YY are deviations to negative and positive zones correspondently. Tolerance range of the deviation is ±5%.

If spectrum distortions are unidirectional (only negative or only positive) analysis returns this string:

Amplification approaches YY.YY %

When distortions are positive, or

Attenuation approaches ХХ.XX %

when distortions are negative.

Other metrics returned by analysis correspond to distortions occurred in different frequency groups. Analysis of different frequency bands performs in a similar manner to spectrum analysis. When talking about frequency bands in question we consider:

Low frequencies                           below 1000 Hz;

Medium frequencies                    from 1000 Hz to 3000 Hz;

High frequencies                           greater than 3000 Hz.

When analyzing frequency bands we use different tolerance range for different bands. Distortion in low frequencies is considered when they are greater than 5%, in medium frequencies – 10% and in high frequencies – 30%.

Multidirectional spectrum changes (vibration) are considered when they are greater than 2.5% in low frequencies, 7% in medium frequencies and 15% in high frequencies.

Unidirectional distortions (no matter positive or negative) are considered when they are greater than 5% in low frequencies, 10% in medium frequencies and 25% in high frequencies.

Visualizing signals spectrum for analysis

AQuA 7.x has a special parameter to store pairs of spectrum energy in critical bands of original and degraded audio to a .csv file (sample-10.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/male.wav -tstf ./wavs/male_5s_delay_5s_end.wav -npnt auto -miter 1 -ratem %p -fau sample-10-report.txt -tmc on -gch -psyn off -psyf off -smtnrm on -enorm on -grad on -specp 32 sample-10-spect.csv

This command produces the following output:

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

————————————————————–

Sevana Ou internal purpose only

————————————————————–

File Quality is

Percent value   91.35

MOS value       4.86

Calculating time 0.630000 sec.

Press any key to continue….

File spect.csv contains 32 pairs related to spectrum energies of both files, so after importing the file into electronic spreadsheet we can plot a diagram visualizing differences in signals’ spectrum:

As one can see the difference is not big and the reasons for received MOS score are stored in sample-10-report.txt:

 

 

Duration distortion.

Audio stretching corresponds to 9.81 percent.

 

Delay of audio signal activity.

Signal delayed by   4990 ms.

Audio signal activity mistiming (unsynchronization) is 1.15 percent.

Value Name          Source   Degraded Units

SNR                : 73.21        73.28        dB

ASR                : 93.88        94.60        %

RMS classic        : 0.0078   0.0065

RMS bounded        : 0.0130   0.0130

Is RMSbounded valid: true     true

AvgEnergy          : 53.25        53.11        dB

MinEnergy          : 3.96     3.88     dB

MaxEnergy          : 77.18        77.16        dB

AvgSample          : 0       0

MinSample          : -28818   -28817

MaxSample          : 32766        32766

As one can see the main reasons are mistiming and delay, which we have not removed, and if we remove it as described in previous chapter (sample-11.bat):

aqua-wb.exe ./lic/aqua-wb.lic -mode files -src file ./wavs/male.wav -tstf ./wavs/male_5s_delay_5s_end.wav -npnt auto -miter 1 -ratem p -fau sample-11-report.txt -tmc on -gch -psyn off -psyf off -smtnrm on -enorm on -grad on -trim r 5 -specp 32 sample-11-spect.csv

we receive result showing that the files are of identical quality:

Sevana Audio Quality Analyzer – AQuA-Wideband v.7.3.5.297.

Copyright (c) 2016 by Sevana Estonia. All rights reserved.

—————————————————————

Sevana Ou internal purposes only

—————————————————————

File Quality is

Percent value   99.94

MOS value       5.00

Calculating time 0.642000 sec.

Press any key to continue….

AQuA Benefits

Among AQuA benefits one will definitely appreciate that:

  • AQuA is available for Windows, Linux and MAC OS operating systems;
  • AQuA is available for both 32 and 64 bit systems;
  • AQuA is easy to deploy and use in any production development;
  • AQuA provides perceptual estimation of audio quality and can be utilized in VoIP, PSTN, ISDN, GSM, CDMA, LTE/4G, VoLTE, satellite and radio networks and combinations of those;
  • AQuA perceptual model is language independent;
  • AQuA can work with audio starting from 2 seconds duration;
  • AQuA can work with audio longer than 30 seconds;
  • AQuA is available as Android native library for custom software development;
  • AQuA can work with stereo audio.

AQuA Error Messages

The following error messages may appear during AQuA run time:

Command line parameters errors

Parameters listing error.

Mode error (-mode).

Data source error (-src).

Sound data generator error (-src).

Weights of sound bands error (-ct).

Synthesized voice type error (-voit).

Incorrect sound duration to synthesize (-slen).

Unknown type of quality estimation defined (-qt).

Incorrect numbers of links points (-npnt).

Incorrect envelope smoothing level (-miter).

Incorrect out quality mode (-ratem).

Incorrect accuracy value (-acr).

Test file name is missing (-tstf).

Speech model file name is missing (-dst).

Name of synthethized sound is missing (-sn).

Codec performance measurement is not set (-power on|off).

Unknown parameter in codec performance measurement option (-power).

Parameter is missing in energy normalization management option (-enorm).

Unknown parameter in in energy normalization management option (-enorm).

File name to store reasons for audio quality loss is not specified.

Parameter is missing in delta-correction management option (-decor).

Unknown parameter in delta-correction management option (-decor).

Parameter is missing in integration mode option (-emode).

Unknown parameter in integration mode option (-emode).

Parameter is missing in signal type selection option (-mprio).

Unknown argument in signal type selection option (-mprio).

Value of initial delay is missing (-tdel).

Value of initial delay is negative or integer value set is incorrect (-tdel).

Parameter is missing in option (-spfrcor).

Unknown parameter in option (-spfrcor).

Parameter is missing in psycho-acoustic model option (-psyf).

Unknown parameter in option setting psycho-acoustic filter on (-psyf).

Parameter in time measurement option is missing (-tmc).

Unknown parameter in time measurement option (-tmc).

Parameter in smart energy normalization option is missing (-smtnrm).

Unknown parameter in smart energy normalization option (-smtnrm).

Parameter is missing in option (-avlp).

Unknown parameter in option (-avlp).

Parameter in psy-normalizer management option is missing (-psyn).

Unknown parameter in psy-normalizer management option (-psyn).

Incomplete parameters in spectrum pairs output option (-specp).

Incorrect amount of spectrum pairs (-specp).

Parameter is missing in filter management option (-voip).

Unknown parameter in filter management option (-voip).

Incompatible set of parameters of psy-normalization and amplitude ranges.

Parameter is missing in amplitude ranges management option (-grad).

Unknown parameter in amplitude ranges management option (-grad).

Parameter is missing in program performance speed option (-fst).

Program performance parameter is out of range (-fst).

Parameter(s) is missing in silence trimming option (-trim).

Unknown type of silence level detection (-trim).

Incorrect silence level (-trim).

Source audio files folder or extension are not defined (-src).

Data source type is not defined (-tstf).

Test files folder is not defined (-tstf).

Report file name is missing (-frep).

No value of the flag, enabling the shift range of estimates.

Unknown value of the flag, enabling the shift range of estimates.

No value of the parameter – the short file duration.

Unknown value of the parameter – the short file duration.

Unknown value of the parameter. Setting options (-short) must be a number [2; 180], -1.

Unknown value of the parameter. Setting options (-trim-tst) must be a number [0.00; 120.00].

Unknown value of the parameter. Setting options (-trim-src) must be a number [0.00; 120.00].

Unknown value of the parameter. Setting options (-trim) must be a number [0.00; 120.00].

Unknown value of the parameter. Setting options (-fst) must be a number [0.00; 1.00].

Unknown value of the parameter. Setting options (-specp) must be a number 8, 16 or 32.

Unknown value of the parameter. Setting options (-tdel) must be a positive number.

Unknown value of the parameter. Setting options (-acr) must be a number [7; 16].

Unknown value of the parameter. Setting options (-miter) must be “auto” or a number [1; 10].

Unknown value of the parameter. Setting options (-npnt) must be “auto” or a number [1; 10].

Unknown value of the parameter. Setting options (-slen) must be a positive number.

Histograms management options (-hist) format error! Use help.

Unknown switch ID in histograms management options (-hist)!

Unknown histogram ID in histograms management options (-hist)!

Management trimming reference file option (-cut-src) format error!

Option (-cut-src) format error! Time must be set a number greater or equal.

Management trimming degraded file option (-cut-tst) format error!

Option (-cut-tst) format error! Time must be set a number greater or equal 0.

Option (-output) format error!

Option (-echo) format error!

Incorrect value of echo-tst parameter. on/off is expected.

Incorrect value of echo-src parameter. on/off is expected.

Incorrect value of echo-interval parameter. Value from 5 to 120000 (milliseconds) is expected.

Incorrect value of echo-min-delay parameter. Value from 5 to 120000 (milliseconds) is expected.

Incorrect value of echo-max-length parameter. Value from 5 to 120000 (milliseconds) is expected.

Runtime errors

Error opening source file!

Error opening file under test (degraded)!

Error: files have different sampling frequencies!

Error: sampling frequency is not supported.

Error: files have different channels!

Error: sampling frequency (in source file) is not supported.

Error: sampling frequency (in degraded file) is not supported.

Error: Source sound signal duration is less than 4096 samples.

Error: Test sound signal duration is less than 4096 samples.

Error: After signal alignment source file duration became less than required.

Error: After signal alignment test file duration became less than required.

Error: Source file sound data is too short.

Error: Test file sound data is too short.

Error: Source file duration is too short.

Error: Test file duration is too short.

Error: Source file duration is too short (shorter than 2 seconds).

Error: Test file duration is too short (shorter than 2 seconds).

Error: Source file manual cut settings failed.

Error: Test file manual cut settings failed.

 

Introduction

In today’s fast-paced digital world, reliable mobile communication is crucial as well as enhancing mobile call quality. Mobile operators are constantly striving to provide their customers with seamless and high-quality call experiences. However, technical challenges such as silent calls, random echo, and short calls can significantly degrade the quality of service, leading to customer dissatisfaction. This case study explores how a leading mobile operator overcame these challenges by integrating Sevana PVQA Server into their existing call quality monitoring system.

The Challenge

The mobile operator in question was facing significant issues with call quality. Customers frequently reported experiencing silent calls, unexpected echo, and abrupt disconnections. Despite having an existing call quality monitoring system in place, the operator struggled to detect and diagnose these specific impairments. The limitations of their monitoring tools meant that many problematic calls went unnoticed, leaving the operator unable to address the root causes effectively.

Silent Calls

Silent calls occur when a call connects, but one or both parties cannot hear anything. This issue can be caused by network problems, faulty equipment, or software bugs. Silent calls not only frustrate customers but also increase the likelihood of them switching to a competitor.

Random Echo

Echo during a call can be highly disruptive. It typically results from network latency or misconfigured equipment. Echo can distort communication, making conversations difficult and leading to a poor user experience.

Short Calls

Short calls, where the connection drops unexpectedly after a brief period, are another significant issue. These interruptions can be caused by network instability, interference, or software errors. Frequent short calls can erode customer trust and loyalty.

The Solution: Sevana PVQA Server

To address these persistent issues, the mobile operator decided to implement Sevana PVQA Server as an additional module to their existing call quality monitoring system. Sevana PVQA Server is a sophisticated tool designed to analyze and monitor voice quality, providing detailed insights into various call impairments.

Implementation Process

The integration of Sevana PVQA Server was seamless and required minimal disruption to the operator’s existing infrastructure. The system began to work alongside the existing monitoring tools, augmenting their capabilities with advanced voice quality analysis.

Identifying Problems

Sevana PVQA Server quickly proved its value by identifying numerous instances of silent calls, echo, and short calls that had previously gone undetected. The system’s detailed analytics provided the operator with precise data on when and where these issues occurred, enabling them to pinpoint problematic areas within the network.

Root Cause Analysis

Armed with comprehensive data from Sevana PVQA Server, the operator conducted a thorough root cause analysis. They discovered that many silent calls were due to faulty network equipment, while echo issues often stemmed from improper configurations in their voice routing system. Short calls were primarily attributed to network instability in specific regions.

Resolving the Issues

With clear insights into the root causes, the operator took targeted actions to resolve the issues. They replaced malfunctioning equipment, reconfigured voice routing settings, and strengthened network stability in identified trouble spots. The proactive measures led to a significant reduction in call impairments.

Results and Benefits

The implementation of Sevana PVQA Server yielded impressive results. The mobile operator experienced a marked improvement in call quality, with a notable decrease in customer complaints regarding silent calls, echo, and short calls. Enhanced call quality monitoring allowed the operator to maintain higher service standards and improve customer satisfaction.

Increased Customer Satisfaction

With fewer call quality issues, customers enjoyed a more reliable and pleasant communication experience. This improvement in service quality helped in retaining existing customers and attracting new ones.

Operational Efficiency

By accurately identifying and addressing call impairments, the operator optimized their network operations. This efficiency not only reduced operational costs but also minimized the time and resources spent on troubleshooting.

Competitive Advantage

The proactive approach to call quality gave the mobile operator a competitive edge. Demonstrating a commitment to high-quality service reinforced their market position and brand reputation.

Conclusion

This case study highlights the critical role of advanced call quality monitoring tools like Sevana PVQA Server in enhancing mobile communication services. By effectively identifying, analyzing, and resolving call impairments, the mobile operator was able to significantly improve call quality and customer satisfaction. This successful implementation underscores the importance of investing in robust monitoring solutions to maintain high standards of service in the competitive telecom industry.

For more information about Sevana and our innovative products, Contact Us.

 

 

Voice quality issues can significantly impact communication experiences, whether you’re using traditional landlines, mobile devices, or Voice over IP (VoIP) systems. When users encounter problems like echonoiseclippingclicking, or dead air, it’s crucial to diagnose the root causes promptly. Let’s explore each issue and how to identify its source within the network.

  1. Echo:
  • Description: Echo occurs when you hear your own voice repeated with a slight delay during a call.
  • Causes:
    • Acoustic Echo: Sound from the speaker’s voice reflects back into the microphone due to room acoustics or speakerphone setups.
    • Network Echo: Delayed echoes caused by network latency or packet loss.
  • Diagnosis:
    • Listen for audible echoes during the call, or monitor your network with a tool that can detect and alert about echo in the payload, e.g. Sevana PVQA Server.
    • Test with different devices and environments.
  • Root Cause Analysis:
    • Check network latency and jitter.
    • Implement acoustic echo cancellation (AEC) algorithms.
  1. Noise:
  • Description: Background noise interferes with voice clarity.
  • Causes:
    • Environmental Noise: External sounds (e.g., traffic, wind) affect call quality.
    • Network Noise: Packet loss or glitches introduce artifacts.
  • Diagnosis:
    • Listen for unwanted sounds (buzzing, hissing, etc.) , or monitor your network with a tool that can detect and alert about echo in the payload, e.g. Sevana PVQA Server..
    • Test in different environments.
  • Root Cause Analysis:
    • Optimize network infrastructure.
    • Use noise reduction techniques.
  1. Clipping:
  • Description: Clipping occurs when an audio signal exceeds its maximum allowed limit.
  • Causes:
    • Amplifier Overdrive: Amplifiers push signals beyond capacity.
    • Digital Clipping: Signal exceeds the maximum level.
  • Diagnosis:
    • Listen for distorted audio during loud passages, or monitor your network with a tool that can detect and alert about echo in the payload, e.g. Sevana PVQA Server.
    • Check for square-wave-like waveforms.
  • Root Cause Analysis:
    • Avoid excessive amplification.
    • Ensure symmetrical output swing.
  1. Clicking:
  • Description: Clicks are sudden, brief disruptions in the audio.
  • Causes:
    • Network Glitches: Packet loss or interruptions.
    • Hardware Issues: Faulty cables or connectors.
  • Diagnosis:
    • Listen for intermittent disruptions, or monitor your network with a tool that can detect and alert about echo in the payload, e.g. Sevana PVQA Server..
    • Test with different devices.
  • Root Cause Analysis:
    • Monitor network stability.
    • Inspect hardware connections.
  1. Dead Air:
  • Description: Dead air refers to complete silence during a call.
  • Causes:
    • Network Drops: Temporary loss of connection.
    • Software Glitches: Call processing issues.
  • Diagnosis:
    • No audio for a period.
    • Abrupt silence during the call.
  • Root Cause Analysis:
    • Monitor network stability with a system that can detect and alert of audio gaps inside of the payload, e.g. with Sevana PVQA Server.
    • Update call processing software.

Conclusion:

Identifying the root causes of voice quality problems involves understanding the symptoms, analyzing the call payload, and addressing specific issues within the network. Proactive troubleshooting ensures better communication experiences. Upon all said above using automatic call quality monitoring system that can detect and alert on voice quality impairments inside the payload will significantly improve QoE and QoS problems root cause analysis. Remember that clear voice quality enhances productivity and user satisfaction! 🗣️🔊🚀

Introduction

In today’s interconnected world, reliable voice communication is essential for businesses, individuals, and service providers. Whether it’s a mobile call or a VoIP conversation, ensuring high-quality voice transmission is crucial. Enter Sevana QualTest, a cutting-edge solution designed to evaluate voice quality and network performance. In this article, we’ll explore the features, use cases, and metrics associated with QualTest.

Features of Sevana QualTest

  1. Mobile Test Probe: QualTest serves as a mobile test probe, compatible with Android-powered devices. It seamlessly integrates into both cellular and VoIP networks, allowing comprehensive testing.
  2. End-to-End and Single-Ended Testing: QualTest offers flexibility by supporting both end-to-end and single-ended call testing. Whether you need to assess the entire communication path or focus on specific segments, QualTest has you covered.
  3. User-Friendly Frontend: The intuitive frontend simplifies test setup. Users can specify calling and called parties, reference audio, and devices effortlessly. The frontend also provides real-time insights during testing.
  4. Detailed Reporting and Analysis: QualTest generates comprehensive reports, including overall quality Mean Opinion Score (MOS) over different time periods. It highlights successful and failed test calls, aiding troubleshooting efforts. Additionally, the system can analyze call audio contents using speech-to-text engines.
  5. Compatibility: QualTest works with both rooted and unrooted mobile phones. Analysis can be performed directly on the mobile device or on connected devices (e.g., Raspberry Pi for regular unrooted phones).

Use Cases

  1. Mobile Operators: QualTest empowers mobile operators to monitor call quality across their networks. By identifying bottlenecks and issues, operators can enhance user experience.
  2. Telecom Service Providers: Telecom providers leverage QualTest to assess voice quality in cellular and VoIP networks. It helps them optimize their services and maintain customer satisfaction.
  3. Telecom Solution Providers: Solution providers integrate QualTest into their offerings, ensuring robust voice quality testing for their clients.
  4. Test Engineers: QualTest is a valuable tool for test engineers involved in call quality assessment. It aids in diagnosing network-related problems and optimizing voice services.

Metrics and Analysis

  1. Network Metrics: QualTest measures critical network parameters, including Round-Trip Time (RTT), jitter, and packet loss for VoIP calls. These metrics directly impact call quality.
  2. Mean Opinion Score (MOS): QualTest calculates MOS scores, providing a quantitative measure of voice quality. Users can choose between reference audio or real call audio for MOS estimation.
  3. Waveform Analysis: By analyzing waveforms, QualTest correlates audio problems with network conditions. This helps pinpoint issues affecting voice quality.

Conclusion

Sevana QualTest is a game-changer in the field of voice quality assessment. Its versatility, detailed reporting, and compatibility make it indispensable for telecom professionals. Whether you’re an operator, service provider, or test engineer, QualTest ensures crystal-clear voice communication in an ever-evolving digital landscape.


Note: If you’d like to learn more about Sevana technologies or schedule a call with their development team, feel free to reach out!


Sevana: Contact Us: QualTest

Sevana AQuA: Promising Audio Quality Assessment for AMR-WB, EVS-WB, and EVS-SWB Codecs

Introduction

In the ever-evolving landscape of telecommunication technologies, efficient audio compression codecs play a pivotal role in delivering high-quality voice communication over limited bandwidth. The quality of audio codecs is traditionally assessed through Mean Opinion Score (MOS) testing, which involves subjective human evaluations. However, the need for automated, reliable, and consistent methods for evaluating codec quality led to the development of Sevana AQuA (Audio Quality Assessment) – a groundbreaking solution capable of differentiating quality based on MOS scores among various codecs, including Adaptive Multi-Rate Wideband (AMR-WB), Enhanced Voice Services Wideband (EVS-WB), and Enhanced Voice Services Super-Wideband (EVS-SWB).

The Significance of Codec Quality Assessment

As telecommunication providers strive to deliver crystal-clear voice quality over diverse networks, codec development has become a critical aspect of ensuring optimal user experience. The efficient utilization of bandwidth, reduction of latency, and preservation of voice naturalness are among the key parameters codec developers aim to enhance. Traditional subjective testing involves a panel of human listeners providing subjective quality scores, often measured by MOS. However, this process can be time-consuming, expensive, and potentially inconsistent due to variations in human perception.

Enter Sevana AQuA

Sevana AQuA represents a paradigm shift in codec quality assessment by leveraging advanced algorithms to automatically evaluate audio quality. The software analyzes audio samples encoded using different codecs, such as AMR-WB, EVS-WB, and EVS-SWB, and assigns them MOS scores based on a robust computational model. This model emulates human auditory perception, enabling AQuA to provide objective and repeatable quality assessments.

Key Features and Functionalities

  1. Objective Quality Scoring: Sevana AQuA employs advanced signal processing techniques and perceptual models to generate objective quality scores for each codec. These scores are highly correlated with human-perceived quality, making them an accurate representation of user experience.
  2. Codecs Comparison: The software enables direct comparisons of codec performance. Telecommunication companies and developers can assess how different codecs perform under varying network conditions, aiding them in selecting the most suitable codec for their requirements.
  3. Scalability and Efficiency: Sevana AQuA’s automated assessment eliminates the need for extensive human listener panels, reducing time and costs associated with quality testing.
  4. Wide Applicability: The software is versatile, accommodating different codecs and network conditions. It can be integrated into the codec development pipeline or used to assess codec performance in real-world scenarios.
  5. Continuous Improvement: Sevana AQuA can be fine-tuned and updated as new codecs are introduced or existing ones are optimized. This ensures that the evaluation remains up-to-date with technological advancements.

Conclusion

Sevana AQuA represents a pivotal advancement in the field of audio codec quality assessment. By offering automated, objective, and reliable MOS scoring for codecs such as AMR-WB, EVS-WB, and EVS-SWB, the software empowers telecommunication companies, developers, and researchers to make informed decisions about codec selection and optimization. As the demand for high-quality voice communication continues to grow, Sevana AQuA plays a crucial role in enhancing user experience across various networks and devices.

PCAP Analyzer - Passive Voice Quality Analysis
PCAP Analyzer – Passive Voice Quality Analysis

Sevana PCAP Analyzer: Pinpointing Audio Impairments and Unveiling Quality of Experience Issues Beyond Network Problems

Introduction:

In the realm of audio communications, ensuring high-quality sound and a seamless user experience is paramount. While network issues often contribute to audio impairments, they are not the sole culprits. Sevana PCAP Analyzer, an advanced tool designed for PCAP file analysis, goes beyond traditional network analysis by accurately pinpointing audio impairments and notifying users about potential quality of experience (QoE) issues, even when the network appears problem-free. This article delves into the capabilities of Sevana PCAP Analyzer and highlights how it revolutionizes the identification and resolution of audio-related problems.

  1. Unveiling Audio Impairments:

Sevana PCAP Analyzer employs sophisticated algorithms and signal processing techniques to detect and identify audio impairments within captured network traffic. By analyzing PCAP files, the tool can uncover a wide range of audio-related problems such as packet loss, jitter, latency, and echo, among others. These impairments can significantly degrade the quality of audio communications and adversely impact user experience. With Sevana PCAP Analyzer, network analysts gain a deeper understanding of the audio stream and can identify the root causes of impairments with precision.

  1. Pinpointing Non-Network Related Issues:

One of the unique strengths of Sevana PCAP Analyzer (unlikely Wireshark f.e.) lies in its ability to detect audio impairments and QoE issues that are not directly related to the network. Even in scenarios where network metrics indicate optimal performance, users may still experience poor audio quality. The tool analyzes audio signals within PCAP files to identify artifacts, distortions, and anomalies that might arise from codec issues, faulty hardware, software glitches, or other non-network factors. By accurately pinpointing these non-network related problems, Sevana PCAP Analyzer helps streamline troubleshooting efforts and ensures a superior audio experience for end-users.

  1. Real-Time Quality of Experience Monitoring:

Sevana PCAP Analyzer incorporates real-time monitoring capabilities to assess the quality of audio communications during live sessions. By capturing and analyzing network traffic in real-time, the tool provides immediate notifications and alerts about potential QoE and QoS issues. This proactive approach enables network administrators and support teams to intervene promptly, preventing a negative impact on user satisfaction. Sevana PCAP Analyzer offers a comprehensive set of metrics, including audio quality scores and MOS (Mean Opinion Score), to quantify the perceived quality and facilitate precise QoE monitoring.

  1. Detailed Analysis and Reporting:

Sevana PCAP Analyzer offers an array of analysis features that provide in-depth insights into audio impairments and QoE issues. The tool enables users to visualize audio metrics, and perform statistical analysis. Additionally, it generates comprehensive reports that capture the analysis results, facilitating documentation and sharing of findings with relevant stakeholders. These reports serve as valuable references for ongoing optimization efforts and troubleshooting collaborations.

  1. Integration and Collaboration:

Sevana PCAP Analyzer supports seamless integration with existing network analysis tools, audio processing systems, and communication platforms. This integration allows users to correlate audio impairments with network performance metrics, facilitating a holistic view of audio quality and enabling efficient troubleshooting. Moreover, the tool promotes collaboration among network analysts, audio engineers, and support teams by providing shared access to PCAP files, analysis results, and reports, thereby fostering a collaborative environment for problem resolution.

Conclusion:

Sevana PCAP Analyzer goes beyond traditional network analysis by accurately identifying audio impairments and shedding light on quality of experience issues, even when the network seems problem-free. With its ability to pinpoint non-network related problems, provide real-time monitoring, offer detailed analysis, and support integration and collaboration, Sevana PCAP Analyzer empowers organizations to deliver exceptional audio quality and ensure a superior user experience. By leveraging the capabilities

 

Messenger-to-messenger testing. How to test messenger-to-messenger voice and video calls?

This is how you can learn about your customers experience during the audio / video calls and find thin points and bottle necks in your service.

No call recordings, no privacy violation, but full details on the user quality of experience: low types of MOS scores (QoS and QoE), reasons for call quality degradation, pinpoints at the parts of the call audio where the quality had issues.

We promise easy and flexible integration into your messenger.

 

Do you have a hardware solution that records RTP call traffic in real-time? Enable QoE analysis to real-time RTP recording.

Are you happy with the E-Model quality scoring? Try out our user experience metrics. This will give you a competitive advantage and won’t take much effort for integration. Matching network conditions (E-Model, packet loss etc) with user experience (our analysis is based on actual call audio) will give you full picture on your network and user quality of experience (QoE).

Call center agents quality monitoring: do you have a call center that records customer calls and your agents complain about call quality?

Figuring out call quality issues, problematic routes might be much easier with Sevana PVQA technology.
Setting up massive call recording analysis is a matter of a couple of minutes. In return one receives audio quality score (MOS) and report on audio impairments that influenced the call quality.
By the end of the analysis, you will have full picture on the calls that had most of the problems in quality and potential reasons for them. Interested in receiving this information in real-time? This just simple with PVQA technology implemented for real-time traffic analysis. Contact us for more information.