Scientific Background
AQuA Command Line parameters
AQuA Command Line usage
Analysis of possible reasons for voice and audio quality loss
Visualizing signals spectrum for analysis
AQuA Online
AQuA Benefits

AQuA – Audio Quality Analyzer page
Download AQuA Manual as PDF

Scientific Background

The human ear is a non-linear system, which produces an effect named masking. Masking occurs on hearing a message against a noisy background or masking sounds.

As result of the research of the harmonic signal masking by narrow-band noise Zwiker has determined that the entire spectrum of audible frequencies could be divided into frequency groups or bands, recognizable by the human ear. Before Zwiker, Fletcher, who had named the selected frequency groups as critical bands of hearing, had drawn a similar conclusion.

Critical bands determined by Fletcher and Zwiker differ since the former has defined bands by means of masking with noise and the latter – from the relations of perceived loudness.

Sapozhkov has determined a critical band as “a band of frequency speech range, perceptible as a single whole”. In his earlier researches he even suggested that sound signals in a band could be substituted by an equivalent tone signal, but experiments did not confirm this assumption. Critical bands determined by Sapozhkov differ from those determined by Fletcher and Zwiker since Sapozhkov proceeded from the properties of speech signal.

Pokrovskij has also determined critical bands on the basis of speech signal properties. According to his definition the bands provide equal probability of finding formants in them.

The value of spectrum energy in bands can be used for different purposes; one of which is the sound signal quality estimation. However, using only one author’s critical bands (for example, Zwiker’s critical bands are used in prototype) does not allow getting an estimation objective enough, since they show only one of the aspects of perception or speech production. AQuA can determine energy in various critical bands as well as in logarithmic and resonator bands, that allows taking into consideration more properties of hearing and speech processing.

Taking into account that the bands determined by Pokrovskij and Sapozhkov are better for speech signal and not for sound signal, in general allows increasing the accuracy of estimation depending on its purpose.

AQuA utilized research results of the above mentioned scientists implementing different algorithms in one software solution. AQuA also has several advantages compared to other existing voice quality measurement software.

Besides critical bands new AquA implements a more advanced psycho-acoustic model, which consists of three layers:

  • psy-filtering
  • level normalization
  • transform into detectable range

Psycho-acoustic model is based on dependencies obtained during experiments. The most complex phase is psy-filtering represented at pic. 1.

Pic. 1. General scheme of psy-filtering

Masking procedure includes the following sequence of actions:

  1. hearing threshold processing
  2. fluid level masking
  3. spectrum separation into tones and noises
  4. creating masks from tone components
  5. creating masks from noise components
  6. joining tone and noise mask components
  7. joining current mask with post-mask
  8. preparing post-mask for the next frame
  9. creating mask for the previous frame

Hearing threshold corresponds to ear sensitivity towards intensity of sound energy, and minimal sound pressure that produces feeling of hearing is called hearing threshold. Threshold level depends on type of sound fluctuations and measureing conditions. One of possible options to detect hearing threshold (implemented in AQuA 5.x) is standartised in ISU/R-226.

Psycho-acoustic model implemented in AQuA 5.3 introduces the so-called range of detectable loudness, which is minimal change of signal amplitude detectable by a human ear. It’s a well-known fact that depending on signal loudness level and frequency human perception varies from 2 up to 40%.

AQuA algorithms have certain advantages:

  • it is universal since it allows measuring signals quality from various sources and processed in different ways;
  • one can optimize quality estimation depending on the purposes:
      – for speed (for example, it is possible to receive rough estimation quickly);


    – in signal type (using different bands for speech signals and sound signals in general);
  • resulting estimations correlate well with that of МОS;
  • quality estimations received for speech signals can be translated in values of various kinds of intelligibility.


To AQuA Manual Page (1/7)

To AQuA Manual Page (3/7)


Copyright 2003-2017 Sevana Oü