Does anyone here have a decent working knowledge of quantisation...

Coda II · Sep 20, 2010

Does anyone here have a decent working knowledge of quantisation in it's application as the basis of lossy codecs?

As I understand it, the fundamental tool of data compression (in the context of audio files) is the quantisation of numbers and the degree to which the codec can get away with doing this.

Again, as I understand it, if I have a list of numbers that I want to store and they are too big, I can divide them by another number - which makes them smaller. When I want my data back I multiply by the original number and there's my data all back again.

So for example, if I have a list of numbers between 100 and 199 (sticking to base 10 for now as that's all I understand) but only have two columns to write them in, I could divide them all by 2 and I now have a list of numbers between 50 and 98 which will fit my columns.

When I multiply them by 2 though, it becomes apparent that the even numbers have done alright but the odd numbers have now become even as well. There is therefore an error - the quantisation error.

It is this quantisation error that the codec tries to hide by making use of 'audio masking'*

I suspect that quantisation is vastly more complicated than this but would like to know whether my basic understanding is at least on the right track.

*Audio masking generally becomes the focus of discussions I have seen on lossy compression; it also appears to be widely misunderstood so if possible I'd like to leave it to one side for now.

pete693 · Sep 20, 2010

There are only 10 types of people.
Those that understand binary and those that don't.

felix · Sep 20, 2010

Coda - two different things going on:

Quantisation error is the bit (haha) 'missed' due to the minimum size of 'step' captured between available levels in a digital system. The smallest available step corresponds directly to the number of 'bits' available. (eg for 16-bit audio at a nominal 2Vrms line level, the smallest 'step is about 30microvolts). Early digital implementations had a technical problem with this, and the answer is 'dither'.

'Dither' is basically adding noise at *half* the level of the smallest step between levels (i.e. at a mean level of 15.5 bits for16-bit CD) This reduces the maximum achievable S"N ratio from CD to about 93dB (still roughly 20dB better than the very best studio tape systems) but the pay-off for the little extra (3dB) noise is astonishing: theoretically such a system when correctly dithered has infinite resolution. Indeed Bob Stuart of Meridan demonstrated complete recovery of signals at the -115dB level, more than 20dB below the noise floor, over 20 years ago. The maths is not necessaily intuitive, but very elegant.

Second part - I think you have conflated with compression, lossy or otherwise. There are various algorithms to do such compressio and since digital audio is 'sign-magnitude' coded the polarity of the signal *is*preserved throughout.

Lossless compression, such as FLAC, work a little like you describe by 'packing' sequences and re-inflating them later. Lossy compression routes, such as MP3, simply compresses data by discarding detail based on prioritised models of how the ear works ('you cant hear this at this level...*toss*). This is permanently destructive. But neither approach depends on 'quantisation' as such.

Coda II · Sep 21, 2010

still not entirely sure I've got the wrong end of the stick...

Thanks for the reply Martin.

It is entirely possible that I am that I am taking 2 + 2 and making 5 (or 10 + 10 = 101), but the various articles I've read to try and understand lossy compression all leave me with the impression that further quantisation is applied and it is how aggressively this is applied that gives lower bitrates and more audible artifacts.

The psychoacoustic model you refer to is in itself very interesting, but I'm still trying to get to grips with what comes before that.

The following is from what appears to be an 'industry' overview:

The key to MPEG/audio compression is quantization. Although quantization is lossy, this algorithm can give "transparent", perceptually lossless, compression.

[snip]

...At lower frequencies a single subband covers several critical bands. In this circumstance the number of quantizer bits cannot be specifically tuned for the noise masking available for the individual critical bands. Instead, the critical band with the least noise masking dictates the number of quantization bits needed for the entire subband. Second, the filter bank and its inverse are not lossless transformations. Even without quantization, the inverse transformation cannot perfectly recover the original signal.

original article here easier to read but without diagrams here

felix · Sep 21, 2010

Ah. I think 'quantisation' there is just referring to the process of sampling the data - chopping it into chunks, or 'bins' etc to apply the codec's model to. It doesn't necessarily correspond at all with 'quantisation' in the sense of linear PCM audio as I started-off decribing.

Coda II · Sep 27, 2010

Sorry to be a bore, but:

Â· The input samples are mapped into a subsampled
spectral representation using an analysis filterbank.
Â· Using a perceptual model the signal's frequency
and time dependent masking threshold is estimated.
This gives the maximum coding error that can be
introduced into the audio signal while still
maintaining perceptually unimpaired signal quality.
Â· The spectral values are then quantized and coded
according to requirements derived from the
masking threshold estimate. In this way, the
quantization noise is hidden ("masked") by the
respective transmitted signal as far as possible and
perceptibility of the coding error is minimized.

from: TEMPORAL NOISE SHAPING, QUANTIZATION AND CODING METHODS IN PERCEPTUAL AUDIO CODING: A TUTORIAL INTRODUCTION
JÃƒÅ"RGEN HERRE

There seems to be a rather deep rooted notion that lossy codecs discard (ie delete) data that (they estimate) would not be heard. This appears not to be the case; information is removed (in a frequency domain context as opposed to time domain with PCM) and this removal leaves not a blank, but noise.

Rather than applying correction fluid, or even a single neat line as we are instructed, the codec applies a good old fashioned schoolboy crossing-out. But it looks for what in the news media world would be a good day to hide bad news to cover it's tracks.

Does anyone here have a decent working knowledge of quantisation...

Coda II

getting there slowly

pete693

felix

part-time Horta

Coda II

getting there slowly

felix

part-time Horta

Coda II

getting there slowly

Latest posts