Fundamentals of 3D Audio

Genuine 3D audio is in its simplicity what we hear every day. When we listen we perceive the different sound sources around us, e.g. a bird in a tree or a car driving by. We perceive the direction and distance to the sound source and how it moves i.e. the static and dynamic properties of the sound source. We also perceive information of the environment we are in e.g. outside/inside, small/large room. This information is generated by sound reflecting from surfaces in the environment we are in e.g. walls, ceiling, floor, objects etc. The first reflections we perceive - the 'Early Reflections' - have influence on our direction and distance perception, while the late reflection - the 'Reverberation' - mainly give us information of the environment, e.g. being in a living room or outside.

Binaural Technology

The idea of producing 3D audio is to shift the experience in time by recording/playback, or create a fictitious experience by synthesis. The fundamental technique used to do this is called 'Binaural Technology'. The theory of binaural technology is:

'If the sound pressure at the eardrums are recorded, and later reproduced exactly, then the listener will perceive the sound as if he/she had been present when the signals were recorded'

If the signals are recorded, it is called 'Binaural Recording' and if the signals are artificially produced, it is called 'Binaural Synthesis' or 'Binaural Simulation'. The signals recorded or synthesised are called the 'Binaural Signal', i.e. the signal to be reproduced at the two ears. This last step to reproduce the binaural signal at the two ears is called 'Binaural Reproduction'.

Binaural Recording

Binaural recording can be done by using an 'Artificial Head' or 'Dummy Head'. The artificial head is a replication of an average human in term of its acoustic properties.

Binaural Synthesis

Binaural synthesis is at lot more complex. Theory of binaural synthesis is:

'If the sound transmission from a point and to the two ears can be synthesised, the listener would perceive the sound as in real life'

The complete transfer function is called the 'Binaural Room Impulse Response (BRIR)', and this sound transmission can be divided into two parts, which both have significant influence on how the sound is perceived:

  1. The first part relates to the influence of the environment (the room)
  2. The second part relates to the influence of the human body

The first part maps the transmission from source point to the point at the centre of the head i.e. the 'Room Impulse Response'. The second part maps how the human body affects the sound coming from different directions to produce the signals at the two ears i.e. the 'Head-Related Transfer Function (HRTF)'. If these transfer functions can be determined for each sound source, we can in principle synthesise any situation by using the superposition principle to generate the binaural signal for the two ears. The accuracy of our location when listening to the synthesised binaural signal mainly depends on how well our acoustic properties match the Head-Related Transfer Functions used. As it requires a huge amount of computational power to determine the room impulse response, this part is most often replaced by a 'simple' reverberation algorithm.

Binaural Reproduction

The binaural reproduction of the recorded or synthesised binaural signal can be done by use of either headphones or loudspeakers. The best performance is obtained using headphones, but each method has its advantages and disadvantages. If the listener is equipped with a 'Head Tracker' that allows interactive update of the sound source positions according to the listeners position and orientation, then localisation performance is improved.