Fundamentals of 3D Audio
Genuine 3D audio is in its simplicity what we hear every day.
When we listen we perceive the different sound sources around us,
e.g. a bird in a tree or a car driving by. We perceive the
direction and distance to the sound source and how it moves i.e.
the static and dynamic properties of the sound source. We also
perceive information of the environment we are in e.g.
outside/inside, small/large room. This information is generated by
sound reflecting from surfaces in the environment we are in e.g.
walls, ceiling, floor, objects etc. The first reflections we
perceive - the 'Early Reflections' -
have influence on our direction and distance perception, while the
late reflection - the 'Reverberation'
- mainly give us information of the environment, e.g. being in a
living room or outside.
Binaural Technology
The idea of producing 3D audio is to shift the experience in
time by recording/playback, or create a fictitious experience by
synthesis. The fundamental technique used to do this is called
'Binaural Technology'. The theory of
binaural technology is:
'If the sound pressure at the eardrums are recorded, and
later reproduced exactly, then the listener will perceive the sound
as if he/she had been present when the signals were
recorded'
If the signals are recorded, it is called 'Binaural Recording' and if the signals are
artificially produced, it is called 'Binaural Synthesis' or 'Binaural Simulation'. The signals recorded
or synthesised are called the 'Binaural
Signal', i.e. the signal to be reproduced at the two ears.
This last step to reproduce the binaural signal at the two ears is
called 'Binaural Reproduction'.
Binaural Recording
Binaural recording can be done by using an 'Artificial Head' or 'Dummy Head'. The artificial head is a
replication of an average human in term of its acoustic
properties.
Binaural Synthesis
Binaural synthesis is at lot more complex. Theory of binaural
synthesis is:
'If the sound transmission from a point and to the two ears can
be synthesised, the listener would perceive the sound as in real
life'
The complete transfer function is called the 'Binaural Room Impulse Response (BRIR)', and
this sound transmission can be divided into two parts, which both
have significant influence on how the sound is perceived:
- The first part relates to the influence of the environment (the
room)
- The second part relates to the influence of the human body
The first part maps the transmission from source point to the
point at the centre of the head i.e. the 'Room Impulse Response'. The second part
maps how the human body affects the sound coming from different
directions to produce the signals at the two ears i.e. the 'Head-Related Transfer Function (HRTF)'. If
these transfer functions can be determined for each sound source,
we can in principle synthesise any situation by using the
superposition principle to generate the binaural signal for the two
ears. The accuracy of our location when listening to the
synthesised binaural signal mainly depends on how well our acoustic
properties match the Head-Related Transfer Functions used. As it
requires a huge amount of computational power to determine the room
impulse response, this part is most often replaced by a 'simple'
reverberation algorithm.
Binaural Reproduction
The binaural reproduction of the
recorded or synthesised binaural signal can be done by use of
either headphones or loudspeakers. The best performance is obtained
using headphones, but each method has its advantages and
disadvantages. If the listener is equipped with a 'Head Tracker' that allows interactive
update of the sound source positions according to the listeners
position and orientation, then localisation performance is
improved.