Auditory scene analysis is the process by which we perceive the distance, direction,
loudness, pitch, and tone of many individual sounds simultaneously.
Analyzing auditory scenes is a complex human ability. Our environment
surrounds us with constant sound. Even the smallest vibrations and
echoes help us to identify our surrounding area. Sounds in a small area
produce fewer echoes than sounds in a large area. Physical properties of an object can
also be determined by sounds the object makes. When a ball is dropped onto a soft surface, it makes a
different sound than it would if dropped onto a hard surface. As you
walk across the floor you can hear the change in the sound of your footsteps when you cross from a carpeted area
onto a tiled surface.
The simplest way in which we can determine the location of the source of a sound is by comparing the intensity of the sound in our ears. If we hear a greater intensity (a louder sound) in the right ear, we know that the sound is coming from somewhere to our right. Conversely, a sound that is louder in the left ear than in the right is identified as coming from our left. We can also use the overall intensity of a sound (the combined intensity of the sound reaching the left ear and the sound reaching the right) to determine the proximity of the source of the sound. Simply put, a soft sound is determined to be coming from farther away than a louder sound. Both the comparison of left and right ear receptions and the evaluation of the sound's intensity are done automatically, without any conscious thought, allowing us to quickly and easily identify the approximate location of the origin of a sound.
We can further pinpoint a sound's position in space by using the ear-body-brain
combination to decode localization cues. Localization cues are divided
into two categories. There are dynamic cues, such as vision,
reverberation, early echo response, and head motion. For example, sounds that originate close to us produce relatively few echoes compared to those that originate farther away. There are also
static cues: shoulder echo, pinna response, head shadow, and interaural
time difference. The pinna response refers to the fact that the pinna filters out certain frequencies of sound depending on the direction from which the sound comes. Sounds coming from the back may, for example, have their 1000 Hz frequencies filtered out by the back of the pinna. We perceive this as a subtle change in the quality of a sound, but we are used to having sounds coming from behind us filtered in this way. Because of this we are able to use this change in quality as a way to determine if the sound comes from in front of us, below us, behind us, or over us.
The recording of sounds has progressed from simple to more complex
levels in an attempt to replicate the way humans perceive sound. Early
monophonic recordings progressed to stereo. Newer technologies, such as
3D sound and other advances in the digital era, are refining the process
further. These recordings, however, are still crude imitations of
the process by which the human ear receives and understands sound.