ABSTRACT
It
is reported that millions of people are suffering from some kind of hearing
impairment. The number is climbing due to the increasing portion of elderly
people in the world’s population. While hearing loss is usually caused by
permanent mechanical damage to the ear, there is no medicine to reverse it, as
surgery offers little help.
Thus
the electronic hearing aid or prosthetic are the best known solutions to the
patients. Hence a neurofuzzy approach for devices that combine neural networks
and fuzzy logic to achieve optimal gain requirements is becoming the future for
hearing prosthetics.
Introduction
This article describes an endeavor to help the hearing
impaired through the use of neurofuzzy methodologies to tune prosthetic hearing
devices in an efficient and tractable manner. The integration of a graphical
user interface, a hearing aid emulator module and a fuzzy inference engine into
the framework of an intelligent tool that can be used to tune prosthetic
hearing devices is described. The graphical user interface permits the
extraction of perceptual information pertaining to the patient's aural response
to test stimuli in the form of speech patterns. This interface could
significantly reduce the role of the acoustician in fine-tuning the hearing aid
thereby reducing possible human error and facilitating a more direct
involvement of the patient in the tuning procedure. A hearing aid emulator
permits one to test the working of the entire tool in a device independent
fashion. The test speech patterns are passed through a filter bank that mimics
the frequency response of the hearing aid. Testing, simulation results and
possible future work form the remainder of the paper.
As we grow older, we cannot hear very weak sounds.
There are large differences between individuals in ear aging, but hearing loss
starts in 40's and progresses rapidly in 60's for high frequency sounds. One of
the main reasons is cochlea damage. In this case, people can not hear speech
sounds properly and it is difficult to understand what others say, especially
when speech is presented at noisy background.
Research on digital hearing aids in hearing impaired
listeners, sound pressure level above which a sound is audible (threshold of
hearing) is higher than that in normal listeners. However the sound pressure
level which is unbearably loud to hear any more is almost the same for normal
and hearing-impaired listeners. The task of old type hearing aids was only to
amplify input sounds by using analog circuits. Therefore, in such a type of
hearing aids, a loud sound is often amplified too much to hear and this type of
hearing aids was not broadly accepted. The digital hearing aid, CLAIDHA
(Compensating Loudness by Analyzing the Input-signal, Digital Hearing Aid) was
developed. In this hearing aid, digital signal processing based on an analysis
of input sounds is done. For hearing-impaired listeners, it amplifies sounds in
such a way that the sounds are nearly as loud for normal listeners. In our
laboratory, we are also researching on a new algorithm of speech enhancement
using the newest generation of digital signal processors and techniques. We are
going to apply our algorithm in a new digital hearing aid that will have
relatively low hardware requirement and process sounds in a real time.
Conceptually,
the hearing aid is nothing more than an amplifier, collecting and amplifying
environmental sounds in order to compensate for hearing loss. However the
hearing process is much too complex to be compensated by current hearing aids.
Hearing is, by its nature, an intelligent process; people try to understand received
sounds, which mean that the brain is involved in the process.
This nature makes the design of a hearing aid much more complicated than
finding a set of linear amplifiers. Hearing
aid technology has progressed tremendously in the past couple of decades. There
are 3 main categories in hearing aid technology today; analog, programmable,
and digital. The most basic of technology is the Analog system. These hearing
aids are adjusted for a person using a fine screwdriver. As you can imagine
this way of fitting hearing aid has been around for awhile. For many people
with hearing loss this type of technology works just fine.
A person with a harder to fit type of hearing loss who
requires a little more precise tuning of the hearing aid may be fit with a
programmable hearing aid. This allows the hearing professional to program the
hearing aid using a computer. This allows for more fine tuned changes in the
hearing aid program to more individualize the hearing aid. It is easier to make
small changes in the program with this technology compared to the analog
system. This technology is very beneficial for people who do rely more on their
hearing in their daily lives.
The newest technology to the hearing aid world is
digital hearing aids. With this technology, acoustic signals are transformed at
high speed and with great precision into a binary code. This allows much more
complex calculations and adjustments of the amplified signal than is possible
with the other two technologies. It gives greater flexibility in providing
individualized solutions to hearing loss. This paper also deals with digital
processing of speech as it pertains to the hearing impaired. At present the
available hearing aids lag behind the technology curve, both, in terms of
algorithm research and available hardware. A digital hearing aid normally consists of a microphone, an
analogue-to-digital (A/D) converter, a microprocessor, a digital-to-analogue
(D/A) converter and a loudspeaker. The microprocessor replaces the amplifier
circuit used in an analogue hearing and it is
designed for special purposes such as providing optimal speech
intelligibility. In digital hearing
aids, sound processing is digitalized. Thus, it is possible to refine the sound
signal, for instance by reducing noise and improving speech signals, or by
amplifying only the frequencies that the user
needs amplified. The digital circuit itself has no internal noise. The central part of a digital hearing aid is
the microprocessor, or the digital signal processor (DSP).
The
mapping of the audiogram into the target is a process to preselect the required
gains of the hearing aid for the user. It is desirable to make this guess as
close to the real requirement as possible to save effort and time in the
following fine-tuning process. One type of strategy for preselection is called
comparative technique, which was introduced by Carhart in the 1940’s. In this
approach, the audiologist has a set of established targets that are obtained
from previous experience. The user is asked to try and select the best matched
one. It is obviously a time consuming process, which makes it less and less
attractive. In contrast to the comparative technique, another paradigm, called
prescriptive procedure, is widely used.
The prescriptive procedure employs a designated formulated to asset the
gain requirement of the user. The input to the formula is the audiological data
from the patient. Many prescriptive procedures exist. One – half gain rule, in
which the optimal gains are taken at the middle point of the audiogram curve,
is one of the simplest processes. Others, like prescription of gain and output
(POGO), developed by McCandless and Lyregaard, and National Acoustics
Laboratory (NAL) procedures, also available. However, the existence of so many
procedures reveals the fact that there is no general agreement over the
question of which one is best. As a matter of fact, in order to make them easy
to apply, the formulae are oversimplifying the hearing process in some sense.
Actually, since an unknown and nonlinear relationship exists between the
hearing loss of the patient (audiogram) and the desired gain (target), it is very
difficult to derive a general formula manually. The neural network, which is a
model free architecture and capable of fitting virtually any functions, might
be the promising way to derive the formula. A neural network includes many
neurons. All the information is stored
in the forms of connection strengths (weights) between neurons. The weights can
be initially set to some random numbers that evolve during the learning
process. The learning for a neural network is a process through which the
network adjusts its weights by comparing the actual outputs and the desired
outputs under given inputs.
The target
generated by the neural network represents an overall requirement of counting a
large population of patients, which makes it impossible to be optimal for a
particular user. Instead, the user is asked to take the fine tuning step to
give an objective evaluation of the performance of the current target. Some
minor changes are expected based on user’s evaluation. However, the user’s
feelings are fuzzy and undetermined. The scores they give are not quantified
gains but the linguistic terms: bad, good, OK, etc. To rebuild the quantified
correction amounts that are needed to tune the target from the linguistic terms
is a difficult task. The major obstacle is that there are too many degrees of
freedom (too many adjustable variables), and sometimes the adjustments are
contradictory. Facing this difficulty, we try to seek help from fuzzy logic,
which is good at handling fuzzy and contradictory information. Fuzzy logic uses
linguistic terms to describe a problem and operates on them. The fuzzy logic
output can be defuzzified to obtain crisp numbers.
Methodology
In this
section, we will introduce a methodology combining neural networks and fuzzy
logic to be used in next generation hearing aids.
Systematic view
The whole
picture of our approach shown in the figure for the first step, hearing loss
data is collected by an audiologist the data drive a trained neural net to
generate initial targets. Initial targets are the first guess of the real gains
requirement of the user. It is worth pointing out that the response of humans
to environmental sounds is not a simple linear function of the sound level. A
patient with hearing loss may complaint that he/she finds a soft sound to be
too soft to perceive but does not have any problems with normal or loud sounds.
Therefore
he/she needs more gain for soft sounds and little or even no gain for normal
and loud sounds, which implies that the target (gains) should be dependent on
the input sound level. We employ three sets of targets that correspond to soft,
normal, and loud sounds respectively. The initial targets are programmed into
the hearing device some hearing aid parameter setting algorithms. The user is
then asked to listen to specific speech and sound stimuli with the programmed
hearing aid. Then the user evaluates the performance based on six subjective
indexes: loudness, tone, clarity, comfort, distortion and noise. The user is
asked to give a quantitative measurement (between 0 and 10, for example) for
each index. If the user is satisfied with all these aspects, the process is
done. Otherwise, a fine tuning algorithm will be invoked to adjust the target
toward the user’s desired direction. This new, finer target is then programmed
and evaluated by the user. Further tuning is performed based on the user’s
evaluation. This process continues until the user’s full satisfaction is
achieved.
Neural Network for Target Generation
A neural
network is being used in this approach to generate initial target for the user.
As stated earlier, the comparative approach for selecting the target is
inefficient because numerous alternatives are available. The prescriptive
procedures are oversimplified. They are developed for generic purposes, which
implies that they are not adaptive. It is difficult for physicians to modify
these formulae to fit their specific situations, which is a fatal weakness of
the prescriptive procedures. On the other hand, a neural network is a universal
function-fitting tool; it can fit virtually any function as long as sufficient
examples are provided. A three layered neural net is used in our approach.
The input
layer has five neurons that correspond to hearing loss at 250 Hz, 500 Hz, 1
kHz, 2 kHz and 4 kHz respectively. The output layer has eight neurons that
corresponds to the hearing loss at 250Hz, 500Hz, 750Hz, 1 kHz, 1 .5 kHz, 2 kHz,
3 kHz and 4 kHz respectively. The number of neurons in a hidden layer may vary
from 20-40, which is determined by actual situations. The activation function
in the hidden layer is a nonlinear sigmoid function, and the output layer is
linear function.
Two modes
exist in the operation of the neural network; learning and simulation. In the
learning mode, the neural network is given a set of training data that are
abstracted from previous successful cases. The weights inside the network are
adjusted to minimize the error between the desired and actual output. The
training data consists of the patient’s hearing loss as input and the actual
target as desired output. This data is obtained from either some public medical
database or a physician’s routine practices. Through this approach, it becomes
easy for a physician to incorporate his/her experience into the system +by
training the neural network with his/her successful cases. After a certain
number of epochs of learning, the configuration of the network should reach an
optimal state for all given training data. Thus, the network is ready to solve
the target for a specific hearing loss- the initial target for the user to
evaluate. Three targets are generated for soft, normal, and loud sound,
respectively.
What is fuzzy logic?
Fuzzy logic is a superset of conventional (Boolean)
logic that has been extended to handle the concept of partial truth -- truth
values between "completely true" and "completely false".
Dr. Lotfi Zadeh of UC/Berkeley introduced it in the 1960's as a means to model
the uncertainty of natural language. Zadeh says that rather than regarding
fuzzy theory as a single theory, we should regard the process of
``fuzzification'' as a methodology to generalize ANY specific theory from a
crisp (discrete) to a continuous (fuzzy) form (see "extension
principle" in [2]). Thus recently researchers have also introduced
"fuzzy calculus", "fuzzy differential equations", and so on
(see [7]). Fuzzy Subsets: Just as there is a strong relationship between
Boolean logic and the concept of a subset, there is a similar strong
relationship between fuzzy logic and fuzzy subset theory. In classical set
theory, a subset U of a set S can be defined as a mapping from the elements of
S to the elements of the set {0, 1}, U: S --> {0, 1} This mapping may be
represented as a set of ordered pairs, with exactly one ordered pair present
for each element of S. The first element of the ordered pair is an element of
the set S, and the second element is an element of the set {0, 1}. The value
zero is used to represent non-membership, and the value one is used to
represent membership. The truth or falsity of the statement x is in U is determined
by finding the ordered pair whose first element is x. The statement is true if
the second element of the ordered pair is 1, and the statement is false if it
is 0.Similarly, a fuzzy subset F of a set S can be defined as a set of ordered
pairs, each with the first element from S, and the second element from the
interval [0, 1], with exactly one ordered pair present for each element of S.
This defines a mapping between elements of the set S and values in the interval
[0, 1]. The value zero is used to represent complete non-membership, the value
one is used to represent complete membership, and values in between are used to
represent intermediate DEGREES OF MEMBERSHIP. The set S is referred to as the
UNIVERSE OF DISCOURSE for the fuzzy subset F. Frequently, the mapping is
describe as a function, the MEMBERSHIP FUNCTION of F. The degree to which the
statement x is in F is true is determined by finding the ordered pair whose
first element is x. The DEGREE OF TRUTH of the statement is the second element of
the ordered pair. In practice, the terms "membership function" and
fuzzy subset get used interchangeably.
Many decision-making and problem-solving tasks are too complex to be
understood quantitatively; however, people succeed by using knowledge that is imprecise
rather than precise. Fuzzy set theory, originally introduced by Lotfi Zadeh in
the 1960's, resembles human reasoning in its use of approximate information and
uncertainty to generate decisions. It was specifically designed to
mathematically represent uncertainty and vagueness and provide formalized tools
for dealing with the imprecision intrinsic to many problems. By contrast,
traditional computing demands precision down to each bit. Since knowledge can
be expressed in a more natural by using fuzzy sets, many engineering and
decision problems can be greatly simplified.
Fuzzy set theory implements classes or groupings of data with boundaries
that are not sharply defined (i.e., fuzzy). Any methodology or theory
implementing "crisp" definitions such as classical set theory,
arithmetic, and programming, may be "fuzzified" by generalizing the
concept of a crisp set to a fuzzy set with blurred boundaries. The benefit of
extending crisp theory and analysis methods to fuzzy techniques is the strength
in solving real-world problems, which inevitably entail some degree of
imprecision and noise in the variables and parameters measured and processed
for the application. Accordingly, linguistic variables are a critical aspect of
some fuzzy logic applications, where general terms such a "large,"
"medium," and "small" are each used to capture a range of
numerical values. While similar to conventional quantization, fuzzy logic
allows these stratified sets to overlap (e.g., a 85 kilogram man may be
classified in both the "large" and "medium" categories,
with varying degrees of belonging or membership to each group). Fuzzy set
theory encompasses fuzzy logic, fuzzy arithmetic, fuzzy mathematical
programming, fuzzy topology, fuzzy graph theory, and fuzzy data analysis, though
the term fuzzy logic is often used to describe all of these.
Fuzzy Logic in the Fine Tuning Process
Fuzzy
logic tuning plays an important role in the fitting process because success in
the first try is nearly impossible. The inputs to the tuning process are user
evaluation and the target being evaluated. The output is a finer target for
further evaluation. Several factors favor fuzzy logic in the tuning process
over other approaches. First, the user’s evaluations are fuzzy. Second, the
rules used to fine-tune are described using linguistic terms. Third, several
adjustments that are required at the same time might be contradictory.
The user
is first exposed to three different sound environments: soft, normal, and loud
sounds separately. He/she is then asked to give quantitative evaluations on
loudness, tone, intelligibility, comfort, distortion, and noise. These
evaluations, normalized between -1 and 1 or 0 and 1, are then fuzzified using
the membership functions.
The
output variables are modifications to the target at three different
frequencies: high frequency (HF), medium frequency (MF), and low frequency
(LF). The crisp outputs, which are the actual modification to the targets, must
be defuzzified using the following membership functions.
The rules that are used to fine-tune
the target are from theoretical considerations or clinical experiences. Samples
of these rules include:
v If
(loudness is LOUD), then (DECREASE LF, DECREASE MF, DECREASE HF).
v If
(tone is BRIGHT), then (DECREASE HF).
v If
(tone is DULL), then (INCREASE LF).
v If
(loudness is NORMAL )
and (intelligibility is BAD) then (HF is INCREASED SLIGHTLY).
v If
(distortion is DISTURBING) and (intelligibility is BAD) then (INCREASE LF
SLIGHTLY).
Most rules will be provided by factories,
yet the physician still has the freedom to add his/her own rules into the rule
base.
One important issue is related to
the convergence of the rule base. Due to the shortcoming of the rules and the
inaccuracy in the user’s evaluations, the fine tuning process may not converge
to an optimal result. In order to address this problem, we also designed a set
of rules to guide the tuning process. The rules are capable of adjusting the
tuning step adaptively. Some samples include:
v
If (gain is CURRENTLY INCREASED) and (gain
was PREVIOUSLY INCREASED), then (SLIGHTLY INCREASE adjustment step).
v
If (gain is CURRENTLY INCREASED) and (gain
was PREVIOUSLY INCREASED) and (adjustment step was PREVIOUSLY SLIGHTLY
INCREASED), then (MODERATELY INCREASE adjustment step).
v
If (gain is CURRENTLY INCREASED) and (gain
was PREVIOUSLY DECREASED), then (DECREASE adjustment step).
Simulation
and Results
In order to model the effect of a
hearing aid, the input signal is passed through a filter that emulates the
frequency response of the hearing aid. This eliminates the device –dependent
nature of the process of evaluation of the tool while permitting testing
without a prosthetic hearing device .The filter parameters are determined using
the data that the user enters while characterizing the speech sample that
he/she hears.
The
emulator is essentially a finite-impulse response (FIR) filter constructed
using the desired output as given by the fuzzy engine. The filter is shaped to
avoid having sharp gradients, which is actually close to how a real hearing aid
performs. Having a smooth response curve prevents the sound from being suddenly
distorted at some frequencies.
Once the filter shape is determined,
the speech sample (with the loss imposed) is passed through this filter and
played to the user using a set of speakers.
Tests were performed to determine
the functionality of the system. Through the process of testing, it was
discovered that a variable rate of approach to the final state may be necessary
since, typically, large changes need to be made in the first few iterations
while progressively smaller changes are needed as the number of the iterations
increase. To this end, a velocity factor (0 < a
<1) has been incorporated into the target update process. The effect of this
may be illustrated by considering
Tk+1
= Tk + aDT
Here, Tk represents the
target vector at iteration k, and DTk represents the change in the
target vector at iteration k as given by the fuzzy engine. The value of a
may be kept static through all the iterations or varied as the number of
iteration increase. A decrease in value of a
as the number of iterations increases should yield a faster convergence since
this reduces the probability of overshoot of the final state.
Also, a constant multiplicative gain
factor C is used to change the effect of the target on the hearing aid emulator
frequency response. This effect is illustrated in
Gk
= CTk
Here, G is the gain vector of the hearing aid at
iteration k, Tk is the target vector at iteration k and C is the
multiplicative constant.
The frequency simply measures the number of waves that travel by in each
unit of time. It is usually measured in hertz; 1 hertz is 1 cycle per second.
(1 kHz = 1000 Hertz). The range of human hearing is from 20 Hz to 20,000 Hertz.
Intensity is a measure of the power in a sound, as it contacts an area
such as the eardrum, and is directly proportional to the square of the
amplitude of the waveform. Intensity is expressed as power per unit area and
measured in Watts per square meter. The
decibel scale is logarithmic and provides the most convenient physical measure
of intensity. One bel (named after Alexander Graham Bell) is defined as the
ratio between two sounds whose intensities have a ratio of 10:1, and a decibel
(dB) is one tenth of a bel. Note that the Decibel is a unit for sound level
differences between two sounds. In acoustics, dB is often used as an absolute
measure of intensity - in this case, it is a measurement relative to the
threshold of hearing of 10 (power -12) Watts
per Sq. meter. Here are some sample intensity levels in decibels:
Threshold
of hearing 0
Leaves
rustling in the breeze 20
A
quiet restaurant 50
Busy
Traffic 70
Vacuum
cleaner 80
Threshold
of pain 120
Jet
at takeoff 140
Conclusions
Neural networks and fuzzy logic are
powerful tools for next generation hearing prosthetics. A neural network, as a
function fitter to map the hearing loss to desired gains requirements, provides
many benefits over other approaches. The network is able to learn dynamically
through experience. It is open and expandable ---- a physician can easily
incorporate new knowledge into the system. Fuzzy logic, on the other hand, is
an indispensable tool for the tuning process. It builds a direct and reasonable
link between a user’s subjective evaluation and actual required modifications
to the gain targets. Again, physicians are free to add new rules to the rule
base in reflection of specific needs and patterns.
The presented neurofuzzy approach
helps hearing prosthetic devices not only in an offline fitting process, but
also in online operations. Next generation hearing prosthetics will differ from
current devices in that they will be more than a mechanical one. The effects of
the brain should be taken into account in order to design a successful hearing
aid. For example the hearing aid should be situation dependent; it should be
capable of evolving or adapting. The neurofuzzy approach presented makes these
features possible.
No comments:
Post a Comment
leave your opinion