today- oct. 26thocw.snu.ac.kr/sites/default/files/note/lecture (11)_3.pdf · vocoder vocoder (voice...
TRANSCRIPT
Neural Prosthetic Engineering NB
Today- Oct. 26th
Question on projects?
Review
Telemetry
Formant based Speech Processing
Speech processing Strategies –continued
CA
Lessons learned
CIS
CIS based
Fine Structure
1
Neural Prosthetic Engineering2
Review
Neural Prosthetic Engineering3
telemetry
Neural Prosthetic Engineering NB
Data Telemetry – Inductive Link
Downlink using PWM scheme
Voltage at Implanted Coil
Recovered Data
Generated Biphasic Current
Regulator, Rectifier
Envelope Detector/ Comprator
Load
PWM data
Modulated & Amplified Signal Received Signal
Recovered Data
Recovered Power
Data path
Power path
Neural Prosthetic Engineering NB
Backward telemetry
Gain = 1
+
-+
-
AMPCOMPVref
schematic
s1
s2
s3Vs
out
time
Vref
s1
s2
s3
out
Vs
Vin
Rectifier/regulator
Envelope Detector
Load
PWM data
Modulated & Amplified
Signal Received Signal
Recovered Data
Recovered Power
ON!AMP+
-
Neural Prosthetic Engineering
Strategies for Representing Speech Information
with Cochlear Implants
6
Neural Prosthetic Engineering NB
Making of voice: Vocal Tract
Hard PalateVelum (Soft Palate)
Larynx
Glottis
Vocal Folds
Alveolar Ridge
Nostril
Lips
Nasal Cavity
Teeth
Tongue
Vocal fold
Trachea
Ventricular fold
Aryepiglottic fold
• Sound source is in the Larynx (Vocal Fold)• The vocal tract is the cavity where sound is filtered.• The vocal tract consists of the laryngeal cavity, the pharynx, the oral cavity, and
the nasal cavity.• The average length of the vocal tract in adult humans is 17 cm (male) and 14
cm (female).
Neural Prosthetic Engineering
Vocal fold at low and high pitches
http://www.vowelsandconsonants3e.com/chapter_2.htmlhttps://www.youtube.com/watch?v=v9Wdf-RwLcs
120 Hz and 200 Hz
Neural Prosthetic Engineering
Sound: Voiced or unvoiced
• Voicing means air is forced into the vocal tract.
• All the vowels are voiced sounds.
• Consonants are voiced or unvoiced sounds.
• Voiced sounds are resonant (vibrant).
• Unvoiced sounds are noisy.
Neural Prosthetic Engineering
Articulation in Vocal Tract
• Place of articulation• Where the vocal tract is shut off
or narrowed• Manner of articulation
• How the vocal tract is articulated• Voicing
• Whether air is forced through the larynx
Neural Prosthetic Engineering NB
Articulation for Vowels
Place of the articulation: High(u), Mid(o), Low(a)
Shape of the lips: Rounded (o) or not (i)
Wikipedia, Wikimedia 2016
Neural Prosthetic Engineering NB
Articulation for Consonants
Stop (plosive): A stop is a consonant in which airflow is completely blocked for a short time [p], [t], [k] / [b], [d], [g]
Nasals: made by lowering the velum and allowing air to pass into the nasal cavity [m], [n], [η]
Fricative: airflow is constricted but not cut off completely. [s]/[z]
Affricative: Stops that are followed immediately by fricatives [ts]/[dj]
Liquid –consonants in which the tongue produces a partial closure in the mouth, resulting in a resonant, vowel-like consonant, [l], [r]
Glide –consonants with no stop or friction which consist of a glide (a quick, smooth movement) towards a following vowel. [w], [y]
Neural Prosthetic Engineering NB
Formants in spectrogram
13
• Distinctive frequency components
of the sound
• Peaks in the amplitude/frequency
spectrum (spectrogram)
• The formant with the lowest
frequency is called F1, the
second F2, and the third F3.
• Most often the two first
formants, F1 and F2, are enough
to disambiguate the vowel.
• An interactive demonstration of
this can be found here.• http://auditoryneuroscience.com/topics/two-
formant-artificial-vowels
Neural Prosthetic Engineering NB
Formants of consonants
14
• Nasal and Liquid consonants
have added formant (F3) at
higher frequencies
• Plosives and Fricatives modify
the placement of formants of
the vowels
• Bilabial sounds (b, p) cause
lowering of the formants
• Velar sounds (k and g)
show F2 and F3 coming
together
• Albeolar sounds (t and d)
cause less systematic
changes in neighboring
vowel formants
Neural Prosthetic Engineering NB
Formants
• The component sounds that build up the phrase "A bird in the hand is worth two in the bush".
http://www.vowelsandconsonants3e.com/chapter_7.html#
Neural Prosthetic Engineering NB
Frequencies of sounds
• C1 32.7 Hz (lowest C on a standard 88 key piano)• C4 261.64 Hz (middle C on 88 key piano)• C6 1046.50 Hz (Highest note reproducible by the average female
human voice)• C8 4186 Hz (highest note on 88 key piano)
https://www.youtube.com/watch?v=qNf9nzvnd1k
Neural Prosthetic Engineering
Sound Waveforms:Voiced or unvoiced
40 msec viewhttp://clas.mq.edu.au/speech/acoustics/waveforms/speech_waveforms.html
Neural Prosthetic Engineering NB
Vocoder
Vocoder (voice coder)
invented by Dudley in the 1930s
a means of reproducing an intelligible facsimile of a voice for recorded messages on telephone systems
Analysis (encoding) stage / decoding (synthesis) stage
A limited set of parameters from speech input in the analysis part transmitted to the receiver
The information rate required for transmission of the parameters is much less than that required for transmission of the unprocessed speech signal
Neural Prosthetic Engineering NB
Model for Voice Coding
Vocal Tract
Random Noise Generator
PeriodicWaveGenerator
Voiced sound Unvoiced Sound
Fundamental
Frequency
Neural Prosthetic Engineering NB
Channel Vocoder : analysis part
Bandpass Filter
RectifierLowpass
FilterA/D
Bandpass Filter
RectifierLowpass
FilterA/D
Pitch Detector
Voicing Detector
Multip
lexe
r
Speech Channel
n channels
Fundamental
Frequency
• Voicing detector determines whether the sound is voiced or not• Pitch detector determines the frequency of the glottal openings for the
voice sound• Configuration of the vocal tract is found with a band of bandpass filters
and envelopment detector (low pass filters).• This analysis provides information of the vocal tract at 5-30 msec interval.
Neural Prosthetic Engineering NB
Channel Vocoder: synthesis part
D/A
D/A
Dem
ultip
lexe
r
Channel
Voicing Information
Pitch (Fund.Freq) F0
n channels
Bandpass Filter
Bandpass Filter
Noise Source
VoiceSource
• A synthesized speech signal is formed by summing the outputs of the band pass filters.
• Voicing information is a binary indication.• Each output is a smoothed envelop energy.
Neural Prosthetic Engineering
Speech Processing Strategies
Neural Prosthetic Engineering NB
Formant based speech Processing Strategies
23
Vocoder theory and models played major roles in the
early designs.
Fundamental Frequency (F0) and two formants (F1 and
F2) are used
F0 is the fundamental frequency and determines the
stimulation rate
F1 gives information about vowels
F2 gives information about consonants
Neural Prosthetic Engineering NB
Speech Processing Strategies – F0/F1/F2
300-1000 Hz Filter
Zero-Crossing Detector
Envelope Detector
270 Hz Low-Pass
Zero-Crossing Detector
1000-3000 Hz Filter
Zero-Crossing Detector
Envelope Detector
Pulse Generator
Pulse Generator
Pulse Rate
AGC
MIC
Automatic
Gain Control
F1
A1
F0
F2
A2
(Apex)
(Base)
P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)
Neural Prosthetic Engineering NB
MPEAK Speech Processing Strategy
25
In addition to formant information, MPEAK extracts channels of higher frequency information from speech
MPEAK as well as F1/F2 strategies, tend to make errors in formant extraction in noisy environment
Neural Prosthetic Engineering NB
Speech Processing Strategies – MPEAK
4-6 kHz Filter
Envelope Detector
800-4000 Hz Filter
Zero-Crossing Detector/Envelope Detector
Pulse Generator
Pulse Rate
AGC
MIC
Automatic
Gain Control
F0
2.8-4 kHz Filter
Envelope Detector
2-2.8 kHz Filter
Envelope Detector
300-1000 Hz Filter
Zero-Crossing Detector/Envelope Detector
270 Hz Low-Pass
Zero-Crossing Detector/Envelope Detector
Electrodes
F1
A1
F2
A2
Electrode 7
Electrode 4
Electrode 1
P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)
Neural Prosthetic Engineering NB
Recent Speech Processing Strategies
27
Compressed Analog (CA)
Continuous Interleaved Sampling (CIS)
ACE and SPEAK (Cochlear)
Harmony HiRes Virtual Channels (Clarion)
Neural Prosthetic Engineering NB
Speech Processing Strategies - CA
AGC
MIC
Automatic
Gain Control
1
2
4
3s(t) s’(t)
x(t) i(t)
Band-Pass
Filter
Current
Source
Frequency (kHz)0.1 1.0 10
-12
-16
4
-8
-4
0
Magnitude
in d
B
1 2 3 4
B. Wilson et al., (Nature, 1991)
Neural Prosthetic Engineering NB
Lessons learned
29
Lessons learned from the formant-based strategies and the CA strategy.
The amount of information perceived by CI users is much less.
Perception of electrical stimuli is different from acoustic stimuli.
Pitch saturation limit= typically around 300 pulses/s for electrical pulses or 300 Hz for electrical sinusoids. Higher rates or frequencies do not produce increases in pitch.
In normal hearing, different pitches are heard over much wider ranges of rates or frequencies (up to ~5KHz), probably through combinations of rate and place cues (‘Volley’ theory and Place theory) .
Neural Prosthetic Engineering NB
Theories
30
Place Code Theory
Time (Rate) Code Theory
Volley Theory
WilipediaFile:Volley Principle of Hearing.png
Neural Prosthetic Engineering NB
CIS (Continuous Interleaved Sampling)
31
Pulsatile processing
Biphasic pulse trains are delivered the electrodes in a non-simultaneous (interleaved) pattern.
No Patent
Commercial devices use modified version of CIS
Neural Prosthetic Engineering NB
Speech Processing Strategies - CIS
Pre
-am
p
BPF 1Rect./L
PFNonlinear Map
BPF nRect./L
PFNonlinear Map
X
X
EL-1
EL-n
Linear Filter
Band
Band
EnvelopeCompression Modulation
B. Wilson et al., (Nature, 1991)
Neural Prosthetic Engineering NB
Speech Processing Strategies – n of m ,SPEAK, ACE
MIC
Pre-amp
“n-of-m” map :
Select n peaks
from m bands in a frame
X
X
V/I
V/I
Band-Pass
Filters
Envelope
Extraction
Amplitude
Compression
Pulses
Current
Source
Electrodes
m inputs n outputs
F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)
Neural Prosthetic Engineering NB
Speech Processing Strategies – n of m ,SPEAK, ACE
• The pre-processing is similar to the CIS strategy
• N-of-m strategy has greater number of bandpass
filters
• The SPEAK strategy selects 6–8 largest peaks and
has a fixed 250 Hz per channel rate
• The ACE strategy has a larger range of peak
selection (8-12) and higher rate (900-1200 Hz) than
the SPEAK strategy
F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)
Neural Prosthetic Engineering NB
Summary
35
Speech Processing Strategies advance with time Formant based CA CIS Need to implement finer features (more detailed
sounds)• Tonal languages• Music
Neural Prosthetic Engineering NB
Discussion: Fine Structure Representation
36
Typical Frequency range of CI frequency filters: 300-8000Hz Normal audible frequency range: 20- 20,000Hz
Low frequency cues (20-50Hz) give prosody information (stress, syllabification)-”Envelope Cues”
Mid frequency cues (50-500Hz) give segmental information such as consonant manner, voicing, and intonation-”Periodicity Cues”
High frequency cues (600-10,000Hz) gives consonant place and vowel quality- “ Fine Structure Cues”
Advanced Bionics HiRes is an example of Speech Processing Strategy intended to provide better Fine Structure Cues
HiRes sample temporal fluctuations up to 2800 Hz across 16 channels 16 independent current sources enable simultaneous analog stimulation (SAS) as well as
CIS “current Steering” provides virtual channel capability (HiRes 120= 15 channels times 8
spectral bands per channel)
[1] HiResolutin Sound Processing, by Jill.B.Firszt, www.advancedbionics.com [2} HiRes Fidelity 120 Sound Processing, Advanced Bionics Technical Report,
www.advancedbionics.com [3] Rosen, Temporal information in speech and its relevance for cochlear implants,
Cochlear Implnat: Acquisition and controversies, ed. B Fraysse, N. Couchard, pp3-26 (1989)
Neural Prosthetic Engineering NB
Related Videos
Hearing CI
https://www.youtube.com/watch?v=00WOao4kpwM
CI simulations
https://www.youtube.com/watch?v=iwbwhfCWs2Q
A day of a CI user
https://www.youtube.com/watch?v=pk_7MVqpnIk
37