computer vision systems for the blind and visually disabled. stats 19 sem 2. 263057202. talk 3. alan...
TRANSCRIPT
Computer Vision Systems for Computer Vision Systems for the Blind andthe Blind and VisuallyVisually
DisabledDisabled..STATS 19 SEM 2. 263057202. STATS 19 SEM 2. 263057202.
Talk 3.Talk 3.
Alan Yuille.Alan Yuille.
UCLA. Dept. Statistics and UCLA. Dept. Statistics and Psychology.Psychology.
www.stat.ucla/~yuillewww.stat.ucla/~yuille
Computer Vision SystemsComputer Vision Systems
• Digital Camera + Portable Computer + Digital Camera + Portable Computer +
Speech Synthesizer.Speech Synthesizer.
(I) Input image from camera.(I) Input image from camera.
(II) Algorithm on PC searches the (II) Algorithm on PC searches the image to detect and read text.image to detect and read text.
(III) Speech Synthesizer speaks the (III) Speech Synthesizer speaks the text.text.
LED ReaderLED Reader
• LED/LCD displays are very common. LED/LCD displays are very common. But But
impossible for the Blind to use.impossible for the Blind to use.
• Controlled domain. Design system to Controlled domain. Design system to detect and read the displays.detect and read the displays.
LED Reader.LED Reader.
• Prototype System. (1999).Prototype System. (1999).
• Subjects using the LED Reader.Subjects using the LED Reader.
• Implementation using special Implementation using special purpose hardware being built.purpose hardware being built.
Blind Volunteer with Blind Volunteer with CameraCamera
• Blind volunteers take Blind volunteers take photographs. Still photographs. Still digital camera, or digital camera, or video camera.video camera.
• Automatic camera Automatic camera settings. Gain control.settings. Gain control.
• Dynamic range of the Dynamic range of the eye is far larger than eye is far larger than the range of a camera.the range of a camera.
Gain Control: Digital Gain Control: Digital CamerasCameras
• Limitation due to the quality of the Limitation due to the quality of the input images.input images.
• Blind users cannot point camera, Blind users cannot point camera, focus, adjust camera gain, or keep focus, adjust camera gain, or keep the camera steady.the camera steady.
• Enormous variation in the intensity in Enormous variation in the intensity in natural images: – range 10,000,000, natural images: – range 10,000,000,
camera range is 100. camera range is 100.
Biologically Inspired Biologically Inspired Cameras.Cameras.• Ideal: cameras with the ability of the Ideal: cameras with the ability of the
human retina: human retina:
(I)(I) Large gain control (from 100 to Large gain control (from 100 to 100,000,000).100,000,000).
(II)(II) More than 30 frames/second (to More than 30 frames/second (to decrease motion blur).decrease motion blur).
• Companies are designing cameras Companies are designing cameras with these abilities. (Carver Mead).with these abilities. (Carver Mead).
Images taken by the BlindImages taken by the Blind
Top two rows are Images taken by blind volunteers.
Bottom two rowsare images by Scientists.Scientists better
at orienting the camera and Centering text.
Experiments with Blind Experiments with Blind VolunteersVolunteers
• Experiments with Blind Volunteers. In Experiments with Blind Volunteers. In San Francisco.San Francisco.
• Experiments showed:Experiments showed:1.1. Blind volunteers could keep the camera Blind volunteers could keep the camera
approximately horizontal.approximately horizontal.2.2. They could hold it steady so there is little They could hold it steady so there is little
motion blur.motion blur.3.3. Automatic gain control was usually Automatic gain control was usually
sufficient to give good quality images.sufficient to give good quality images.
Visual Search to Detect Visual Search to Detect Text.Text.• The human visual system has mechanisms The human visual system has mechanisms
for directing “interesting parts” of images.for directing “interesting parts” of images.• Known as “Visual Attention”.Known as “Visual Attention”.• Visual attention causes eye movements Visual attention causes eye movements
and directs gaze.and directs gaze.• We need a form of visual attention to We need a form of visual attention to
detect text.detect text.• This must be fast. We want to quickly This must be fast. We want to quickly
reject non-text areas of the image.reject non-text areas of the image.
Strategy I: Twenty Strategy I: Twenty Questions.Questions.• Divide the image up into many small Divide the image up into many small
windows.windows.• Apply “filter tests” to each window.Apply “filter tests” to each window.• If the window fails the test, then eliminate If the window fails the test, then eliminate
it.it.• If it passes, then proceed to the next test.If it passes, then proceed to the next test.• Apply tests until there are only a few (1-5) Apply tests until there are only a few (1-5)
windows in the image which pass all windows in the image which pass all tests.tests.
Strategy II: Test Selection.Strategy II: Test Selection.
• Choose a vocabulary of tests. E.g. Choose a vocabulary of tests. E.g. average image brightness, local image average image brightness, local image variability.variability.
• Use a Machine Learning algorithm Use a Machine Learning algorithm “AdaBoost” to select and combine tests.“AdaBoost” to select and combine tests.
• Requires a training dataset of text and Requires a training dataset of text and non-text. (Learning with a teacher).non-text. (Learning with a teacher).
• AdaBoost combines “weak tests” into a AdaBoost combines “weak tests” into a “strong test”.“strong test”.
AdaBoost Example: Face AdaBoost Example: Face Detection.Detection.
• AdaBoost was AdaBoost was
used in Computer used in Computer
Vision to detect Vision to detect faces.faces.
• Best test: Best test:
Forehead brighterForehead brighter
than eyes.than eyes.
Example Sequence I:Example Sequence I:
• Series of tests, selected by AdaBoost.Series of tests, selected by AdaBoost.
Example II.Example II.
Results of AdaBoost.Results of AdaBoost.Strong Performance: Very High Detection Rate.
Failures of AdaBoost.Failures of AdaBoost.
• AdaBoost fails to detect some text.AdaBoost fails to detect some text.
Next Stage: Binarization.Next Stage: Binarization.
• AdaBoost detects regions of text in AdaBoost detects regions of text in windows of the image.windows of the image.
• Apply a binarization algorithm. Label Apply a binarization algorithm. Label the points within the window as the points within the window as letters/digits or as background.letters/digits or as background.
• Extend the binarization to areas Extend the binarization to areas outside the window – to include outside the window – to include letters/digits that are just outside the letters/digits that are just outside the window.window.
Results of Binarization.Results of Binarization.
Optical Character Recognition Optical Character Recognition (OCR)(OCR)
• OCR has been developed for reading OCR has been developed for reading text on documents.text on documents.
• Black and white images. High Black and white images. High resolution.resolution.
• We apply it to the binarized output of We apply it to the binarized output of AdaBoost.AdaBoost.
• OCR will read the text and reject OCR will read the text and reject regions which are not-text.regions which are not-text.
Text detected by AdaBoost, Text detected by AdaBoost, Binarized, and read by OCR.Binarized, and read by OCR.
Text detected, but not read.Text detected, but not read.Non-text detected, rejected by Non-text detected, rejected by OCR.OCR.Non-text detected, read by OCR.Non-text detected, read by OCR.
PerformancePerformance
• Can detect text within our dataset (San Can detect text within our dataset (San Francisco) with false negative rate of Francisco) with false negative rate of 2.8%.2.8%.
• We can read the detected text correctly We can read the detected text correctly at 93.0%.at 93.0%.
• Read detected non-text as text at 1.0%.Read detected non-text as text at 1.0%.
• Prototype System: room for Prototype System: room for improvement.improvement.
SummarySummary
• It will soon be practical to build It will soon be practical to build Computer Computer
Vision systems for text detection Vision systems for text detection and reading that work in and reading that work in unconstrained unconstrained
domains. domains.