half of the human brain is devoted directly or indirectly ......the era of machines that see:...
TRANSCRIPT
© 2014 Embedded Vision Alliance 1
“Half of the human brain is devoted
directly or indirectly to vision.”
– Paraphrased from Prof. Mriganka Sur, MIT
© 2014 Embedded Vision Alliance 2
Jeff Bier, Founder, Embedded Vision Alliance / President, BDTI
Presentation for Cubic Corporation, November 10, 2014
The Era of Machines That See:
Opportunities, Challenges, and Trends in
Embedded Vision
© 2014 Embedded Vision Alliance 3
Computer vision: systems that extract meaning from visual inputs
Computer vision has been an
active research field for decades,
with limited commercial applications
Embedded vision: the practical, widely
deployable evolution of computer vision
• Applications: industrial, automotive,
medical, defense, retail, gaming,
consumer electronics, security, education, …
• Embedded systems, mobile devices, PCs and the cloud
Computer Vision Embedded Vision
© 2014 Embedded Vision Alliance 4
The proliferation of embedded vision is enabled by:
• Hardware: processors, sensors, etc.
• Software: tools, algorithms, libraries, APIs
Why is Embedded Vision Proliferating Now?
© 2014 Embedded Vision Alliance 5
0
5000
10000
15000
20000
25000
30000
1996 1998 2000 2002 2004 2006 2008 2010 2012
MM
AC
s/s
ec
on
d
Year
DSP Performance: High-end, Single-core DSPs from TI
Source: BDTI Analysis
10 GMACs/
second
Enabling Embedded Vision:
Processor Performance
© 2014 Embedded Vision Alliance 6
Most machines are useful only to the extent that they interact with the
physical world
Visual information is the richest source of information about the real
world: People, places, and things
Vision is the highest-bandwidth mode for machines to obtain info from
the real world
Embedded vision can:
• Boost efficiency: Improving throughput and quality
• Enhance safety: Detecting danger and preventing accidents
• Simplify usability: Making the “user interface” disappear
• Fuel innovation: Enabling us to do things that were impossible
The Highest Bandwidth Input Channel
© 2014 Embedded Vision Alliance 7
What Does Embedded Vision Enable?
© 2014 Embedded Vision Alliance 8
Augmented Reality for Realistic Simulations
www.youtube.com/watch?v=v_bo1m7kXgQ
© 2014 Embedded Vision Alliance 9
Augmenting Human Capabilities: OrCam
Visual Interpreter for the Sight Impaired
www.youtube.com/watch?v=ykDDxWbt5Nw
© 2014 Embedded Vision Alliance 10
Dyson 360 Robot Vacuum
www.youtube.com/watch?v=oguKCHP7jNQ
© 2014 Embedded Vision Alliance 11
Touch+ Makes Any Surface Multi-touch
www.youtube.com/watch?v=v_bo1m7kXgQ
© 2014 Embedded Vision Alliance 12
Embedded Vision Increases
Human Productivity and Safety
www.youtube.com/watch?v=9Wv9k_ssLcI
© 2014 Embedded Vision Alliance 13
Mercedes: www.youtube.com/watch?v=WGgSyA8HXyY
Philips: www.youtube.com/watch?v=2M7AFoqJyDI
IKEA: www.youtube.com/watch?v=DhbHnec4se0
LEGO: www.youtube.com/watch?v=mUuVvY4c4-A
www.youtube.com/watch?v=Td7cKB2BxIo
Amazon: www.youtube.com/watch?v=bnqnvL8B0k0
www.youtube.com/watch?v=8gy5tYVR-28
Stanley: www.youtube.com/watch?v=orTO3E0Vvok
Audi: www.youtube.com/watch?v=2YqflcbCVZg
Tesco: www.youtube.com/watch?v=bMCw7-lYUKw
Major League Baseball: bit.ly/1qylyRI
CENTR Cam: vimeo.com/91037496
More Videos for Later
© 2014 Embedded Vision Alliance 14
How Does Embedded Vision Work?
© 2014 Embedded Vision Alliance 15
Data reduction
Data reduction
How Does Embedded Vision Work? How Does Embedded Vision Work?
© 2014 Embedded Vision Alliance 16
A simplified embedded vision pipeline:
Typical total compute load: ~10-100 billion operations/second
Loads can vary dramatically with pixel rate and algorithm complexity
How Does Embedded Vision Work?
Segmenta-tion
Object Analysis
Heuristics or Expert System
Image Acquisition
Image Pre-
processing
Feature Detection
Ultra-high data rates;
low to medium
algorithm complexity
High to medium data
rates; medium algorithm
complexity
Low data rates;
high algorithm
complexity
How Does Embedded Vision Work?
© 2014 Embedded Vision Alliance 17
A simplified embedded vision algorithm: A typical embedded vision algorithm:
Vision Algorithm Development Challenges
Segmenta-tion
Object Analysis
Heuristics or Expert System
Image Acquisition
Lens Correction
Image Pre-
processing
© 2014 Embedded Vision Alliance 18
• Infinitely varying inputs in many applications…
• Uncontrolled conditions: lighting, orientation, motion, occlusion
• Leads to ambiguity…
• Leads to the need for complex,
multi-layered algorithms to extract
meaning from pixels
• Plus:
• Lack of analytical models means
exhaustive experimentation is required
• Numerous algorithms and algorithm
parameters to choose from
• It’s a whole-system problem
What Makes Embedded Vision Hard?
www.selectspecs.com
© 2014 Embedded Vision Alliance 19
• Most vision applications involve high data rates and complex algorithms
• For vision to be widely deployed, it must be implemented in many
designs that are constrained in cost, size, and power consumption
• These constraints, combined with high performance and bandwidth
demands, create challenging design problems
• Algorithms are diverse and dynamic, so fixed-function compute engines
are less attractive
• Modern embedded CPUs may have the muscle, but are often too
expensive or power-hungry
• Many vision applications require parallel or specialized hardware
• E.g., DSP, GPU, FPGA or other co-processor
• Most product creators lack experience with embedded vision
Implementing Embedded Vision
is Challenging
© 2014 Embedded Vision Alliance 20
Example: Lane Marking Detection
© 2014 Embedded Vision Alliance 21
Detect lane markings on the road and warn when car veers out of the lane
The “textbook” solution:
• Acquire road images from front-facing camera (often with fish-eye
lens)
• Apply pre-processing (primarily lens correction)
• Perform edge detection
• Detect lines in the image with Hough transform
• Determine which lines are lane markings
• Track lane markings and estimate positions in the next frame
• Assess car’s trajectory with respect to lane and warn driver in case
of lane departure
Lane-Departure Warning—The Problem
© 2014 Embedded Vision Alliance 22
In a lane-departure warning system, edge detection is the first step in
detecting lines (which may correspond to lane markings)
• Edge detection is a well-understood technique
• Primarily comprises 2D FIR filtering
• Computationally-intensive pixel processing
• Many algorithms are available (Canny, Sobel, etc.)
Lane-Departure Warning: Edge Detection
© 2014 Embedded Vision Alliance 23
Edge thinning removes unwanted spurious edge pixels
• Improves output of Hough transform
• Often performed in multiple passes over the frame
• Also useful in other applications
Lane-Departure Warning: Edge Thinning
© 2014 Embedded Vision Alliance 24
• Hough transform examines the edge pixels found in the image, and
detects predefined shapes (typically lines or circles)
• In a lane-departure warning system, Hough transform is used to detect
lines, which may correspond to lane markings on the road
• For good results, lens distortion correction is important
Lane-Departure Warning: Hough Transform
Original image Edges detected Lines detected
© 2014 Embedded Vision Alliance 25
Similar to a histogram
• Each detected edge pixel is a “vote” for all of the lines that pass
through the pixel’s position in the frame
• Lines with the most “votes” are detected in the image
• Uses a quantized line-parameter space (e.g. angle and distance
from origin)
• Must compute all possible line-parameter values for each detected
edge pixel
Lane-Departure Warning: Hough Transform
Edge
pixels
Every possible line through an edge pixel
gets one vote when the pixel is processed
A line passing
through many edge
pixels gets many
votes
© 2014 Embedded Vision Alliance 26
Filter the detected lines to discard lines that are not likely to be lane
markings
• Find start and end points of line segments
• Filter by length, position, and angle
• Filter by line color and background color
• Additional heuristics may apply (e.g. dashed or solid lines are likely
to be lane markings, but lines with uneven gaps are not)
Possibly classify the lines as lane markings or other lane indication (e.g.
curb)
Lane-Departure Warning: Detecting Lane
Markings
© 2014 Embedded Vision Alliance 27
• Tracking lane markers from each frame to the next
• Helps eliminate spurious errors
• Provides a measure of the car’s trajectory with respect to the lane
• Typically done using a predictive filter:
• Predict new positions of lane markings in the current frame
• Match the lane markings to the predicted positions and compute the
prediction error
• Update the predictor for future frames
• Kalman filters are often used for prediction in vision applications
• Theoretically these are the fastest-converging filters
• Simpler filters are often sufficient
• Very low computational demand due to low data rates
• E.g. 2 lane marking positions × 30 fps 60 samples per second
Lane-Departure Warning: Tracking Lane
Markings
© 2014 Embedded Vision Alliance 28
• The basic algorithm presented is not robust. Will need significant
enhancements for real-world conditions
• Must work properly on curved roads
• Must handle diverse conditions (e.g. glare on wet road at night)
• Integration with other automotive safety functions
• May impose memory, processing load, or synchronization constraints
• Tradeoffs between computational demand and quality
• E.g. video resolution, choice of algorithms, etc.
• Billions to hundreds of billions of operations per second required,
depending on tradeoffs
Lane-Departure Warning: Challenges
© 2014 Embedded Vision Alliance 29
Processors for Embedded Vision
© 2014 Embedded Vision Alliance 30
Though challenged with respect to performance and efficiency, unaided high-
performance embedded CPUs are attractive for some vision applications
Vision algorithms are initially developed on PCs with general-purpose
CPUs
CPUs are easiest to use: tools, operating systems, middleware, etc.
Most systems need a CPU for other tasks
However:
Performance and/or efficiency is often inadequate
Memory bandwidth is a common bottleneck
Example: Intel Atom (used in NI 1772C Smart Camera)
Best for: Applications with modest performance needs;
quick time to market
Trend to watch: Integration of GPGPUs with embedded CPUs
High-performance Embedded CPUs
© 2014 Embedded Vision Alliance 31
Trend: Heterogeneous Architectures
Performance/$
Performance/W
Development
Effort
© 2014 Embedded Vision Alliance 32
• Very heterogeneous processors
• Benefit from huge investments by suppliers
• Hardware performance, efficiency, integration
• Application development infrastructure
• Mobile apps have become a primary locus of software development
• APs can be difficult to buy and use for embedded applications
• APs are used in some embedded applications (sometimes in mobile
device form, sometimes via a system-on-module)
Trend: Mobile Application Processors
CPU GPU
DSP
ISP
VPU
© 2014 Embedded Vision Alliance 33
• Graphics processing units (GPUs) are massively parallel machines
• GPUs and their tools have evolved to support non-graphics workloads
(“general-purpose GPU” or “GPGPU”)
• Important recent developments:
• Now in mobile application processors and embedded processors
• OpenCL support
• HSA (Heterogeneous System Architecture)
• Generalized to CPUs, DSPs, FPGAs
Trend: General-Purpose GPU
© 2014 Embedded Vision Alliance 34
• As more vision applications achieve high volumes, vision-specific
processors are emerging
• All are co-processors, working in tandem with a CPU
• Many are sold as licensable IP for custom chips:
• Apical, Cadence (Tensilica), CEVA, CogniVue, videantis
• A few are sold as chips: Mobileye, Movidius, TI (EVE)
• Some are do-it-yourself kits:
• For chip designers: Synopsys Processor Designer
• For system designers: Xilinx Zynq
• Challenges:
• Unique programming models and environments
• Limited libraries
• Important (potential) trend: OpenVX
Trend: Vision-Specific (Co-)Processors
© 2014 Embedded Vision Alliance 35
FPGA flexibility is very valuable for embedded vision applications
Enables custom specialization and enormous parallelism
Enables selection of I/O interfaces and on-chip peripherals
However:
FPGA design is hardware design, typically done at a low level (register transfer level)
Ease of use improving due to:
Platforms
IP block libraries
Emerging high-level tools
Example: Xilinx Spartan-3 XC3S4000 (in Eutecus Bi-i V301HD smart camera)
Best for: High performance needs with tight size/power/cost budgets
Trends to watch: Vision IP libraries and reference designs; high-level tools
FPGA + CPU
© 2014 Embedded Vision Alliance 36
The Embedded Vision Alliance (www.Embedded-Vision.com) is a
partnership of 42 leading embedded vision technology
and services suppliers
Mission: Inspire and empower product creators (including
mobile app developers) to incorporate visual intelligence into
their products
The Alliance provides low-cost, high-quality technical
educational resources for engineers
• The Alliance website offers in-depth tutorial articles,
video “chalk talks,” code examples, discussion forums
• The Embedded Vision Insights newsletter delivers news,
Alliance updates and new resources
Capture the attention of thousands of vision system and
software developers by highlighting AMD products and
technology on the Alliance web site
Empowering Product Creators to
Harness Embedded Vision
© 2014 Embedded Vision Alliance 37
• “Embedded vision” refers to practical systems that extract meaning
from visual inputs
• Embedded vision upgrades what machines know about the physical
world, and how they interact with it, enabling dramatic improvements
in existing products—and creation of new types of products
• Thanks to the emergence of high-performance, low-cost, energy
efficient programmable processors, embedded vision is proliferating
into almost every segment of the electronics industry
• Embedded vision is a huge opportunity for the electronics industry
• Engineers can leverage the Embedded Vision Alliance to learn practical
techniques in embedded vision
Conclusions
© 2014 Embedded Vision Alliance 38
Thank You!
ANALYSIS • ADVICE • ENGINEERING FOR EMBEDDED PROCESSING TECHNOLOGY
© 2014 BDTI
Published Analysis
Contract Analysis Services
Contract Engineering
Services
BDTI: Over 20 Years of Embedded Processing Leadership
39
Since 1992, BDTI has helped technology suppliers and users build better products, win business, and reduce risk through:
1. Published Analysis and Training
• Building awareness of technology options, capabilities, and trade-offs through webinars, articles, blogs, and reports
• Teaching effective development techniques live and online
2. Contract Analysis Services
• Strengthening technology marketing and strategy through an independent view, hands-on evaluation, and deep insight
• Enabling rapid, confident technology-selection decisions via benchmarking and analysis
3. Contract Engineering Services
• Delivering specialized engineering services to prove feasibility, increase efficiency, and speed time-to-market
• Solving difficult problems in performance and power consumption
ANALYSIS • ADVICE • ENGINEERING FOR EMBEDDED PROCESSING TECHNOLOGY
© 2014 BDTI
BDTI Services for Vision Systems Design
BDTI provides vision system design services for product development.
• A highly trusted partner—consistently delivering projects right the first time, on time and on budget
• Knowledge of vision applications, algorithms and tools, including OpenCV
“BDTI has set a new standard for quality. BDTI’s deliverables were orders of magnitude better than other vendors’.” - Group Program Manager, Fortune 500 systems and software company
BDTI provides onsite training in embedded engineering
• Workshop and hands-on training in design and implementation of vision systems
Learn more about BDTI’s capabilities in vision engineering at http://www.embedded-vision.com/platinum-members/bdti/overview
40