yao wang tandon school of engineering, new york …image formation and representation yao wang...

57
Image Formation and Representation Yao Wang Tandon School of Engineering, New York University

Upload: others

Post on 14-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Image Formation and Representation

Yao WangTandon School of Engineering, New York University

Yao Wang, 2017 EL-GY 6123

Why Study Image and Video Processing?

• Purpose of image and video processing– Improvement of pictorial information for human interpretation– Compression of image and video data for storage and

transmission– Feature extraction and image analysis to enable object

detection, classification, and tracking• Typical application areas

– Television Signal Processing– Satellite Image Processing– Medical Image Processing– Robotics– Visual Communications– Law Enforcement– Etc.

2

Breakdown of the Course

• Image processing basics• Video processing basics• Feature extraction• Image alignment and stitching based on feature

correspondence• Image segmentation• Image and video compression• Stereo and multiview image/video basics

Yao Wang, 2017 EL-GY 6123 3

Yao Wang, 2017 EL-GY 6123

Today’s Lecture

• Color perception • Color production using primary colors• Color specification • Color image representation• Image capture and display• 3D->2D Projection • Video format (SD, HD, UHD)

4

Yao Wang, 2017 EL-GY 6123

Color Perception and Specification

• Light -> color perception• Human perception of color• Type of light sources• Trichromatic color mixing theory• Specification of color

– Tristimulus representation – Luminance/Chrominance representation

• Color coordinate conversion

5

Yao Wang, 2017 EL-GY 6123

Light is part of the EM wave

from [Gonzalez2008]

6

Yao Wang, 2017 EL-GY 6123

Illuminating and Reflecting Light

• Illuminating sources: – emit light (e.g. the sun, light bulb, TV monitors)– perceived color depends on the emitted freq. – follows additive rule

• R+G+B=White

• Reflecting sources: – reflect an incoming light (e.g. the color dye, matte surface,

cloth)– perceived color depends on reflected freq (=emitted freq-

absorbed freq.)– follows subtractive rule

• R+G+B=Black

7

Yao Wang, 2017 EL-GY 6123

Eye Anatomy

From http://www.stlukeseye.com/Anatomy.asp

8

Yao Wang, 2017 EL-GY 6123

Eye vs. Camera

Camera components Eye components

Lens Lens, cornea

Shutter Iris, pupil

Film Retina

Cable to transfer images Optic nerve send the info to the brain

9

Yao Wang, 2017 EL-GY 6123

Human Perception of Color

• Retina contains photo receptors– Cones: day vision, can perceive color

tone• Red (560-580nm), green (530-540nm),

and blue (420-440nm) cones• Different cones have different frequency

responses• Tri-receptor theory of color vision

[Young1802]– Rods: night vision, perceive brightness

only• Color sensation is characterized by

– Luminance (brightness)– Chrominance

• Hue (color tone)• Saturation (color purity)

From http://www.macula.org/anatomy/retinaframe.html

10

Yao Wang, 2017 EL-GY 6123

Frequency Responses of Cones and the Luminous Efficiency Function

ybgridaCC ii ,,,,)()( == ò lll

11

Yao Wang, 2017 EL-GY 6123

Three Attributes of Color

• Luminance (brightness)• Chrominance

– Hue (color tone) and Saturation (color purity)• Represented by a “color cone”

θ

12

Yao Wang, 2017 EL-GY 6123

Trichromatic Color Mixing

• Trichromatic color mixing theory– Any (actually most) color can be obtained by mixing three primary

colors with a right proportion

• Primary colors for illuminating sources:– Red, Green, Blue (RGB)– Color monitor works by exciting red, green, blue phosphors using

separate electronic guns• Primary colors for reflecting sources (also known as secondary

colors):– Cyan, Magenta, Yellow (CMY)– Color printer works by using cyan, magenta, yellow and black (CMYK)

dyes

valuessTristimulu :,3,2,1

kk

kk TCTC å=

=

13

Yao Wang, 2017 EL-GY 6123

RGB vs CMY

Magenta = Red + BlueCyan = Blue + GreenYellow = Green + Red

Magenta = White - GreenCyan = White - RedYellow = White - Blue

14

Yao Wang, 2017 EL-GY 6123

red

Green Blue

15

CIE 1931 RGB Primaries

Yao Wang, 2017 EL-GY 6123 16

• Monochromatic primaries: Red (700nm), Green (546.1nm), Blue (435.8nm)

• Color matching functions: specify the amount of each primary to match a particular monochromatic color

• Need “negative amount” of red to match some colors: add this amount of red to the test color to match the sum of remaining primaries

from [Szeliski2010]

CIE 1931 XYZ Primaries

• XYZ do not correspond to real color primaries (Y=luminance, Z=blue, X: mixed). They are imaginary primary colors.

• Using these color matching functions can however represent any single spectral color using non-negative tristimulus values

Yao Wang, 2017 EL-GY 6123 17

from [Szeliski2010]

Yao Wang, 2017 EL-GY 6123

Tristimulus Values

• Tristimulus values– The amounts of three primary colors needed to form any

particular color are called the tristimulus values, denoted by X, Y, Z (or any other three symbols).

– For any color with spectrum

• Trichromatic (or chromaticity) coefficients

– Only two chromaticity coefficients are necessary to specify the chrominance of a light.

.,,ZYX

ZzZYX

YyZYX

Xx++

=++

=++

=

1=++ zyx

18

!!X = S(λ∫ )x(λ)dλ , Y = S(λ∫ )y(λ)dλ , Z = S(λ∫ )z(λ)dλ

S(λ)

Conversion between CIE 1931 RGB and XYZ

Yao Wang, 2017 EL-GY 6123 19

The three columns in the RGB-> XYZ conversion are the tristimulus values of the R,G,B primaries defined in terms of CIE XYZ primaries. From these, you could derive the chromaticity coefficients of RGB primaries and correspondingly locate them in the CIE diagram.

Ex: Red: x=0.49/(0.49+0.17697)=0.735, y=0.17697/(0.49+0.17697)=0.265.

from [Szeliski2010]

Yao Wang, 2017 EL-GY 6123

CIE 1931 XYZ Chromaticity Diagram

• Colors on the boundary: spectrum colors, highest saturation

• Shows all visible colors by humans

• Mixing any three colors in the visible range can generate colors in the triangle formed by these three points.

• Not all visible colors can be reproduced by RGB primaries used for display, or CMY primaries used for printing.

• Printing gamut using CMY is smaller than display gamut using RGB

20

from [https://commons.wikimedia.org/wiki/File:CIE1931xy_CIERGB.svg]

RGB Color Primary Defined by ITU Rec. 709 for Digital TV (BT. 601)

Yao Wang, 2017 EL-GY 6123 21

RGB

!

"

###

$

%

&&&=

3.240479 −1.537150 −0.498535−0.969256 1.875992 0.0415560.055648 −0.204043 1.057311

!

"

###

$

%

&&&

XYZ

!

"

###

$

%

&&&

XYZ

!

"

###

$

%

&&&=

0.412453 0.357580 0.1804230.212671 0.715160 0.0721690.019334 0.119193 0.950277

!

"

###

$

%

&&&

RGB

!

"

###

$

%

&&&

The three columns in the RGB-> XYZ conversion are the tristimulus values of the R,G,B primaries defined in terms of CIE XYZ primaries. From these, you could derive the chromaticity coefficients of RGB primaries and correspondingly locate them in the CIE diagram (HW!)

BT.709 primaries shown on theCIE 1931 x, y chromaticity diagram

Yao Wang, 2017 EL-GY 6123 22

https://en.wikipedia.org/wiki/Rec._709#/media/File:CIExy1931_Rec_709.svg

Yao Wang, 2017 EL-GY 6123

Color Representation Models

• Specify the tristimulus values associated with the three primary colors– RGB, CMY, XYZ

• Specify the luminance and chrominance– HSI (Hue, saturation, intensity)– YIQ (used in NTSC color TV)– YCbCr (used in digital color TV)

• Amplitude specification (Standard Dynamic Range): – 8 bits for each color component, or 24 bits total for each pixel– Total of 16 million colors – A true RGB color display of size 1Kx1K requires a display buffer

memory size of 3 MB• High dynamic range (HDR) Image: up to 16 bits/component

23

Yao Wang, 2017 EL-GY 6123

HSI Color Model

• Hue represents dominant color as perceived by an observer. It is an attribute associated with the dominant wavelength.

• Saturation refers to the relative purity or the amount of white light mixed with a hue. The pure spectrum colors are fully saturated. Pink and lavender are less saturated.

• Intensity reflects the brightness.

24

Yao Wang, 2017 EL-GY 6123

Conversion Between RGB and HSI

• Converting from RGB to HSI

• Converting from HSI to RGB

[ ]

[ ]

][31

)],,[min()(

31

))(()(

)()(21

cos360

21

2

1

BGRI

BGRBGR

S

BGBRGR

BRGRwith

GBifGBif

H

++=

++-=

ïïþ

ïïý

ü

ïïî

ïïí

ì

--+-

-+-=

îíì

>-£

= -qq

q

)(1)60cos(

cos1

)1(

BRGHHSIR

SIB

+-=

úû

ùêë

é-

+=

-=

RG sector (0≤H<120)

)(1))120(60cos()120cos(1

)1(

GRBHHSIG

SIR

+-=

úû

ùêë

é---

+=

-=

GB sector (120≤H<240)

)(1))240(60cos()240cos(1

)1(

BGRHHSIB

SIG

+-=

úû

ùêë

é---

+=

-=

BR sector (240≤H<360)

25

Yao Wang, 2017 EL-GY 6123

YIQ Color Coordinate System

• YIQ is defined by the National Television System Committee (NTSC) for US analog color TV system– Y describes the luminance, I and Q describes the chrominance.– A more compact representation of the color.– YUV plays similar role in PAL and SECAM.

• Conversion between RGB and YIQ

úúú

û

ù

êêê

ë

é

úúú

û

ù

êêê

ë

é

---=

úúú

û

ù

êêê

ë

é

úúú

û

ù

êêê

ë

é

úúú

û

ù

êêê

ë

é

---=

úúú

û

ù

êêê

ë

é

QIY

BGR

BGR

QIY

703.1106.10.1649.0272.00.1621.0956.00.1

,311.0523.0211.0322.0274.0596.0114.0587.0299.0

26

Yao Wang, 2017 EL-GY 6123

YUV/YCbCr Coordinate

• YUV is the color coordinate used in color TV in PAL system, somewhat different from YIQ.

• YCbCr is the digital equivalent of YUV, used for digital TV, with 8 bit for each component, in range of 0-255

27

Yao Wang, 2017 EL-GY 6123

Comparison of Different Color Spaces

28

Yao Wang, 2017 EL-GY 6123

Color Coordinate Conversion

• Conversion between different primary sets are linear (3x3 matrix)

• Conversion between primary and XYZ/YIQ/YUV are also linear

• Conversion to LSI/Lab are nonlinear– LSI and Lab coordinates

• coordinate Euclidean distance proportional to actual color difference

• Conversion formulae between many color coordinates can be found in [Gonzalez2008]

• Color space of HDTV by Recommendation 709 can be found in [Woods2012]

29

Yao Wang, 2017 EL-GY 6123

Grayscale Image Specification

• Each pixel value represents the brightness of the pixel. With 8-bit image, the pixel value of each pixel is 0 ~ 255

• Matrix representation: An image of MxN pixels is represented by an MxN array, each element being an unsigned integer of 8 bits

úúúúúú

û

ù

êêêêêê

ë

é

=

10999555894795560

69122158162154166162160

!

!

""#""

!

!

M

30

Yao Wang, 2017 EL-GY 6123

Color Image Specification

• Three components – M = {R, G, B}

R G B

úúú

û

ù

êêê

ë

é=

úúú

û

ù

êêê

ë

é=

úúú

û

ù

êêê

ë

é=

1436

6131,

1336

9866,

1727

8773

!

"#"

!

!

"#"

!

!

"#"

!

BGR

Red nose is brightest! Blue Cheek is brightest31

Yao Wang, 2017 EL-GY 6123

Chrominance Subsampling Formats

4:2:0For every 2x2 Y Pixels

1 Cb & 1 Cr Pixel(Subsampling by 2:1 bothhorizontally and vertically)

4:2:2 For every 2x2 Y Pixels

2 Cb & 2 Cr Pixel(Subsampling by 2:1

horizontally only)

4:4:4 For every 2x2 Y Pixels

4 Cb & 4 Cr Pixel(No subsampling)

Y Pixel Cb and Cr Pixel

4:1:1For every 4x1 Y Pixels

1 Cb & 1 Cr Pixel(Subsampling by 4:1

horizontally only)

32

Yao Wang, 2017 EL-GY 6123

Image Capture and Display

• Light reflection physics• Imaging operator• Color capture• Color display

33

Photometric Image Formation

Yao Wang, 2017 EL-GY 6123 34

from [Szeliski2010]

Yao Wang, 2017 EL-GY 6123

Image and Video Capture

Courtesy of Onur Guleryuz

35

Yao Wang, 2017 EL-GY 6123

More on Video Capture

• Camera absorption function

• Projection from 3-D to 2-D

• The projection operator is non-linear – Perspective projection– Othographic projection

llly datCt c )(),,(),( ò= XX

)),((),(or ),()),(( 1 tPtttPP

xxXX

xX-==

®

yyyy

36

Yao Wang, 2017 EL-GY 6123

Pinhole Camera Model

Cameracenter

Imageplane

2-Dimage

3-Dpoint

The image of an object is reversed from its 3-D position. The object appears smaller when it is farther away.

ZYFy

ZXFx == ,

37

Perspective Projection

Yao Wang, 2017 EL-GY 6123 38

ZyxZYFy

ZXFx

ZY

Fy

ZX

Fx

torelatedinversely are ,

,, ==Þ==

All points in this ray will have the same image

Yao Wang, 2017 EL-GY 6123

Gray and Color Image Capture

• Gray images are captured by a single sensor, sensitive to the entire visible spectrum, similar to the rods

• Color images are captured by having three sensors, each sensitive to a different primary color, similar to the cones– Alternatively using a single sensor proceeded by different color

filters

• Each sensor or filter has its own frequency response, which may differ from the cones responses in the human retina.

39

Yao Wang, 2017 EL-GY 6123

Color Imaging Using Color Filter Arrays

• Single sensor array, with different color filters to separate the RGB primary components

• Sensors: CCD, CMOS

From http://en.wikipedia.org/wiki/Bayer_filter

Bayer RGB Pattern

40

Yao Wang, 2017 EL-GY 6123

From http://en.wikipedia.org/wiki/Bayer_filter

1: original scene2: output of a 120x80 pixel sensor with a Bayer filter3: output color coded4: Reconstructed image after interpolating missing colors (demosaicing)

41

Yao Wang, 2017 EL-GY 6123

Gray and Color Image Display/Printing

• Gray images are displayed by a single light sensitive diode, with intensity proportional to gray level.

• LCD monitors display color images by having three phosphors at slightly shifted positions near each pixel, each generating a different primary color (red, green, blue)

• Color images are printed by having three color inks (cyan, magenta, yellow). High end printers use more inks to produce a larger color gamut and more vivid colors.

42

Captured and Displayed Color

• Original lighted scene has a spectrum distribution (in terms of wavelength) at a particular space and time (x,y,z,t). Call it c(l)

• If the human oberserves it directly, the cones’ responses (each with frequency responses ai(l)) will be

• If the scene is captured by a camera, with three sensors, each with frequency response bi(l), the captured RGB values will be

• The display will use three phophors, each with different spectrum di(l) and the displayed spectrum is the weighted sum of the three spectrums

• The displayed spectrum again is viewed by the human, generating three other responses

• Bi should be close to Ai for the perceived scene to be similar to the original scene

Yao Wang, 2017 EL-GY 6123 43

Ai = C(λ)ai∫ (λ)dλ, i = r,g,b, y

Ci = C(λ)ci∫ (λ)dλ, i = r,g,b

Bi = D(λ)ai∫ (λ)dλ, i = r,g,b, y

Color Calibration

• Each camera/display may use R,G,B primaries somewhat different

• Captured images also depend on scene lighting• Using a reference white, rescale the RGB values so

that “white” will have equal R,G,B values• BT. 709 uses daylight illuminate D_65

Yao Wang, 2017 EL-GY 6123 44

Gamma Correction

• Displayed light intensity is nonlinearly related to the actual intensity following the Gamma rule

• Gamma correction precompensate this nonlinearity inside the camera

• Gamma value depends on the display device, typically gamma~2.2

Yao Wang, 2017 EL-GY 6123 45

g = af r

h = f 1/r

g = ahr = af

Video Formats

• Standard Definition (SD) video (BT. 601)– Resolution: 720x480, 25-30Hz– Standard Dynamic Range (SDR): 0.0002 to 100 cd/m2 (nits) in

luminance range, represented in 8bits/color– Color spec: BT. 709

• Narrow color range: 33.5% of visible range

• High Definition (HD) video (BT. 709)– Resolution increase to 1920x1080, up to 60 Hz– Color spec: BT. 709

• Ultra High Definition (UHD) video– Resolution further increase to 3840x2160 or higher, up to 120Hz– High dynamic range (HRD): 0,00005 to 1000 cs/m2, represented up to

16bits/color (10-12 bpp for distribution, 14-16bpp for production)– Wide color gamut: DCI-P3 (41.8%) and BT.2020 (57.3%).

Yao Wang, 2017 EL-GY 6123 46

UHD Alliance Definition of UHD TV

• Resolution: 3840x2160 for content, distribution, and playback displays.

• Color bit depth: 10 bits minimum for content and distribution, 10 bits for playback displays.

• Color representation: BT.2020 for content, distribution, and playback displays.

• Mastering display: 0.03- 1,000 nits• Playback display: 0.05-1,000 nits, or 0.0005-540nits• UHD content backwards compatible with SDR displays

Yao Wang, 2017 EL-GY 6123 47

Color Gamut Specifications

• BT.709: Used in HDTV and SDTV• DCI-P3: Used in cinema presentation• Pointer’s gamut: gamut of 4,000 diffuse surface colors

found in nature• BT.2020: Specified for UHD, covers 99.9% of Pointer’s

gamut

• BT.2020 primaries are maximum saturation pure colors, created by extremely narrow spectral slices of light energy: Red (630nm), Green (532 nm), Blue (467nm)

Yao Wang, 2017 EL-GY 6123 48

Yao Wang, 2017 EL-GY 6123 49From [Schulte2016]

Yao Wang, 2017 EL-GY 6123 50From [Schulte2016]

New Color Space for UHD: BT.2020

Yao Wang, 2017 EL-GY 6123 51

From [BT2020]

SDR vs. HDR

Yao Wang, 2017 EL-GY 6123 52

From: http://files.spectracal.com/Documents/White%20Papers/HDR_Demystified.pdf

HDR TV: captured using HDR camera and displayed by HDR displayHDR photography: computed HDR image using multiple captured images using SDR cameras

HDR Delivery Systems

• HDR10: Specified by UHD Alliance. Incompatible with SDR TV. Compression using HEVC main 10 profile

• Dolby Vision: for UHD Blue Ray specified by Dolby, backward compatible with SDR using two layer representation.

Yao Wang, 2017 EL-GY 6123 53

HDR10 System

Yao Wang, 2017 EL-GY 6123 54

HDR10 content mastering process

HDR10 content decoding process

From [Schulte2016]

Mapping between sensed light and color values

Yao Wang, 2017 EL-GY 6123 55

From [Schulte2016]

A majority of the ST2084 codes represent lower luminance levels, to complement the human visual system’s greater contrast sensitivity at lower luminance levels and to minimize visible banding at those lower levels.

HDR Electro Optical Transfer Function (ST2084)

Inverse Transfer Function (ST2084PQ)

Yao Wang, 2017 EL-GY 6123

Summary

• What is light? Difference between illuminating and reflecting light. Which light attribute determines the color?

• How human perceive color? Functions of cones and rods in the retina.

• Three attributes of color: luminance, hue, saturation• How to produce different colors ?• How to represent a color? Different color models. Meaning of

chromaticity.• Color image representation using a matrix• Projection of 3D to 2D• Color image capture using color filters• Color image display• Different image formats, color formats

56

Reading Assignments

Yao Wang, 2017 EL-GY 6123 57

• [Szeliski2012] Richard Szeliski, Computer Vision: Algorithms and Applications. 2012. Ch. 2. (Sec. 2.1.5, 2.2, 2.3)

• [Schulte2016] Tom Schulte, Joel Barsotti, “HDR Demystified – Emerging UHDTV Systems,” Mar. 2016. http://files.spectracal.com/Documents/White%20Papers/HDR_Demystified.pdf

• Other reference:• [Wang2002] Wang et al, Digital video processing and communications. Chap 1

(Sec. 1.3-1.4 optional)• [Gonzalez2008] Gonzalez & Woods, “Digital Image Processing”, Prentice Hall,

2008, 3rd ed. Chapter 3 (Section 6.1 – 6.2)• [BT2020] "BT.2020 : Parameter values for ultra-high definition television

systems for production and international programme exchange". International Telecommunication Union. 2012-08-23. Retrieved 2014-08-31.