open cv intro references: 1."learning opencv: computer vision with the opencv library",...

Open CV intro

References: 1. "Learning OpenCV: Computer Vision with the OpenCV Library",

Bradski & Kaehler (O'Reilley 2008)2. http://opencv.willowgarage.com/wiki/

http://opencv.willowgarage.com/wiki/

What is Computer Vision?

3D Graphics

(2D Image)

(2D Image) Scene "Information"• "Apple on a table"

• Triangle Mesh• Virtual lights• Camera• …

ComputerVision

Scene "Information"• Skin Tones• Parallel Lines (building)• Depth calculations• ...• "Man in front of building"

What is OpenCV?

• ~2500 computer vision algorithms• Highly optimized (originally for Intel)• BSD license• Languages:

– C (function-based) OpenCV 1.x– C++ (class based) OpenCV 2.x– Python (2.6 and 2.7)

• Import cv2 for the OpenCV 2.x style bindings• Import cv2.cv for the OpenCV 1.x style bindings• Very poor documentation!

– Java (not yet, though)• Ported to Windows, OSX, Linux, iOS, Android

Links

• Download:– http://opencv.org/

• C / C++ – http://docs.opencv.org/

• Python– Tutorials: https://opencv-python-tutroals.readthedocs.org/en/latest/– [Not really any documentation – just read the C++ docs and translate it

yourself]

http://opencv.org/

http://opencv.org/

http://docs.opencv.org/



https://opencv-python-tutroals.readthedocs.org/en/latest/

https://opencv-python-tutroals.readthedocs.org/en/latest/

Python setup

• Copy [OpenCVdir]\build\Python\2.7\Lib\site-packages cv2.pyd (a .dll file)

• Paste it in the same directory as your script• (or put in your python install folder)

Example01: Absolute basics

import cv2

# Creates a numpy.ndarray object (basically a fast, C-based# array of numeric valuesimg = cv2.imread("apple.jpg", cv2.IMREAD_COLOR)

# Creates a window (title = ‘an apple!’) and displays img in it.cv2.imshow('an apple!', img)

# Waits for any key to be pressed.cv2.waitKey(0)

# Destroys all windows.cv2.destroyAllWindows()

Example02: Video reading / game loopfrom cv2 import cvimport time#import cv2

# Create a new windowcv.NamedWindow("main")

writing = 0if writing:

# Start capturing from cameracam = cv.CaptureFromCAM(-1)

# Create a write for offline processing (without # a webcam)width = cv.GetCaptureProperty(cam,

cv.CV_CAP_PROP_FRAME_WIDTH)height = cv.GetCaptureProperty(cam,

cv.CV_CAP_PROP_FRAME_HEIGHT)

# Note: Indeo video 5.10 is the only codec I could # read and write to.writer = cv.CreateVideoWriter("example01.avi", code,

30.0, (width,height), 0)

else:# Open a video file for reading (treat it as if it # were a camera)cam = cv.CaptureFromFile("example01.avi")

# The current "window grab"captureNum = 0

# "Game" Loopwhile True:

# Capture the current frame and show # it in the windowimg = cv.QueryFrame(cam)cv.ShowImage("main", img)

if writing:# Save to the avi filecv.WriteFrame(writer, img)

# Get Keyboard eventskeyCode = cv.WaitKey(5)if keyCode == 27: # Escape

breakelif keyCode == ord("s"):

# Save the current image to a filecv.SaveImage("F" + str(captureNum)

+ ".jpg", img)captureNum += 1

# End capturingdel cam # Should release the camera / file

if writing:del writer

# Destroy the windowcv.DestroyWindow("main")

A tour of CV algorithms

1. Noise reduction: a. Blurringb. Thresholdingc. Erode / Dilate

2. Edge / shape detectiona. (Hu) moments

3. Histogramsa. Back-projection

4. Background-subtraction

1a. Noise Reduction (Blur)

im = cv.LoadImage("apple.jpg", cv.CV_LOAD_IMAGE_GRAYSCALE)

cv.NamedWindow("orig")cv.ShowImage("orig", im)print dir(im)

blurred = cv.CreateImage((im.width, im.height), im.depth, 1)cv.Smooth(im, blurred, cv.CV_GAUSSIAN, 9, 9)

1b. Noise Reduction (Threshold)

edImg = cv.CreateImage((im.width, im.height), im.depth, 1)cv.Threshold(blurred, edImg, 128, 255.0, cv.CV_THRESH_BINARY)

1c. Noise Reduction (Erode / Dilate)

cv.Erode(edImg, edImg, None, 20)cv.Dilate(edImg, edImg, None, 20)

Just erodeJust dilate

Erode, then dilate

2a. Moments

• An easy way to analyze a shape– Assumptions (here):

• binary image (0=black, 1=white)• Mainly One shape: the white part (pass

CV_THRESH_BINARY_INV instead of CV_THRESH_BINARY to Threshold)

– Notation: • I(x, y): intensity of pixel (x,y) (a 0 or 1)

2a. Moments, cont.

• Note:– M00 is the number of pixels (area)

– Centroid is (M10/M00, M01/M00)

2a. Hu Moments• Invariant (mostly) to

– Scale– Rotation– Reflection

• the seventh has different sign for reflection

• Hu1 = M20 + M02• Hu2 = (N20 – N02)2 + 4M112• Hu3 = (M30 – 3M12)2 + (3M21 – M03)2• …• Hu7 = (3M21 – M03)(M21 + M03)[3(M30 + M12)2 – (M21 + M03)2] – (M30 – 3M12)(M21

+ M03)[3(M30 + M12)2 – (M21 + M03)2]

• If you were to treat this as a Vector7, you could compare it to a database of other Vector7's to do simple shape-matching

2a. Hu Moments, cont.• 0.1730651754• 0.0002714368• 0.0000133760• 0.0000289668• 0.00000000056837• -0.0000004641061• 0.00000000004541

cv.Threshold(blurred, edImg, 128, 255.0, cv.CV_THRESH_BINARY_INV)

cv.Dilate(edImg, edImg, None, 10)cv.Erode(edImg, edImg, None, 10)

edMat = cv.GetMat(edImg)moments = cv.Moments(edMat, 1)hu = cv.GetHuMoments(moments)

• 0.1606958551 (diff2 = 0.000153) • 0.0000105604 (diff2 = 0.00000731)• 0.0000937233 (diff2 = 6.455e-9)• 0.0000001.183 (diff2 = 8.322e-10)• 0.000000000000341 (diff2 =

3.227e-19)• 0.000000000269809 (diff2 = 2.15e-

13)• -0.000000000000197 (diff2 = 2e-21)• Total = 0.0016

• 0.3608422951 (diff2 = 0.0353)• 0.0568850707 (diff2 = 0.003)• 0.0170774107 (diff2 = 0.0003)• 0.0026356011 (diff2 = 6.8e-6)• -0.00001702199712 (diff2 = 2.9e-7)• -0.00062800188755 (diff2 = 3.94e-7)• 0.000004785759613 (diff2 = 2.3e-11)• Total = 0.0385 (~30x "farther")

3. Histograms

• Basically, an n-dimensional plot• Bins (buckets)• Examples:– 1D: In a grayscale image, number of pixels in a bin

(0-5 intensity, 5-10 intensity, …, 250-255 intensity)– 2D: Hue-Saturation graph (x-axis = hue, y-axis =

saturation)

3. Histograms, cont.

3. Histograms (creating, 1D)

• Assumption: img is a grayscale image (1 channel)

# Create the histogramnum_bins = 25hist = cv.CreateHist([num_bins,], cv.CV_HIST_ARRAY, [[0,255],], 1)cv.CalcHist((img,), hist)

# Create the image to visualize it (optional)scale = 15 # Size of each "bar" in the plothist_img = cv.CreateImage((num_bins * scale, 256), 8, 3)(_, max_val, _, _) = cv.GetMinMaxHistValue(hist)cv.Rectangle(hist_img, (0,0), (num_bins * scale - 1, 255), cv.RGB(0,255,0), cv.CV_FILLED)for b in range(num_bins):

val = 255.0 * cv.QueryHistValue_1D(hist, b) / max_valcv.Rectangle(hist_img, (b*scale, 255),

((b+1)*scale-1, 255-val), cv.RGB(255,0,0), cv.CV_FILLED)

3. Histograms (creating, 2D)• Assumption: – img is an RGB image– mask is a gray image (black = don't count, white = do)

# Convert from RGB to HSV (storing the hue and sat in different (grayscale) imageshsv = cv.CreateImage(cv.GetSize(img), 8, 3)cv.CvtColor(img, hsv, cv.CV_BGR2HSV)

h_plane = cv.CreateMat(img.height, img.width, cv.CV_8UC1)s_plane = cv.CreateMat(img.height, img.width, cv.CV_8UC1)cv.Split(hsv, h_plane, s_plane, None, None)

# Create the histogramhue_bins = 30sat_bins = 32hist_size = [h_bins, s_bins]h_ranges = [0, 180] # Red is ~0 degreess_ranges = [15, 255] # 0=graysacle, 255=pure colorranges = [h_ranges, s_ranges]hist = cv.CreateHist(hist_size, cv.CV_HIST_ARRAY, ranges, 1)cv.CalcHist(planes, hist, 0, mask)

3a. Back-projection• A histogram tells us how often a value (color)

appears in the image the histogram was built from.– "Fuller" bin = more prevalent– "Emptier" bin = less prevalent

• Back-Projection example:– Create an image with flesh tones.– Create a histogram from it

• Hue-Saturation generally ignores race

– Now, given a new color, we can determine how likely it is to be flesh-toned by looking up that spot in the histogram.• If it's a full bin, it's probably a flesh-tone• If it's an empty bin, it's probably not a flesh-tone

3a. Back-projection

3a. Back-projection

# Convert img (RGB) to HS(V)hsv = cv.CreateImage(cv.GetSize(img), 8, 3)cv.CvtColor(img, hsv, cv.CV_BGR2HSV)

# Get images for the hue and sat "planes" of hsvh_plane = cv.CreateMat(i.height, i.width, cv.CV_8UC1)s_plane = cv.CreateMat(i.height, i.width, cv.CV_8UC1)cv.Split(hsv, h_plane, s_plane, None, None)h_plane = cv.GetImage(h_plane) # CvMat => CvImgs_plane = cv.GetImage(s_plane)hsPlanes = [h_plane, s_plane]

# Do the back-projection. Note: hist was as created on a# previous slidebackPropImg = cv.CreateImage((img.width, img.height), 8, 1)cv.CalcBackProject(hsPlanes, backPropImg, hist)

3a. Patch-based Back-projection

• Similar to regular back-projection, but uses a (w x h) (I used 5 for each) "window"

• The window "slides" over each pixel in the image. Let's say it's at pixel (i, j)– Look in the 11x11 neigborhood of (i, j)– Calculate a new histogram– Compare it to a reference histogram. The degree

of similarity is the value to set (i, j) to in the "probability" image


# hist, and hsPlanes are computed as before.

patchW = patchH = 5backPropImg = cv.CreateImage((img.width - patchW + 1, img.height - patchH + 1), cv.IPL_DEPTH_32F, 1)cv.CalcBackProjectPatch(hsPlanes, backPropImg, (patchW, patchH), hist, cv.CV_COMP_CORREL, 1)

4. Background-Subtraction

• Goal:– Mark “non-background” pixels in a mask (1=non-

background, 0=background)– Analyze the shape of the non-background pixels.

4. Background Subtraction

• Naïve Approach:cv.AbsDiff(curFrame, bgOnlyFrame, diffImg)# Maybe a threshold now, erosion, dilate, etc.

• Problems:– A lot of frame-to-frame “noise”– Webcam auto-adjusting intensity (@#$! Logitechs)– Clouds passing by, trees waving in wind, …

• A better approach…[see example04]

open cv intro references: 1."learning opencv: computer vision with the opencv library",...

Documents