open cv intro references: 1."learning opencv: computer vision with the opencv library",...
TRANSCRIPT
Open CV intro
References: 1. "Learning OpenCV: Computer Vision with the OpenCV Library",
Bradski & Kaehler (O'Reilley 2008)2. http://opencv.willowgarage.com/wiki/
What is Computer Vision?
3D Graphics
(2D Image)
(2D Image) Scene "Information"• "Apple on a table"
• Triangle Mesh• Virtual lights• Camera• …
ComputerVision
Scene "Information"• Skin Tones• Parallel Lines (building)• Depth calculations• ...• "Man in front of building"
What is OpenCV?
• ~2500 computer vision algorithms• Highly optimized (originally for Intel)• BSD license• Languages:
– C (function-based) OpenCV 1.x– C++ (class based) OpenCV 2.x– Python (2.6 and 2.7)
• Import cv2 for the OpenCV 2.x style bindings• Import cv2.cv for the OpenCV 1.x style bindings• Very poor documentation!
– Java (not yet, though)• Ported to Windows, OSX, Linux, iOS, Android
Links
• Download:– http://opencv.org/
• C / C++ – http://docs.opencv.org/
• Python– Tutorials: https://opencv-python-tutroals.readthedocs.org/en/latest/– [Not really any documentation – just read the C++ docs and translate it
yourself]
Python setup
• Copy [OpenCVdir]\build\Python\2.7\Lib\site-packages cv2.pyd (a .dll file)
• Paste it in the same directory as your script• (or put in your python install folder)
Example01: Absolute basics
import cv2
# Creates a numpy.ndarray object (basically a fast, C-based# array of numeric valuesimg = cv2.imread("apple.jpg", cv2.IMREAD_COLOR)
# Creates a window (title = ‘an apple!’) and displays img in it.cv2.imshow('an apple!', img)
# Waits for any key to be pressed.cv2.waitKey(0)
# Destroys all windows.cv2.destroyAllWindows()
Example02: Video reading / game loopfrom cv2 import cvimport time#import cv2
# Create a new windowcv.NamedWindow("main")
writing = 0if writing:
# Start capturing from cameracam = cv.CaptureFromCAM(-1)
# Create a write for offline processing (without # a webcam)width = cv.GetCaptureProperty(cam,
cv.CV_CAP_PROP_FRAME_WIDTH)height = cv.GetCaptureProperty(cam,
cv.CV_CAP_PROP_FRAME_HEIGHT)
# Note: Indeo video 5.10 is the only codec I could # read and write to.writer = cv.CreateVideoWriter("example01.avi", code,
30.0, (width,height), 0)
else:# Open a video file for reading (treat it as if it # were a camera)cam = cv.CaptureFromFile("example01.avi")
# The current "window grab"captureNum = 0
# "Game" Loopwhile True:
# Capture the current frame and show # it in the windowimg = cv.QueryFrame(cam)cv.ShowImage("main", img)
if writing:# Save to the avi filecv.WriteFrame(writer, img)
# Get Keyboard eventskeyCode = cv.WaitKey(5)if keyCode == 27: # Escape
breakelif keyCode == ord("s"):
# Save the current image to a filecv.SaveImage("F" + str(captureNum)
+ ".jpg", img)captureNum += 1
# End capturingdel cam # Should release the camera / file
if writing:del writer
# Destroy the windowcv.DestroyWindow("main")
A tour of CV algorithms
1. Noise reduction: a. Blurringb. Thresholdingc. Erode / Dilate
2. Edge / shape detectiona. (Hu) moments
3. Histogramsa. Back-projection
4. Background-subtraction
1a. Noise Reduction (Blur)
im = cv.LoadImage("apple.jpg", cv.CV_LOAD_IMAGE_GRAYSCALE)
cv.NamedWindow("orig")cv.ShowImage("orig", im)print dir(im)
blurred = cv.CreateImage((im.width, im.height), im.depth, 1)cv.Smooth(im, blurred, cv.CV_GAUSSIAN, 9, 9)
1b. Noise Reduction (Threshold)
edImg = cv.CreateImage((im.width, im.height), im.depth, 1)cv.Threshold(blurred, edImg, 128, 255.0, cv.CV_THRESH_BINARY)
1c. Noise Reduction (Erode / Dilate)
cv.Erode(edImg, edImg, None, 20)cv.Dilate(edImg, edImg, None, 20)
Just erodeJust dilate
Erode, then dilate
2a. Moments
• An easy way to analyze a shape– Assumptions (here):
• binary image (0=black, 1=white)• Mainly One shape: the white part (pass
CV_THRESH_BINARY_INV instead of CV_THRESH_BINARY to Threshold)
– Notation: • I(x, y): intensity of pixel (x,y) (a 0 or 1)
2a. Moments, cont.
• Note:– M00 is the number of pixels (area)
– Centroid is (M10/M00, M01/M00)
2a. Hu Moments• Invariant (mostly) to
– Scale– Rotation– Reflection
• the seventh has different sign for reflection
• Hu1 = M20 + M02• Hu2 = (N20 – N02)2 + 4M112• Hu3 = (M30 – 3M12)2 + (3M21 – M03)2• …• Hu7 = (3M21 – M03)(M21 + M03)[3(M30 + M12)2 – (M21 + M03)2] – (M30 – 3M12)(M21
+ M03)[3(M30 + M12)2 – (M21 + M03)2]
• If you were to treat this as a Vector7, you could compare it to a database of other Vector7's to do simple shape-matching
2a. Hu Moments, cont.• 0.1730651754• 0.0002714368• 0.0000133760• 0.0000289668• 0.00000000056837• -0.0000004641061• 0.00000000004541
cv.Threshold(blurred, edImg, 128, 255.0, cv.CV_THRESH_BINARY_INV)
cv.Dilate(edImg, edImg, None, 10)cv.Erode(edImg, edImg, None, 10)
edMat = cv.GetMat(edImg)moments = cv.Moments(edMat, 1)hu = cv.GetHuMoments(moments)
• 0.1606958551 (diff2 = 0.000153) • 0.0000105604 (diff2 = 0.00000731)• 0.0000937233 (diff2 = 6.455e-9)• 0.0000001.183 (diff2 = 8.322e-10)• 0.000000000000341 (diff2 =
3.227e-19)• 0.000000000269809 (diff2 = 2.15e-
13)• -0.000000000000197 (diff2 = 2e-21)• Total = 0.0016
• 0.3608422951 (diff2 = 0.0353)• 0.0568850707 (diff2 = 0.003)• 0.0170774107 (diff2 = 0.0003)• 0.0026356011 (diff2 = 6.8e-6)• -0.00001702199712 (diff2 = 2.9e-7)• -0.00062800188755 (diff2 = 3.94e-7)• 0.000004785759613 (diff2 = 2.3e-11)• Total = 0.0385 (~30x "farther")
3. Histograms
• Basically, an n-dimensional plot• Bins (buckets)• Examples:– 1D: In a grayscale image, number of pixels in a bin
(0-5 intensity, 5-10 intensity, …, 250-255 intensity)– 2D: Hue-Saturation graph (x-axis = hue, y-axis =
saturation)
3. Histograms, cont.
3. Histograms (creating, 1D)
• Assumption: img is a grayscale image (1 channel)
# Create the histogramnum_bins = 25hist = cv.CreateHist([num_bins,], cv.CV_HIST_ARRAY, [[0,255],], 1)cv.CalcHist((img,), hist)
# Create the image to visualize it (optional)scale = 15 # Size of each "bar" in the plothist_img = cv.CreateImage((num_bins * scale, 256), 8, 3)(_, max_val, _, _) = cv.GetMinMaxHistValue(hist)cv.Rectangle(hist_img, (0,0), (num_bins * scale - 1, 255), cv.RGB(0,255,0), cv.CV_FILLED)for b in range(num_bins):
val = 255.0 * cv.QueryHistValue_1D(hist, b) / max_valcv.Rectangle(hist_img, (b*scale, 255),
((b+1)*scale-1, 255-val), cv.RGB(255,0,0), cv.CV_FILLED)
3. Histograms (creating, 2D)• Assumption: – img is an RGB image– mask is a gray image (black = don't count, white = do)
# Convert from RGB to HSV (storing the hue and sat in different (grayscale) imageshsv = cv.CreateImage(cv.GetSize(img), 8, 3)cv.CvtColor(img, hsv, cv.CV_BGR2HSV)
h_plane = cv.CreateMat(img.height, img.width, cv.CV_8UC1)s_plane = cv.CreateMat(img.height, img.width, cv.CV_8UC1)cv.Split(hsv, h_plane, s_plane, None, None)
# Create the histogramhue_bins = 30sat_bins = 32hist_size = [h_bins, s_bins]h_ranges = [0, 180] # Red is ~0 degreess_ranges = [15, 255] # 0=graysacle, 255=pure colorranges = [h_ranges, s_ranges]hist = cv.CreateHist(hist_size, cv.CV_HIST_ARRAY, ranges, 1)cv.CalcHist(planes, hist, 0, mask)
3a. Back-projection• A histogram tells us how often a value (color)
appears in the image the histogram was built from.– "Fuller" bin = more prevalent– "Emptier" bin = less prevalent
• Back-Projection example:– Create an image with flesh tones.– Create a histogram from it
• Hue-Saturation generally ignores race
– Now, given a new color, we can determine how likely it is to be flesh-toned by looking up that spot in the histogram.• If it's a full bin, it's probably a flesh-tone• If it's an empty bin, it's probably not a flesh-tone
3a. Back-projection
3a. Back-projection
# Convert img (RGB) to HS(V)hsv = cv.CreateImage(cv.GetSize(img), 8, 3)cv.CvtColor(img, hsv, cv.CV_BGR2HSV)
# Get images for the hue and sat "planes" of hsvh_plane = cv.CreateMat(i.height, i.width, cv.CV_8UC1)s_plane = cv.CreateMat(i.height, i.width, cv.CV_8UC1)cv.Split(hsv, h_plane, s_plane, None, None)h_plane = cv.GetImage(h_plane) # CvMat => CvImgs_plane = cv.GetImage(s_plane)hsPlanes = [h_plane, s_plane]
# Do the back-projection. Note: hist was as created on a# previous slidebackPropImg = cv.CreateImage((img.width, img.height), 8, 1)cv.CalcBackProject(hsPlanes, backPropImg, hist)
3a. Patch-based Back-projection
• Similar to regular back-projection, but uses a (w x h) (I used 5 for each) "window"
• The window "slides" over each pixel in the image. Let's say it's at pixel (i, j)– Look in the 11x11 neigborhood of (i, j)– Calculate a new histogram– Compare it to a reference histogram. The degree
of similarity is the value to set (i, j) to in the "probability" image
3a. Patch-based Back-projection
# hist, and hsPlanes are computed as before.
patchW = patchH = 5backPropImg = cv.CreateImage((img.width - patchW + 1, img.height - patchH + 1), cv.IPL_DEPTH_32F, 1)cv.CalcBackProjectPatch(hsPlanes, backPropImg, (patchW, patchH), hist, cv.CV_COMP_CORREL, 1)
3a. Patch-based Back-projection
4. Background-Subtraction
• Goal:– Mark “non-background” pixels in a mask (1=non-
background, 0=background)– Analyze the shape of the non-background pixels.
4. Background Subtraction
• Naïve Approach:cv.AbsDiff(curFrame, bgOnlyFrame, diffImg)# Maybe a threshold now, erosion, dilate, etc.
• Problems:– A lot of frame-to-frame “noise”– Webcam auto-adjusting intensity (@#$! Logitechs)– Clouds passing by, trees waving in wind, …
• A better approach…[see example04]