Transcript
Page 1: Monitoring%Illegal%Fishing%through%Image%Classificaon%cs229.stanford.edu/proj2016/poster/GeislerFrostMahajan-Monitoring... · Nature Conservancy, an NGO, monitors illegal fishing

Modeling

SIFT  

Fisher  Encoding  (FE)  (GMM  Model  on  SIFT  to  construct  

a  visual  dic:onary)  

Linear  SVM  

Bag  of  Visual  Words  (K-­‐Means  on  SIFT)  

Linear  SVM  

CIFA

R-­‐10  

CNN  

VGG-­‐16   Random  Forest  (15  trees)  

SVM  (RBF,  linear)  

Monitoring  Illegal  Fishing  through  Image  Classifica:on  Jason  Frost  

[email protected]  Taylor  Geisler  

[email protected]  Antariksh  Mahajan  [email protected]  

ProblemNature Conservancy, an NGO, monitors illegal fishing by installing cameras on fishing boats. Image classification using computers can help them to speed up and automate the process. We built a multiclass classification system for 8 different species.

Feature Extraction

1.  Scale-Invariant Feature Transform (SIFT)•  Identifies keypoints (shapes with

directional orientations)•  Invariant to rotations and translations•  Dense SIFT (dSIFT) runs SIFT on a grid

2.  VGG-16 Convolutional Neural Network•  16-layer CNN, has won image recognition

competitions by ImageNet•  Pre-trained weights, outputs 1000 features

3.  CIFAR-10 Convolutional Neural Network•  10 convolutional layers•  Trained our own weights

Data•  Original dataset: 3777 images, 8 labels•  Data preprocessing and augmentation: •  Feature-wise centering set input mean to 0•  Random rotations, shifts, flips increased

dataset size to >20,000 images•  Rescaled to 224x224 for easier processing

•  Challenge: fish are tiny part of most images

Label: ALB BET

Baseline models•  Naïve Bayes (by occurrences of colors): 8%•  1-nearest neighbor: 97%!•  But this is because similar boats catch

similar fish. Need more complex model for greater generalizability.

Training image with SIFT keypoints

Image  2(conv-­‐64),  maxpool,  ReLU  

2(conv-­‐128),  maxpool,  ReLU  

3(conv-­‐256),  maxpool,  ReLU  

6(conv-­‐512),  maxpool,  ReLU  

2(FC-­‐4096),  1(FC-­‐1000)  

ResultsExtrac'on  Method   Model   Training  Acc.   Test  Acc.  

VGG-­‐16  RF   99.5%   83.4%  

SVM-­‐RBF   44.3%   45.6%  SVM-­‐Linear   54.1%   53.3%  

-­‐   CIFAR-­‐10   95.6%   77.3%  

Conclusions•  VGG-16 to RF performed best•  However, difficult to extract most important

features from VGG-16•  Better and more generalizable results can

be obtained by extracting fish from images, possibly using another CNN

Top Related