monitoring%illegal%fishing%through%image%classificaon%cs229.stanford.edu/proj2016/poster/geislerfrostmahajan-monitoring... ·...

1
Modeling SIFT Fisher Encoding (FE) (GMM Model on SIFT to construct a visual dic:onary) Linear SVM Bag of Visual Words (KMeans on SIFT) Linear SVM CIFAR10 CNN VGG16 Random Forest (15 trees) SVM (RBF, linear) Monitoring Illegal Fishing through Image Classifica:on Jason Frost [email protected] Taylor Geisler [email protected] Antariksh Mahajan [email protected] Problem Nature Conservancy, an NGO, monitors illegal fishing by installing cameras on fishing boats. Image classification using computers can help them to speed up and automate the process. We built a multiclass classification system for 8 different species. Feature Extraction 1. Scale-Invariant Feature Transform (SIFT) Identifies keypoints (shapes with directional orientations) Invariant to rotations and translations Dense SIFT (dSIFT) runs SIFT on a grid 2. VGG-16 Convolutional Neural Network 16-layer CNN, has won image recognition competitions by ImageNet Pre-trained weights, outputs 1000 features 3. CIFAR-10 Convolutional Neural Network 10 convolutional layers Trained our own weights Data Original dataset: 3777 images, 8 labels Data preprocessing and augmentation: Feature-wise centering set input mean to 0 Random rotations, shifts, flips increased dataset size to >20,000 images Rescaled to 224x224 for easier processing Challenge: fish are tiny part of most images Label: ALB BET Baseline models Naïve Bayes (by occurrences of colors): 8% 1-nearest neighbor: 97%! But this is because similar boats catch similar fish. Need more complex model for greater generalizability. Training image with SIFT keypoints Image 2(conv64), maxpool, ReLU 2(conv128), maxpool, ReLU 3(conv256), maxpool, ReLU 6(conv512), maxpool, ReLU 2(FC4096), 1(FC1000) Results Extrac’on Method Model Training Acc. Test Acc. VGG16 RF 99.5% 83.4% SVMRBF 44.3% 45.6% SVMLinear 54.1% 53.3% CIFAR10 95.6% 77.3% Conclusions VGG-16 to RF performed best However, difficult to extract most important features from VGG-16 Better and more generalizable results can be obtained by extracting fish from images, possibly using another CNN

Upload: others

Post on 09-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monitoring%Illegal%Fishing%through%Image%Classificaon%cs229.stanford.edu/proj2016/poster/GeislerFrostMahajan-Monitoring... · Nature Conservancy, an NGO, monitors illegal fishing

Modeling

SIFT  

Fisher  Encoding  (FE)  (GMM  Model  on  SIFT  to  construct  

a  visual  dic:onary)  

Linear  SVM  

Bag  of  Visual  Words  (K-­‐Means  on  SIFT)  

Linear  SVM  

CIFA

R-­‐10  

CNN  

VGG-­‐16   Random  Forest  (15  trees)  

SVM  (RBF,  linear)  

Monitoring  Illegal  Fishing  through  Image  Classifica:on  Jason  Frost  

[email protected]  Taylor  Geisler  

[email protected]  Antariksh  Mahajan  [email protected]  

ProblemNature Conservancy, an NGO, monitors illegal fishing by installing cameras on fishing boats. Image classification using computers can help them to speed up and automate the process. We built a multiclass classification system for 8 different species.

Feature Extraction

1.  Scale-Invariant Feature Transform (SIFT)•  Identifies keypoints (shapes with

directional orientations)•  Invariant to rotations and translations•  Dense SIFT (dSIFT) runs SIFT on a grid

2.  VGG-16 Convolutional Neural Network•  16-layer CNN, has won image recognition

competitions by ImageNet•  Pre-trained weights, outputs 1000 features

3.  CIFAR-10 Convolutional Neural Network•  10 convolutional layers•  Trained our own weights

Data•  Original dataset: 3777 images, 8 labels•  Data preprocessing and augmentation: •  Feature-wise centering set input mean to 0•  Random rotations, shifts, flips increased

dataset size to >20,000 images•  Rescaled to 224x224 for easier processing

•  Challenge: fish are tiny part of most images

Label: ALB BET

Baseline models•  Naïve Bayes (by occurrences of colors): 8%•  1-nearest neighbor: 97%!•  But this is because similar boats catch

similar fish. Need more complex model for greater generalizability.

Training image with SIFT keypoints

Image  2(conv-­‐64),  maxpool,  ReLU  

2(conv-­‐128),  maxpool,  ReLU  

3(conv-­‐256),  maxpool,  ReLU  

6(conv-­‐512),  maxpool,  ReLU  

2(FC-­‐4096),  1(FC-­‐1000)  

ResultsExtrac'on  Method   Model   Training  Acc.   Test  Acc.  

VGG-­‐16  RF   99.5%   83.4%  

SVM-­‐RBF   44.3%   45.6%  SVM-­‐Linear   54.1%   53.3%  

-­‐   CIFAR-­‐10   95.6%   77.3%  

Conclusions•  VGG-16 to RF performed best•  However, difficult to extract most important

features from VGG-16•  Better and more generalizable results can

be obtained by extracting fish from images, possibly using another CNN