crimes data in bogota - repository.ucatolica.edu.co · presentación de powerpoint author: sala1...
TRANSCRIPT
• Problem Statement
• Justification
• Objectives
• Methodology
• Results
• Discussion
• Conclusion
• Future work
• References
• Crimes Data in Bogota
17%
Theft report first half of 2019
Figure [2]. Increase in theft report. Figure [1]. Insecurity.
[1]. Unión digital. Public insecurity. [online]. Available in: https://launiondigital.com.ar/noticias/195103-inseguridad-publica-recrudecen-robos-armas-y-arrebatos-violentos
[2]. La FM. Bogota, increasingly insecure?. [online]. Available in: https://www.lafm.com.co/bogota/revelan-preocupantes-cifras-de-hurtos-en-bogota
[3]. Policía nacional de Colombia. Confiscated weapons in 2018. [online]. Available in: https://www.policia.gov.co/contenido/incautacion-armas-fuego-2018-1
Figure [3]. Report confiscated weapons.
5.572
142
4.306
7.214
Shotgun
Rifle
Pistol
Revolver
Confiscated Weapons in 2018
• Incursions
Figure [4]. Police Monitoring Centers
Figure [6]. Machine Learning algorithms
Figure [5]. Problems of monitoring
[4]. Ministry of interior. Citizen security projects implemented between the National Government. [online]. Available in: https://www.mininterior.gov.co/sala-de-
prensa/noticias/desde-que-se-instalaron-las-camaras-criminalidad-disminuyo-en-un-75-por-ciento-en-itagui
[5]. Flaticon .Symbols. [online]. Available in: https://www.flaticon.com/
[7]. IBM. Deep learning performance breakthrough. [online]. Available in: https://www.ibm.com/blogs/systems/deep-learning-performance-breakthrough/
Figure [7]. Performance in deep learning
According to the problem statement, the research question is:
• How can support the process of monitoring and detection of
handguns by implementing an automatic algorithm of recognition
through videos using deep learning?
To monitoring and get excellent results is required the availability and
efficiency. Deep learning is a subset of Artificial Intelligence and it has
algorithms that in the last years got the better performance in object
detection tasks and it has a greater productivity and efficiency by
improving response times.8
Figure [8]. Machine Learning Vs. Deep Learning
[8]. Medium. Why Deep Learning over Traditional Machine Learning? [online]. Available in: https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-
machine-learning-1b6a99177063
General objective
To implement a method, using deep learning algorithms, for automatic detection of
handguns in videos.
Specific objectives
- To build a dataset of videos that include handguns, collecting videos from online
platforms (YouTube), to model the algorithm for automatic detection of handguns
in videos.
- To design a methodological strategy on the basis of process used on the state of
the art for the automatic detection of handguns in video using deep learning.
- To implement an algorithm of convolutional neural networks (Faster-RCNN),
taking an existing architecture and aligning it to the experiment model for the
automatic detection of handguns in video.
- To evaluate performance of detection algorithm, through a set of metrics, to
verify the efficiency.
Feature map
Simple sampling
Frame
and
XML
Video Frame
Resnet101
70 30
75 25
20 80
- Average
Precision
- Average
Recall
Resizing frame Extracting frames
Set of Videos Labeled Frames
XML of
frames Inception v2
Figure [9]. Methodology Workflow
• Collecting videos from online platforms
Figure [10]. Video of pistol
Videos with Revolvers 24
Videos with Pistols 24
Videos with Revolvers and Pistols 7
Total 55
Rate of
Extraction
Total Number
of Frames Selected
Frames
1 Second 8.130 2.590
0,5 Second 16.271 9.524
0,02 Second 244.045 -
• Selection and Extraction of Frames
Figure [11]. Frames selection
Table [1]. Set of Videos
Table [2]. Frames Extraction
Figure [14]. Labeled Frames and bounding box coordinates
• Resizing Frames
600px
800px
600px
800px
Figure [12]. Original Frames
Figure [13]. Resized Frames
• Generate coordinates in XML format
Table [3]. Simple random sampling
Training Testing
70% 30%
Training Testing
75% 25%
Training Testing
80% 20%
Simple Random
Sampling distribution
Dataset
Training
Testing
Set of frames
XML’s to TFRecord
Set of frames
XML’s to TFRecord
Dataset Type 70 - 30 75 - 25 80 - 20
1
Second
Training 1812 frames 1942 frames 2072 frames
Testing 778 frames 648 frames 518 frames
0,5
Second
Training 6666 frames 7143 frames 7619 frames
Testing 2858 frames 2381 frames 1905 frames
Figure [15]. Sampling Workflow
• Configuring training
Resnet101
Inception v2 faster_rcnn_inception_v2.config
faster_rcnn_resnet101.config
Edit configuration files
item { id: 1 name: ‘Pistola’ } item { id: 2 name: ‘Revolver’ }
Label map
Training path
Testing path
Train whit
Figure [16]. Training Workflow
CPU
OR
GPU
Input Files
- Batch size
- Epochs
- Steps
- Batch size
- Epochs
- Steps
• Faster-RCNN training workflow
600px
800px
1 0 1 0
1 0 0 0
1 0 1 0
1 0 1 1
RGB 𝐺 𝑚, 𝑛 = 𝑓 ∗ ℎ 𝑚, 𝑛 = ℎ 𝑗, 𝑘 𝑓[𝑚 − 𝑗, 𝑛 − 𝑘]
𝑘𝑗
0 1 0 1 0
1 1 0 0 0
0 1 0 1 0
0 1 0 1 1
1 0 1 0 1
-1 0
1 0 ꞏ =
Input image
Filter/Kernel
Feature map
Input Layer Convolutional layers
Rectified Linear Unit (ReLU) Layer
Feature map
f(x) = max(0,x) = 0 𝑓𝑜𝑟 𝑥 < 0𝑥 𝑓𝑜𝑟 𝑥 ≥ 0
Feature map
1 0 1 -1
-1 0 0 0
1 0 -1 0
1 0 1 1
1 0 1 0
0 0 0 0
1 0 0 0
1 0 1 1
1 0 1 -1
-1 0 0 0
1 0 -1 0
1 0 1 1
1 0 1 0
0 0 0 0
1 0 0 0
1 0 1 1
Feature map
1 1
1 1
Feature
Map
Max Pooling Layer
Max Pooling 2x2
Figure [17]. Faster-RCNN Training Workflow
• Region Proposal Network (RPN)
Figure [18]. Region Proposal Network Architecture
𝐿 𝑝𝑖 , 𝑡𝑖 =1
𝑁𝑐𝑙𝑠 𝐿𝑐𝑙𝑠 𝑝𝑖 , 𝑝𝑖
∗ + 𝜆1
𝑁𝑟𝑒𝑔 𝑝𝑖
∗𝐿𝑟𝑒𝑔 𝑡𝑖 , 𝑡𝑖∗
𝑖𝑖
• Loss Function
ߪ 𝑧 𝑗 =𝑒𝑗𝑧
𝑒𝑘𝑧𝐾
𝑘=1
• Softmax
function
• Regions of Interest
• Regression
Values
5 8 3 0
1 6 4 9
5 0 1 3
1 4 3 1
Feature map
1 2 4 2 4
Matrix Nx5 of regions
Index Coordinates
Example of region proposals
• RoI Pooling
Pixel X, Pixel Y,
Height and Width
Figure [19]. Classification and Detection
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
True Positive + False Positive
𝑅𝑒𝑐𝑎𝑙𝑙 =𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
True Positive + False N𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Figure [20]. Intersection over union
Table [5]. Confusion matrix
Figure [22]. Precision
Figure [23]. Recall
Figure [21]. Intersection over union values
[21]. Gitbooks. Object Localization and Detection [online]. Available in: https://leonardoaraujosantos.gitbooks.io/artificial-
inteligence/content/object_localization_and_detection.html
AP / AR
• Dataset
Dataset of frames and XML
Dataset of videos
Dataset Number of
frames
Videos of Revolvers 24
Videos of Pistols 24
Videos with Revolvers and Pistols 7
Dataset
Total
Number of
Frames
Selected
Frames
Number of
XML’s
0,5 Second 16,271 9,524 9,524
1 Second 8.130 2.590 2.590
Table [4]. Collected videos
Table [5]. Selected frames and XML’s
Figure [24]. Dataset results
Frame
and
XML
Simple sampling
Resizing frame
Extracting frames
600px
800px
Set of Videos Labeled Frames Video
Frame
FASTER RCNN
ResNet101
Figure [25]. Methodology results
Average
Precision
Average
Recall
0,648261 0,673748
Table [6]. Precision and Recall of the best model.
Dataset Faster RCNN Average
Precision
Precision
IoU=0.5
Precision
IoU=0.75
Average
Recall
Recall
AR@10
Recall
AR@100
70 % - 30 %
0,5 seg Inception v2 0,615094 0,970359 0,692229 0,644959 0,698864 0,701883
Resnet 101 0,627137 0,974953 0,712473 0,648633 0,705599 0,706734
1 seg Inception v2 0,612784 0,974205 0,699056 0,638394 0,704711 0,707376
Resnet 101 0,646496 0,983529 0,771682 0,652957 0,713456 0,714902
75 % - 25 %
0,5 seg Inception v2 0,618180 0,970136 0,693906 0,646152 0,700650 0,702060
Resnet 101 0,631681 0,974111 0,718193 0,653793 0,705945 0,707592
1 seg Inception v2 0,630636 0,980002 0,745703 0,646902 0,714352 0,717672
Resnet 101 0,639993 0,981227 0,762834 0,654614 0,720448 0,721750
80 % - 20 %
0,5 seg Inception v2 0,599535 0,965547 0,684900 0,624573 0,681491 0,683900
Resnet 101 0,633470 0,974359 0,729347 0,648402 0,705234 0,706176
1 seg Inception v2 0,622743 0,981432 0,701254 0,652287 0,708068 0,709454
Resnet 101 0,648261 0,987471 0,780117 0,673748 0,724240 0,724409
• Table 6 shows the results of the performance metrics applied to the
handguns detection models trained in this project with three sampling of
80%, 75% and 70% for training and 2 datasets of 1 second and 0,5 second.
• The highest percentage of Precision and Recall is from the Faster RCNN
Resnet 101 network with the dataset of 1 second, these values are 0.6482
and 0.6737 respectively.
• Precision value is better at the values of IoU = 0.5 because it considers a
margin of error of 50% and obtains a Precision value of 0.98 but when
IoU = 0.75 or greater the margin of error is smaller and therefore it’s more
difficult to detect a handgun.
• This model also tested separately with videos. These weren’t within dataset
and there were a lot of false positives with objects similar to a handgun.
• Models implemented in this automatic handgun detection experiment, Faster RCNN
Resnet 101 was the one with the highest Precision and highest Recall because this
model has a robust architecture.
• Datasets created, the one with the best values in the metrics was the dataset of 1-
second frames with a sampling of 80% and 20%, having a greater variety of frames
collected from movies and YouTube.
• The methodological strategy provided greater clarity of the methods and phases
that make up a deep learning algorithm, having as architecture faster RCNN that
supports the automatic detection of revolver and pistol experiment.
• With the development of the algorithm it is evident that it is necessary to find a
greater variety of videos where handguns have more angles and small area
weapons.
• Based on average Recall and average precision, algorithm has an acceptable rate
for the detection of handguns, but sometimes identifies many false positives (predicts
that an object is a weapon when it really isn't).
• To implement a method of automatic detection in real time and implement
it in areas of insecurity in Bogotá, taking as dataset videos of the security
cameras.
• To build a larger dataset, to have a large number of handgun frames and
these must be at different angles to improve the performance of the
algorithm, you must also be balanced in classes and have handguns with
small, medium and large areas.
• To work a GPU with more capacity to optimize times, generally training
times are several hours and also this can help to reduce the test times.
• To implement a Mask RCNN for detection of handguns in video and to
compare performance metrics values of Faster RCNN implemented in this
experiment.
Figure [26]. Frames test 1
Figure [1]. Unión digital. Public insecurity. [online]. Available in:
https://launiondigital.com.ar/noticias/195103-inseguridad-publica-recrudecen-robos-armas-y-arrebatos-
violentos
Figure [2]. La FM. Bogota, increasingly insecure?. [online]. Available in:
https://www.lafm.com.co/bogota/revelan-preocupantes-cifras-de-hurtos-en-bogota
Figure [3]. Policía nacional de Colombia. Confiscated weapons in 2018. [online]. Available in:
https://www.policia.gov.co/contenido/incautacion-armas-fuego-2018-1
Figure [4]. Ministry of interior. Citizen security projects implemented between the National Government.
[online]. Available in: https://www.mininterior.gov.co/sala-de-prensa/noticias/desde-que-se-instalaron-las-
camaras-criminalidad-disminuyo-en-un-75-por-ciento-en-itagui
Figure [5]. Flaticon. Symbols. [online]. Available in: https://www.flaticon.com/
Figure [7]. IBM. Deep learning performance breakthrough. [online]. Available in:
https://www.ibm.com/blogs/systems/deep-learning-performance-breakthrough/
Figure [8]. Medium. Why Deep Learning over Traditional Machine Learning? [online]. Available in:
https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-
1b6a99177063