crimes data in bogota - repository.ucatolica.edu.co · presentación de powerpoint author: sala1...

• Problem Statement

• Justification

• Objectives

• Methodology

• Results

• Discussion

• Conclusion

• Future work

• References

• Crimes Data in Bogota

17%

Theft report first half of 2019

Figure [2]. Increase in theft report. Figure [1]. Insecurity.

[1]. Unión digital. Public insecurity. [online]. Available in: https://launiondigital.com.ar/noticias/195103-inseguridad-publica-recrudecen-robos-armas-y-arrebatos-violentos

[2]. La FM. Bogota, increasingly insecure?. [online]. Available in: https://www.lafm.com.co/bogota/revelan-preocupantes-cifras-de-hurtos-en-bogota

[3]. Policía nacional de Colombia. Confiscated weapons in 2018. [online]. Available in: https://www.policia.gov.co/contenido/incautacion-armas-fuego-2018-1

Figure [3]. Report confiscated weapons.

5.572

142

4.306

7.214

Shotgun

Rifle

Pistol

Revolver

Confiscated Weapons in 2018

• Incursions

Figure [4]. Police Monitoring Centers

Figure [6]. Machine Learning algorithms

Figure [5]. Problems of monitoring

[4]. Ministry of interior. Citizen security projects implemented between the National Government. [online]. Available in: https://www.mininterior.gov.co/sala-de-

prensa/noticias/desde-que-se-instalaron-las-camaras-criminalidad-disminuyo-en-un-75-por-ciento-en-itagui

[5]. Flaticon .Symbols. [online]. Available in: https://www.flaticon.com/

[7]. IBM. Deep learning performance breakthrough. [online]. Available in: https://www.ibm.com/blogs/systems/deep-learning-performance-breakthrough/

Figure [7]. Performance in deep learning

According to the problem statement, the research question is:

• How can support the process of monitoring and detection of

handguns by implementing an automatic algorithm of recognition

through videos using deep learning?

To monitoring and get excellent results is required the availability and

efficiency. Deep learning is a subset of Artificial Intelligence and it has

algorithms that in the last years got the better performance in object

detection tasks and it has a greater productivity and efficiency by

improving response times.8

Figure [8]. Machine Learning Vs. Deep Learning

[8]. Medium. Why Deep Learning over Traditional Machine Learning? [online]. Available in: https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-

machine-learning-1b6a99177063

General objective

To implement a method, using deep learning algorithms, for automatic detection of

handguns in videos.

Specific objectives

- To build a dataset of videos that include handguns, collecting videos from online

platforms (YouTube), to model the algorithm for automatic detection of handguns

in videos.

- To design a methodological strategy on the basis of process used on the state of

the art for the automatic detection of handguns in video using deep learning.

- To implement an algorithm of convolutional neural networks (Faster-RCNN),

taking an existing architecture and aligning it to the experiment model for the

automatic detection of handguns in video.

- To evaluate performance of detection algorithm, through a set of metrics, to

verify the efficiency.

Feature map

Simple sampling

Frame

and

XML

Video Frame

Resnet101

70 30

75 25

20 80

- Average

Precision

- Average

Recall

Resizing frame Extracting frames

Set of Videos Labeled Frames

XML of

frames Inception v2

Figure [9]. Methodology Workflow

• Collecting videos from online platforms

Figure [10]. Video of pistol

Videos with Revolvers 24

Videos with Pistols 24

Videos with Revolvers and Pistols 7

Total 55

Rate of

Extraction

Total Number

of Frames Selected

Frames

1 Second 8.130 2.590

0,5 Second 16.271 9.524

0,02 Second 244.045 -

• Selection and Extraction of Frames

Figure [11]. Frames selection

Table [1]. Set of Videos

Table [2]. Frames Extraction

Figure [14]. Labeled Frames and bounding box coordinates

• Resizing Frames

600px

800px

600px

800px

Figure [12]. Original Frames

Figure [13]. Resized Frames

• Generate coordinates in XML format

Table [3]. Simple random sampling

Training Testing

70% 30%

Training Testing

75% 25%

Training Testing

80% 20%

Simple Random

Sampling distribution

Dataset

Training

Testing

Set of frames

XML’s to TFRecord

Set of frames

XML’s to TFRecord

Dataset Type 70 - 30 75 - 25 80 - 20

1

Second

Training 1812 frames 1942 frames 2072 frames

Testing 778 frames 648 frames 518 frames

0,5

Second

Training 6666 frames 7143 frames 7619 frames

Testing 2858 frames 2381 frames 1905 frames

Figure [15]. Sampling Workflow

• Configuring training

Resnet101

Inception v2 faster_rcnn_inception_v2.config

faster_rcnn_resnet101.config

Edit configuration files

item { id: 1 name: ‘Pistola’ } item { id: 2 name: ‘Revolver’ }

Label map

Training path

Testing path

Train whit

Figure [16]. Training Workflow

CPU

OR

GPU

Input Files

- Batch size

- Epochs

- Steps

- Batch size

- Epochs

- Steps

• Faster-RCNN training workflow

600px

800px

1 0 1 0

1 0 0 0

1 0 1 0

1 0 1 1

RGB 𝐺 𝑚, 𝑛 = 𝑓 ∗ ℎ 𝑚, 𝑛 = ℎ 𝑗, 𝑘 𝑓[𝑚 − 𝑗, 𝑛 − 𝑘]

𝑘𝑗

0 1 0 1 0

1 1 0 0 0

0 1 0 1 0

0 1 0 1 1

1 0 1 0 1

-1 0

1 0 ꞏ =

Input image

Filter/Kernel

Feature map

Input Layer Convolutional layers

Rectified Linear Unit (ReLU) Layer

Feature map

f(x) = max(0,x) = 0 𝑓𝑜𝑟 𝑥 < 0𝑥 𝑓𝑜𝑟 𝑥 ≥ 0

Feature map

1 0 1 -1

-1 0 0 0

1 0 -1 0

1 0 1 1

1 0 1 0

0 0 0 0

1 0 0 0

1 0 1 1

1 0 1 -1

-1 0 0 0

1 0 -1 0

1 0 1 1

1 0 1 0

0 0 0 0

1 0 0 0

1 0 1 1

Feature map

1 1

1 1

Feature

Map

Max Pooling Layer

Max Pooling 2x2

Figure [17]. Faster-RCNN Training Workflow

• Region Proposal Network (RPN)

Figure [18]. Region Proposal Network Architecture

𝐿 𝑝𝑖 , 𝑡𝑖 =1

𝑁𝑐𝑙𝑠 𝐿𝑐𝑙𝑠 𝑝𝑖 , 𝑝𝑖

∗ + 𝜆1

𝑁𝑟𝑒𝑔 𝑝𝑖

∗𝐿𝑟𝑒𝑔 𝑡𝑖 , 𝑡𝑖∗

𝑖𝑖

• Loss Function

ߪ 𝑧 𝑗 =𝑒𝑗𝑧

𝑒𝑘𝑧𝐾

𝑘=1

• Softmax

function

• Regions of Interest

• Regression

Values

5 8 3 0

1 6 4 9

5 0 1 3

1 4 3 1

Feature map

1 2 4 2 4

Matrix Nx5 of regions

Index Coordinates

Example of region proposals

• RoI Pooling

Pixel X, Pixel Y,

Height and Width

Figure [19]. Classification and Detection

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

True Positive + False Positive

𝑅𝑒𝑐𝑎𝑙𝑙 =𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

True Positive + False N𝑒𝑔𝑎𝑡𝑖𝑣𝑒

Figure [20]. Intersection over union

Table [5]. Confusion matrix

Figure [22]. Precision

Figure [23]. Recall

Figure [21]. Intersection over union values

[21]. Gitbooks. Object Localization and Detection [online]. Available in: https://leonardoaraujosantos.gitbooks.io/artificial-

inteligence/content/object_localization_and_detection.html

AP / AR

• Dataset

Dataset of frames and XML

Dataset of videos

Dataset Number of

frames

Videos of Revolvers 24

Videos of Pistols 24

Videos with Revolvers and Pistols 7

Dataset

Total

Number of

Frames

Selected

Frames

Number of

XML’s

0,5 Second 16,271 9,524 9,524

1 Second 8.130 2.590 2.590

Table [4]. Collected videos

Table [5]. Selected frames and XML’s

Figure [24]. Dataset results

Frame

and

XML

Simple sampling

Resizing frame

Extracting frames

600px

800px

Set of Videos Labeled Frames Video

Frame

FASTER RCNN

ResNet101

Figure [25]. Methodology results

Average

Precision

Average

Recall

0,648261 0,673748

Table [6]. Precision and Recall of the best model.

Dataset Faster RCNN Average

Precision

Precision

IoU=0.5

Precision

IoU=0.75

Average

Recall

Recall

AR@10

Recall

AR@100

70 % - 30 %

0,5 seg Inception v2 0,615094 0,970359 0,692229 0,644959 0,698864 0,701883

Resnet 101 0,627137 0,974953 0,712473 0,648633 0,705599 0,706734

1 seg Inception v2 0,612784 0,974205 0,699056 0,638394 0,704711 0,707376

Resnet 101 0,646496 0,983529 0,771682 0,652957 0,713456 0,714902

75 % - 25 %

0,5 seg Inception v2 0,618180 0,970136 0,693906 0,646152 0,700650 0,702060

Resnet 101 0,631681 0,974111 0,718193 0,653793 0,705945 0,707592

1 seg Inception v2 0,630636 0,980002 0,745703 0,646902 0,714352 0,717672

Resnet 101 0,639993 0,981227 0,762834 0,654614 0,720448 0,721750

80 % - 20 %

0,5 seg Inception v2 0,599535 0,965547 0,684900 0,624573 0,681491 0,683900

Resnet 101 0,633470 0,974359 0,729347 0,648402 0,705234 0,706176

1 seg Inception v2 0,622743 0,981432 0,701254 0,652287 0,708068 0,709454

Resnet 101 0,648261 0,987471 0,780117 0,673748 0,724240 0,724409

• Table 6 shows the results of the performance metrics applied to the

handguns detection models trained in this project with three sampling of

80%, 75% and 70% for training and 2 datasets of 1 second and 0,5 second.

• The highest percentage of Precision and Recall is from the Faster RCNN

Resnet 101 network with the dataset of 1 second, these values are 0.6482

and 0.6737 respectively.

• Precision value is better at the values of IoU = 0.5 because it considers a

margin of error of 50% and obtains a Precision value of 0.98 but when

IoU = 0.75 or greater the margin of error is smaller and therefore it’s more

difficult to detect a handgun.

• This model also tested separately with videos. These weren’t within dataset

and there were a lot of false positives with objects similar to a handgun.

• Models implemented in this automatic handgun detection experiment, Faster RCNN

Resnet 101 was the one with the highest Precision and highest Recall because this

model has a robust architecture.

• Datasets created, the one with the best values in the metrics was the dataset of 1-

second frames with a sampling of 80% and 20%, having a greater variety of frames

collected from movies and YouTube.

• The methodological strategy provided greater clarity of the methods and phases

that make up a deep learning algorithm, having as architecture faster RCNN that

supports the automatic detection of revolver and pistol experiment.

• With the development of the algorithm it is evident that it is necessary to find a

greater variety of videos where handguns have more angles and small area

weapons.

• Based on average Recall and average precision, algorithm has an acceptable rate

for the detection of handguns, but sometimes identifies many false positives (predicts

that an object is a weapon when it really isn't).

• To implement a method of automatic detection in real time and implement

it in areas of insecurity in Bogotá, taking as dataset videos of the security

cameras.

• To build a larger dataset, to have a large number of handgun frames and

these must be at different angles to improve the performance of the

algorithm, you must also be balanced in classes and have handguns with

small, medium and large areas.

• To work a GPU with more capacity to optimize times, generally training

times are several hours and also this can help to reduce the test times.

• To implement a Mask RCNN for detection of handguns in video and to

compare performance metrics values of Faster RCNN implemented in this

experiment.

Figure [26]. Frames test 1

Figure [1]. Unión digital. Public insecurity. [online]. Available in:

https://launiondigital.com.ar/noticias/195103-inseguridad-publica-recrudecen-robos-armas-y-arrebatos-

violentos

Figure [2]. La FM. Bogota, increasingly insecure?. [online]. Available in:

https://www.lafm.com.co/bogota/revelan-preocupantes-cifras-de-hurtos-en-bogota

Figure [3]. Policía nacional de Colombia. Confiscated weapons in 2018. [online]. Available in:

https://www.policia.gov.co/contenido/incautacion-armas-fuego-2018-1

Figure [4]. Ministry of interior. Citizen security projects implemented between the National Government.

[online]. Available in: https://www.mininterior.gov.co/sala-de-prensa/noticias/desde-que-se-instalaron-las-

camaras-criminalidad-disminuyo-en-un-75-por-ciento-en-itagui

Figure [5]. Flaticon. Symbols. [online]. Available in: https://www.flaticon.com/

Figure [7]. IBM. Deep learning performance breakthrough. [online]. Available in:

https://www.ibm.com/blogs/systems/deep-learning-performance-breakthrough/

Figure [8]. Medium. Why Deep Learning over Traditional Machine Learning? [online]. Available in:

https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-

1b6a99177063

crimes data in bogota - repository.ucatolica.edu.co · presentación de powerpoint author: sala1...

Documents