real time driver drowsiness detection with...

Real Time Driver Drowsiness Detection with Histogram of Oriented Gradients

Hafizh Budiman - 13516137

Program Studi Teknik Informatika, Sekolah Teknik Elektro dan Informatika. Institut Teknologi Bandung, Jl. Ganesha 10 Bandung 40132, Indonesia. E-mail: [email protected]

Abstract. Risk, danger, and sometimes tragic consequences from driving while being drowsy is currently reaching an alarming level. In this paper a solution will be proposed to detect car driver’s drowsiness level by leveraging various image processing techniques. Facial landmarks will be detected to obtain the eye location and shape from video stream by using histogram of oriented gradients technique. By utilizing the detection result, the system will calculate the driver’s drowsiness level by utilizing its eye aspect ratio and create the eyes convex hull. If the ratio is below the predetermined threshold, the system will sound an alarm to raise the driver’s alertness.

1. Introduction One of the major causes behind the casualties of people in traffic accidents is driver’s drowsiness. After driving for a long time, drivers tend to be unalarmed and easily get tired resulting into driver fatigue and drowsiness. Research studies have stated that majority of accidents occur due to driver fatigue and drowsiness. In this work, developing technology for detecting driver fatigue to reduce accident is the main challenge. For instance, many vehicles are driven mostly at night such as loaded trucks. The drivers of such vehicles who drive for such continuous long period become more susceptible to these kinds of accident. Detecting drowsiness of drivers is still an ongoing research in order to reduce the number of accidents. There are many methods used to identify drowsy drivers are physiological based, vehicle based, and behavioral based. Other physiological methods identified by [5] such as heartbeat, pulse rate, and Electrocardiogram etc. are used to detect fatigue level. Vehicle based methods include accelerator pattern, acceleration and steering movements. Behavioral methods include yawn, Eye Closure, Eye Blinking, etc. To encounter this worldwide problem, a solution that captures images in a succession, transmits real-time driver’s data to the server, detect facial landmarks using histogram of oriented gradients and determines drowsiness using EAR (Eye Aspect Ratio).

2. Facial Landmark Detection

In order to calculate drowsiness level with eye aspect ratio, we need to obtain the facial landmark locations first. There are many approaches in detecting it. In this paper we will not be using Viola-Jones algorithm, because this method is outdated and time-consuming in terms of its parameter optimization. Even after the long process of tuning, this method does not guarantee that the parameters will work from image-to-image. Now, the Viola-Jones detector isn’t our only choice for object detection. We have object detection using keypoints, local invariant descriptors, and bag-of-visual-words models. In this

paper, we will be using histogram of oriented gradients descriptor because even though it’s old, it still works well and produce fantastic results as demonstrated in [1].

2.1. Histogram of Oriented Gradients

Histogram of oriented gradients or HOG is a feature descriptor that is often used to extract features from image data. HOG focuses on the structure or the shape of an object. This method is able to provide not only the edge, but also its direction as well, which is done by extracting the gradient and orientation. These orientations are calculated in localized portions. This means that the complete image is broken down into smaller regions and for each region, the gradients and its orientation are calculated. This method will generate a histogram for each of these regions separately [1].

Figure 1. Visualization of HOG on an image containing a person. [3]

2.2. Facial Landmark Detection with HOG

Detecting facial landmarks is a subset of the shape prediction problem. Given an input image (and normally an ROI that specifies the object of interest), a shape predictor attempts to localize key points of interest along the shape. In the context of facial landmarks, our goal is to detect important facial structures on the face using shape prediction methods. Detecting facial landmarks consists of localization of the face in the image, and detection of the key facial structure on the face ROI. Histogram of oriented gradients will be used to fulfill the first step of the facial landmark detection. The first step is to sample P positive samples from the training data of the object, which is the eye, and extract HOG descriptors from these samples.

Figure 2. Example of positive training samples and visualization of their corresponding HOG features from HELEN dataset.

After that, N negative samples will be sampled from a negative training set that does not contain any of the objects that you want to detect and extract its HOG descriptors from these samples as well.

Figure 3. Example of negative training samples extracted from a face from HELEN dataset.

From these two datasets, we can train a classifier on both of it. After that, for each image in the negative training set, apply the sliding window technique and slide the window across the image. At each window compute the histogram of oriented gradients descriptors and apply it to the classifier. If the classifier incorrectly classifies the given window as an object, the associated feature vector will be recorded with the false-positive patch. After that we can take the samples out and sort by their confidences and re-train the classifier using these samples. The classifier is now trained and can be applied to the test dataset again. These are the bare minimum steps required to create a facial landmark detector. But in the implementation, for the purpose of convenience, dlib’s facial landmark predictor will be used. This predictor is a direct implementation of [4] which uses ensemble of regression trees to classify facial landmarks.

3. Eye Aspect Ratio

Figure 4. Open and closed eyes with landmarks automatically detected. The eye aspect ratio EAR plotted for several frames of a video sequence. A single blink is present.

To calculate the drowsiness level, we propose to exploit the facial landmark detector to localize the eyes and eyelid contours. From the landmarks detected in the image, we derive the eye aspect ratio (EAR), a method that is proposed by [2] used as an estimate of the state of the eyes. From every frame in the video, the eye landmarks will be detected with the detector explained in the previous section. The eye aspect ratio between height and width of the eye is computed.

𝐸𝐴𝑅 =&|𝑝) − 𝑝+|& + &|𝑝- − 𝑝.|&

2&|𝑝0 − 𝑝1|&(1)

Where 𝑝0, … . , 𝑝+are the landmark locations of the eyes depicted in figure 4. The EAR is mostly constant when an eye is open and is getting close to zero while closing an eye. It is partially person and head pose insensitive. Aspect ratio of the open eye has a small variance among individuals and it is fully invariant to a uniform scaling of the image and in-plane rotation of the face. Since eye blinking is performed by both eyes synchronously, the EAR of both eyes is averaged.

Figure 5. Example of detected drowsiness. The plots of the eye aspect ratio EAR in Eq. (1), results of the EAR thresholding (threshold set to 0.2), the blinks detected by EAR SVM and the ground-truth labels over the video sequence. Input image with detected landmarks (depicted frame is marked by a red line).

4. Implementation

In this section we will put all the parts together into a driver drowsiness system that could alert the driver when it detects drowsiness. The threshold for eye aspect ratio is set to 0.2 after some trial and error because it is heavily influenced by many factors, such as the distance between the driver and the camera, the face orientation, etc.

Figure 6. System monitors driver’s eye aspect ratio.

Figure 7. System alert driver because of the eye aspect ratio is detected to be under 0.2.

5. Conclusion In this work, a real time system that monitors and detects the loss of attention of drivers of vehicles is proposed. The face of the driver has been detected by utilizing histogram of oriented gradients to capture facial landmarks and warning is given to the driver in the form of an alarm to avoid real time crashes. The proposed approach uses Eye Aspect Ratio to detect driver’s drowsiness in real-time. This is useful in situations when the drivers are used to strenuous workload and drive continuously for long distances. This work shows that it is feasible to implement a real time driver drowsiness detection by utilizing histogram of oriented gradients and image processing techniques in general.

6. References [1]. N. Dalal & B. Triggs 2005 Histogram of Oriented Gradients for Human Detection. [2]. T. Soukupova & J. Cech 2016 “Real-time eye blink detection using facial landmarks,” Computer

Vision Winter Workshop (CVWW). [3]. E. Fotiadis, M. Garzon, & A. Barrientos 2013 Human Detection from a Mobile Robot Using Fusion

of Laser and Vision Information. [4]. V. Kazemi & J. Sullivan 2014 One millisecond face alignment with an ensemble of regression trees

(CVPR 2014). [5]. S. Mehta, S. Dadhich, S. Gumber, & A. Bhatt 2019 Real-Time Driver Drowsiness Detection

System Using Eye Aspect Ratio and Eye Closure Ratio.

Acknowledgments Author would like to thank Dr. Rinaldi Munir as the lecturers of this amazing class and also giving me this chance to explore the interesting applications of the things that I learnt during this class which is to design and implement various image processing techniques. Author also wishes to thank families and friends for all the motivations and encouragements.

real time driver drowsiness detection with...

Documents