raufbaig.files.wordpress.com · 2016. 12. 26. · artificial neural networks ·clustering methods...

1 23

Applied IntelligenceThe International Journal of ArtificialIntelligence, Neural Networks, andComplex Problem-Solving Technologies ISSN 0924-669XVolume 44Number 3 Appl Intell (2016) 44:645-664DOI 10.1007/s10489-015-0722-6

Profiling drivers based on driver dependentvehicle driving features

Zahid Halim, Rizwana Kalsoom & AbdulRauf Baig

1 23

Your article is protected by copyright and all

rights are held exclusively by Springer Science

+Business Media New York. This e-offprint is

for personal use only and shall not be self-

archived in electronic repositories. If you wish

to self-archive your article, please use the

accepted manuscript version for posting on

your own website. You may further deposit

the accepted manuscript version in any

repository, provided it is only made publicly

available 12 months after official publication

or later and provided acknowledgement is

given to the original source of publication

and a link is inserted to the published article

on Springer's website. The link must be

accompanied by the following text: "The final

publication is available at link.springer.com”.

Appl Intell (2016) 44:645–664DOI 10.1007/s10489-015-0722-6

Profiling drivers based on driver dependent vehicle drivingfeatures

Zahid Halim1 ·Rizwana Kalsoom1 ·Abdul Rauf Baig2

Published online: 2 November 2015© Springer Science+Business Media New York 2015

Abstract This work addresses the problem of profilingdrivers based on their driving features. A purpose-built hard-ware integrated with a software tool is used to record datafrom multiple drivers. The recorded data is then profiledusing clustering techniques. k-means has been used forclustering and the results are counterchecked with Fuzzyc-means (FCM) and Model Based Clustering (MBC). Basedon the results of clustering, a classifier, i.e., an ArtificialNeural Network (ANN) is trained to classify a driver duringdriving in one of the four discovered clusters (profiles). Theperformance of ANN is compared with that of a SupportVector Machine (SVM). Comparison of the clustering tech-niques shows that different subsets of the recorded datasetwith a diverse combination of attributes provide approxi-mately the same number of profiles, i.e., four. Analysis offeatures shows that average speed, maximum speed, num-ber of times brakes were applied, and number of times hornwas used provide the information regarding drivers’ driv-ing behavior, which is useful for clustering. Both one versus

� Zahid [email protected]

Rizwana [email protected]

Abdul Rauf [email protected]

1 Faculty of Computer Science and Engineering, Ghulam IshaqKhan Institute of Engineering Sciences and Technology, Topi,Pakistan

2 College of Computer and Information Sciences, Al ImamMohammad Ibn Saud Islamic University (IMSIU), Riyadh,Saudi Arabia

one (SVM) and one versus rest (SVM) method for classi-fication have been applied. Average accuracy and averagemean square error achieved in the case of ANN was 84.2 %and 0.05 respectively. Whereas the average performance forSVM was 47 %, the maximum performance was 86 % usingRBF kernel. The proposed system can be used in modernvehicles for early warning system, based on drivers’ drivingfeatures, to avoid accidents.

Keywords Driver behavior modeling · Road safety ·Artificial neural networks · Clustering methods · Intelligentsystems

1 Introduction

Road safety is a multidimensional subject that can beaddressed from different perspectives. A few of theseinclude road/track designs, driver training, enforcing traf-fic rules and pedestrian awareness. It has been reported byWorld Health Organization (WHO) that, globally, the aver-age annual human fatality rate is 18 per 100,000 [1]. Thisfigure varies across countries and excludes major and minorinjuries. Road accidents not only cause loss of precious livesbut also damage to vehicles and property. Many countriesmaintain the records of traffic accidents through their trans-portation departments, traffic police or hospitals [2]. Theserecords show that the number of casualties and injuries dueto road accidents is very high. In these fatalities, car occu-pants are at a higher risk as compared to the rest (pedestriansand pedalists).

Driving under stressed and anxious state of mind makesit difficult to figure out the hazardous road conditions anda minor mistake can cause an accident [2]. It is a chal-lenging task to detect the hazardous change in the vehicle’s

Author's personal copy

http://crossmark.crossref.org/dialog/?doi=10.1186/10.1007/s10489-015-0722-6-x&domain=pdf

mailto:[email protected]



646 Z. Halim et al.

pattern influenced by the individual driver’s style of driv-ing and alert the driver accordingly. This can be done byanalysing the driver’s driving behaviour. Some work alreadyexists on studying unsafe driving patterns, e.g. [3–5]. How-ever, the studies do not incorporate the individual drivingfeatures of the driver and, rather, focus on either studyingthe aspect of driving while eating/drinking, using mobilephones or predicting crashes based on weather, road, andtraffic conditions.

This work aims at recording different actions of thedrivers while they drive a vehicle and uses the recorded datato profile the drivers in different types of driving behaviors.The work also aims at using the aforementioned profiles totrain a classifier in order to predict a driver’s current profile,given his/her driving features. For this purpose, we identi-fied heterogeneous profiles using clustering techniques. Thedriving features taken into consideration include: number oftimes footbrakes are applied, use of horns, maximum speedachieved while driving, driver’s average speed, ratio of leftturns to use of left indicators, ratio of right turns to use ofright indicators, maximum gear used by the driver, driver’saverage gear, and number of times a vehicle gets into reversegear. The driver’s driving features are recorded using a

custom-built hardware, which is integrated with a computersoftware tool. The clusters are extracted using k-meansclustering and the results are counterchecked by applyingmodel-based and fuzzy c-means clustering approaches.

During the clustering experiments, we have also studiedthe most and the least important of the recorded featuresthat influence the clustering process. The clustering resultssuggest four different clusters in the underlying data. Theinformation extracted from clusters is used to train a clas-sifier (ANN and SVM are used in our experiments). Thetrained classifier can then be used to predict a driver’s pro-file by providing it the current driving features at the input.The proposed system, in addition to many other uses, canserve in generating early warnings to the drivers to avoidfatal driving states. Complete system design is illustrated inFig 1.

The rest of the paper is organized as follows: Section 2presents the related work. Section 3 provides the detailsof data recording and dataset creation module. Section 4explains in detail the clustering techniques, which have beenused for data profiling. Section 5 explains the classifica-tion techniques used. Section 6 lists the results of clus-tering and classification. Section 7 covers the discussion.

Fig. 1 Framework for the driverprofiling and profile predictionsystem

Micro Controller

Hardware Module

PressureSensor

PushButtons

Clutch, Brake,Accelerator,

SteeringWheel

Gear

Machine Learning

Attribute Selection

Clustering

MutualInformationAlgorithm

PrincipalComponentAnalysis

K-means ModelBased

Fuzzy CMeans

Classification

Training Testing Performance

Software Module

S/W Simulator

Data inbits

Pre-processing

Dataset In Excel Format

How Many Clusters?

FindDifferentClasses

Assign ClassLabel to data


Profiling drivers based on driver dependent vehicle driving features 647

Finally, Section 8 concludes the paper with some futuredirections.

2 Related work

Road traffic safety deals with the techniques and actionstaken to reduce the risk of death or injury for road users[49]. Accident prediction is one of the most critical aspectsof road safety, where an accident can be predicted beforeit actually occurs and precautionary measures be taken toavoid it. Accident prediction models are also popular inroad safety analysis. There are many existing approachesto detect the unsafe driving patterns for accident predic-tion. Some of these are based on biometric detection and afew on facial movement for driving fatigue assessment [6,7]. There is no fixed list of actions and/or data based uponwhich accident could be predicted accurately. So, it is diffi-cult to consider all the reasons at a time and label all the datafor accident prediction. A number of techniques have beenimplemented considering different methods and having dif-ferent datasets for the detection of safe and unsafe drivingpatterns [8, 9].

A genetic programming (GP)-based model is developedin [10] using traffic, weather, and crash data for freewaycrash prediction. To find the more effective variables inpredicting a crash, the authors have used random forest tech-nique for the selection of candidate variables. Each state isanalysed using GP model and results show that traffic flowcharacteristics, which may lead to crash, are different incase of congested and uncongested states. Authors in [10]also show that GP has better performance as compared tobinary logit model [11, 12]. In [13] authors have used multi-channel sequential data collected from the driving simulatorSTISIM [14–17] for the detection of unsafe driving patterns.Multichannel sequential data cannot be applied directly to adiscriminative classifier like SVM because the data is tem-porally correlated. So, Conditional Random Field (CRF) isused for the fusion of multi-channel data. CRF enhances theperformance of the system by directly modelling and max-imizing the conditional probability. It also does not needto label all the data and use both labelled and unlabelleddata for training. According to results in [18] CRF outpacesSVM and Hidden Markov model (HMM) in classificationof both labelled and unlabelled data.

A number of methods based upon ANN are presentedin the literature for the detection of accidents and severityof accidents using different datasets and conditions. Stu-dents of Oklahoma State University have presented a modelfor the severity of injury caused by traffic accidents [19].They use decision tree and neural networks employing

back-propagation with different number of iterations to trainthe network. Analysis in [19] shows that driver’s seat beltusage, light condition of the roadway and driver’s alcoholusage are the most critical features in fatal injuries [19]. Inanother work, three categories are used for the prediction ofaccident severity: light, serious or dead, a fused methodol-ogy of probabilistic neural network (PNN); decision tree isused on the data collected by the Republic of Cyprus policeand the results show improvement in classification accu-racy [20]. A mobile application named crash prediction isreported in [21], which takes various data as input, includ-ing age, gender, disability (if any), vision, date of licenseexpiry and experience in driving. This data is treated as asingle variable and Principal Component Analysis (PCA) isapplied on it. Afterwards, it is converted into driver depen-dent variables and HMM is applied to determine if a driveris fit, unfit or partially fit for driving.

In [9] unsafe driving behavior is quantified using anassumption that three perspectives can be used to mea-sure this type of behavior. First is the passenger’s point ofview; second, driver’s point of view; and third, vehicle’sstatus. For the passenger point of view, the authors detectheavy jolts caused by sudden turns or brakes by using 3-axis accelerometer mounted on the passenger seat. A videocamera is mounted on the car’s console to emulate thedriver vision and focus on the road as a driver’s point ofview. For vehicle status, velocity and speed of the engineare read directly from Engine Control Unit (ECU) throughOn-Board Diagnosis II (OBD II) protocol. Results in [9]show that the proposed system works according to humanopinion.

Based on statistical theory, a prediction model is pro-posed in [19] using SVM. Analysis of the model is per-formed on rural frontage road data of Texas. Results of SVMmodel are compared with negative binomial (NB) regressionmodel which shows that SVM predicts more accurately ascompared to NB model. A previous work of the authors isbased on the back propagation neural network (BPNN) andthe result shows SVM to be faster as compared to BPNN.Correspondingly, work in [22] implements SVM to predictthe hazardous and normal condition of traffic using micro-scopic data of traffic simulation software, Traffic SoftwareIntegrated System (TSIS).

As is evident from the preceding literature review and tothe best of our knowledge [5–46], [52], there has been noreported research that deals with the problem of profilingdrivers based on the actual driving features under regulardriving conditions.

The work in this paper is focused toward profiling thedrivers based on the driver dependent vehicle driving fea-tures only. The factors like road condition, weather, driver’s


648 Z. Halim et al.

(a)

Microcontroller

PressureSensor

PushButtons

Clutch, Brake, Accelerator,SteeringWheel

Gear

(b)

Fig. 2 Hardware part of the driving simulator, a Camera Image ofhardware b Integrated vehicle components with microcontroller

health or mood, and vehicle condition have not been incor-porated into the study. Nevertheless, the aforementionedfactors will ultimately affect the driver’s driving featuresand are indirectly recorded into the dataset.

3 Data recording and the dataset

To acquire the driving data, a custom-built data acquisitionsystem has been utilized [50, 51]. This system consists ofhardware and software modules. The hardware module con-sists of all the essential components required for driving afunctional vehicle. These include indicators, steering, foot-brake, handbrake, clutch, accelerator and gear shift. Thesoftware module is a driving simulator that creates a virtualdriving environment. Both of these modules, when com-bined, create the desired effect of real time driving. Thesoftware can be used for creating various driving environ-ments such as driving in crowded cities, open roads, and

highways. The obvious advantage of using a simulator is toprevent actual road accidents while recording data.

As mentioned earlier, the hardware part of the dataacquisition system consists of a mechanical assembly com-prising various vehicle components. The rotation of thesteering wheel is identified by the simulator environmentwith the help of push buttons. Whenever the steering wheelis rotated, one of the push buttons is pressed to indicateeither clockwise or anticlockwise rotation of the steeringwheel. Similar push buttons are used to detect whether theclutch, footbrake paddle or accelerator paddle is pressed.For gear assembly, customized switches have been used toindicate the current gear position (Fig. 2). The incomingdata from all of these push buttons and switches are fed tothe input port pins of an 8051 microcontroller. This data onthe port pins is serially transmitted to the software appli-cation, which controls the dynamics of the vehicle in thesoftware simulator environment. The serial communicationbetween 8051 microcontroller and the software applica-tion occurs at 9600 baud rate. Because of this high speeddata transfer between the microcontroller and computer, thevehicle in the software driving environment is controlled inreal time. The incoming serial data is also fed to the pro-filer module of the driver profiling system (discussed inSection 4), which categorizes the driver based on this data.

The software module of the system serially receives hex-adecimal values from microcontroller. These hexadecimalvalues are converted into binary strings so that, when thedata is transmitted serially, the individual bits of the binarystrings change with any activity detected by the electronicswitches attached to various components of the vehicle.

For our experiments, we have recorded the data of 50different subjects. Each subject drove the car in three differ-ent traffic scenarios (maximum traffic, average traffic, andminimum traffic) for 15-minute interval each (a total of 45minutes per subject). The driver’s actions are recorded fora time window of 30 seconds. Initially, a time window of15 seconds was also tested; however, in that case most ofthe records had only one or less control with an updated

Table 1 Dataset detail

Dataset

No. of subjects Total 50 Male 42

Females 8

Traffic scenario Total 3 Traffic Minimum

Average

Maximum

Time duration Total time

45min Each track 15min



value. Moreover, windows of more than 60-second causedchanges go unnoticed in many controls. The 30-second win-dow was found to be the most appropriate one. However,the selection of an appropriate sampling rate does dependupon the particular driving track and can be set accordingly.For the 15 minutes of driving by a subject, 30 samples wererecorded. These 30 samples were used to create a signaturethat identified a subject. This signature is created by tak-ing the average of each attribute of the recorded 30 samples.Table 1 gives the details of recorded dataset. Although thetotal sample size becomes 150, it is representative of a largercollection of data readings collected for clustering purpose.A total of 50 subjects have participated in this study givinga reading of 45-minute duration each at the sampling rateof 30 seconds. This makes a total of 4500 samples in thedataset. These 4500 samples are then represented by 150signatures, each representing a specific subject in a partic-ular traffic scenario. Other studies, like [56], [58], and [57]have used 18, 11, and 20 subjects respectively.

The contemporary work in intelligent transportationresearch makes use of diverse technologies to record vehicleand/or drivers’ data. The choice of a particular technologyprimarily depends on the type of data required. However,other factors like cost, availability, and accuracy of the datarecording technology also influence this choice. The worksin [5–8], [40], and [57] have used a custom-built devicefor data acquisition. Approaches in [14–17], [52], [57], and[58] have utilized STISIM simulator [15], and the contri-butions in [9], [56], and [35] have used OBD-II [37] fordata recording. Table 2 lists a summary of data acquisi-tion technology and the type of data used in recent work onintelligent transportation systems.

We have used the custom-built hardware and softwaretools for the data acquisition due to multiple reasons. Theprimary reason was the unavailability of all the requiredattributes (ten features) used in other systems. Consider-ing OBD II as an example, in our case, we are gatheringdriver dependent features, whereas OBD II collects the sta-tus attributes of a vehicle. By using OBD II we could onlygather three of the required attributes, including maximumspeed, average speed and gear through fuel pressure andthrottle position. For the remaining six attributes, we had

to use a customized solution. The secondary reasons werecost effectiveness (systems like STISIM simulator [15] arerelatively expensive) and facility to collect data samples ina laboratory environment (unlike OBD- II, which requireddata collection in real-time only). As already shown inTable 2, other approaches in the contemporary work havealso used a mix of custom-built solutions, off-the-shelf tech-nologies, real-time data, and simulated data depending uponthe requirements of the experiments.

4 Profiling drivers

Clustering is used for profiling the recorded driving fea-tures. The data consists of only the observed values and noadditional information, like age, education, and license sta-tus, is provided. In order to reduce the risk of unsuitableclustering [53] and to discover good clusters (those withhigh intra-cluster similarity and low inter-cluster similarity)in the underlying data, we have used three different clus-tering techniques. These are k-means [23, 24], model-basedclustering [25, 26], and fuzzy c-means clustering [27].

k-means is a partition-based unsupervised method forclustering. k-means partitions the given n objects into k

clusters, where each object belongs to the cluster with thenearest mean [24].

For model-based clustering [25, 26], it is supposed thateach component of data is generated by a mixture of basicprobability distributions representing a different group orcluster. For a dataset of given observations X which con-sists of multivariate independent observations x1,x2, . . . ,xn.Let G be the number of mixture components and fk (xi |θk)

be the density of an observation xi from the kth componentand θk is the corresponding parameters. The classificationlikelihood for mixture model is shown in (1).

Lmix (θ1, .., θG; τ1, .., τG|x) =n∏

i=1

G∑

k=1

τkfk (xi |θk) (1)

Where τk is the probability that an observation belongs

to the kth component(τk ≥ 0; ∑G

k=1 τk = 1)

.

Table 2 Technologies used inprevious work for dataacquisition

Technology/Data Works

Custom [5–8], [40], [57]

OBD II (or variants) [9], [56], [34]

STISIM (or variants) [14–17], [52], [57, 58]

Real-life data [5, 6], [9], [11], [40], [54], [58]

Simulations data [7, 8], [14–17], [52], [58]

Test/Historical/Traffic/Route data [2], [10, 11], [13], [18, 19], [22], [43], [56]


650 Z. Halim et al.

Table 3 Parameterizations ofthe covariance matrix

∑k

Identifier∑

k Distribution Volume Shape Orientation

EII λI Spherical Equal Equal NA

VII λkI Spherical Variable Equal NA

EEI λA Diagonal Equal Equal Coordinate axes

VEI λkA Diagonal Variable Equal Coordinate axes

EVI λAk Diagonal Equal Variable Coordinate axes

VVI λkAk Diagonal Variable Variable Coordinate axes

EEE λDAD Ellipsoidal Equal Equal Equal

EEV λDkADk Ellipsoidal Equal Equal Variable

VEV λkDkADk Ellipsoidal Variable Equal Variable

VVV λkDkAkDk Ellipsoidal Variable Variable Variable

We are generally concerned with those multivariateobservations, which have normal density fk (xi |θk). InGaussian mixture model each component k is modelledby a distribution parameter μk , covariance matrix

∑k and

density as given in (2):

fk

(xi |θμk,

∑

k

)=

exp{− 1

2 (xi − μk)T ∑−1

k (xi − μk)}

(2π)p2

∣∣∑k

∣∣

(2)

Geometric characteristics like shape, volume, distribu-tion, and orientation of each component are determined bythe covariance matrix

∑k.

The covariance matrix in terms of eigenvalue decom-position for model based clustering [25] is represented in(3):∑

k= λkDkAkD

Tk (3)

Where Dk represents the orthogonal matrix of eigen-vectors, Ak is called diagonal matrix and it is proportional

to the eigenvalues of covariance matrix∑

k and λk is ascalar quantity. Orthogonal matrix Dk provides the orien-tation for the components of covariance matrix while Ak

determines its shape and λk specifies the volume. Similarly,there are a number of geometric shapes for parameterizationof covariance matrix.

Table 3 shows the various multivariate model options forcurrently available covariance. For more than one dimen-sion, the model identifier (first column in Table 3) encodesthe geometric characteristics of the model. For example,VEI denotes a model in which the volumes of all clus-ters may vary, the shapes of the clusters are similar, andthe orientation shows the identity. Clusters in this modelhave diagonal covariance with an orientation parallel to thecoordinate axes. Parameters associated with characteristicsdesignated by variables are determined from the data. Thereis a subset of the parameterizations discussed in [26], whichgives details of the EM algorithm for maximum likelihoodestimation for these models.

Fuzzy c-mean provides soft clustering where every sam-ple has a membership degree with each cluster [27]. In caseof fuzzy c-means, a sample may belong to one or more

Table 4 Mutual informationof all attributes Traffic scenario

Feature Maximum Minimum Average Combined

No of Left indicator use/ No. of left turns 3.564 3.322 3.407 3.595

No of right indicator use/ No. of right turns 3.423 3.407 3.565 3.694

Brake use 3.774 3.936 4.166 4.19

Horn use 2.547 2.967 2.844 2.962

Reverse gear use 1.989 2.208 2.228 2.178

Average gear 1.143 1.159 1.124 1.143

Max. gear 0.818 0.667 0.529 0.704

Average speed 4.288 4.368 4.414 4.649

Max speed 4.458 4.679 4.553 5.07

Gender 0.634 0.634 0.634 0.634



Table 5 Variance of theprincipal components Traffic scenario

Feature Maximum Minimum Average Combined

No of Left indicator use/ No. of left turns 0.2936 0.3038 0.3101 0.3014

No of right indicator use/ No. of right turns 0.3002 0.2883 0.2738 0.2881

Brake use 0.333 0.3038 0.3332 0.3315

Horn use 0.3237 0.3332 0.3318 0.3221

Reverse gear use 0.3331 0.332 0.3256 0.3204

Average gear 0.314 0.3285 0.3259 0.3301

Max. Gear 0.3159 0.3189 0.3061 0.3161

Average speed 0.3317 0.3173 0.3233 0.3206

Max speed 0.3093 0.3322 0.3247 0.3291

Gender 0.305 0.3005 0.3033 0.2999

clusters. It is based on the minimization of the objectivefunction according to a threshold value. To obtain best clus-tering solution, objective functions work as cost functionthat has to be minimized [28]. Membership degrees can alsoindicate how indistinctly or absolutely a data point shouldbelong to a cluster [29]. This partitioning is carried outthrough an iterative optimization of the objective function,with the update of membership uij and the cluster centrecj . The function is listed in (4).

Fm =N∑

i=1

∑C

j=1um

ij

∥∥xi − cj

∥∥2 1 ≤ m ≤ ∞ (4)

The highest value of membership to a specific clusterindicates the greater similarity to that cluster. Iterative opti-mization of the function in (4) is carried out by updatingthe membership function uij and the centers of clusters cj .

Table 6 Standard deviation values of the recorded features

Feature Std.

No of Left indicator use/ No. of left turns 4.3054047

No of right indicator use/ No. of right turns 3.7630073

Brake use 5.0442161

Horn use 5.0400518

Reverse gear use 1.2183533

Average gear 0.5403206

Max. Gear 0.389401

Average speed 7.1134747

Max speed 11.217763

Gender 0.3678342

The center cj and membership uij are calculated from therelationships given in (5) and (6) respectively [30].

cj =∑N

i=1umij ∗ xi

∑Ni=1u

mij

(5)

uij = 1

∑ck=1

(‖xi−cj‖‖xi−ck‖

) 2m−1

(6)

The iteration stops when∥∥∥uk+1

ij − ukij

∥∥∥ < threshold

value, where uij is a termination criterion having a valuebetween 0 and 1, whereas k is the number of iterations.

5 Profile prediction

Classification is employed for profile prediction of a driverwhile s/he is driving the vehicle. Both SVM- and ANN-[31] based classifiers are used to observe and compare theirperformance. Though many types of neural networks canbe used for classification problems [32, 44], we focusedthe commonly used feed-forward neural networks knownas multilayer perceptron (MLPs). We have used a 3-layeredneural network (two layers of weights) with 10 neurons inthe input layer (corresponding to 10 recorded attributes ofthe data) and experimented with 2 to 10 neurons in the hid-den layer. Four neurons were used in the output layer eachrepresenting one of the four profiles (discussion on pro-files follow in Section 6). The network was trained usingthe scaled conjugate gradient method, which shows a fasterlearning rate as compared to standard gradient methods incase of classification and pattern recognition problems. Theperformance of the networks was examined through meansquare error (MSE) given in (7),

MSE = 1

N

∑N

i=1(ti−ai)

2 (7)


652 Z. Halim et al.

Fig. 3 Results of k-meansclustering for the value of kfrom 1 to 28, a Top 4 attributes,b All attributes, and c Bottom 6attributes

(a)

(b)

(c)

0.000

5.000

10.000

15.000

20.000

25.000

30.000

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Avg.

clu

ster

cen

troi

d di

stan

ceAv

g. c

lust

er c

entr

oid

dist

ance

Avg.

clu

ster

cen

troi

d di

stan

ce

Vale of k

Top 4 a�ributes

T4-Min T4-Avg

T4-Max T4-All

0.000

5.000

10.000

15.000

20.000

25.000

30.000

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Value of k

All a�ributes

All-Min All-Avg

All-Max All-All

0.000

2.000

4.000

6.000

8.000

10.000

12.000

14.000

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Value of k

Bo�om 6 a�ributes

B6-Min B6-Avg

B6-Max B6-All

Where, ti represents the target output value and ai is thenetwork’s actual output.

Support Vector Machine (SVM) was also chosen forexperiments because of its ability to generalize in high-dimensional spaces. In case of linear separable data SVM

maps samples to a high dimensional feature space withthe help of training data and a hyper plane with maxi-mum margin. However, there can be non-linear problemsas is the case for the problem at hand. One answer to thisis mapping the data in a higher dimensional space and



Table 7 Average of theaverage distance betweencentroids of clusters for alltraffic scenarios datasets

Dataset

Clusters Min. Avg. Max. All Std.

2 23.21 17.14 24.5 11.02 6.2

3 14.16 19.27 9.12 14.76 4.16

4 15.36 17.21 15.55 15.42 0.89

5 20.97 17.79 14.48 14.44 3.12

6 21.84 17.14 14.68 14.98 3.31

7 21.29 16.45 17.39 15.06 2.67

8 21.61 17.99 16.04 17.27 2.4

9 21.97 18.01 18.28 17.57 2.03

10 22.21 15.2 17.34 16.83 3.02

11 21.8 22.28 17.81 19.75 2.05

12 22.37 21.23 17.85 19.05 2.05

13 21.72 20.89 17.58 20.25 1.79

14 21.84 22.62 15.18 19.38 3.34

15 21.11 20.25 17.59 18.87 1.55

16 21.21 22.75 19.45 18.9 1.75

17 22.6 22.34 20.36 19.49 1.52

18 22.65 21.76 19.32 19.81 1.58

19 20.96 21.53 17.22 19 1.96

20 21.66 21.31 17.65 19.23 1.88

21 20.22 21.85 17.99 19.2 1.63

22 21.44 21.63 19.42 19.84 1.12

23 20.43 21.52 20.52 18.37 1.32

24 21.3 20.92 19.65 18.84 1.14

25 21.08 20.82 17.48 19.02 1.68

26 21.27 22.35 17.99 19.09 1.99

27 20.82 21 17.96 20.38 1.41

then characterizing a hyper plane there. With an appro-priately selected characteristic space of adequate dimen-sionality, training set can be made separable [47]. The

mapping is often not explicitly given and kernels [48] canbe used. Some commonly used kernel functions are RadialBasis Function (RBF), polynomial, and sigmoid [48]. RBF

Fig. 4 Standard deviation ofaverage of the average distancebetween centers of clusters of alldatasets

0

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26Avg.

dist

ance

of a

ll sa

mpl

es w

ith 4

feat

ures

Number of clusters

Avergae distance between cluster centroids

Std Value


654 Z. Halim et al.

Table 8 Top 3 models basedon the BIC criterion Traffic scenario Model No of components BIC value

Complete dataset having 10 feature VEI 4 −6197.195

VEI 7 −6201.844

VEI 8 −6205.559

Complete dataset having 4 feature VEV 4 −3757.928

EEE 8 −3759.372

EEE 9 −3759.906

Maximum dataset having 10 features VEI 3 −2166.13

VEI 4 − 2179.644

VEI 5 −2189.315

Maximum dataset having 4 features VEI 5 −1219.188

VEI 6 −1242.719

VEI 7 −1250.175

Average data Sample having 10 features VEI 4 −2240.913

VEI 5 −2278.364

EEI 4 −2286.836

Average dataset having 4 features EEE 1 −1228.018

EEV 1 −1228.018

VEV 1 −1228.018

Minimum dataset having 10 features VEI 4 −2240.913

VEI 5 −2278.364

VEI 4 −2286.836

Minimum dataset having 4 features EEE 4 −1328.43

EEI 7 −1335.478

EEE 6 −1336.515

kernel is a reasonable first choice of kernel function becauseother non-linear kernels require more parameters to be cho-sen [3]. Accuracy of SVM is dependent on the selection ofthe cost (C) and gamma. We have used cross validation forbest model evaluation using 5-folds.

6 Data analysis and results

This section describes the experiments for data clusteringand classification followed by a discussion of results. Themain objective of the clustering experiment is to identify thenumber of natural clusters in the underlying data. For thispurpose, an analysis of the results from three different clus-tering techniques, i.e., k-means, MBC and fuzzy c-means,is done for the selection of k clusters. Since we haverecorded data for each subject against three traffic scenar-ios, the clustering is performed separately on these datasetsand also by combining the three datasets. From this pointonward in the paper, we refer to these as minimum traffic

dataset, average traffic dataset, maximum traffic dataset,and combined (all) dataset. Furthermore, wehave also divided the 10 recorded attributes intwo groups, top 4 and bottom 6 attributes. Thiscategorization is done on the basis of mutual infor-mation [33] and PCA [34]. The purpose of thiscategorization is to observe which attribute contributesmore toward better profiling (clustering) of drivers. Table 4lists the mutual information values for the 10 recordedattributes. For all the four datasets, the mutual informationvalue is highest for maximum speed, average speed, andnumber of times brakes are applied. The fourth highestvalue of mutual information is for ratio of left indicator useto the number of times left turn is taken (for the maximumtraffic dataset) and, for the remaining of the datasets, it isthe ratio of right indicator use to number of times right turnis taken.

Table 5 lists the variance of the attributes after apply-ing PCA. In case of maximum traffic dataset, the numberof times brakes are applied, average speed, number of times



(a) (b)

(c) (d)

(e) (f)

(g) (h)

0

20

40

60

80

100

120

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

14000.00

16000.00

18000.00

2 3 4 5 6

Num

ber o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of clusters

10 features with combined dataset

obj. func�onno of itera�ons 0

20

40

60

80

100

120

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

14000.00

2 3 4 5 6

Num

ber o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of clusters

4 features with combined dataset

obj. func�on

no. of itera�ons

0

20

40

60

80

100

120

0.00

1000.00

2000.00

3000.00

4000.00

5000.00

6000.00

2 3 4 5 6

Num

ber o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of clusters

10 features with max. dataset

obj. func�onno. of itera�ons

0

10

20

30

40

50

60

70

80

0

500

1000

1500

2000

2500

3000

3500

4000

2 3 4 5 6

No o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of Clusters

4 features with max. dataset

obj. func�onno. of itera�ons

0

10

20

30

40

50

60

70

80

0

1000

2000

3000

4000

5000

6000

7000

2 3 4 5 6

No o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of Clusters

10 features with min. dataset

obj. func�on

no. of itera�ons

0

10

20

30

40

50

60

70

0.00

1000.00

2000.00

3000.00

4000.00

5000.00

6000.00

2 3 4 5 6

No o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of Clusters

4 features with min. dataset

obj. func�on

no. of itera�ons

0

20

40

60

80

100

120

0.00

500.00

1000.00

1500.00

2000.00

2500.00

3000.00

3500.00

4000.00

4500.00

5000.00

2 3 4 5 6

No o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of Clusters

10 features with avg. dataset

obj. func�on

no. of itera�ons0

10

20

30

40

50

60

70

80

90

0.00

500.00

1000.00

1500.00

2000.00

2500.00

3000.00

3500.00

2 3 4 5 6

No o

f Ite

ra�o

ns

Obj

ec�v

e Fu

nc�o

n

Number of Clusters

4 features with avg. dataset

obj. func�on

no. of itera�ons

Fig. 5 Objective Function versus iteration of FCM, (a). 10 featuresfor complete dataset (b). Top 4 features for complete dataset (c). 10features for max. dataset (d). Top 4 features for max. dataset (e). 10

features for min. dataset (f). Top 4 features for min. dataset (g). 10features for avg. dataset (h). Top 4 features for avg. dataset


656 Z. Halim et al.

Table 9 Value of ObjectiveFunction and No. of Iterationsfor first 6 Clusters obtainedthrough FCM

Features Top 4 All

Dataset Clusters Iterations Obj. function Iterations Obj. function

All 2 22 12624.83 21 17015.31

3 27 7598.61 26 10788.67

4 35 5465.66 44 7970.94

5 70 4311.1 100 6363.6

6 97 3508.98 100 5274.23

Maximum 2 22 12624.83 20 5481.98

3 27 7598.61 23 3372.08

4 35 5465.66 44 2478.63

5 70 4311.1 100 1984.32

6 97 3508.98 100 1984.32

Average 2 16 3285.28 16 4537.88

3 25 1997.37 35 2926.76

4 29 1359.26 39 2131.84

5 37 1043.53 100 1659.27

6 78 840.62 100 1364.92

Minimum 2 18 4945.74 16 6370.86

3 61 3058.67 33 4046.36

4 38 2179.39 28 2999.9

5 27 1665.17 57 2376.1

6 40 1373.72 68 1941.13

horn is used, and maximum speed has highest variance forprincipal components. Similarly, the dataset having com-bined traffic scenario is showing maximum variance valuefor the same driving features.

Since the standard deviation is a measure of how spreadout numbers/data are, it could be of help in clustering therecords when the attributes have a high standard devia-tion. Based on this assumption, Table 6 lists the standarddeviation values for 10 recorded features. This helps in con-cluding that average speed, maximum speed, number oftimes brakes are applied, and number of times horn is usedas the top 4 attributes that may influence clustering. Theremaining 6 attributes are considered to be the bottom 6.

6.1 k-means clustering results

k-means clustering was applied to the combined dataset andthe 3 column-wise distributions (top 4, bottom 6, and all

attributes). k-means is input dependent and we need to inputthe number of clusters before the algorithm starts clustering.The values of k from 1 to 28 were used. For higher valuesof k, i.e., 29 and above, it was observed that empty clus-ters were being generated. All clusters were examined basedupon their inter-cluster similarity. Inter-cluster similaritywas calculated by finding the average pair-wise distance ofthe samples in a cluster. Figure 3 shows the graphs for theaverage distance between all the cluster centroids for thevalues of k from 1 to 28. For values of k greater than 29,empty clusters were generated. For this reason, some of thelines in Fig. 3 finish before reaching the right-hand-side ofthe figure.

Figure 3a shows the results of k-means when appliedon the top 4 attributes with combined dataset. By observ-ing Fig. 3a it is not clear for which value of k all datasetshave the highest intra-cluster similarity and the lowest inter-cluster similarity. However, for the value of k = 4, three

Table 10 SVM accuracy usingone versus one method Cross validation accuracy Testing accuracy Kernel C Gamma

98 % 86 % RBF 4.5552 0.028656

97 % 30 % Linear 0.074325 3.3636

96 % 32 % Polynomial 81.0193 8

35 % 32 % Sigmoid 76.1093 19.0273



Table 11 SVM accuracy usingone versus one method Cross validation accuracy Testing accuracy Kernel C Gamma

39 % 30 % RBF 0.03125 0.0078125

100 % 68 % Linear 0.03125 0.044194

98 % 70 % Polynomial 0.17678 1.4142

30 % 28 % Sigmoid 1 0.0078125

datasets seem to have the same result (the exception beingthe average traffic dataset).

Figure 3b shows results of k-means when applied to10 attributes in the four traffic scenarios. It is clear fromthe graph that minimum traffic dataset is showing high-est distance between cluster centroids when the value ofk is 4. Average and maximum traffic datasets are show-ing relatively higher values when the value of k is 4,while the combined dataset is not showing peak value at4. Figure 3c shows the results of k-means when applied tobottom 6 attributes in the four traffic scenarios. The resultsshow a random behavior for different values of k. How-ever, these results are irrelevant because these attributescarry less information and they have been already cov-ered in Fig. 3b. We have not used these 6 features inour analysis. As shown in Fig. 3b there is no signifi-cant change in centroid distances for all the 27 clusteringformations.

Table 7 lists the values of the average of the average dis-tance between centers of clusters. The value of k in Table 7ranges from 2 to 27. Each subset of the dataset has thehighest value with different number of clusters. We haveapplied standard deviation on these four datasets for eachcluster to find the point where all clusters show a rela-tively higher value. The last column of Table 7 lists thestandard deviation values for clusters, which shows that forfour clusters we get minimum difference for all datasetsusing top 4 attributes. The same information is illustratedusing a graph in Fig. 4. Hence, based on k-means cluster-ing, we can conclude that 4 profiles exist in the underlyingdataset(s).

6.2 Model-based clustering results

The probabilistic framework in model-based clusteringallows the choice of best clusters. Large Bayesian Informa-tion Criterion (BIC) values indicate strong evidence for aspecific model. The missing values of BIC correspond tothe model and the number of clusters for which parame-ter values could not fit. For multivariate data, the defaultinitialization for all models uses the classification of hierar-chical clustering based on an unconstrained model. In ourexperimental setup, we have observed the BIC values upto 9 components with 10 covariance models (as listed inTable 3) for all categories of the dataset with the aforemen-tioned three combinations of driving features. Table 8 showsthe first three models which have maximum likelihood andhighest BIC value for each dataset considering all modelsfor first 9 components. For all of the datasets with the excep-tion of average traffic dataset, both with top four and allattributes, 4 components are among the top three models.Based upon BIC, it can be concluded that all datasets areshowing good clusters having 4 components.

6.3 Fuzzy c-means clustering results

The third and final experiment was the application offuzzy c-means clustering for extraction of clusters from thedataset. The main advantage of fuzzy c-means is its abil-ity to provide the degree of membership in clusters insteadof hard clustering, like k-means. All experiments were donewith 1e-5 as a stopping criterion. Figure 5 lists results ofFCM for all datasets by plotting objective function anditerations up to 6 clusters.

Fig. 6 Performance of theneural network

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10

MSE

Accu

racy

Number of neurons in the hidden layer

Accuracy

MSE


658 Z. Halim et al.

Table 12 Clustering resultsfor a real life dataset k-means FCM

Number of clusters Average centroids distance Number of iterations Objective function

2 11 23 11212.977

3 14.7 32 6560.24

4 16.4 35 4620.361

5 15.4 60 3594.929

6 15.8 100 2937.026

7 16.3 95 2418.755

8 16 94 2084.192

9 15.2 100 1841.212

10 15.7 94 1632.775

Figure 5a and b plot the result of FCM for the combineddataset for all and top four driving features respectively. Itis clear from these figures that objective function value isvery high for 2 and 3 clusters with the number of iterationsas a minimum. With four clusters, number of iterations andobjective function both have a minimum value as comparedto 5 and 6 clusters. Figure 5c and d plot the objective func-tion and the number of iterations for the maximum trafficdataset. Both of these plots show that, considering the valueof objective function with iteration, best results are foundat 4 clusters. Average traffic dataset is also showing inverserelation to the number of iterations and objective functionvalue in Fig. 5g and h. The graphs indicate minimum objec-tive function at four clusters while keeping iteration underconsideration.

Table 9 summarizes FCM clustering results including thevalues of objective function and number of iterations for 2to 6 clusters. It can be seen from the table that consider-ing all attributes for clustering, number of iterations havea very sharp change after four clusters. For example, incase of combined dataset, the objective function values havesmall change while iteration goes from 44 to 100 as thenumber of clusters goes from 4 to 5. Similarly, in case ofmaximum traffic dataset, iterations go from 39 to 100 withsmall decrease in the objective function. After comparing allclustering results, with both four and ten attributes, we canconclude that there are four different driving profiles in theunderlying data.

6.4 Classification results

For the purpose of predicting a driver’s profile while s/heis driving the vehicle, SVM- and ANN-based classificationis utilized. In case of SVM-based classification we haveapplied both one versus one and one versus rest method ofclassification. A total of 100 samples were used for 5-foldcross validation and 50 samples for testing. A total of 4kernels were applied to find the best SVM model for clas-sification. From Table 10 it is clear that only RBF kernel

is showing a good accuracy with a small trade-off and lowvalue of gamma. The low value of parameter C makes amore flexible hyper plane and tries to minimize the errormargin. Remaining kernels show a very low testing accu-racy, whereas, sigmoid is showing a very high trade-offvalue which may not provide good results in general. Themaximum performance achieved is 86 %, for RBF. Bothsigmoid and polynomial are showing 32 % testing accuracywhile polynomial and linear have good performance in caseof cross validation.

Since SVM is originally designed for binary classifica-tion problems, we used one versus rest method for apply-ing SVM to the multi-class classification problem. In thismethod, k−class classification issue is divided into a groupof binary classification sub-problems. The kth classifierforms a hyper plane among one class and other k-1 classes.We then use a majority vote across classifiers. Results ofone versus rest SVM model are shown in Table 11.

In case of one versus rest method the best SVM modelclassification accuracy for testing is 70 % using polynomialkernel. If we consider model parameters polynomial showsgood accuracy with a high gamma value as compared toother kernels.

We have also applied feed forward neural networks tothe dataset in order to compare the performance with SVM.The scaled conjugate gradient was applied for training andtangent sigmoid as activation function for neurons in eachlayer. Number of epochs used was 1000, minimum gradientwas 1e-6, and validation checks were set to 50. The networkwas trained by using 70 % of sample data while 15 % wasused for validation, and 15 % for testing. The performancewas evaluated based upon the accuracy of classification. Tofind the accuracy, we calculated the confusion matrix, whichhelped in visualizing network performance. With c classesthe confusion matrix becomes a c × c matrix containingthe c correct classifications. Figure 6 shows the neural net-work’s accuracy on the primary y-axis and the performanceevaluation by plotting mean square error values on the sec-ondary y-axis for varying number of neurons at the hidden



layer. Average accuracy and average mean square error were84.2 % and 0.05 respectively. With 7 and higher number ofneurons in hidden layer, accuracy has high value. Similarly,as shown by secondary y-axis, the mean square error getsvery close to zero with 7 or higher neurons in hidden layer.When we compare the results of both SVM and neural net-work, the neural network shows better average accuracy forclassification.

6.5 Results using real life data

To demonstrate the utility of the proposed approach on areal life dataset we conduct an additional experiment. Forthis experiment we use the same group of 50 subjects aswere in the preceding sections. However, for data record-ing we have used OBD II device. As already mentioned inSection 3, there are multiple options for the recording of reallife datasets. Some of these are too costly like UTDrive plat-form [57], [6], and [54]. OBD II is used due to its immediateavailability and also to avoid expensive solutions. The datarecorded consists of three of the top 4 attributes, including:maximum speed, average speed, and use of brake paddle.We have applied k-means and fuzzy c-means clustering upto 10 clusters on the data recorded through the OBD IIdevice. Table 12 shows that k-means has a maximum dis-tance between centroids with four clusters. Similarly, fuzzyc-means is showing low objective function value with lessnumber of iterations for four clusters. For fuzzy c-meansclustering, there is a drastic change in the number of iter-ations as we increase the number of clusters from 4 to 5.Based on the aforementioned results, it can be concludedthat the results obtained through custom hardware/softwarecorrelate with the real life data readings.

7 Discussion

The profiling of drivers using clustering has been presentedin Section 4. For the purpose of clustering, the data isrecorded using software which is integrated with the hard-ware assembly. The work in this paper deals with profilingdrivers based on driver-depended driving features only. Thisleads us to drop other vehicle features like fuel system sta-tus, fuel pressure, throttle position, oxygen sensor voltage,and a few others. These vehicle-specific features can becaptured using OBD-II. Since all the features that we haveconsidered for our study are not provided by OBD-II, acustom-built hardware assembly was used. The completefeature set consisted of nine items, including number oftimes footbrakes were applied, use of horns, maximumspeed achieved while driving, driver’s average speed, ratioof left turns to use of left indicators, ratio of right turns touse of right indicators, maximum gear used by the driver,

driver’s average gear, and number of times a vehicle getsinto reverse gear. In addition to these nine features, gen-der was also added to the feature set. However, the value ofgender does not vary during driving, though we have men-tioned some statistics for gender in Tables 4, 5, and 6, whichis for the sake of completeness. To see which of these tenattributes will contribute toward better profiling PCA andmutual information was applied to them. This resulted intwo sets, where four features were found to be the mostinfluential in deciding a driver profile. These include aver-age speed, maximum speed, number of times brakes wereapplied, and number of times horn was used. These fourmost influential features also correlate with the findings in[56, 57]. There are other studies [54] which also explore thefactors associated with individual driving risk by associat-ing the risk factors with driver personality. In the future, theextracted four profiles can also be mapped to the big fivepersonality traits [55] to see possible existence of a correla-tion. For the purpose of extracting similar group of drivers(profiles) from the underlying data, we used clustering. Var-ious clustering methods may produce different number ofclusters for the same dataset. Keeping this in mind, we usedthree different clustering approaches and decided upon thenumber of clusters based on the commonalities in the resultsof these three methods. The k-means clustering groups sim-ilar items based on a distance measure; however, it requiresas input the number of clusters to be formed beforehand. Weused the value of k from 1 to 28. The clustering formationusing four as a value for k results in the most suitable one,i.e., clusters are formed with maximum similarity withinthe clusters and minimum similarity with other clusters.The model based clustering and fuzzy c-means clusteringalso show the same trend. The hard clustering approaches,k-means and model based clustering, and the soft cluster-ing approach of fuzzy c-means show the same number ofclusters in the underlying dataset.

Figures 7, 8, 9, and 10 in the Appendix list the averagevalue for each of the recorded attribute using the mini-mum, average, maximum, and the combined traffic scenar-ios respectively. The figures in the Appendix are preparedcluster-wise to have a descriptive analysis of the formedclusters. After analysis of four clusters of the minimumtraffic dataset, we have found that cluster 1 and 4 have agroup of careful drivers who drive the vehicle with a highvalue of average speed. They do so by carefully applyingan appropriate number of brakes. The standard deviation ofthe number of turns to the indicator, for the drivers in theseclusters, also show that indicators are used according to thepath requirement. A comparison with additional informa-tion acquired through a separate questionnaire also showsthat drivers of this group usually drove few days in a weekand they try to avoid using mobile phone and other activi-ties like eating while driving. Similarly, cluster 2 indicates


660 Z. Halim et al.

Table 13 Key benefits and limitations

Merits Limitations

Low cost hardware and software simulation environment Influence of delay factor needs further analyses

Profiling of drivers for an early warning systems to avoid accidents

Prediction accuracy of 84.2 % Real-life and real-time data issue

the class of drivers who drive slow and are sluggish in usingindicators and brakes. This includes both male and femaledrivers, but all have a very slow speed with a high ratio ofturns to indicators. While third cluster of this dataset rep-resents the class of good drivers, those who drive with amoderate speed and apply proper brakes and indicators.

In the case of average traffic dataset, the first clustershows a group of careful, speedy drivers who have gear4 as their average gear and less number of brakes duringthe whole track and a good average ratio of turns to theindicators. The second cluster has the same properties ofdriving except that the speed of drivers of this cluster isnot high. Cluster three drivers believe in slow speed driv-ing with a good number of brakes applied according to thepath. Finally, the last group of drivers fall in the categoryof careful drivers having high average for ratio of turns toindicators but, the standard deviation of this is small, whichrepresents that they do not apply indicators regularly in caseof turns in the path.

Clustering based on maximum traffic dataset shows thesimilar groups of drivers including moderate drivers andthose who drive vigilantly and the class of sluggish andrash drivers. While examining the combined dataset whichincludes samples from all traffic scenarios, i.e., minimum,maximum, and average traffic, the clusters are separated dis-tinctly. Analysis of these clusters shows that the first clusterhas a very high average speed and high ratio of turns toindicators and less number of brakes which suggests thatdrivers show an impulsive driving behavior. In case of sec-ond cluster, average and standard deviation values of theattributes show that drivers are driving in a reckless drivingmode.

Like any other research, there are some limitations ofthe proposed work. Although this work has performed adetailed set of experiments on collecting various driverdependent vehicle driving features, the influence of delayfactor on the prediction system is yet to be studied in detail.Another limitation of the work is the use of data collectedonly through controlled environment. The data collectedfrom the subject is through the custom built hardwareintegrated with a software system in a laboratory envi-ronment. This has the advantage of avoiding any possibleaccidents, repeating an experiment with low cost, and largernumber of samples collection per subject. However, thereare a few factors that may indirectly influence the driverwhile s/he is driving a vehicle. Some of these factors include

pollution, traffic noise, and extreme weather. Consideringreal-life and real-time data may further enrich the results.Additionally, this may also give insight to various non-vehicle and non-driver factors that are important in studyingdrivers’ behavior. For further study on this the reader isreferred to [10]. Table 13 lists key benefits and limitationsof the proposed work.

8 Conclusion and future recommendations

This paper addressed the problem of driver profiling basedon the individual driver’s driving features. For the purposeof recording drivers’ data, a custom-build hardware inte-grated with a software environment was used. Three cluster-ing techniques (k-means, FCM, and MBC) were applied todiscover the unique classes of drivers. Comparison of theseclustering techniques showed that various datasets with dif-ferent combinations of attributes provide approximately thesame number of clusters, i.e., four. Analysis of variousfeatures showed that speed, number of times brakes wereapplied, and number of times horn was used, provided richinformation regarding drivers’ driving behavior useful forclustering. Feed-forward neural network and SVM-basedclassification was performed and the network was trainedusing the clustered data. The average accuracy achieved bythe ANN was 84.2 % while the average performance forSVM was 47 % and the maximum performance was 86 %using RBF kernel. In addition to the vehicle’s and the pas-sengers safety, the recorded data from the system can alsobe utilized in better decision-making for transportation andtraffic departments and also for the management of drivers,vehicles, and roads.

In future, this work can be expanded by scaling up thenumber of subjects for the same experimentation and seeits effect on the number of profiles created. Another inter-esting direction would be to study the behavior of subjectson different types of vehicles, like automatic, manual, lighttraffic vehicles, and heavy traffic vehicles. A more challeng-ing prospect is the collection of data in real cars and streets.From the computational point of view, there is still a goodamount of potential to increase the accuracy of the classi-fier for predicting the correct profile. This usually dependson the number and quality of samples in the dataset. How-ever, classifiers that can perform better with smaller trainingdatasets may also be investigated.



Appendix

Fig. 7 Clusters properties usingminimum traffic scenario

Appendix

0

10

20

30

40

50

60

70

80

90

100

Le�

indi

cato

r/le

� tu

rns r

a�o

Righ

t ind

icat

or/ r

ight

turn

s ra�

o

Brak

es u

se

Horn

s use

Reve

rse

Gear

use

Aver

age

Gear

Max

imum

Gea

r

Avge

rage

Spe

ed

Max

imum

Spe

ed

Valu

es

Driving features

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Fig. 8 Clusters properties usingaverage traffic scenario

0

10

20

30

40

50

60

70

80

90

100

Le�

indi

cato

r/le

� tu

rns r

a�o

Righ

t ind

icat

or/ r

ight

turn

s ra�

o

Brak

es u

se

Horn

s use

Reve

rse

Gear

use

Aver

age

Gear

Max

imum

Gea

r

Avge

rage

Spe

ed

Max

imum

Spe

ed

Valu

es

Driving features

Cluster 1

Cluster 2

Cluster 3

Cluster 4


662 Z. Halim et al.

Fig. 9 Clusters properties usingmaximum traffic scenario

0

10

20

30

40

50

60

70

80

90

100

ra

o

Brak

es u

se

Horn

s use

Reve

rse

Gear

use

Aver

age

Gear

Max

imum

Gea

r

Avge

rage

Spe

ed

Max

imum

Spe

ed

Valu

es

Driving features

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Fig. 10 Clusters propertiesusing all traffic scenarios

0

10

20

30

40

50

60

70

80

90

100

Le�

indi

cato

r/le

� tu

rns r

a�o

Righ

t ind

icat

or/ r

ight

turn

s ra�

o

Brak

es u

se

Horn

s use

Reve

rse

Gear

use

Aver

age

Gear

Max

imum

Gea

r

Avge

rage

Spe

ed

Max

imum

Spe

ed

Valu

es

Driving features

Cluster 1

Cluster 2

Cluster 3

Cluster 4



References

1. Dagal A, Greer SE, McCunn M (2014) International dispari-ties in trauma care. Current Opinion in Anesthesiology 27(2):233–239

2. Agbonkhese O, Yisa GL, Agbonkhese EG, Akanbi DO, Aka EO,Mondigha EB (2013) Road traffic accidents in nigeria: causes andpreventive measures. Civil and Environmental Research 3(13):90–99

3. Ran B, Jin PJ, Boyce D, Qiu TZ, Cheng Y (2012) Per-spectives on future transportation research: Impact ofintelligent transportation system technologies on next-generation transportation modeling. J Intell Transp Syst 16(4):226–242

4. Zhang J, Wang FY, Wang K, Lin WH, Xu X, Chen C (2011) Data-driven intelligent transportation systems: A survey. IEEE TransIntell Transp Syst 12(4):1624–1639

5. Fazeen M, Gozick B, Dantu R, Bhukhiya M, Gonzalez MC (2012)Safe driving using mobile phone. IEEE Trans Intell Transp Syst13(3):1462–1468

6. Zhiwei QJ, Lan P (2004) Real-time nonintrusive monitoring andprediction of driver fatigue. IEEE Trans Veh Technol 53(4):1052–1068

7. D’Orazio DT, Leo M, Spagnolo P, Guaragnella C (2004) A neuralsystem for eye detection in a driver vigilance application. In: The7th International IEEE Conference on Intelligent TransportationSystems, pp 320–325

8. Carl E, Lippitt J, Forsythe C, Dixon R (2005) Super-vised machine learning for modeling human recognitionof vehicle-driving situations. In: IEEE/RSJ Interna-tional Conference on Intelligent Robots and Systems,pp 604-609

9. Saensom P, Tangamchit P, Pongpaibool P, Imkamon T (2008)Detection of hazardous driving behavior using fuzzy logic. In:Proceedings of ECTI-CON, pp 657-660

10. Xu C, Liu P, Wang W (2013) A genetic programming model forreal-time crash prediction on freeways. IEEE Trans Intell TranspSyst 14(2):574–586

11. Abdel-At M, Rajashekar P (2006) Calibrating a real-time traffic crash prediction model using archived weatherand its traffic data. IEEE Trans Intell Transp Syst 7(2):167–174

12. Breiman L (2001) Random forests. Mach Learn 45(1):5–3213. Abdel-Aty M, Uddin N, Pande A (2005) Split models for pre-

dicting multivehicle crashes during high-speed and low-speedoperating conditions on freeways. Transportation research record1908(1):51–58

14. Huazhong N, Thomas SH, Xu W, Zhou Y (2007) Detecting unsafedriving patterns using discriminative learning In: ICME07, pp.1431–1434

15. Huazhong N, Xu W, Zhou Y, Gong Y, Huang TS (2009) Ageneral framework to detect unsafe system states from mul-tisensor data stream. IEEE Trans Intell Transp Syst 11(1):4–15

16. Ning H, Xu W, Zhou Y, Gong Y, Huang TS (2008) Temporaldifference learning to detect unsafe system states. In: Proceed-ings of the International Conference on Pattern Recognition,pp. 1–4.

17. Jabon ME, Bailenson JN, Pontikakis E, Takayama L (2009)Facial expression analysis for predicting unsafe drivingbehavior car driving simulator. Pervasive Computing 10(4):84–95

18. Yuejing L, Jie L, Ming L, Xing-lin Z, Haixia Z (2010) Researchon accident prediction of intersection and identification method ofprominent accident form based on back propagation neural net-work. In: International Conference on Computer Application andSystem Modeling (ICCASM ), pp. V1–434

19. Chong M (2004) Traffic accident analysis using decision trees andneural networks. In: IADIS International Conference on AppliedComputing, pp. 1–4

20. Tambouratzis T, Souliou D, Chalikias M, Gregoriades A (2010)Combining probabilistic neural networks and decision trees formaximally accurate and efficient accident prediction. In: Interna-tional Joint Conference on Neural Networks (IJCNN), pp.1–8

21. Garima RS, Dongre S (2012) Crash prediction system for mobiledevice on android by using data stream mining techniques. In:Sixth Asia Modeling Symposium, pp. 185–190

22. Lv Y, Tang S, Zhao H, Li S (2009) Real-time highway acci-dent prediction based on support vector machines. In: Control andDecision Conference, CCDC ’09, pp. 4403–4407

23. Na S, Xumin L, Yong G (2010) Research on k-means Clus-tering Algorithm. In: Third International Symposium on Intelli-gent Information Technology and Security Informatics, pp. 63–67.

24. Har-Peled S, Mazumdar S (2004) On coresets for k-means andk-median clustering. In: Proceedings of the thirty-sixth annualACM symposium on Theory of computing, New York, pp 291–300

25. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821

26. Celeux G, Govaert G (1995) Gaussian parsimonious clusteringmodels. Pattern Recognit 28(50):781–793

27. Dunn JC (1973) A fuzzy relative of the ISODATA process andits use in detecting compact well-separated clusters. J. Cybernet3:32–57

28. Rezankova H, Loster T, Husek D (2011) Evaluation of categoricaldata clustering. Advances in Intelligent Web Mastering 86:173–182

29. Loster T, Langhamrova. J (2012) Disparities between regions ofthe Czech Republic for non-business aspects of labour market. In:International Days of Statistics and Economics, 6th ed., pp. 689–702

30. Oliveira JV (2007) Advances in fuzzy clustering and its applica-tions. 1 ed., Wiley, pp. 4–69

31. Zhang GP (2000) Neural Networks for Classification: A Survey.IEEE Trans Syst Man Cybern Part C Appl Rev 30(4):451–462

32. Lippmann RP (1989) Pattern classification using neural networks.IEEE Communication Magazine 27(11):47–64

33. Thomas JA (1991) Elements of information theory. John Wiley &Sons

34. Mazzei M, Palma AL (2014) Evaluating principal componentsanalysis of particular spatial statistical models. In: Sixth Interna-tional Conference on Advanced Geographic Information Systems,Applications, and Services, pp. 24–30

35. Zaldivar J, Calafate CT, Cano JC, Manzoni P (2011) Providingaccident detection in vehicular networks through OBD-II devicesand Android-based smartphones. In: IEEE 36th Conference onLocal Computer Networks (LCN), pp. 813–819

36. Chong M (2004) Traffic accident analysis using decision trees andneural networks. In: IADIS International Conference on AppliedComputing, Portugal, pp. 1–4

37. Birnbaum RT, Truglia J (2000) Getting to know OBD II. Manu-factured and United States


664 Z. Halim et al.

38. Akin D, BulentAkba (2010) A neural network (NN) modelto predict intersection crashes based upon driver, vehicle androadway surface characteristic. Sci Res Essays 5(19):2837–2847

39. Moghaddam R, Afandizadeh S, Ziyadi M (2010) Prediction ofaccident severity using artificial neural networks. InternationalJournal of Civil Engineering 9(1):41–48

40. Wahab A, Quek C, Keong T, Takeda K (2009) Driving pro-file modeling and recognition based on soft computing approach.IEEE Trans Neural Netw 20(4):563–582

41. Lv Y, Tang S, Zhao H, Li S (2009) Real-time highway acci-dent prediction based on support vector machines. In: Control andDecision Conference, CCDC ’09, pp. 4403–4407

42. Qu A, Wang W, Liu P, Noyce D (2012) Real-time prediction offreeway rear-ends crash potential by support vector machine. In:Annual Meeting Transport. Res, Board: Washington

43. Li X, Zhang DL, Xie Y (2008) Predicting motor vehiclecrashes using support vector machine models. Accid Anal Prev40(4):1611–1618

44. Halim Z, Baig AR, Zafar Z (2014) Evolutionary search in thespace of rules for creation of new two-player board games. Int JArtif Intell Tools 23(2):1–26

45. Abdel-Aty M, Rajashekar P (2006) Calibrating a real-timetraffic crash prediction model using archived weather and itstraffic data. IEEE Trans Intell Transp Syst 7(2):167–174

46. Xiao WH, Tan D (2004) Traffic accident prediction using3-d model-based vehicle tracking. IEEE Trans Veh Technol53(3):677–694

47. Ren J, Shen Y, Ma S, Guo L (2004) Applying multi-class svmsinto scene image classification. Innovations in Applied ArtificialIntelligence 3029:924–934

48. Wu K, Wang S (2009) Choosing the kernel parameters for supportvector machines by the inter-cluster distance in the feature space.Pattern Recognit 42(5):710–717

49. Young W, Sobhani A, Lenne M, Sarvi M (2014) Simulation ofsafety: a review of the state of the art in road safety simulationmodeling. Accid Anal Prev 66(5):89–103

50. Faizan A, Arif U, Abbasi R, Inam H (2013) Driver profilingproject report,faculty of computer science and engineering, GIKInstitute, Topi, Pakistan

51. Kalsoom R, Halim Z (2013) Clustering the driving features basedon data streams. In: 16th International Multi Topic Conference,INMIC13, Lahore, pp. 89–94

52. Das S, Zhou S, Lee JD (2012) Differentiating alcohol-induceddriving behavior using steering wheel signals. IEEE Trans IntellTransp Syst 13(3):1355–1368

53. Halim Z, Waqas M, Hussain SF (2015) Clustering large proba-bilistic graphs using multi-population evolutionary algorithm. InfSci 317:78–95

54. Guo F, Fang Y (2013) Individual driver risk assessment usingnaturalistic driving data. Accid Anal Prev 61:3–9

55. Keyes CL, Kendler KS, Myers JM, Martin CC (2015) The geneticoverlap and distinctiveness of flourishing and the big five person-ality traits. J Happiness Stud 16(3):655–668

56. Shi B, Xu L, Hu J, Tang Y, Jiang H, Meng W, Liu H (2015) Eval-uating driving styles by normalizing driving behavior based onpersonalized driver modeling. IEEE Trans Syst Man Cybern Syst.doi:10.1109/TSMC.2015.2417837

57. Li N, Busso C (2015) Predicting perceived visual and cognitivedistractions of drivers with multimodal features. IEEE Trans IntellTransp Syst 16(1):51–65

58. Dovgan E, Javorski M, Tusar T, Gams M, Filipic B (2013)Comparing a multiobjective optimization algorithm for dis-covering driving strategies with humans. Expert Systems withApplications 40(1):2687–2695

Zahid Halim received theB.S. degree in computer sci-ence from the University ofPeshawar, Pakistan, in 2004,M.S. degree in computerscience from the NationalUniversity of Computer andEmerging Sciences, Pakistan,in 2007, and also the Ph.D.degree in computer sciencefrom the National Universityof Computer and EmergingSciences, Pakistan, in 2010.He was with the NationalUniversity of Computer andEmerging Sciences, Islam-

abad, Pakistan, as a Faculty Member (Lecturer and then AssistantProfessor) from 2007 to 2010. Currently he is an Assistant Profes-sor with Ghulam Ishaq Khan Institute of Engineering Sciences andTechnology, Pakistan. He has more than 35 publications in journalsand international conferences. His current research interests includemachine learning and intelligent systems.

Rizwana Kalsoom receivedthe BSc degree in computerengineering from the univer-sity of engineering and tech-nology Taxila, Pakistan, in2011 and the MS degreein computer system engineer-ing from Ghulam Ishaq Khan(GIK) Institute of Engineer-ing Sciences and Technology,Pakistan, in 2014. She is cur-rently working as a researchassociate at the faculty of com-puter science and engineeringof GIK Institute. She got pro-fessional association and reg-

istration from Pakistan Engineering Council (PEC) after completion ofthe BSc degree. Engr. Kalsoom’s research interest includes data min-ing and image processing, specifically dynamics of vehicles control.

Abdul R. Baig received theB.E. degree in electricalengineering from the NEDUniversity of Engineeringand Technology, Karachi, Pak-istan, in 1987, the Diplome deSpecialisation degree in com-puter science from Supelec,Rennes, France, in 1996, andthe Ph.D. degree in computerscience from the Universityof Rennes-I, Rennes, France,in 2000. He was with theNational University of Com-puter and Emerging Sciences,Islamabad, Pakistan, as a

Faculty Member (Assistant Professor, Associate Professor, and thenProfessor) from 2001 to 2010. Currently he is a Professor withImam Muhammad bin Saud Islamic University, Riyadh, Saudi Ara-bia. He has more than 90 publications in journals and internationalconferences. His current research interests include data mining andevolutionary algorithms.


http://dx.doi.org/10.1109/TSMC.2015.2417837

raufbaig.files.wordpress.com · 2016. 12. 26. · artificial neural networks ·clustering methods...

Documents