a virtual sensor network framework for vehicle quality ...bvicam.ac.in/news/indiacom 2018...
Post on 24-May-2020
6 Views
Preview:
TRANSCRIPT
Proceedings of the 12th
INDIACom; INDIACom-2018; IEEE Conference ID: 42835
2018 5th
International Conference on “Computing for Sustainable Global Development”, 14th
- 16th
March, 2018
Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)
A Virtual Sensor Network Framework for Vehicle
Quality Evaluation
Mohammad Alwadi
Faculty of ESTeM,
University of Canberra,
Canberra, AUSTRALIA
Girija Chetty
Faculty of ESTeM
University of Canberra
AUSTRALIA
Mohammad Yamin
Faculty of Economics and Admin
King Abdulaziz University
SAUDI ARABIA
Abstract—In this paper we propose a novel virtual sensor network
framework for assessing the value and quality of the vehicle based
on a data dri ven approach, using machine learning and data
science techniques. The evaluation of the proposed approach done on two publicly available datasets, showed the capability for
automatic prediction of vehicle quality, based on different
characteristics of vehicle captured at different levels of vehicle
supply chain from manufacturing to the end user.
Keywords—Machine learning; Vehicle Quality, Risk Assessment
I. INTRODUCTION
Artificial Intelligence (AI) and machine learn ing systems have
become a focus of intensive research today, especially in Cyber physical systems area, monitoring large physical
environments, and for tracking environments within the homes, the cars and for different types of indoor and outdoor
spaces. Recently, there has been a tectonic shift happening in the vehicle and automotive industry – moving away fro m
traditional methods of selling cars to the adoption of data driven solutions based on machine learn ing, big data and the
artificial intelligence. As car and ride sharing gain in
popularity, quality aspects in terms of assessing the vehicle quality, the safety and the risks, based on historical data, and
data driven models are gaining increasing importance. Further, the demands for provision of several new services have also
become necessary: such as remote diagnostics, user behavior analysis, automatic owner identification, and many more. As a
result, the car is being transformed into a connected gadget on
wheels, including a browser for the real world, and powerfu l sensor or tracking device for everyone. This has led to need for
the traditional players in vehicle industry, the manufacturers and dealerships, to share the market opportunity with the IT
industry. So, how can the very latest trends in IT — artificial intelligence (AI), machine learn ing, and big data analytics —
help in the search for new data driven models in a changing car
market? Traditionally, vehicles, particularly cars, have been the purely
mechanical transportation solutions, and were not designed to provide any digital services. Over the years, car manufacturers
invested in the improvement of the quality of their engines and chassis, focused on improving safety and productivity. But the
growth of internet technologies and innovative developments that have been achieved in the fields of AI, machine learning,
and big data analytics are fueling the transformation in the auto
industry as well. With these cutting edge technologies , the car
manufacturers have been able to improve the driver
experience, meet customer expectations, enhance safety in terms of ensuring the vehicle is built safe enough to withstand
the safety on road, as well as to provide decision support to the driver in terms of emergency road assistance, during accidents
and vehicle breakdown, as well as allowing remote monitoring
of vehicle for provision of better driver support, navigation, and tracking and surveillance by control centers for law
enforcement agencies. This tremendous improvement in capabilit ies of the vehicles has become possible due to
improvement in hardware, and software technologies, as well as novel data driven approaches to learn from previous
mistakes, in terms of historical data, allowing users and customers to choose a safe and acceptable car.
According to automotive industry forecasts [1], by 2020 there
will be roughly 152 million connected cars on the roads, with each generating up to 30 terabytes of data daily. This massive
data provides immense opportunities using machine learning and data mining algorithms to improve vehicle and driver
safety, preserve customer privacy, and can identify driver behavior patterns, allowing manufacturers to offer services that
uniquely meet specific customer needs on a long term basis,
long after the vehicle has been purchased. For example, with data processing and the car’s management block, it’s possible
to identify when a car is about to break down, before it happens. The car owner can be alerted, and the manufacturer
can proactively contact a service center and register for repairs with a single click [1]. Further, providing value added
connected car services, can bring manufacturers a higher
marginal return compared to selling cars as standalone products. AI, Machine learning and data science algorithms
make it possible to sell each separate service efficiently to each user, and thus to earn more. More important, cars equipped
with connected services is not just a product, but a complete vehicle ecosystem with services, enabling continuous channels
of communication between the manufacturer and the customer.
II. BACKGROUND
The long cycles of automobile development process from manufacturers to dealers/resellers to end uses/customers, and
with availability of massive data stores, logging and tracking different activities in each stage of this process can provide
immense opportunities to achieve efficiencies in the process,
by analyzing this large data in depth using AI, machine learning and big data analysis algorithms, and for enriching the
Copy Right © INDIACom-2018; ISSN 0973-7529; ISBN 978-93-80544-28-1 1416
Proceedings of the 12th
INDIACom; INDIACom-2018; IEEE Conference ID: 42835
2018 5th
International Conference on “Computing for Sustainable Global Development”, 14th
- 16th
March, 2018
product planning process , in addition to providing clear
direction about the nature of customers and their consumable needs. AI can indicate a good time for changing cars,
understand that a lifestyle has evolved, and can offer drivers a new car already customized for their needs. Big data analysis
and AI algorithms can help manufacturers to forecast customer behavior. Collect ing data from networked and connected cars,
with other subsystems in vehicle manufacturing chain –
including several internal production systems, and analyzing this data with machine learning and data mining algorithms,
can speed up the business processes and decision-making time of car manufacturers. These technologies can help vehicle
manufacturers to be more customer centric, allowing them to choose the right solution and strategy for marketing the cohort
for different demographics and customer profiles, provide better sales and after-sales services by reaching out easily to
end users, and improve the product quality, with continuous
customer feedback. In this paper, we present details of our research work, towards
development of an innovative AI and machine learning technology platform for assessing car/automobile quality based
on historical information corresponding to and solutions based on forecasting user preferences and interests. The
technology platform provides value added services based on
requirements of the end users by automatically understanding their needs, interests, and requirements offer customized
solutions within the vehicle as well as outside the vehicle in monitoring and maintenance of the vehicle, thus creating a
unique and innovative solution of soft sensors seamless throughout the production chain, right from the manufacturer,
to the network of dealers and resellers, up to the end users and
customers. We use a novel formulation of the research problem, by
casting it as a virtual soft sensor network problem, and achieve dynamic node allocation using machine learn ing based
strategy, which optimizes the network nodes participating in the human machine communication subsystem. The optimal
machine learn ing solution is obtained by selecting most significant set of sensors, instead of using large number of
data collecting sensor nodes, used for collecting and storing
data continuously over time since the vehicle has been purchased, to the time it is discarded or change of ownership
happens [2, 3]. This is analogous to sensor networks which has been utilized in numerous applications, some examples of
which are wildlife monitoring [4], military target tracking and surveillance [5], hazardous environment exploration [6], and
natural disaster relief [7]. As many of the soft sensors collect
data continuously run unsupervised for longer periods of time, spanning into months and years, it could build immense
pressure on the energy resources. Therefor we need to design suitable data collection schemes which could cap the quantity
of transmitted data in the virtual soft sensor network. In this article, we propose an effective machine learning based
mechanis m for v irtual soft sensor network (SSN) for the
purpose of monitoring automobile supply chain system, which we claim would be energy efficient. The proposed approach
models the virtual SSN on the lines of wireless sensor networks (WSN), in data transmission networks, and uses an
adaptive routing scheme for energy efficiency. The adaptive
routing scheme eliminates redundant nodes in the SSN, based
on selecting most significant soft sensors for the accurate modeling of the data centric virtual SSN environment, with
historical data collected as the information source, corresponding to vehicle/automobile characteristics, usage
informat ion and the quality assessment, and predicting quality of service parameters, including vehicle quality, safety and
associated risks.
The experimental evaluation of the proposed virtual energy efficient SSN scheme was done with two publicly available car
and automobile datasets , and validates that the proposed scheme provides a good solution for solving complex
communicat ion and interaction aspects between internal and external vehicle systems. The proposed scheme allows
visualizing the virtual SSN similar to tradit ional wireless sensor network (WSN) parad igm, but includes a machine
learning formulation, that can provide better decision support
leveraging the benefits of data centric modelling techniques . We handle the complexity of virtual SSN with a data mining
mechanis m where each virtual sensor treated as an attribute of the data set, and all the virtual sensor nodes together constitute
the SSN set up equivalent to a multip le feature or attributes of the data set. With this, it is possible to dynamically adapt the
sensor nodes participating in the decision making loop, by
using powerful feature selection, dimensionality reduction and learning classifier algorithms from machine learning/data
mining field. In this way, we end up with an energy efficient monitoring/tracking system for assessing several endogenous
(internal) and exogenous(external) aspects of vehicle, such as vehicle quality, safety, and risks based on the historical data
[7]. This amounts to sourcing effective and efficient selection
and classification algorithms. For example, we can acquire an energy efficient solution even with missing and poor quality
noisy data, and sparse and insufficient information. We know that the accuracy of data mining mechanis ms or schemes
depends on the quantity of historical data, which we use for the purpose of forecasting the future state of the environment, the
virtual SSN may learn adaptively, as the quantity of data increases. This would trigger a trade-off between energy
efficiency and prediction accuracy. Thus we have
demonstrated that we can achieve this with an experimental validation of our proposed scheme with two publicly availab le
datasets, the car evaluation data set [9], and the automobile data set [10]. In the next section, we discuss the concept of
virtual soft sensor network, and the proposed scheme. The details of the two datasets used, and classification and
regression algorithms developed for experimental validation is
described in section V. In section VI, we present the details of experimental results obtained, and finally in Section VII , we
present conclude our discussion.
III. VIRTUAL SOFT SENSOR NETWORKS IN AUTOMOBILES
Depending on the model of the car or automobile, there are
more than 100 sensors deployed in a modern vehicle these
days, to measure wear and tear of the brakes, the tire pressure,
temperature and if the person was too close to a car. The focus
of the majority of these sensors is to monitor the state the car
and its safety. Recently, soft sensors have opened new avenues
Copy Right © INDIACom-2018; ISSN 0973-7529; ISBN 978-93-80544-28-1 1417
A Virtual Sensor Network Framework for Vehicle Quality Evaluation
to monitor and strengthen the safety and comfort of the riders.
For example, soft sensors embedded in a car seat can be
utilized to determine how comfortably the riders sit in the
vehicle, clearly exh ibit ing the weight distribution and posture
of the driver or the riders. Moreover, the seats can be
automatically adjusted to the personal liking of the riders, and
to ensure their comfort continuously throughout the journey.
Sensors for safety features like that of airbag can be
dynamically geared toward the individual sitting in the seat —
whether it’s an adult or a child — enabling the car to deploy
the airbag with appropriate pressure and height in the event of
an accident. The two publicly available datasets used in this
study use similar information for monitoring the quality and
safety aspects of a particular type of vehicle based on the data
logged from several vehicles, and can provide decision support
for future enthusiast or the car purchaser in making appropriate
decisions. The details of the two datasets are described next.
A. Car Evaluation Dataset
This publicly available multivariate data set [9], consists of
informat ion about car evaluation, using a single performance
measure, called as the car acceptability metric, derived from
several attributes, including overall price, buying price,
maintenance price, technical characteristics and comfort level
offered, which is represented as the number of doors, the
number of people it can carry, the boot size, and the estimated
car safety. The statistical summary for data set is described in
the Table 1 below. The prediction variable here is the car
acceptability metric, as a 4-class classification problem, in
terms of car quality being of acceptable, unacceptable, good or
very good quality of evaluation of the care quality as
acceptable, unacceptable good or very good, using the
attributes (aka soft sensor values), including buying and
maintenance price values, technical and comfort level
measurements and safety assessment metric. Since each of the
soft sensor attributes are not physically located or connected in
hard sense, similar to how a physical wireless sensor network
with hard sensors, what we have here is a virtual sensor
network, consisting of soft sensor nodes. We model this
network with a machine learning formulat ion, and obtain the
prediction of car quality based on the data collected from
several such vehicles and different attributes. Table 1 shows
the structure of car evaluation dataset (Dataset 1), and class
distribution (instances per class) is shown in Table 2.
TABLE I. CAR EVALUATION DATA SET
Car evaluation data set description
Buying
Maint Doors
Persons Lug_boot. Safety Car
Quality/Price
vhigh high
med low
vhigh high
med low
2 3
4 5
more
2 4
more
small med
Big
low med
high
unacc acc
good vgood
TABLE II. CLASS DISTRIBUTION (NUMBER OF INSTANCES PER CLASS
Class N N[% ]
unacc 1210 (70.023 %)
acc 384 (22.222 %)
good 69 (3.993 %)
v-good 65 (3.762 %)
This dataset is highly imbalanced with large instances for one
of the class (unacc), as compared to other class instances.
B. Automobile data set
This is the second publicly available data set used in the study,
and consists of three types of entities namely, (a) the
specification of the auto in terms of various characteristics, (b)
its assigned insurance risk rating, (c) its normalized losses in
use as compared to other cars. The second rating corresponds
to the degree to which the auto is more risky than its price
indicates. Cars are init ially assigned a risk factor symbol
associated with its price. Then, if it is more risky (or less), this
symbol is adjusted by moving it up (or down) the scale. This
"symboling" process assigns a value of +3 indicating that the
auto is risky, and -3 that it is probably pretty safe.
TABLE III. AUTOMOBILE DATA SET
Attribute number
Type Attribute number
Type
1 Symboling 7 Body style 2 Normalized losses 8 drive wheels
3 Make 9 engine location 4 Fuel Type 10 Wheel base
5 Aspiration 11 Length
6 Number of doors 12 Width 13 Height 20 Stroke
14 Curb weight 21 Compression-
ratio
15 Engine type 22 Horsepower 16 Number of cylinders 23 Peak rpm
17 Engine size 24 city mpg
18 Fuel system 25 highway mpg 19 Bore 26 Price
The third factor is the relative average loss payment per
insured vehicle year. This value is normalized for all autos
within a part icular size classification (two-door small, station
wagons, sports/specialty, etc...), and represents the average
loss per car per year. This data set is a regression task dataset
that is sparse, with large set of attributes (26 attributes) and
few instances available for machine to learn (just around 205
instances). Further, there is lot of missing data for several of
the instances. Table 3 shows the structure of the automobile
data set. This is a regression dataset with Price of the vehicle
as the output regression variable, to be predicted from 24
different vehicle attributes.
Both the datasets are of different levels of complexity
(classification vs. regression), with different type of attributes,
different number of attributes and size of data availab le. The
proposed virtual soft sensor network characterization of the
problem, and the use of machine learning approach to learn the
historical informat ion, it is possible to develop a data driven
Copy Right © INDIACom-2018; ISSN 0973-7529; ISBN 978-93-80544-28-1 1418
Proceedings of the 12th
INDIACom; INDIACom-2018; IEEE Conference ID: 42835
2018 5th
International Conference on “Computing for Sustainable Global Development”, 14th
- 16th
March, 2018
decision support model, in spite of the complex and poor
quality information, including class imbalance and sparsity.
The combined machine learning based virtual SSN strategy
allows leveraging the benefits of both technologies to predict
the output, here, the car quality, price, safety and risk
associated, even with incomplete, sparse and imbalanced
informat ion available. Next section discusses the algorithms
used for the proposed study.
IV. ALGORITHMS USED FOR THE PROPOSED STUDY
We examined two different sets of learning algorithms for
Dataset 1 (Car evaluation Dataset) and Dataset 2 (Automobile
dataset). For Dataset 1 (Car Evaluation Dataset), six different
classification algorithms were examined in this work,
including Naive Bayes, Lazy learn ing (kNN), Logistic
learning, Bagging with Random Forest as the base learner, J48
(decision tree) classifier, and CV Parameter selection (cross
validation parameter selection) with random forest classifier as
the base learner. We used a stratified cross validation with
different folds for examining different classifier algorithms.
For Dataset 2 (automobile dataset), being a regression task,
five different regression learning algorithms were examined,
including linear regression, Random Forest learner, CV
Parameter Selection, multilayer perceptron and the Support
vector regression with regularized optimizer and polynomial
kernel. As this dataset has several missing values, in addition
to large attributes (26 attributes) with small data size (205
instances), we used preprocessing algorithms including
standardization and resampling, in addition to different feature
selection algorithms to reduce the dimensionality, such as
correlation based feature selection with two different search
strategies, best first fit strategies, and greedy forward and
backward search strategies. Further details of each of these
algorithms are available in [3] and [9]. Next Section discusses
the experimental results achieved for each set of experiments .
V. EXPERIMENTAL RESULTS
Different sets of experiments were performed to examine the
relative performance of classification and regression learning
algorithms of the proposed vSSN learn ing framework. We
used k-fold stratified cross validation technique with different
folds for performing experiments, with k=10 and k=5. For
regression learning, since the data size was too small, we used
full train ing dataset for examining the baseline benchmark
performance measures. For Dataset 1, as the attributes
available were few, we did not use feature selection stage for
extracting most significant features. As can be seen Table 4
and Figure 1 below, for Dataset 1, it was possible to achieve
96% car quality predict ion accuracy (as acceptable,
nonacceptable, good and very good) based on 6 attributes (or
soft sensor information).
TABLE IV. CAR EVALUATION PREDICTION ACCURACY (UNACC, ACC, VGOOD AND GOOD)
Classifier Algorithm 10 Fold CV 5 fold CV ZeroR 70.02% 70.02%
Naïve Bayes 77.31% 76.85%
Classifier Algorithm 10 Fold CV 5 fold CV Lazy learning (kNN) 80.38% 80.72%
Logistic Classifier 82.35% 82.46% Bagging (Random Forest learner) 96.4% 96.6%
J48 (Decision Trees) 96.35% 95.83% CVParameter Selection 96.64% 95.5%
TABLE V. CAR EVALUATION PERFORMANCE MATRIX
Further, the data size for building the models, and the class
imbalance did not impact the performance as the prediction
performance shown in Table 4 for k= 10 and k = 5 folds is
almost similar. The three best performing classifiers are J48
(Decision Trees), Bagging Classifier with Random Forest
algorithm as the base learner and the CV parameter selection
classifier, which have a prediction accuracy higher than 95%,
in addition to better performance in terms of other metrics such
as confusion matrix, true positive and false positive rates,
precision and recall. Table 5 shows these metrics for one of the
best performing classifier.
Fig. 1. Results from car evaluation dataset
For Dataset 2, being a regression learning task, we used %
RMSE as the evaluation metric, and we derive p rediction
accuracy as (100- %RMSE). As since the size of the data
Copy Right © INDIACom-2018; ISSN 0973-7529; ISBN 978-93-80544-28-1 1419
A Virtual Sensor Network Framework for Vehicle Quality Evaluation
available was too small, we used 3 fold CV for build ing the
model, instead of 5 and 10 folds, and shown the performance
achieved when full training set was used for build ing the
model, just as a baseline benchmark performance measure.
TABLE VI. % RMSE FOR PRICE PREDICTION FOR DATASET 2
Algoritm Full
Training with all features
3 Fold CV (
with all 26 features)
3 fold CV with
BestFirst FeatureSelect (8 features )
3 fold CV with
GreedySearch FeatureSelect (5 features)
Linear Regression
3.69 7.19 7.19 7.37
Random Forest
0.75 2.03 2.38 2.42
Bagging 0.58 2.23 2.74 2.88
CVParam
Selection
0.72 1.97 2.40 2.43
MLP 2.13 3.19 5.44 6.03 SMOReg
(Poly)
4.37 5.20 7.93 8.24
Table 6 below shows the results of % RMSE (Root Mean
Square Error) achieved from the second dataset, the
Automobile data set, with %RMSE between 0.58 and 8.24.
The reason for the results to be in %RMSE, not as % accuracy
was because the Automobile data set was regression tasks not
a classification task. However, by subtracting the % RMSE out
of 100 (100- %RMSE) we can compare the performance
achieved for each algorithm across different datasets.
Fig. 2. Autombile data set experiments
As can be seen in Table 6, the performance achieved with two
different soft sensor (feature) selection algorithms (BestFirst
Search and Greedy Search) is comparable to 3 fold CV results.
However, instead of all 26 virtual sensors (attributes), with
BestFirst feature selection, only 8 virtual sensors are needed
and with Greedy Search only 5 virtual sensors are needed for
achieving similar performance. So, we could achieve an energy
efficiency by a factor of 26/8 = 3.25 and 26/5 = 5.2 for the two
different automatic sensor selection strategies used.
VI. CONCLUSION
In this paper we propose a novel virtual sensor network
framework based on data driven formulation for assessing the
vehicle price, and quality. The experimental validation of the
proposed framework, based on two publicly available datasets,
and different classification and regression learning algorithms,
showed promising results, and provides several opportunities
for better connection and communication in automobile supply
chain ecosystem, with a data driven strategy for providing
value added services.
REFERENCES
[1] Automative. Decisions fueled by insight, [Online], Last Accessed on 1/11/2017 from https://www.ihs.com/industry/automotive.html
[2] Ping, S., Delay measurement time synchronization for wireless sensor networks. Intel Research Berkeley Lab, 2003.
[3] Hall, M., et al., The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009. 11(1): p. 10-18.
[4] Csirik, J., P. Bertholet, and H. Bunke. Pattern recognition in wireless sensor networks in presence of sensor failures. 2011.
[5] Nakamura, E.F. and A.A.F. Loureiro, Information fusion in wireless sensor networks, in Proceedings of the 2008 ACM SIGMOD international conference on Management of data2008, ACM: Vancouver, Canada. p. 1365-1372.
[6] Bashyal, S. and G.K. Venayagamoorthy. Collaborative routing algorithm for wireless sensor network longevity. 2007. IEEE.
[7] Richter, R., Distributed Pattern Recognition in Wireless Sensor Networks, 2008, [Online], Last accessed on November1, 2017 from https://www.semanticscholar.org/paper/Distributed-Pattern-Recognition-in-Wireless-Sensor-Richter/d889fc994f21c0dad4eba693556de67ab1bf0e2b?tab=references
[8] Alwadi, M. and G. Chetty, Energy Efficient Data Mining Scheme for High Dimensional Data, Biodiversity Environment. Procedia Computer Science, Volume 46, 2015, Pages 483-490.
[9] Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
[10] B. Zupan, M. Bohanec, I. Bratko, J. Demsar: Machine learning by function decomposition. ICML-97, Nashville, TN. 1997.
[11] Kibler, D., Aha, D.W., & Albert,M. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence, Vol 5, 51—57
Copy Right © INDIACom-2018; ISSN 0973-7529; ISBN 978-93-80544-28-1 1420
top related