filtering erroneous positioning data with moving window...

6
Filtering Erroneous Positioning Data with Moving Window Approach HA YOON SONG Department of Computer Engineering Hongik University 72-1, Sangsu, Mapo, Seoul KOREA [email protected] HAN GYOO KIM Department of Computer Engineering Hongik University 72-1, Sangsu, Mapo, Seoul KOREA [email protected] Abstract: Positioning data systems such as GPS, GLONASS, and Galileo are famous ones, in addition cellu- lar network positioning with base station locations or crowd source WIFI positioning approaches are available. Therefore, users with smart phones or other mobile devices can obtain current position regardless of the position- ing system. However, positioning data sets usually have erroneous data for various reasons. Authors experienced quite a large number of erroneous positioning data using Apple iPhone or Samsung Galaxy devices, and thus need to filter evident errors. In this paper, we will suggest relatively simple, but efficient filtering method based on statistical approach. From the user’s mobile positioning data in a form of < latitude, longitude, time > obtained by mobile devices, we can calculate user’s speed and acceleration. From the idea of sliding window or moving window, we can calculate statistics of speed and acceleration of user position data and thus filtering can be made with controllable parameters. We expect that the simplicity of our algorithm can be applied on portable mobile device with low computation power. Key–Words: Human Mobility, Error Filtering, Positioning Data, Moving Window, Sliding Window 1 Introduction The recent advances of mobile devices enable vari- ous location based services over human mobility, es- pecially the introduction of smart phone with GPS or other positioning equipment. However these posi- tioning data sometimes have position errors accord- ing to the operational environment. In such cases, many of applications require filtering of such erro- neous positioning data. As we experienced by our ex- periments, more than 12% of positioning data were erroneous by use of smart phones. This basic experi- ment was done by use of smart phone app over Sam- sung Galaxy Tab which internally uses the position of cellular base station, over portable GPS device [1], and Apple iPhone 3GS with iOS5 which uses com- bination of crowd sourced WIFI positioning, cellu- lar networks, and GPS [2]. More precise result can be found in Kim and Song [3] along with researches on human mobility model. Another research field of complex system physics showed that up to 93% of hu- man mobility can be predicted since peoples avoid the random selection of next destination instead selects their place frequented and their route frequented [4]. The sets of positioning data will be a basis for hu- man mobility model construction as shown in [3]. In this paper, we will propose a filtering technique which filters erroneous positioning data with the use of mov- ing window approach. Section 2 shows our idea using moving window with pre-experiments for algorithm set-up. Section 3 shows filtering algorithm and its de- tailed description. Section 4 shows our consideration on user controllable parameters for experiment design and shows our experimental results. We will conclude and discuss about our future research in section 5. 2 Backgrounds 2.1 Idea on Moving Window Collected user position in a form of < latitude, longitude, time > composes a set of user mobile trace and adding an identification param- eter to the tuple will represent user’s mobility data set. We call one tuple at time t as P t , and latitude of P t as lat t , longitude of P t as lon t . From two consecutive position data tuples, we can calculate the distance D i moved at time P i with < lat i , lon i > and < lat i-1 , lon i-1 > according Vincenty’s formula [5]. Recent Researches in Communications, Signals and Information Technology ISBN: 978-1-61804-081-7 47

Upload: others

Post on 26-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Filtering Erroneous Positioning Data with Moving Window ...wseas.us/e-library/conferences/2012/SaintMalo/SITE/SITE-06.pdf · iPhone 3GS with positioning data collection app. We will

Filtering Erroneous Positioning Datawith Moving Window Approach

HA YOON SONGDepartment of Computer Engineering

Hongik University72-1, Sangsu, Mapo, Seoul

[email protected]

HAN GYOO KIMDepartment of Computer Engineering

Hongik University72-1, Sangsu, Mapo, Seoul

[email protected]

Abstract: Positioning data systems such as GPS, GLONASS, and Galileo are famous ones, in addition cellu-lar network positioning with base station locations or crowd source WIFI positioning approaches are available.Therefore, users with smart phones or other mobile devices can obtain current position regardless of the position-ing system. However, positioning data sets usually have erroneous data for various reasons. Authors experiencedquite a large number of erroneous positioning data using Apple iPhone or Samsung Galaxy devices, and thus needto filter evident errors. In this paper, we will suggest relatively simple, but efficient filtering method based onstatistical approach. From the user’s mobile positioning data in a form of < latitude, longitude, time > obtainedby mobile devices, we can calculate user’s speed and acceleration. From the idea of sliding window or movingwindow, we can calculate statistics of speed and acceleration of user position data and thus filtering can be madewith controllable parameters. We expect that the simplicity of our algorithm can be applied on portable mobiledevice with low computation power.

Key–Words: Human Mobility, Error Filtering, Positioning Data, Moving Window, Sliding Window

1 IntroductionThe recent advances of mobile devices enable vari-ous location based services over human mobility, es-pecially the introduction of smart phone with GPSor other positioning equipment. However these posi-tioning data sometimes have position errors accord-ing to the operational environment. In such cases,many of applications require filtering of such erro-neous positioning data. As we experienced by our ex-periments, more than 12% of positioning data wereerroneous by use of smart phones. This basic experi-ment was done by use of smart phone app over Sam-sung Galaxy Tab which internally uses the positionof cellular base station, over portable GPS device [1],and Apple iPhone 3GS with iOS5 which uses com-bination of crowd sourced WIFI positioning, cellu-lar networks, and GPS [2]. More precise result canbe found in Kim and Song [3] along with researcheson human mobility model. Another research field ofcomplex system physics showed that up to 93% of hu-man mobility can be predicted since peoples avoid therandom selection of next destination instead selectstheir place frequented and their route frequented [4].The sets of positioning data will be a basis for hu-

man mobility model construction as shown in [3]. Inthis paper, we will propose a filtering technique whichfilters erroneous positioning data with the use of mov-ing window approach. Section 2 shows our idea usingmoving window with pre-experiments for algorithmset-up. Section 3 shows filtering algorithm and its de-tailed description. Section 4 shows our considerationon user controllable parameters for experiment designand shows our experimental results. We will concludeand discuss about our future research in section 5.

2 Backgrounds

2.1 Idea on Moving WindowCollected user position in a form of <latitude, longitude, time > composes a set ofuser mobile trace and adding an identification param-eter to the tuple will represent user’s mobility dataset. We call one tuple at time t as Pt, and latitudeof Pt as latt, longitude of Pt as lont. From twoconsecutive position data tuples, we can calculate thedistance Di moved at time Pi with < lati, loni > and< lati−1, loni−1 > according Vincenty’s formula [5].

Recent Researches in Communications, Signals and Information Technology

ISBN: 978-1-61804-081-7 47

Page 2: Filtering Erroneous Positioning Data with Moving Window ...wseas.us/e-library/conferences/2012/SaintMalo/SITE/SITE-06.pdf · iPhone 3GS with positioning data collection app. We will

Of course, from the two consecutive distance, we cancalculate speed at time of Pi, Vi, and acceleration attime of Pi, ai. Therefore a tuple Pt has a core formof < t, latt, lont, Dt, Vt, at > with possible auxiliaryattributes.

Based on the speed values on actual positiondata set, we find a glitch on a series of speed suchas 600m/sec, which are meaningless for usual life-time environment. Thus we investigated maximumpossible speed values of usual human mobility asshown in table 1. We define max speed MAXspeed as250 m/s. In addition, those maximum values cannotbe reached instantly, i.e. the acceleration cannotbe made abruptly. In addition to the MAXspeed,MAXacceleration can be defined. For the next stepwe introduced the idea of moving average andmoving standard deviation of speed. We definethe moving average of speed at current time t,MAspeed(n) where n stands for the number of pastdata of {Px : t − n + 1 ≤ x ≤ t}. Similarly, wecan define the moving standard deviation at time t,MSDspeed(n) where n stands for the number ofpast data. Here, n is referred as window size bymost of researchers. Once we obtain a new tuplePt, then we can determine whether Vt is in the usualrange of human mobility, and for the at, the samecan be applicable. Once Vt is out of range for thenormal distribution with average of MAspeed(n) andstandard deviation of MSDspeed(n) we can discardPt and this tuple will be filtered out from the series ofhuman trace. The condition of filtering out Pt is:

Vt > MAspeed(n) + s×MSDspeed(n) (1)

where s stands for the sensitivity level of filteringand user controllable parameters. Otherwise, we caninclude the tuple Pt to the series as a valid position-ing data, and thus can recalculate MAspeed(n) andMSDspeed(n). Note that this calculation can be madein real-time, and we intentionally introduced this ap-proach because it requires relatively simple calcula-tion. In other words, this algorithm can be executedon a device with low computing power such as smartphones or similar mobile devices. Our previous re-search includes similar work with more complicatedstatistical theory [6] however it is not likely to be in-troduced for real time environment.

2.2 Pre-experimentsWe will conduct experiments to see the effect of win-dow size. For this experiment, we used a set of po-sitioning data over the area of Seoul, Korea collected

Table 1: Maximum Speed of Transportation Methods

Method Speed (m/s)

Ambulation 3.00Bicycle 33.33

Automobile 92.78Sports-Car 244.44

High-Speed Train 159.67Air-plain 528.00

Table 2: Typical Error Examples (unit: meter)Method Average Std. Dev. Max. Err. Ratio

GPS-Out 4.449 7.169 51.77 12.30%GPS-In N/A N/A 10769 N/A

3GBS-Out 52.661 23.595 206.35 36.75%3GBS-In 52.553 32.685 156.75 48.66%

Figure 1: A Trail of Positioning Data Set

Figure 2: Effects of Different Window Size

Recent Researches in Communications, Signals and Information Technology

ISBN: 978-1-61804-081-7 48

Page 3: Filtering Erroneous Positioning Data with Moving Window ...wseas.us/e-library/conferences/2012/SaintMalo/SITE/SITE-06.pdf · iPhone 3GS with positioning data collection app. We will

during more than 20 days. Note that this data is vol-untarily collected by the author of this paper usingiPhone 3GS with positioning data collection app. Wewill call it iPhone data set. The app records positiondata whenever it senses the location change of iPhoneor every user-defined interval (3 to 60 seconds) if theiPhone is in immobile state. The visualization of rawdata sets is shown in figure 1 and contains erroneousdata. One of the notable phenomenon is that iOS5sometimes give simultaneous report of position datafrom three different schemes. Our guess is that cel-lular base station locations, crowd source WIFI po-sitioning, and GPS sometimes reports different posi-tioning data at the same time. In case we meet multi-ple position values at the same time, the position datawith smallest speed value was the correct one empir-ically. Therefore the first stage of filtering is triviallyto choose the position data with the smallest distanceto previous position among multiple position data ofthe same time. In addition, data sets are composedof several set of discontinued data, for example, po-sition data collection was unable on the subway trainor there was no need to collect data in the home bed.While traveling by subway a train, only at the stationswas it possible to collect positioning data.

In order to determine n, we must be more consid-erate. With varying size of n with the algorithm overdata set, we made several set of experiments. We canexpect that large windows size cannot react to the situ-ation of rapid speed change while it is useful in stable,immobile states, successfully coping with continuouserrors. On the contrary small n will show rapid reac-tion to abrupt speed change for mobile states whichmaybe is considered to include continuous error tu-ples. For example, once we met m continuous errorswe cannot filter out such errors if m > n. In orderto figure out our conjecture, we make experiments onwindow size upon the iPhone data set. We calculatedmoving average and moving standard deviation overvarious window size such as 5, 10, 25, 50, and 100.Figure 2 shows the result of our simple experiment. Itis clear that larger window size is dull. Once we metvery large speed value, large window size tails the ef-fect of large error speed thus leads to under-filteringof erroneous data. In case of window size with 100or 50, we can clearly see the tailing effect on figure 2.On the contrary, small window size is sensitive andrapidly reacts on speed change while it over-filterscorrect data especially at the starting phase of speedchange. We experienced two or more plausible tupleswere discarded by the small window size while theylook like correct speed data with window size 5 or 10.This phenomena implies that we must introduce the

throttling mechanism to moving window in order toavoid tailing effect of window size.

3 Filtering AlgorithmAccording to the considerations in section 2, we buildan algorithm for erroneous positioning data filteringas shown in algorithm 1. With the new acquisitionof new position data Pt, this algorithm can determinewhether Pt be filtered out or not. In practice, we mustconsider several situation:◦ Initial construction of Moving Window: in case

there are less than n tuples, we cannot constructcomplete moving average and moving standard de-viation. Instead, we have to incomplete windowwith less number of data: lines 2 to 5 in algorithm 1.

◦ Both acceleration and speed are considered as throt-tling parameters even though they have difference infiltering details. Positioning data tuple with out ofrange speed or with unreasonable acceleration willbe filtered out: lines 6 to 8 in algorithm 1.

◦ Once the speed of a tuple is too big, which can affectthe MA and MSD values and leads to filtering er-ror as expansion of confidence interval which leadsto erroneous filter-in of to-be-filtered tuple, we willcalibrate the speed value with MAspeed(n)+2.57×MSDspeed in order to include possibly rapid changeof speed and to avoid erroneous expansion of confi-dence interval: lines 9 to 11 in algorithm 1. Thevalue s = 2.57 stands for 99.5% confidence intervalof normal distribution. This is throttling which re-duces the effect of erroneous speed to moving win-dow: s99.5 in lines 9 and 10 of algorithm 1.

◦ Window Construction: Even though a tuple be fil-tered out, we will include the speed of the tuple un-less it is out of 99.5% confidence interval of normaldistribution. It is intended to update moving windowin order to cope with rapid change of speed, i.e. atuple can be filtered out with rapid change of speedwhile it is a genuine one. In such cases, even thoughthe tuple was filtered, moving window can reflectthe change of speed for the next incoming tuples:line 10 in algorithm 1.

◦ Small speed less than 2.77 m/s (10Km/h) will notbe filtered since it is always possible for a humanambulation within this small speed, and it is also inGPS error range: MINvelocity line 9 in algorithm 1.

◦ Tuples with unreal acceleration must be filtered. Asreported in [7], 10.8m/s is the current biggest value

Recent Researches in Communications, Signals and Information Technology

ISBN: 978-1-61804-081-7 49

Page 4: Filtering Erroneous Positioning Data with Moving Window ...wseas.us/e-library/conferences/2012/SaintMalo/SITE/SITE-06.pdf · iPhone 3GS with positioning data collection app. We will

for sport-cars: MAXacceleration as line 12 in algo-rithm 1.

◦ Once a tuple be filtered out due to excessive ac-celeration, the tuple must be filtered, the accel-eration value of the tuple is forced to set asMAXacceleration, and the speed value is forced toset as MAspeed(n) to nullify the effect of unrealspeed value: lines 12 to 16 in algorithm 1.

◦ Tuples with positive acceleration values more thanMAXacceleration will be regarded as errors whilepositioning data tuples with negative accelerationvalues will not be regarded as errors since it is al-ways possible for a vehicle to stop emergently withlarge negative acceleration.

Algorithm 1 Moving Window Filtering at real-time t

Require: P0 . At least one initial tuple is requiredRequire: window size nRequire: user sensitivity level sEnsure: Check validness of new position tupleEnsure: Calibrated series of tuple {Pi : t ≥ i > 0}

for t inputsRequire: i=0

1: repeat Get Pi+1 . Acquisition of new tuple, ifexist

2: Construct MAspeed(n) with {Px : max(i −n+ 1, 0) ≤ x ≤ i}

3: Construct MSDspeed(n) with {Px : max(i−n+ 1, 0) ≤ x ≤ i}

4: Set MAspeed = MAspeed(n)5: Set MSDspeed = MSDspeed(n)6: if (Vi+1 > MAspeed + s ×MSDspeed) OR

(ai+1 ≥MAXacceleration) then7: Mark Pi+1 as filtered.8: end if9: if (Vi+1 ≥ MAspeed + s99.5 × MSDspeed)

AND (Vi+1 > MINvelocity) then10: Set Vi+1 = MAspeed+s99.5×MSDspeed

11: end if12: if ai+1 ≥MAXacceleration then13: Mark Pi+1 as filtered14: Set Vi+1 = MAspeed

15: Set ai+1 = MAXacceleration

16: end if17: Set i = i + 118: until Exist no more tuple input

Figure 3: Trail of Filtered Positioning Data with n=10and s=0.34 (63% confidence interval)

4 Experiments4.1 Experiment DesignFor the algorithm from section 3, we left two param-eters unspecified: n as a number of tuples in the win-dow and s as a sensitivity level of filtering. Both ofthe parameters can be specified by user of this algo-rithm. The user sensitivity level s has relatively sim-ple to determine. From the properties of normal dis-tribution, we can obtain s with proper confidence in-terval. Since we use only the positive part of normaldistribution for filtering, we can set s = 1.64 for con-fidence interval of 95% and s = 2.33 for confidencelevel of 99%. Users can determine s as their own pur-pose. For example, we choose the sensitivity level sas follows. Table 2 shows a typical error rate for posi-tion data collecting when the device is immobile. Thepostmark ”-in” stands for the result inside a buildingwhile ”-out” stands for the data was collected outsidearea. For GPS case, 12.3% data was erroneous andfor 3GBS cellular positioning system, 36.75% of datahas errors. So we can choose s = 1.16 for GPS dataor s = 0.34 for cellular positioning data in our ex-periment. We already discussed the effect of windowsize in section 2. We will choose window size as (5,10, 25, 50, 100) according to the pre-experiments andwill show the effect of speed calibration to the variouswindow size.

4.2 Filtering ResultsWe make a combination of window size and sensitiv-ity level. Table 3 shows the percentage of filtered-outtuples in each combination of parameters. Among thevarious window size, we focus on results of n = 10.

Recent Researches in Communications, Signals and Information Technology

ISBN: 978-1-61804-081-7 50

Page 5: Filtering Erroneous Positioning Data with Moving Window ...wseas.us/e-library/conferences/2012/SaintMalo/SITE/SITE-06.pdf · iPhone 3GS with positioning data collection app. We will

Table 3: Ratio (%) of Filtered-out Tupleswindow size s=0.34 s=1.16 s=1.64 s=2.33

5 42.20 29.07 24.06 19.3110 38.67 24.10 18.81 14.3125 37.37 21.43 16.07 11.8550 37.71 20.54 15.23 11.07

100 37.05 20.03 14.90 10.65

We experienced up to four consecutive errors and thusconcluded that n = 5 is deficient window size whilen = 10 will cover consecutive errors and will re-duce the tailing effect of larger window size. Figure 3shows the filtering result of n = 10 and s = 0.34.Figure 4 shows the filtering result of 88% confidenceinterval for n = 10. The collector cannot find errorsin this figure while it contains more data for position-ing than the result shown in figure 3. Therefore, wecan conclude that proper selection of window size andsensitivity level will leads to adequate filtering results.Finally, figure 5 shows the effect of window size un-der filtering process. The x-axis stands for wall clocktime on 11th of November, 2011. Thin black solidline shows the change of real speed in m/s and thingray dashed line shows calibrated speed by our filter-ing algorithm. Calibrated speed usually overlappedwith speed while it shows calibration once a tuple bedecided by filtering algorithm. It is interesting that thecalibrated speed naturally provides the effect of inter-polation. Other lines shows calibrated moving win-dow. Thick black solid line denotes the size of movingwindow with n=10. It reacts rapidly with the changeof speed while successfully filters erroneous tuples. Itis also true for thick gray solid line with n=5, while itstill have tendency of over-filtering. Other windowswith bigger n show the tendency of under-filtering,and they cannot reflect the speed change efficiently.

5 ConclusionIn this research we build algorithm for erroneous posi-tion data filtering and experienced the combinationaleffect of window size and sensitivity level. A real setof positioning data collected by author is used for al-gorithm verification and we found successful filteringresults in general. Several parameters of the algorithmmust be defined by user such as window size, sensitiv-ity level, maximum speed and maximum acceleration.Even it is possible for a user can change constant ofthe algorithm such as MAXacceleration, s99.5 (max-imum sensitivity level) and MINvelocity (minimum

Figure 4: Trail of Filtered Positioning Data with n=10and s=1.16 (88% confidence interval)

threshold of speed for filtering) of the algorithm.While investigating filtering process one by one,

we find several minute frailties of our algorithm. Thefirst one is that out algorithm cannot work at the start-ing phase of data collection because at the stage ofinitial window construction there were not enough tu-ples to fill the whole window. We think it is a com-pulsory demerit for every approach based on movingwindow. The second problem is a tendency of over-filtering and under-filtering. The arrival of new tuplewith rapid increase of speed will be filtered out regard-less of its correctness. This tendency is clear whenwe have a large window size since large window can-not react fast enough to catch the change of speed. Incompensation, we include the velocity of filtered tupleto keep the window width to cope with speed changeunless the change of speed is out of 99% of confidenceinterval of normal distribution. In case the velocity isout of 99% confidence interval, we calibrate the veloc-ity for the future construction of moving window. Aswe noticed in figure 5 the tendency of under-filteringis clear with larger window size. With smaller win-dows we cannot filter if number of consecutive errorsis bigger than window size, as we experienced fourconsecutive errors in our data. Thus we think n=10 isbetter choice than n=5. Another benefit of small win-dow size is a small computation time and small mem-ory capacity for moving window so that this algorithmcan work on mobile devices with low computationalpower in real time.

We must express three sorts of further considera-tions. The first one is windows size as a time durationinstead of number of tuples in a window. This will beeffective once we regularly collect position data and

Recent Researches in Communications, Signals and Information Technology

ISBN: 978-1-61804-081-7 51

Page 6: Filtering Erroneous Positioning Data with Moving Window ...wseas.us/e-library/conferences/2012/SaintMalo/SITE/SITE-06.pdf · iPhone 3GS with positioning data collection app. We will

Figure 5: Filtering with n=10 and s=1.16

somewhat accurate since speed is a function of time.The second one is pseudo real time algorithm ratherthan real time one as algorithm 1. For a window sizen, we can decide filtering of tuple dne instead of fil-tering (n+1)th tuple. Even though it cannot be usedin real time, this approach can reduce the tendencyof under-filtering and over-filtering. We consider thepseudo real time version of filtering algorithm as ournext research. We expect to see the tradeoff of real-timeness of filtering versus accuracy of filtering soon.Finally, we need to investigate the effect of probabilitydistribution for filtering. In general, normal distribu-tion is a usual candidate for various sources of errorsand filtering. However, a research showed that humanmobility pattern is in a heavy tailed distribution suchas Levy Walk [4]. We will therefore see the effect ofLevy Walk for filtering since the distribution of posi-tioning data will be likely to be in a Levy Walk form.

Acknowledgements: This research was supported byBasic Science Research Program through the NationalResearch Foundation of Korea(NRF) funded by theMinistry of Education, Science and Technology (No.2011-0025875).

References:

[1] Garmin GPSMAP62s, https://buy.garmin.com/shop/shop.do?pID=

63801

[2] iOS 5: Understanding Location Services,http://support.apple.com/kb/ht4995

[3] Hyunuk Kim and Ha Yoon Song, Daily Life Mo-bility of A Student: From Position Data to Hu-man Mobility Model through Expectation Max-imization Clustering, Communications in Com-puter and Information Science, Volume 263, De-cember 2011, pp. 88-97.

[4] Marta C. Gonzalez and A. Hidalgo and Albert-Laszlo Barabasi: Understanding individual hu-man mobility patterns: Nature, (2008)

[5] T. Vincenty, Direct and Inverse Solutions ofGeodesics on the ellipsoid with Application ofNested Equations, Survey Review, Volume 23,Number 176, April 1975 , pp. 88-93(6).

[6] Woojoong Kim and Ha Yoon Song, Optimiza-tion Conditions of OCSVM for Erroneous GPSData Filtering, Communications in Computerand Information Science, Volume 263, Decem-ber 2011, pp. 62-70.

[7] List of fastest production cars by acceler-ation, http://en.wikipedia.org/wiki/LisP_of_fastesP_cars_by_acceleration

Recent Researches in Communications, Signals and Information Technology

ISBN: 978-1-61804-081-7 52