user clustering in smartphone applicationsprodei/dsie12/papers/paper_8.pdf · the problem of user...

12
User Clustering in Smartphone Applications Klaus Schaefers Doctoral Program in Informatics Engineering, Faculdade de Engenharia da Universidade do Porto, [email protected] http://fe.up.pt Abstract. Successful user interface design requires a deep understand- ing of the targeted audience. This paper explores the application of the K-Means algorithm to cluster smartphone users, in order to offer Human Computer Interaction (HCI) specialists a better insight into their user group. Different hierarchical ordered feature space representations are explored and their performance is evaluated against a synthetic data set. By applying the best performing features against a real world data set, obtained from a public available smartphone application, three distinct user groups could be identified. Keywords: Data Mining, Usage Mining, Cluster Analysis, HCI, Smart- phone 1 Introduction Designing user interfaces is a challenging task. The success and acceptance of the user interface depends crucially on the HCI specialists understanding of the target group and their needs. Wrong expectation of the target group often lead to severe shortcomings in terms of usability and a low market success. The main objective of this work is it to explore different clustering techniques and their applicability in the context of smartphone applications. Common design pro- cesses like user centered design pay special attention to the detailed description of the different user types and their needs. The users stereotypes are in these processes presented as so called personas, each having a background story and a detailed description of their requirements and skills. The precise construction of these personas is a very cumbersome and expensive task, as it requires for instance a large number of interviews [7,15]. However it is not always possible to get access to members of the target group. This is in particular a challenge for web or smartphone applications, that might be targeted at a very brought and heterogeneous, global audience. In the context of web applications it is there- fore common to monitor the user interaction and to perform data mining in the gathered data, to better understand the targeted user group. Among techniques like pattern mining, particularly clustering techniques have received much attention, as they provide the necessary information to parti- tion the audience into different persona like groups[14, 18]. The knowledge about

Upload: others

Post on 21-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

User Clustering in Smartphone Applications

Klaus Schaefers

Doctoral Program in Informatics Engineering,Faculdade de Engenharia da Universidade do Porto,

[email protected]

http://fe.up.pt

Abstract. Successful user interface design requires a deep understand-ing of the targeted audience. This paper explores the application of theK-Means algorithm to cluster smartphone users, in order to offer HumanComputer Interaction (HCI) specialists a better insight into their usergroup. Different hierarchical ordered feature space representations areexplored and their performance is evaluated against a synthetic data set.By applying the best performing features against a real world data set,obtained from a public available smartphone application, three distinctuser groups could be identified.

Keywords: Data Mining, Usage Mining, Cluster Analysis, HCI, Smart-phone

1 Introduction

Designing user interfaces is a challenging task. The success and acceptance ofthe user interface depends crucially on the HCI specialists understanding of thetarget group and their needs. Wrong expectation of the target group often leadto severe shortcomings in terms of usability and a low market success. The mainobjective of this work is it to explore different clustering techniques and theirapplicability in the context of smartphone applications. Common design pro-cesses like user centered design pay special attention to the detailed descriptionof the different user types and their needs. The users stereotypes are in theseprocesses presented as so called personas, each having a background story anda detailed description of their requirements and skills. The precise constructionof these personas is a very cumbersome and expensive task, as it requires forinstance a large number of interviews [7, 15]. However it is not always possibleto get access to members of the target group. This is in particular a challenge forweb or smartphone applications, that might be targeted at a very brought andheterogeneous, global audience. In the context of web applications it is there-fore common to monitor the user interaction and to perform data mining in thegathered data, to better understand the targeted user group.

Among techniques like pattern mining, particularly clustering techniqueshave received much attention, as they provide the necessary information to parti-tion the audience into different persona like groups[14, 18]. The knowledge about

Page 2: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

these user clusters is also a key factor for successful user interface design in thecontext of smartphone applications, especially if no in depth use case analysishas been carried out before [19]. For example the target user group could be sep-arated into distinct sub sets, each utilizing only parts of the application features.In such a case it might make sense to offer each group a different user interface,focusing on the required features.

In section one the paper introduces the related work, before the experimen-tal setup is presented and the different features spaces are discusses in detail.Afterwards the results will be presented and discussed. The paper ends with ashort presentation of future work.

2 Related Work

The problem of user clustering is a field that has been intensively researched inthe context of web usage mining, for instance to provide a customized shoppingexperience or to help search engines in scoring the retrieved documents based inthe users interests [14]. Most web mining systems are based on web server logfiles, from which the user sessions are extracted. Each session consists out of aset of URLs which have been visited and the timestamp of the request [21, 6, 5].

In the current literature the most common approaches try to capture thesession data in a vector space model, researching different features set and algo-rithms. For instance the session data might be transformed into a binary vectormodel, where each component indicates the presence of the absence of a certainURL in the session [11]. Others have used the page frequency instead as themain feature. Yan et al. for instance use the frequency to construct the featurespace and apply the Leader algorithm to perform the clustering [20]. Antonelliset al. use also the page frequency, but deploy the K-mean algorithm [8] insteadto determine the clusters [2]. However these two algorithms have in common theneed to specify the number of cluster k in advance. Different methods have beensuggested to overcome this issue. One approach is the so called “elbow method”,which calculates the sum of squared errors SSE for each k, and tries to detecta sudden decrease between k and k + 1. Another approach is to calculate theSilhouette coefficient, s, or the Davies Bouldin coefficient, d, for each k, andchoose the k with the best value of s or d [16]. We may refer the work of Kuoet al. who explored a new method for determining k. They applied an artificialneural network (ANN) to decide the optimal number of clusters k in advance,before using K-Means [11]. Later Kuo et al. also used an hierarchical algorithminstead of K-Means to solve this issue [9]. Due to the potentially large numberof web pages, the feature spaces might suffer from a high dimensionality, whichcauses problems during clustering. This phenomenon is also know as the “curseof dimensionality”, and leads to difficulties in the visualization of the clusteringresults and a poor performance of certain similarity measures [10, 4]. As a solu-tion Fu et al. introduce the concept of a generalized session, which uses attribute- oriented - induction to reduce the number of dimensions. Afterwards they usedthe BIRCH algorithm [22] to perform a hierarchical clustering [21]. Beside the

Page 3: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

URL of the requested page, the temporal dimension of the user interaction isseen as a valuable sign for the personal interest of a user regarding a certainpage. Generally it is assumed that a long duration of a page view indicates astrong interest [18, 1], anyhow it is to point out that the duration might differsignificantly between users and might be prone to noise. [3, 12].

Although web based applications share certain similarities with smartphoneapplications, there are some significant differences. Web applications consist nor-mally out of a large number of pages, where the user interaction can be describedby the transition from page to page, whereas smartphone applications are onlycomposed out of a few pages. Most of the interaction takes place within thepages, e.g. a click on a button. Therefore the obtained usage data must be morefine grained then the web server log files, and must capture each interactionevent.

3 Data Sets

The user interaction can be divided into distinct atomic interaction events. Eachinteraction event was triggered by the invocation of a widget wi which is elementof the set of all widgets W in the application. Each widget wi belongs one pagepi, which is an element of the set of all pages P . Each event has a certaintype yi, which is an element of the set of all event types Y that can occur inthe application. Typical examples are an onclick event which occurs in case abutton was pressed, or an onchange event if the value of a text field was changed.Moreover each event occurred at one point of time ti ∈ N. An event is assignedto a session si, which starts and ends with the application and is element of S,the set of all session. Each session is assigned to an element ui of the set of allusers U . Let E be the entire set of events

E = (e1, e2, .., en) with ei ∈W × Y × P × N× S × U (1)

Each interaction event ei be can described by the tuple

ei = (eiw, eiy, eip, ein, eis, eiu) (2)

with

eiw ∈W, eiy ∈ Y, eip ∈ Y, ein ∈ N, eis ∈ S, eiu ∈ U (3)

Due to the lack of a publicily available data set, a custom Android applica-tion was created in the scope of this work. The application offers the users thepossibility to play a simple memory game, in which they can test there knowl-edge in certain domains. In addition the users can select between the two levelsof difficulty easy and hard. The application implements two main use cases. Thefirst use cases targets new and inexperienced users. These users will face thestart page after launching the application and can start an easy memory gamedirectly pressing a large “Start” button. After finishing the game, users have thechoice between playing the same game again or moving on to another domain.

Page 4: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

Furthermore the users can view their high scores or navigate back to the “Wel-come” page. The second use case extends the first user case and targets at moreexperiences users, that are already familiar with the application. These users cannavigate from the “Welcome” page to several configuration pages, to customizethe domain of the game or to change level of difficulty from easy to hard. In casethe difficulty is set to hard, the users will face a more complex game field whichoffers more options, making it harder to solve the game in short time. The basicnavigation structure of the application is can be seen in Figure 1

Fig. 1. Conceptual model of the real world applicaton.

The application was published in the Android Market and all interactionevents were monitored and transmitted in regular intervals to a central server.After a period of three weeks 41 users had installed the application, and morethen 6500 events were submitted, belonging to 68 sessions.

4 Feature Spaces

The clustering was performed using nine different normalized feature spaces, inwhich each user is presented as an instance. Before the different presentationswhere calculated, the event stream was filtered to only include onclick events.This step is mandatory as a magnitude of events was logged in the real androidapplication. For instance changing the value of a text field, would result in oneonclick, one onfocus, one onchange and one onblur event. As result certain type ofwidgets, for instance text fields, would have a significant higher event frequencythen other widget, e.g. buttons, and would hence distort the result.

Page 5: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

The features spaces utilize on one hand the event frequency to describe theuser behavior, and on the other hand use the time stamps to model the temporaldimension of the user interaction. The temporal dimension can reveal the usersinterest in a certain page [18], but can also be seen as a hint regarding the level ofexperience the user has archived so far with the application. New users will havea longer latency between the interaction events, as they have to orient themselfin the new application, whereas experienced users will know the application, andhence have a lower latency [19].

Following the natural hierarchy of modern smart phone user interfaces, thefeatures space form a hierarchy as well. A smartphone application consists out ofn pages, each containing m widgets. Thus it is possible to calculate aggregatedvalues for different dimensions, starting at the widget level, over to the pagelevel and ending on the application level.

The first feature space V1 can been seen as a application level summary ofthe entire interaction. It contains the number of session s(ui)→ N, the numberof visited pages p(ui) → N, the number of invoked widgets w(ui) → N, themean session duration m(ui) → R and the mean latency l(ui) → R betweeninteraction events. Hence v1i ∈ V1 for one user ui ∈ U is

v1i = {s(ui), p(ui), w(ui),m(ui), l(ui)} (4)

Feature space V2 targets the page level, and can be considered a kind of “drilldown”, offering more details about the user interaction. It is composed out ofthe page frequencies pf(ui, pj) → R, this means v2i ∈ V2 contains the relativeamount of events the were observed in each page pj ∈ P for a given user ui ∈ U .

v2i = {pf(ui, p1), pf(ui, p2), ..., pf(ui, pn)} (5)

The temporal dimension of the user interaction is considered in the thirdfeature space V3. Instead of the event frequency, v3i ∈ V3 for a user ui ∈ U willcontain the mean time pt(ui, pj)→ R ui spend on each page p : j ∈ P .

v3i = {pt(ui, p1), pt(ui, p2), ..., pt(ui, pn)} (6)

V4 is the union of V2 and V3. It is composed out of the page frequencyand the mean time users spend on a page, and tries to capture the temporalcharacteristics and as well the preferred pages for a user ui ∈ U . v4i is definedas:

v4i = {pf(ui, p1), pt(ui, p1), pf(ui, p2), pt(ui, p2), ..., pf(ui, pn), pt(ui, pn)} (7)

The following three feature spaces will explore similar feature space repre-sentations, but on the widget level. v5i ∈ V5 is composed out of the frequencieswf(ui, wi)→ R of a widget wj ∈W for each user ui ∈ U .

v5i = {wf(ui, p1), wf(ui, p2), ..., wf(ui, pn)} (8)

Page 6: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

V6 presents the mean widget latency wl(ui, wj)→ R for a given user ui ∈ U .v6i ∈ V6 is therefore defined as

v6i = {wl(ui, w1), wl(ui, w2), ..., wl(ui, wn)} (9)

The union of V5 and V6 is V7. Like V4 it tries to model the temporal charac-teristics for a user ui ∈ U and his preferred widgets.

v7i = {wf(ui, w1), wl(ui, w1), wf(ui, w2), wl(ui, w2), ..., wf(ui, wn), wl(ui, wn)}(10)

Moreover two additional feature spaces will be investigated on the widgetlevel. v8 is inspired by the popular tf-idf measure form the field of naturallanguage processing [13]. The tf-idf measure gives a higher importance to theterm occurrence in a document, if only a small set of other documents containthe term as well. As a result it is possible to recognize characteristic terms fora each document. Each widget wj is considered the equivalent to a term andthe set of all events Ei ⊂ E of an user ui is the considered the equivalent of adocument. By doing so, v8 ∈ V8 will have a higher value for the characteristicwidgets per user. v8 ∈ V8 is defined as

v8i = {tfidf(ui, w1), tfidf(ui, w2), ..., tfidf(ui, wn)} (11)

where

tfidf(ui, wj) =|(ek ∈ Ei : ekw = wj)|

|Ei|× log(

|U ||x : wj ∈ Ej , j ∈ 1, ..|U ||

) (12)

V9 is a binary feature space which is composed out of the k most frequentwidgets. Let Tki be the set of the k most frequent used widgets of a given userui. Then tw(ui, wj) ∈ {0, 1} is a boolean function which returns 1 iff widgetwj ∈ Tki and 0 otherwise. For this paper k = 10 was found to give good results.v9i ∈ V9 can be defined as

v9i = {tw(ui, w1), tw(ui, w2), ..., tw(ui, wn)} (13)

5 User clustering

The K-Means algorithm, implemented in the R toolkit [17], was used to findthe clusters within the data sets. This paper will use the “elbow” method, theSilhouette coefficient and the Davies Bouldin coefficient for k ∈ {2..15} to deter-mine the best k. The maximum number of 15 was chosen, since a higher numberof clusters seems to be unreasonable for a population of only 41 users.

The different feature representations might result in different values for anoptimal k. To ensure the general feasibility of the taken approach, four synthetic

Page 7: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

data sets were created. These data sets were generated for a prototypical appli-cation which consists out of two pages, each containing a number of widgets. Thewidgets can be grouped into four distinct sets A,B,C and D, each presentingone application feature. To model different kind of users, for instance beginnersand professionals, several simulated users have been created, each interactingdifferently with the application. The main variables of these simulated users arethe average number and length (in events) of sessions and the average latencybetween interaction events. Each of the variables is modeled through a Gaussiandistribution, with a default standard deviation per parameter. The number ofsessions and the session length can be used to simulate the personal interest of auser. The latency related parameters allow the simulation of the different levelsof experience. Additionally the users are parametrized with a weighted set of ap-plication features, they use most of the times. An application feature is modeledthrough the set of widgets A, B, C, and D. By utilizing different combinationsof all these parameters the following four synthetic data set were created:

Data Set 1: This data set simulates two very heterogeneous groups of usersand consists out of 200 users. The first group only utilizes feature A and B ,has a short session length and in general a low number of sessions per user. Thesecond group utilizes feature C and D, has a medium session length and a largernumber of sessions.

Data Set 2: The data set consists out of 300 users, which can are divided intothree heterogeneous groups. Each group has a medium session length, a smallnumber of sessions and invokes the features A, B and C, but with a significantdifferent weight. The first group focuses on feature A and has a low latencybetween the events. The second group on prefers feature B and shows a mediumlatency and the thirst group invokes feature C mostly with a large latency.

Data Set 3: This data set simulates experienced and inexperienced usersand consists out of 200 users. The inexperienced group only utilizes feature A,has a medium latency but only short number of sessions and session events. Theexperienced group uses features A and B, has a short latency and a large numberof session and a medium session length.

Data Set 4: This data set model four set of users, which are partly overlap-ping. The data set consists out of 400 users. Two groups have a small numberand length of sessions and a medium latency, whereas the two remaining groupsshow a higher value for the session length and number of sessions and in addi-tion have a lower latency. Each group utilizes three of the four features, with adifferent weight, and each group has a different latency.

The expected results for each data set and each feature space are summarizedin Table 1.

6 Results and Conclusions

To ensure that the random seed of the K-Means algorithm does not influencethe results, the clustering was performed 100 times for all synthetic data sets,

Page 8: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

Table 1. Expected best k for synthetic data sets per feature space

Feature Space Data Set 1 Data Set 2 Data Set 3 Data Set 4

V1 2 3 2 2

V2 2 3 2 3

V3 2 3 2 2

V4 2 3 2 2

V5 2 3 2 4

V6 2 3 2 2

V7 2 3 2 4

V8 2 3 2 4

V9 2 3 2 4

and the mean values for the “elbow” method, the Silhouette coefficient and theDavies Bouldin coefficient are shown in Table 2.

Before the result will be discussed in detail, some general issues have to beexplained. It can be seen, that the Davies Bouldin coefficient returned in severalcases, which are marked with * in Table 2, a value of 15. For the further analysisthese values will not be taken into consideration, and will be treated as outliers.In some cases the values for the best and the second best k were quite close forthe Davies Bouldin coefficient. In such a case the second best k is noted behindthe best k in parenthesis. Moreover the “elbow” method was not conclusive infour cases, which are marked with ∗∗, because the “elbow” could not be clearlylocalized. This happened for data set 2 and 4 and the feature set V5, V8 andV9, anyhow it is to mention that the value for k was close to the expectedvalue, therefore a value range is given for the k. For the real world applicationthe “elbow” method turned out to be not useful in most of the cases, as thecurve rendered by the SSE / k pairs did not show any “elbow”. These result aredenoted with a “-”.

When comparing the outcome of the analysis of the synthetic data sets withthe expected values for the data sets it can be seen, that V1 performed generallywell. All methods return the expected k value, the only exception the “elbow”value for data set 2. This is remarkable, because Data Set 3 was constructed tohave three clearly separated user groups, but might be explained with the factthat the number of sessions and number of events per session was equal for allthree groups. V9 performed also well, if the outlier value of 15 for the DaviesBouldin in data set 4 is not considered. The “elbow” method anyhow does notdeliver a non-ambiguous result, and should therefore be interpreted with carein the context of V9. V5 seemed to perform well in the more simply data sets1,2 and 3, where the user groups are clearly separated and there is only a smalloverlap in the invoked features. The temporal dimension of the user interactionseems to be well captured by V3 on the page level, whereas the result in thewidget level V6 show only a good result for the “elbow” method. The combinedfeature spaces V4 and V7 do not show satisfying results, and should therefore notbe considered in the analysis of the real world application.

Page 9: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

Table 2. Best k for all data sets per feature space and evaluation method

Feature Space Method Data Set 1 Data Set 2 Data Set 3 Data Set 4 Real Data

V1 Elbow 2 5 2 2 4Silhouette 2 3 2 2 2Davies Bouldin 2 3 2 2 4

V2 Elbow 2 2 2 2 6Silhouette 2 2 2 3 2Davies Bouldin 2 2 2 3 6

V3 Elbow 2 3 2 2 -Silhouette 4 3 2 2 3Davies Bouldin 15 * 3 2 2 3

V4 Elbow 2 3 2 2 -Silhouette 4 3 2 2 3Davies Bouldin 15 * 3 2 2 3

V5 Elbow 2 3-4 ** 2 5 -Silhouette 2 3 2 5 2Davies Bouldin 2 3 2 3 2

V6 Elbow 2 3 2 2 -Silhouette 2 2 2 3 2Davies Bouldin 2 2 2 2 6 (2)

V7 Elbow 2 4 2 3 -Silhouette 2 2 2 2 2Davies Bouldin 2 2 2 2 6 (2)

V8 Elbow 2 3-4 ** 2 5 6Silhouette 2 2 2 3 2Davies Bouldin 2 2 2 3 2

V9 Elbow 2 3-4 ** 2 4-5 ** -Silhouette 2 3 2 4 2Davies Bouldin 2 3 15 * 4 15

The real word usage data shows problems regarding the application of the“elbow” method. The reason for this might be the low number of users in thedata set, which makes it hard to find large and consistent clusters. Only in V1 thismethod returns a clear result of 4. The Davis Bouldin method indicates 4 as well,only the Silhouette coefficient returns 2. The widget frequency represented byV5 has a non ambiguous result of 2 .V9, which performed well on the syntheticdata set, confirms the value of k = 2, but like in the data set 4 the DaviesBouldin index returns k = 15. The screen latency which is model by V3 returnsthe value 3 for k for the Silhouette and the Davies Bouldin method. Table 3shows the cluster sizes, the Silhouette and the Davies Bouldin coefficient for themost promising features spaces and the corresponding best k.

The resulting clusters of V1 and V3 have both one cluster which only con-tains a single user. As V1 and V3 both contain the the number of invoked pages,we might conclude that this user distinguishes himself only through the num-

Page 10: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

Table 3. Cluster sizes and cluster quality

Feature Space k Cluster Sizes Silhouette Davies Bouldin

V1 4 25, 1, 8,7 0.71 0.39

V3 3 33, 7, 1 0.65 0.69

v5 2 35, 6 0.57 1.00

ber of pages he has used. Furthermore all clusters contain one cluster which issignificantly larger then the remaining ones, which can be seen as an indicator,that the entire set of users can be split into one main group and several smallergroups.

Figure 2 shows the plot of the mean latency l(ui) against the number ofinvoked widgets w(ui) per user. The size of the circles presents the numberof sessions per users. The figure reveals, like the cluster analysis before, theexistence of a larger group which utilizes only a subset of the application widgetsand which has a low latency. In contrast to this group, two smaller subsets can beidentified. One group with a larger latency, which only uses parts of the widgetsand another group with a larger number of widgets and a a low latency. It canbe also seen, that this group has in general a larger number of sessions, which isin indication for an particular interest in the application.

This observations are reasonable in the context of the real world application,since the application offers one main use case for new users, and one extendeduse case for more experienced users. The correlation between the larger numberof sessions and the low latency seems to be sound as well, as users gather moreexperience in the application, and can therefore full fill there goals faster.

7 Conclusion and Future Work

In this work nine different feature spaces were subject to investigation, regardingtheir feasibility to cluster smartphone usage data. The feature spaces were eval-uated against four synthetic data sets. A features space summarising the userinteraction in only 5 dimensions showed the best results. However, to capturethe temporal dimension of the user interaction, the mean latency per screen wasfound to form a good feature space. The widget frequency performed also well,but focuses on the most frequent used functions of the application.

The major problem during the elaboration of the results was the lack of asufficient data set. To overcome this issue, a custom smartphone application wasdeveloped to harvest real world usage data. Within three weeks, the data of 41users could be collected. After applying the best performing feature spaces inconjunction with the K-Means algorithm, three user groups could be identified.The discovered clusters show good values for the Silhouette and Davies Bouldincoefficient, and fit the use case structure of the developed application.

Nonetheless the collected data set was quite small, which made it challeng-ing to draw valid conclusions. Therefore one of the next goals should be the

Page 11: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

Fig. 2. Mean latency l(ui) against the number of invoked widgets w(ui). The size ofthe circle indicates the number of sessions s(ui).

acquisition of a larger data set. Once this data set is available other algorithmsshould be subject to further research, for instance hierarchical or density basedalgorithms. Furthermore new ways of cluster visualization should be explored,as it might be hard to interpret the results, especially in the context of highdimensional feature spaces, like those working on the widget level.

References

1. Shafiq Alam, Gillian Dobbie, and Patricia Riddle. Particle Swarm OptimizationBased Clustering of Web Usage Data. In 2008 IEEE/WIC/ACM InternationalConference on Web Intelligence and Intelligent Agent Technology, pages 451–454.IEEE, December 2008.

2. Panagiotis Antonellis, Christos Makris, and Nikos Tsirakis. Algorithms for clus-tering clickstream data. Information Processing Letters, 109(8):381–385, March2009.

3. Arindam Banerjee and Joydeep Ghosh. Clickstream Clustering Using WeightedLongest Common Subsequences. Proceedings of the Web Mining Workshop at the1st SIAM Conference on Data Mining, May 2001.

4. Richard Ernest Bellman. Dynamic programming. Princeton University Press, 1957.5. Jose Borges, Mark Levene, Brij Masand, and Myra Spiliopoulou. Web Usage Analy-

sis and User Profiling, volume 1836 of Lecture Notes in Computer Science. SpringerBerlin Heidelberg, Berlin, Heidelberg, June 2000.

6. R. Cooley, B. Mobasher, and J. Srivastava. Web mining: information and patterndiscovery on the World Wide Web. In Proceedings Ninth IEEE International Con-ference on Tools with Artificial Intelligence, pages 558–567. IEEE Comput. Soc,November 1997.

Page 12: User Clustering in Smartphone Applicationsprodei/dsie12/papers/paper_8.pdf · The problem of user clustering is a eld that has been intensively researched in the context of web usage

7. Alan Dix, Janet E. Finlay, Gregory D. Abowd, and Russell Beale. Human-Computer Interaction (3rd Edition). Prentice Hall, 2003.

8. Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Tech-niques, Second Edition (The Morgan Kaufmann Series in Data Management Sys-tems). Morgan Kaufmann, 2005.

9. Gang Kou and Chunwei Lou. Multiple factor hierarchical clustering algorithmforlarge scale web page and search engine clickstream data. Annals of OperationsResearch, pages 1–12, February 2010.

10. Hans-Peter Kriegel, Peer Kroger, and Arthur Zimek. Clustering high-dimensionaldata. ACM Transactions on Knowledge Discovery from Data, 3(1):1–58, March2009.

11. R.J. Kuo, J.L. Liao, and C. Tu. Integration of ART2 neural network and geneticK-means algorithm for analyzing Web browsing paths in electronic commerce.Decision Support Systems, 40(2):355–374, August 2005.

12. Pawan Lingras and Chad West. Interval Set Clustering of Web Users with RoughK-Means. Journal of Intelligent Information Systems, 23(1):5–16, July 2004.

13. Christopher D. Manning. Foundations of Statistical Natural Language Processing[Gebundene Ausgabe]. Mit Press, 1999.

14. Bamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Automatic personal-ization based on Web usage mining. Communications of the ACM, 43(8):142–151,August 2000.

15. Jakob Nielsen. Usability Engineering [Taschenbuch]. Morgan Kaufmann; Auflage:New edition, 1994.

16. Tan Pang-Ning, Steinbach Michael, and Kumar Vipin. Introduction to Data Mining[Hardcover]. Addison Wesley; 1 edition, 2005.

17. RProject. The R Project for Statistical Computing, 2011.18. Jaideep Srivastava, Robert Cooley, Mukund Deshpande, P.N. Tan, Birgit Hay,

Geert Wets, Koen Vanhoof, Alan Dix, Janet E. Finlay, Gregory D. Abowd, andRussell Beale. Web usage mining: Discovery and applications of usage patternsfrom web data. In Proc. Intelligent Techniques for Web Personalization: 17th Int.Joint Conf. Artificial Intelligence, volume 1, pages 1–6. Citeseer, 2000.

19. Thomas Tullis and William Albert. Measuring the User Experience: Collecting,Analyzing, and Presenting Usability Metrics. Morgan Kaufmann, 2008.

20. Tak Woon Yan, Matthew Jacobsen, Hector Garcia-Molina, and Umeshwar Dayala.From user access patterns to dynamic hypertext linking. Computer Networks andISDN Systems, 28(7-11):1007–1014, May 1996.

21. Kanwalpreet Sandhu Yongjian Fu. Clustering of Web Users Based on Access Pat-terns. In In Proceedings of the 1999 KDD Workshop on Web Mining. Springer-Verlag, 1999.

22. Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: A New Data Clus-tering Algorithm and Its Applications, June 1997.