research article parametric rough sets with application to

14
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2013, Article ID 461363, 13 pages http://dx.doi.org/10.1155/2013/461363 Research Article Parametric Rough Sets with Application to Granular Association Rule Mining Xu He, 1 Fan Min, 1,2 and William Zhu 1 1 Lab of Granular Computing, Minnan Normal University, Zhangzhou 363000, China 2 School of Computer Science, Southwest Petroleum University, Chengdu 610500, China Correspondence should be addressed to Fan Min; [email protected] Received 2 August 2013; Accepted 12 October 2013 Academic Editor: Gerhard-Wilhelm Weber Copyright © 2013 Xu He et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Granular association rules reveal patterns hidden in many-to-many relationships which are common in relational databases. In recommender systems, these rules are appropriate for cold-start recommendation, where a customer or a product has just entered the system. An example of such rules might be “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol.” Mining such rules is a challenging problem due to pattern explosion. In this paper, we build a new type of parametric rough sets on two universes and propose an efficient rule mining algorithm based on the new model. Specifically, the model is deliberately defined such that the parameter corresponds to one threshold of rules. e algorithm benefits from the lower approximation operator in the new model. Experiments on two real-world data sets show that the new algorithm is significantly faster than an existing algorithm, and the performance of recommender systems is stable. 1. Introduction Relational data mining approaches [14] look for patterns that include multiple tables in the database. ese works generate rules through two measures, namely, support and confidence. Granular association rule mining [5, 6], com- bined with the granules [714], is a new approach to look for patterns hidden in many-to-many relationships. is approach generates rules with four measures to reveal con- nections between granules in two universes. A complete example of granular association rules might be “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol.” Here 45%, 6%, 40%, and 30% are the source coverage, the target coverage, the source confidence, and the target confidence, respectively. With these four measures, the strength of the rule is well defined. erefore, granular association rules are semantically richer than other relational association rules. Granular association rules [5, 6] are appropriate to solve the cold-start problem for recommender systems [15, 16]. Recommender systems [1720] suggest products of interest to customers; therefore, they have gained much success in E-commerce and similar applications. e cold- start problem [15, 16] is difficult in recommender systems. Recently, researchers have addressed the cold-start problem where either the customer or the product is new. Naturally, content-based filtering approaches [17, 21] are used for these problems. However, the situation with both new customer and new product has seldom been considered. A cold-start recommendation approach has been proposed based on granular association rules for the new situation. First, both customers and products are described through a number of attributes, thus forming different granules [8, 10, 2224]. en we generate rules between customers and products through satisfying four measures of granular association rules. Examples of granular association rules include “men like alcohol” and “Chinese men like France alcohol.” ey are proposed at different degrees of granularity. Finally, we match the suitable rules to recommend products to customers. A granular association rule mining problem is defined as finding all granular association rules given thresholds on four measures [5]. Similar to other relational association rule mining problems (see, e.g., [2527]), this problem is challenging due to pattern explosion. A straightforward

Upload: others

Post on 15-Mar-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 461363 13 pageshttpdxdoiorg1011552013461363

Research ArticleParametric Rough Sets with Application toGranular Association Rule Mining

Xu He1 Fan Min12 and William Zhu1

1 Lab of Granular Computing Minnan Normal University Zhangzhou 363000 China2 School of Computer Science Southwest Petroleum University Chengdu 610500 China

Correspondence should be addressed to Fan Min minfanphd163com

Received 2 August 2013 Accepted 12 October 2013

Academic Editor Gerhard-WilhelmWeber

Copyright copy 2013 Xu He et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Granular association rules reveal patterns hidden in many-to-many relationships which are common in relational databases Inrecommender systems these rules are appropriate for cold-start recommendation where a customer or a product has just enteredthe system An example of such rules might be ldquo40 men like at least 30 kinds of alcohol 45 customers are men and 6products are alcoholrdquo Mining such rules is a challenging problem due to pattern explosion In this paper we build a new type ofparametric rough sets on two universes and propose an efficient rule mining algorithm based on the new model Specifically themodel is deliberately defined such that the parameter corresponds to one threshold of rules The algorithm benefits from the lowerapproximation operator in the new model Experiments on two real-world data sets show that the new algorithm is significantlyfaster than an existing algorithm and the performance of recommender systems is stable

1 Introduction

Relational data mining approaches [1ndash4] look for patternsthat include multiple tables in the database These worksgenerate rules through two measures namely support andconfidence Granular association rule mining [5 6] com-bined with the granules [7ndash14] is a new approach to lookfor patterns hidden in many-to-many relationships Thisapproach generates rules with four measures to reveal con-nections between granules in two universes A completeexample of granular association rules might be ldquo40 menlike at least 30 kinds of alcohol 45 customers are menand 6 products are alcoholrdquo Here 45 6 40 and30 are the source coverage the target coverage the sourceconfidence and the target confidence respectively Withthese four measures the strength of the rule is well definedTherefore granular association rules are semantically richerthan other relational association rules

Granular association rules [5 6] are appropriate tosolve the cold-start problem for recommender systems [1516] Recommender systems [17ndash20] suggest products ofinterest to customers therefore they have gained much

success in E-commerce and similar applications The cold-start problem [15 16] is difficult in recommender systemsRecently researchers have addressed the cold-start problemwhere either the customer or the product is new Naturallycontent-based filtering approaches [17 21] are used for theseproblems However the situation with both new customerand new product has seldom been considered A cold-startrecommendation approach has been proposed based ongranular association rules for the new situation First bothcustomers and products are described through a numberof attributes thus forming different granules [8 10 22ndash24]Then we generate rules between customers and productsthrough satisfying four measures of granular associationrules Examples of granular association rules include ldquomenlike alcoholrdquo and ldquoChinese men like France alcoholrdquoThey areproposed at different degrees of granularity Finally wematchthe suitable rules to recommend products to customers

A granular association rule mining problem is definedas finding all granular association rules given thresholds onfour measures [5] Similar to other relational associationrule mining problems (see eg [25ndash27]) this problem ischallenging due to pattern explosion A straightforward

2 Mathematical Problems in Engineering

sandwich algorithm has been proposed in [5] It starts fromboth entities and proceeds to the relation Unfortunately thetime complexity is rather high and the performance is notsatisfactory

In this paper we propose a new type of parametric roughsets on two universes to study the granular association rulemining problem We borrow some ideas from variable preci-sion rough sets [28] and rough sets on two universes [29ndash31]to build the new model The model is deliberately adjustedsuch that the parameter coincides with the target confidencethreshold of rules In this way the parameter is semanticand can be specified by the user directly We compare ourdefinitions with alternative ones and point out that theyshould be employed in different applications Naturally ourdefinition is appropriate for cold-start recommendation Wealso study some properties especially themonotonicity of thelower approximation of our new model

With the lower approximation of the proposed para-metric rough sets we design a backward algorithm for rulemining This algorithm starts from the second universe andproceeds to the first one hence it is called a backwardalgorithm Compared with an existing sandwich algorithm[5] the backward algorithm avoids some redundant compu-tation Consequently it has a lower time complexity

Experiments are undertaken on two real-world data setsOne is the course selection data from Minnan NormalUniversity during the semester between 2011 and 2012 Theother is the publicly available MovieLens data set Resultsshow that (1) the backward algorithm is more than 2 timesfaster than the sandwich algorithm (2) the run time is linearwith respect to the data set size (3) samplingmight be a goodchoice to decrease the run time and (4) the performance ofrecommendation is stable

The rest of the paper is organized as follows Section 2reviews granular association rules through some examplesThe rule mining problem is also defined Section 3 presentsa new model of parametric rough sets on two universes Themodel is defined to cope with the formalization of granularassociation rules Then Section 4 presents a backward algo-rithm for the problem Experiments on two real-world datasets are discussed in Section 5 Finally Section 6 presentssome concluding remarks and further research directions

2 Preliminaries

In this section we revisit granular association rules [6] Wediscuss the data model the definition and four measures ofsuch rules A rule mining problem will also be represented

21The DataModel Thedata model is based on informationsystems and binary relations

Definition 1 119878 = (119880 119860) is an information system where 119880 =1199091 1199092 119909

119899 is the set of all objects 119860 = 119886

1 1198862 119886

119898 is

the set of all attributes and 119886119895(119909119894) is the value of119909

119894on attribute

119886119895for 119894 isin 1 2 119899 and 119895 isin 1 2 119898

An example of information system is given by Table 1(a)where119880 = 1198881 1198882 1198883 1198884 1198885 and119860 = Age Gender Married

Country Income NumCars Another example is given byTable 1(b)

In an information system any1198601015840 sube 119860 induces an equiva-lence relation [32 33]

1198641198601015840 = (119909 119910) isin 119880 times 119880 | forall119886 isin 119860

1015840 119886 (119909) = 119886 (119910) (1)

and partitions 119880 into a number of disjoint subsets calledblocks or granules The block containing 119909 isin 119880 is

1198641198601015840 (119909) = 119910 isin 119880 | forall119886 isin 119860

1015840 119886 (119910) = 119886 (119909) (2)

The following definition was employed by Yao and Deng[23]

Definition 2 A granule is a triple [23]

119866 = (119892 119894 (119892) 119890 (119892)) (3)

where 119892 is the name assigned to the granule 119894(119892) is arepresentation of the granule and 119890(119892) is a set of objects thatare instances of the granule

119892 = 119892(1198601015840 119909) is a natural name to the granule The rep-

resentation of 119892(1198601015840 119909) is

119894 (119892 (1198601015840 119909)) = ⋀

119886isin1198601015840

⟨119886 119886 (119909)⟩ (4)

The set of objects that are instances of 119892(1198601015840 119909) is

119890 (119892 (1198601015840 119909)) = 119864

1198601015840 (119909) (5)

The support of the granule is the size of 119890(119892) divided bythe size of the universe namely

supp (119892 (1198601015840 119909)) = supp(⋀119886isin1198601015840

⟨119886 119886 (119909)⟩)

= supp (1198641198601015840 (119909)) =

10038161003816100381610038161198641198601015840 (119909)1003816100381610038161003816

|119880|

(6)

Definition 3 Let119880 = 1199091 1199092 119909

119899 and119881 = 119910

1 1199102 119910

119896

be two sets of objects Any119877 sube 119880times119881 is a binary relation from119880 to 119881 The neighborhood of 119909 isin 119880 is

119877 (119909) = 119910 isin 119881 | (119909 119910) isin 119877 (7)

When119880 = 119881 and 119877 is an equivalence relation 119877(119909) is theequivalence class containing 119909 From this definition we knowimmediately that for 119910 isin 119881

119877minus1(119910) = 119909 isin 119880 | (119909 119910) isin 119877 (8)

A binary relation is more often stored in the database asa table with two foreign keys In this way the storage is savedFor the convenience of illustration here we represented itwith an 119899 times 119896 Boolean matrix An example is given inTable 1(c) where 119880 is the set of customers as indicatedin Table 1(a) and 119881 is the set of products as indicated inTable 1(b)

With Definitions 1 and 3 we propose the followingdefinition

Mathematical Problems in Engineering 3

Table 1 A many-to-many entity-relationship system

(a) Customer

CID Name Age Gender Married Country Income NumCars1198881 Ron 20ndash29 Male No USA 60 kndash69 k 0-11198882 Michelle 20ndash29 Female Yes USA 80 kndash89 k 0-11198883 Shun 20ndash29 Male No China 40 kndash49 k 0-11198884 Yamago 30ndash39 Female Yes Japan 80 kndash89 k 21198885 Wang 30ndash39 Male Yes China 90 kndash99 k 2

(b) Product

PID Name Country Category Color Price1199011 Bread Australia Staple Black 1ndash91199012 Diaper China Daily White 1ndash91199013 Pork China Meat Red 1ndash91199014 Beef Australia Meat Red 10ndash191199015 Beer France Alcohol Black 10ndash191199016 Wine France Alcohol White 10ndash19

(c) Buys

CIDPID 1199011 1199012 1199013 1199014 1199015 1199016

1198881 1 1 0 1 1 01198882 1 0 0 1 0 11198883 0 1 0 0 1 11198884 0 1 0 1 1 01198885 1 0 0 1 1 1

Definition 4 A many-to-many entity-relationship system(MMER) is a 5-tuple ES = (119880119860 119881 119861 119877) where (119880 119860) and(119881 119861) are two information systems and 119877 sube 119880times119881 is a binaryrelation from 119880 to 119881

An example of MMER is given in Tables 1(a) 1(b) and1(c)

22 Granular Association Rules with Four Measures Now wecome to the central definition of granular association rules

Definition 5 A granular association rule is an implication ofthe form

(GR) ⋀119886isin1198601015840

⟨119886 119886 (119909)⟩ 997904rArr ⋀

119887isin1198611015840

⟨119887 119887 (119910)⟩ (9)

where 1198601015840 sube 119860 and 1198611015840 sube 119861

According to (6) the set of objects meeting the left-handside of the granular association rule is

LH (GR) = 1198641198601015840 (119909) (10)

while the set of objects meeting the right-hand side of thegranular association rule is

RH (GR) = 1198641198611015840 (119910) (11)

From the MMER given in Tables 1(a) 1(b) and 1(c) wemay obtain the following rule

Rule 1 One has

⟨Gender Male⟩ 997904rArr ⟨Category Alcohol⟩ (12)

Rule 1 can be read as ldquomen like alcoholrdquo There are someissues concerning the strength of the rule For example wemay ask the following questions on Rule 1

(1) How many customers are men(2) How many products are alcohol(3) Do all men like alcohol(4) Do all kinds of alcohol favor men

An example of complete granular association rules withmeasures specified is ldquo40 men like at least 30 kindsof alcohol 45 customers are men and 6 products arealcoholrdquo Here 45 6 40 and 30 are the sourcecoverage the target coverage the source confidence and thetarget confidence respectivelyThese measures are defined asfollows

The source coverage of a granular association rule is

scov (GR) = |LH (GR)||119880|

(13)

The target coverage of GR is

tcov (GR) = |RH (GR)||119881|

(14)

4 Mathematical Problems in Engineering

There is a tradeoff between the source confidence and thetarget confidence of a rule Consequently neither value canbe obtained directly from the rule To compute any one ofthem we should specify the threshold of the other Let tc bethe target confidence thresholdThe source confidence of therule is

sconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

(15)

Let sc be the source confidence threshold and

|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870 + 1|

lt sc times |LH (GR)|

le |119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870|

(16)

This equationmeans that sctimes100 elements in LH(GR) haveconnectionswith at least119870 elements in RH(GR) but less thansctimes100 elements in LH(GR) have connections with at least119870 + 1 elements in RH(GR) The target confidence of the ruleis

tconf (GR sc) = 119870

|RH (GR)| (17)

In fact the computation of 119870 is nontrivial First for any119909 isin LH(GR) we need to compute tc(119909) = |119877(119909) cap RH(GR)|and obtain an array of integers Second we sort the array ina descending order Third let 119896 = lfloorsc times | LH(GR)|rfloor 119870 is thekth element in the array

The relationships between rules are interesting to us Asan example let us consider the following rule

Rule 2 One has

⟨Gender Male⟩ and ⟨Country China⟩

997904rArr ⟨Category Alcohol⟩ and ⟨Country France⟩ (18)

Rule 2 can be read as ldquoChinese men like France alcoholrdquoOne may say that we can infer Rule 2 from Rule 1 sincethe former one has a finer granule However with the fourmeasures we know that the relationships between these tworules are not so simple A detailed explanation of Rule 2might be ldquo60 Chinese men like at least 50 kinds of Francealcohol 15 customers are Chinesemen and 2 products areFrance alcoholrdquo Comparing Rules 1 and 2 is stronger in termsof sourcetarget confidence however it is weaker in terms ofsourcetarget coverage Therefore if we need rules coveringmore people and products we prefer Rule 1 if we need moreconfidence on the rules we prefer Rule 2 For example if thesource confidence threshold is 55 Rule 2 might be validwhile Rule 1 is not if the source coverage is 20 Rule 1 mightbe valid while Rule 2 is not

23 The Granular Association Rule Mining Problem Astraightforward rule mining problem is as follows

Input An ES = (119880 119860 119881 119861 119877) a minimal source coveragethreshold ms a minimal target coverage threshold mt aminimal source confidence threshold sc and aminimal targetconfidence threshold tc

Output All granular association rules satisfying scov(GR) gems tcov(GR) ge mt sconf(GR) ge sc and tconf(GR) ge tc

Since both sc and tc are specified we can choose either(15) or (17) to decide whether or not a rule satisfies thesethresholds Equation (15) is a better choice

3 Parametric Rough Sets on Two Universes

In this section we first review rough approximations [32]on one universe Then we present rough approximations ontwo universes Finally we present parametric rough approx-imations on two universes Some concepts are dedicated togranular association rules We will explain in detail the waythey are defined from semantic point of view

31 Classical Rough Sets The classical rough sets [32 34] arebuilt upon lower and upper approximations on one universeWe adopt the ideas and notions introduced in [35] and definethese concepts as follows

Definition 6 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 are

119877 (119883) = 119909 isin 119883 | 119877 (119909) sube 119883 (19)

119877 (119883) = 119909 isin 119883 | 119877 (119909) cap 119883 = 0 (20)

respectively

These concepts can be employed for set approximationor classification analysis For set approximation the interval[119877(119883) 119877(119883)] is called rough set of 119883 which provides anapproximate characterization of 119883 by the objects that sharethe same description of its members [35] For classificationanalysis the lower approximation operator helps findingcertain rules while the upper approximation helps findingpossible rules

32 Existing Parametric Rough Sets Ziarko [28] pointed outthat the classical rough sets cannot handle classification witha controlled degree of uncertainty or amisclassification errorConsequently he proposed variable precision rough sets [28]with a parameter to indicate the admissible classificationerror

Let 119883 and 119884 be nonempty subsets of a finite universe 119880and 119884 sube 119883 The measure 119888(119883 119884) of the relative degree ofmisclassification of the set 119883 with respect to set 119884 is definedas

119888 (119883 119884) =

1 minus|119883 cap 119884|

|119883| |119883| gt 0

0 |119883| = 0

(21)

where | sdot | denotes set cardinality

Mathematical Problems in Engineering 5

The equivalence relation 119877 corresponds to a partitioningof119880 into a collection of equivalence classes or elementary sets119877lowast = 119864

1 1198642 119864

119899 The 120572-lower and 120572-upper approxima-

tion of the set 119880 supe 119883 are defined as

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) le 120572

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) lt 1 minus 120572

(22)

where 0 le 120572 lt 05For the convenience of discussion we rewrite his defini-

tion as follows

Definition 7 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 under precision 120573 are

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|ge 120573

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|gt 1 minus 120573

(23)

respectively 119877(119909) is the equivalence class containing 119909

Note that 05 lt 120573 le 1 indicate the classification accuracy(precision) threshold rather than the misclassification errorthreshold as employed in [28] Wong et al [31] extended thedefinition to an arbitrary binary relation which is at leastserial

Ziarko [28] introduced variable precision rough sets witha parameter The model had seldom explanation about theparameter Yao and Wong [36] studied the condition ofparameter and introduced the decision theoretic rough set(DTRS) model DTRS model is a probabilistic rough sets inthe framework of Bayesian decision theory It requires a pairof thresholds (120572 120573) instead of only one in variable precisionrough sets Its main advantage is the solid foundation basedon Bayesian decision theory (120572 120573) can be systematicallycomputed by minimizing overall ternary classification cost[36]Therefore this theory has drawnmuch research interestsin both theory (see eg [37 38]) and application (see eg[39ndash41])

Gong and Sun [42] firstly defined the concept of theprobabilistic rough set over two universes Ma and Sun [43]presented the parameter dependence or the continuous oflower and upper approximations about two parameters 120572and 120573 for every type probabilistic rough set model over twouniverses in detail Moreover a new model of probabilisticfuzzy rough set over two universes [44] is proposed Thesetheories have broad and wide application prospects

33 Rough Sets on Two Universes for Granular AssociationRules Since our datamodel is concerned with two universeswe should consider computationmodels for this type of dataRough sets on two universes have been defined in [31] Somelater works adopt the same definitions (see eg [29 30])We will present our definitions which cope with granularassociation rulesThenwe discuss why they are different fromexisting ones

Definition 8 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation The lower and upper approximations of 119883 sube

119880 with respect to 119877 are

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) supe 119883 (24)

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) cap 119883 = 0 (25)

respectivelyFrom this definition we know immediately that for 119884 sube

119881

119877minus1(119884) = 119909 isin 119880 | 119877 (119909) supe 119884

119877minus1 (119884) = 119909 isin 119880 | 119877 (119909) cap 119884 = 0

(26)

Nowwe explain these notions through our example119877(119883)contains products that favor all people in119883 119877minus1(119884) containspeoplewho like all products in119884119877(119883) contains products thatfavor at least one person in 119883 and 119877minus1(119884) contains peoplewho like at least one product in 119884

We have the following property concerning the mono-tonicity of these approximations

Property 1 Let1198831sub 1198832

119877 (1198831) supe 119877 (119883

2) (27)

119877 (1198831) sube 119877 (119883

2) (28)

That is with the increase of the object subset the lowerapproximation decreases while the upper approximationincreases It is somehow ad hoc to people in the rough setsociety that the lower approximation decreases in this caseIn fact according to Wong et al [31] Liu [30] and Sun et al[44] (24) should be rewritten as

1198771015840(119883) = 119910 isin 119881 | 119877

minus1(119910) sube 119883 (29)

where the prim is employed to distinguish between twodefinitions In this way 1198771015840(119883

1) sube 1198771015840(119883

2) Moreover it would

coincide with (19) when 119880 = 119881We argue that different definitions of the lower approx-

imation are appropriate for different applications Supposethat there is a clinic systemwhere119880 is the set of all symptomsand119881 is the set of all diseases [31]119883 sube 119880 is a set of symptomsAccording to (29) 1198771015840(119883) contains diseases that induced bysymptoms only in 119883 That is if a patient has no symptom in119883 she never has any diseases in 1198771015840(119883) This type of rules isnatural and useful

In our example presented in Section 2 119883 sube 119880 is a groupof people 1198771015840(119883) contains products that favor people only in119883 That is if a person does not belong to 119883 she never likesproducts in 1198771015840(119883) Unfortunately this type of rules is notinteresting to us This is why we employ equation (24) forlower approximation

We are looking for very strong rules through the lowerapproximation indicated in Definition 8 For example ldquoallmen like all alcoholrdquo This kind of rules are called completematch rules [6] However they seldom exist in applications

6 Mathematical Problems in Engineering

On the other hand we are looking for very weak rulesthrough the upper approximation For example ldquoat least oneman like at least one kind of alcoholrdquo Another extremeexample is ldquoall people like at least one kind of productrdquo whichhold for any data set Therefore this type of rules is uselessThese issues will be addressed through a more general modelin the next section

34 Parametric Rough Sets on Two Universes for GranularAssociation Rules Given a group of people the number ofproducts that favor all of them is often quite small On theother hand the number of products that favor at least one ofthem is not quite meaningful Similar to probabilistic roughsets we need to introduce one or more parameters to themodel

To cope with the source confidence measure introducedin Section 22 we propose the following definition

Definition 9 Let 119880 and 119881 be two universes 119877 sube 119880 times 119881 abinary relation and 0 lt 120573 le 1 a user-specified thresholdThelower approximation of119883 sube 119880with respect to119877 for threshold120573 is

119877120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge 120573 (30)

We do not discuss the upper approximation in the newcontext due to lack of semantic

From this definition we know immediately that the lowerapproximation of 119884 sube 119881 with respect to 119877 is

119877minus1

120573(119884) = 119909 isin 119880 |

|119877 (119909) cap 119884|

|119884|ge 120573 (31)

Here 120573 corresponds with the target confidence instead Inour example 119877

120573(119883) are products that favor at least 120573 times 100

people in119883 and119877minus1120573(119884) are people who like at least120573times100

products in 119884The following property indicates that 119877

120573(119883) is a general-

ization of both 119877(119883) and 119877(119883)

Property 2 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation

1198771(119883) = 119877 (119883) (32)

119877120576(119883) = 119877 (119883) (33)

where 120576 asymp 0 is a small positive number

Proof One has10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge1lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816= |119883|lArrrArr119877

minus1(119910) supe 119883

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge120576lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816gt 0lArrrArr119877

minus1(119910) cap 119883 = 0

(34)

The following property shows themonotonicity of119877120573(119883)

Property 3 Let 0 lt 1205731lt 1205732le 1

1198771205732

(119883) sube 1198771205731

(119883) (35)

However given 1198831sub 1198832 we obtain neither 119877

120573(1198831) sube

119877120573(1198832) nor 119877

120573(1198831) supe 119877

120573(1198832) The relationships between

119877120573(1198831) and 119877

120573(1198832) depend on 120573 Generally if 120573 is big

119877120573(1198831) tends to be bigger otherwise 119877

120573(1198831) tends to be

smaller Equation (27) indicates the extreme case for 120573 asymp 0and (28) indicates the other extreme case for 120573 = 1

120573 is the coverage of 119877(119909) (or 119877minus1(119910)) to 119884 (or 119883) It doesnot mean precision of an approximation This is why we callthis model parametric rough sets instead of variable precisionrough sets [28] or probabilistic rough sets [45]

Similar to the discussion in Section 33 in some cases wewould like to employ the following definition

1198771015840

120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

100381610038161003816100381610038161003816100381610038161003816119877minus1 (119910)

1003816100381610038161003816ge 120573 (36)

It coincides with 119877120573(119883) defined in Definition 7 if 119880 = 119881

Take the clinic system again as the example 1198771015840120573(119883) is the set

of diseases that are caused mainly (with a probability no lessthan 120573 times 100) by symptoms in119883

4 A Backward Algorithm to GranularAssociation Rule Mining

In our previous work [5] we have proposed an algorithmaccording to (15) The algorithm starts from both sides andchecks the validity of all candidate rules Therefore it wasnamed a sandwich algorithm

To make use of the concept proposed in the Section 3 weshould rewrite (15) as followssconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

=10038161003816100381610038161003816100381610038161003816119909 isin 119880 |

|119877 (119909) cap RH (GR)||RH (GR)|

ge tc cap LH (GR)10038161003816100381610038161003816100381610038161003816

times (|LH (GR)|)minus1

=

10038161003816100381610038161003816119877minus1tc (RH (GR)) cap LH (GR)10038161003816100381610038161003816

|LH (GR)|

(37)

With this equation we propose an algorithm to deal withProblem 1The algorithm is listed inAlgorithm 1 It essentiallyhas four steps

Step 1 Search in (119880 119860) all granules meeting the minimalsource coverage threshold ms This step corresponds to Line1 of the algorithm where SG stands for source granule

Step 2 Search in (119881 119861) all granules meeting the minimaltarget coverage threshold mt This step corresponds to Line2 of the algorithm where TG stands for target granule

Mathematical Problems in Engineering 7

Input ES = (119880 119860 119881 119861 119877) ms mtOutput All complete match granular association rules satisfying given constraintsMethod complete-match-rules-backward

(1) SG(ms) = (1198601015840 119909) isin 2119860 times 119880 | |1198641198601015840 (119909)||119880| ge ms Candidate source granules

(2) TG(mt) = (1198611015840 119910) isin 2119861 times 119881 | |1198641198611015840 (119910)||119881| ge mt Candidate target granules

(3) for each 1198921015840 isin TG(ms) do(4) 119884 = 119890(1198921015840)(5) 119883 = 119877minus1tc(119884)(6) for each 119892 isin SG(mt) do(7) if (119890(119892) sube 119883) then(8) output rule 119894(119892) rArr 119894(1198921015840)(9) end if(10) end for(11) end for

Algorithm 1 A backward algorithm

Step 3 For each granule obtained in Step 1 construct a blockin 119880 according to 119877 This step corresponds to Line 4 of thealgorithmThe function 119890 has been defined in (5) Accordingto Definition 9 in our example 119877minus1tc(119884) are people who likeat least tc times 100 products in 119884

Step 4 Check possible rules regarding 1198921015840 and 119883 and outputall rules This step corresponds to Lines 6 through 10 of thealgorithm In Line 7 since 119890(119892) and 119883 could be stored insorted arrays the complexity of checking 119890(119892) sube 119883 is

119874 (1003816100381610038161003816119890 (119892)

1003816100381610038161003816 + |119883|) = 119874 (|119880|) (38)

where | sdot | denotes the cardinality of a set

Because the algorithm starts from the right-hand sideof the rule and proceeds to the left-hand side it is calleda backward algorithm It is necessary to compare the timecomplexities of the existing sandwich algorithm and our newbackward algorithm Both algorithms share Steps 1 and 2which do not incur the pattern explosion problemThereforewe will focus on the remaining steps The time complexity ofthe sandwich algorithm is [5]

119874 (|SC (ms)| times |TC (119898119905)| times |119880| times |119881|) (39)

where | sdot | denotes the cardinality of a setAccording to the loops the time complexity of Algo-

rithm 1 is

119874 (|SG (ms)| times |TG (mt)| times |119880|) (40)

which is lower than the sandwich algorithmIntuitively the backward algorithm avoids computing

119877(119909) cap RH(GR) for different rules with the same right-hand side Hence it should be less time consuming than thesandwich algorithm We will compare the run time of thesealgorithms in the next section through experimentation

The space complexities of these two algorithms are alsoimportant To store the relation 119877 a |119880| times |119881| Boolean matrixis needed

5 Experiments on Two Real-World Data Sets

The main purpose of our experiments is to answer thefollowing questions

(1) Does the backward algorithm outperform the sand-wich algorithm

(2) How does the number of rules change for differentnumber of objects

(3) How does the algorithm run time change for differentnumber of objects

(4) How does the number of rules vary for differentthresholds

(5) How does the performance of cold-start recommen-dation vary for the training and testing sets

51 Data Sets We collected two real-world data sets forexperimentation One is course selection and the other ismovie rating These data sets are quite representative forapplications

511 A Course Selection Data Set The course selectionsystem often serves as an example in textbooks to explainthe concept of many-to-many entity-relationship diagramsHence it is appropriate to produce meaningful granularassociation rules and test the performance of our algorithmWe obtained a data set from the course selection system ofMinnan Normal University (The authors would like to thankMrs Chunmei Zhou for her help in the data collection)Specifically we collected data during the semester between2011 and 2012 There are 145 general education courses in theuniversity 9654 students took part in course selection Thedatabase schema is as follows

(i) Student (student ID name gender birth-yearpolitics-status grade department nationality andlength of schooling)

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

2 Mathematical Problems in Engineering

sandwich algorithm has been proposed in [5] It starts fromboth entities and proceeds to the relation Unfortunately thetime complexity is rather high and the performance is notsatisfactory

In this paper we propose a new type of parametric roughsets on two universes to study the granular association rulemining problem We borrow some ideas from variable preci-sion rough sets [28] and rough sets on two universes [29ndash31]to build the new model The model is deliberately adjustedsuch that the parameter coincides with the target confidencethreshold of rules In this way the parameter is semanticand can be specified by the user directly We compare ourdefinitions with alternative ones and point out that theyshould be employed in different applications Naturally ourdefinition is appropriate for cold-start recommendation Wealso study some properties especially themonotonicity of thelower approximation of our new model

With the lower approximation of the proposed para-metric rough sets we design a backward algorithm for rulemining This algorithm starts from the second universe andproceeds to the first one hence it is called a backwardalgorithm Compared with an existing sandwich algorithm[5] the backward algorithm avoids some redundant compu-tation Consequently it has a lower time complexity

Experiments are undertaken on two real-world data setsOne is the course selection data from Minnan NormalUniversity during the semester between 2011 and 2012 Theother is the publicly available MovieLens data set Resultsshow that (1) the backward algorithm is more than 2 timesfaster than the sandwich algorithm (2) the run time is linearwith respect to the data set size (3) samplingmight be a goodchoice to decrease the run time and (4) the performance ofrecommendation is stable

The rest of the paper is organized as follows Section 2reviews granular association rules through some examplesThe rule mining problem is also defined Section 3 presentsa new model of parametric rough sets on two universes Themodel is defined to cope with the formalization of granularassociation rules Then Section 4 presents a backward algo-rithm for the problem Experiments on two real-world datasets are discussed in Section 5 Finally Section 6 presentssome concluding remarks and further research directions

2 Preliminaries

In this section we revisit granular association rules [6] Wediscuss the data model the definition and four measures ofsuch rules A rule mining problem will also be represented

21The DataModel Thedata model is based on informationsystems and binary relations

Definition 1 119878 = (119880 119860) is an information system where 119880 =1199091 1199092 119909

119899 is the set of all objects 119860 = 119886

1 1198862 119886

119898 is

the set of all attributes and 119886119895(119909119894) is the value of119909

119894on attribute

119886119895for 119894 isin 1 2 119899 and 119895 isin 1 2 119898

An example of information system is given by Table 1(a)where119880 = 1198881 1198882 1198883 1198884 1198885 and119860 = Age Gender Married

Country Income NumCars Another example is given byTable 1(b)

In an information system any1198601015840 sube 119860 induces an equiva-lence relation [32 33]

1198641198601015840 = (119909 119910) isin 119880 times 119880 | forall119886 isin 119860

1015840 119886 (119909) = 119886 (119910) (1)

and partitions 119880 into a number of disjoint subsets calledblocks or granules The block containing 119909 isin 119880 is

1198641198601015840 (119909) = 119910 isin 119880 | forall119886 isin 119860

1015840 119886 (119910) = 119886 (119909) (2)

The following definition was employed by Yao and Deng[23]

Definition 2 A granule is a triple [23]

119866 = (119892 119894 (119892) 119890 (119892)) (3)

where 119892 is the name assigned to the granule 119894(119892) is arepresentation of the granule and 119890(119892) is a set of objects thatare instances of the granule

119892 = 119892(1198601015840 119909) is a natural name to the granule The rep-

resentation of 119892(1198601015840 119909) is

119894 (119892 (1198601015840 119909)) = ⋀

119886isin1198601015840

⟨119886 119886 (119909)⟩ (4)

The set of objects that are instances of 119892(1198601015840 119909) is

119890 (119892 (1198601015840 119909)) = 119864

1198601015840 (119909) (5)

The support of the granule is the size of 119890(119892) divided bythe size of the universe namely

supp (119892 (1198601015840 119909)) = supp(⋀119886isin1198601015840

⟨119886 119886 (119909)⟩)

= supp (1198641198601015840 (119909)) =

10038161003816100381610038161198641198601015840 (119909)1003816100381610038161003816

|119880|

(6)

Definition 3 Let119880 = 1199091 1199092 119909

119899 and119881 = 119910

1 1199102 119910

119896

be two sets of objects Any119877 sube 119880times119881 is a binary relation from119880 to 119881 The neighborhood of 119909 isin 119880 is

119877 (119909) = 119910 isin 119881 | (119909 119910) isin 119877 (7)

When119880 = 119881 and 119877 is an equivalence relation 119877(119909) is theequivalence class containing 119909 From this definition we knowimmediately that for 119910 isin 119881

119877minus1(119910) = 119909 isin 119880 | (119909 119910) isin 119877 (8)

A binary relation is more often stored in the database asa table with two foreign keys In this way the storage is savedFor the convenience of illustration here we represented itwith an 119899 times 119896 Boolean matrix An example is given inTable 1(c) where 119880 is the set of customers as indicatedin Table 1(a) and 119881 is the set of products as indicated inTable 1(b)

With Definitions 1 and 3 we propose the followingdefinition

Mathematical Problems in Engineering 3

Table 1 A many-to-many entity-relationship system

(a) Customer

CID Name Age Gender Married Country Income NumCars1198881 Ron 20ndash29 Male No USA 60 kndash69 k 0-11198882 Michelle 20ndash29 Female Yes USA 80 kndash89 k 0-11198883 Shun 20ndash29 Male No China 40 kndash49 k 0-11198884 Yamago 30ndash39 Female Yes Japan 80 kndash89 k 21198885 Wang 30ndash39 Male Yes China 90 kndash99 k 2

(b) Product

PID Name Country Category Color Price1199011 Bread Australia Staple Black 1ndash91199012 Diaper China Daily White 1ndash91199013 Pork China Meat Red 1ndash91199014 Beef Australia Meat Red 10ndash191199015 Beer France Alcohol Black 10ndash191199016 Wine France Alcohol White 10ndash19

(c) Buys

CIDPID 1199011 1199012 1199013 1199014 1199015 1199016

1198881 1 1 0 1 1 01198882 1 0 0 1 0 11198883 0 1 0 0 1 11198884 0 1 0 1 1 01198885 1 0 0 1 1 1

Definition 4 A many-to-many entity-relationship system(MMER) is a 5-tuple ES = (119880119860 119881 119861 119877) where (119880 119860) and(119881 119861) are two information systems and 119877 sube 119880times119881 is a binaryrelation from 119880 to 119881

An example of MMER is given in Tables 1(a) 1(b) and1(c)

22 Granular Association Rules with Four Measures Now wecome to the central definition of granular association rules

Definition 5 A granular association rule is an implication ofthe form

(GR) ⋀119886isin1198601015840

⟨119886 119886 (119909)⟩ 997904rArr ⋀

119887isin1198611015840

⟨119887 119887 (119910)⟩ (9)

where 1198601015840 sube 119860 and 1198611015840 sube 119861

According to (6) the set of objects meeting the left-handside of the granular association rule is

LH (GR) = 1198641198601015840 (119909) (10)

while the set of objects meeting the right-hand side of thegranular association rule is

RH (GR) = 1198641198611015840 (119910) (11)

From the MMER given in Tables 1(a) 1(b) and 1(c) wemay obtain the following rule

Rule 1 One has

⟨Gender Male⟩ 997904rArr ⟨Category Alcohol⟩ (12)

Rule 1 can be read as ldquomen like alcoholrdquo There are someissues concerning the strength of the rule For example wemay ask the following questions on Rule 1

(1) How many customers are men(2) How many products are alcohol(3) Do all men like alcohol(4) Do all kinds of alcohol favor men

An example of complete granular association rules withmeasures specified is ldquo40 men like at least 30 kindsof alcohol 45 customers are men and 6 products arealcoholrdquo Here 45 6 40 and 30 are the sourcecoverage the target coverage the source confidence and thetarget confidence respectivelyThese measures are defined asfollows

The source coverage of a granular association rule is

scov (GR) = |LH (GR)||119880|

(13)

The target coverage of GR is

tcov (GR) = |RH (GR)||119881|

(14)

4 Mathematical Problems in Engineering

There is a tradeoff between the source confidence and thetarget confidence of a rule Consequently neither value canbe obtained directly from the rule To compute any one ofthem we should specify the threshold of the other Let tc bethe target confidence thresholdThe source confidence of therule is

sconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

(15)

Let sc be the source confidence threshold and

|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870 + 1|

lt sc times |LH (GR)|

le |119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870|

(16)

This equationmeans that sctimes100 elements in LH(GR) haveconnectionswith at least119870 elements in RH(GR) but less thansctimes100 elements in LH(GR) have connections with at least119870 + 1 elements in RH(GR) The target confidence of the ruleis

tconf (GR sc) = 119870

|RH (GR)| (17)

In fact the computation of 119870 is nontrivial First for any119909 isin LH(GR) we need to compute tc(119909) = |119877(119909) cap RH(GR)|and obtain an array of integers Second we sort the array ina descending order Third let 119896 = lfloorsc times | LH(GR)|rfloor 119870 is thekth element in the array

The relationships between rules are interesting to us Asan example let us consider the following rule

Rule 2 One has

⟨Gender Male⟩ and ⟨Country China⟩

997904rArr ⟨Category Alcohol⟩ and ⟨Country France⟩ (18)

Rule 2 can be read as ldquoChinese men like France alcoholrdquoOne may say that we can infer Rule 2 from Rule 1 sincethe former one has a finer granule However with the fourmeasures we know that the relationships between these tworules are not so simple A detailed explanation of Rule 2might be ldquo60 Chinese men like at least 50 kinds of Francealcohol 15 customers are Chinesemen and 2 products areFrance alcoholrdquo Comparing Rules 1 and 2 is stronger in termsof sourcetarget confidence however it is weaker in terms ofsourcetarget coverage Therefore if we need rules coveringmore people and products we prefer Rule 1 if we need moreconfidence on the rules we prefer Rule 2 For example if thesource confidence threshold is 55 Rule 2 might be validwhile Rule 1 is not if the source coverage is 20 Rule 1 mightbe valid while Rule 2 is not

23 The Granular Association Rule Mining Problem Astraightforward rule mining problem is as follows

Input An ES = (119880 119860 119881 119861 119877) a minimal source coveragethreshold ms a minimal target coverage threshold mt aminimal source confidence threshold sc and aminimal targetconfidence threshold tc

Output All granular association rules satisfying scov(GR) gems tcov(GR) ge mt sconf(GR) ge sc and tconf(GR) ge tc

Since both sc and tc are specified we can choose either(15) or (17) to decide whether or not a rule satisfies thesethresholds Equation (15) is a better choice

3 Parametric Rough Sets on Two Universes

In this section we first review rough approximations [32]on one universe Then we present rough approximations ontwo universes Finally we present parametric rough approx-imations on two universes Some concepts are dedicated togranular association rules We will explain in detail the waythey are defined from semantic point of view

31 Classical Rough Sets The classical rough sets [32 34] arebuilt upon lower and upper approximations on one universeWe adopt the ideas and notions introduced in [35] and definethese concepts as follows

Definition 6 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 are

119877 (119883) = 119909 isin 119883 | 119877 (119909) sube 119883 (19)

119877 (119883) = 119909 isin 119883 | 119877 (119909) cap 119883 = 0 (20)

respectively

These concepts can be employed for set approximationor classification analysis For set approximation the interval[119877(119883) 119877(119883)] is called rough set of 119883 which provides anapproximate characterization of 119883 by the objects that sharethe same description of its members [35] For classificationanalysis the lower approximation operator helps findingcertain rules while the upper approximation helps findingpossible rules

32 Existing Parametric Rough Sets Ziarko [28] pointed outthat the classical rough sets cannot handle classification witha controlled degree of uncertainty or amisclassification errorConsequently he proposed variable precision rough sets [28]with a parameter to indicate the admissible classificationerror

Let 119883 and 119884 be nonempty subsets of a finite universe 119880and 119884 sube 119883 The measure 119888(119883 119884) of the relative degree ofmisclassification of the set 119883 with respect to set 119884 is definedas

119888 (119883 119884) =

1 minus|119883 cap 119884|

|119883| |119883| gt 0

0 |119883| = 0

(21)

where | sdot | denotes set cardinality

Mathematical Problems in Engineering 5

The equivalence relation 119877 corresponds to a partitioningof119880 into a collection of equivalence classes or elementary sets119877lowast = 119864

1 1198642 119864

119899 The 120572-lower and 120572-upper approxima-

tion of the set 119880 supe 119883 are defined as

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) le 120572

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) lt 1 minus 120572

(22)

where 0 le 120572 lt 05For the convenience of discussion we rewrite his defini-

tion as follows

Definition 7 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 under precision 120573 are

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|ge 120573

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|gt 1 minus 120573

(23)

respectively 119877(119909) is the equivalence class containing 119909

Note that 05 lt 120573 le 1 indicate the classification accuracy(precision) threshold rather than the misclassification errorthreshold as employed in [28] Wong et al [31] extended thedefinition to an arbitrary binary relation which is at leastserial

Ziarko [28] introduced variable precision rough sets witha parameter The model had seldom explanation about theparameter Yao and Wong [36] studied the condition ofparameter and introduced the decision theoretic rough set(DTRS) model DTRS model is a probabilistic rough sets inthe framework of Bayesian decision theory It requires a pairof thresholds (120572 120573) instead of only one in variable precisionrough sets Its main advantage is the solid foundation basedon Bayesian decision theory (120572 120573) can be systematicallycomputed by minimizing overall ternary classification cost[36]Therefore this theory has drawnmuch research interestsin both theory (see eg [37 38]) and application (see eg[39ndash41])

Gong and Sun [42] firstly defined the concept of theprobabilistic rough set over two universes Ma and Sun [43]presented the parameter dependence or the continuous oflower and upper approximations about two parameters 120572and 120573 for every type probabilistic rough set model over twouniverses in detail Moreover a new model of probabilisticfuzzy rough set over two universes [44] is proposed Thesetheories have broad and wide application prospects

33 Rough Sets on Two Universes for Granular AssociationRules Since our datamodel is concerned with two universeswe should consider computationmodels for this type of dataRough sets on two universes have been defined in [31] Somelater works adopt the same definitions (see eg [29 30])We will present our definitions which cope with granularassociation rulesThenwe discuss why they are different fromexisting ones

Definition 8 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation The lower and upper approximations of 119883 sube

119880 with respect to 119877 are

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) supe 119883 (24)

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) cap 119883 = 0 (25)

respectivelyFrom this definition we know immediately that for 119884 sube

119881

119877minus1(119884) = 119909 isin 119880 | 119877 (119909) supe 119884

119877minus1 (119884) = 119909 isin 119880 | 119877 (119909) cap 119884 = 0

(26)

Nowwe explain these notions through our example119877(119883)contains products that favor all people in119883 119877minus1(119884) containspeoplewho like all products in119884119877(119883) contains products thatfavor at least one person in 119883 and 119877minus1(119884) contains peoplewho like at least one product in 119884

We have the following property concerning the mono-tonicity of these approximations

Property 1 Let1198831sub 1198832

119877 (1198831) supe 119877 (119883

2) (27)

119877 (1198831) sube 119877 (119883

2) (28)

That is with the increase of the object subset the lowerapproximation decreases while the upper approximationincreases It is somehow ad hoc to people in the rough setsociety that the lower approximation decreases in this caseIn fact according to Wong et al [31] Liu [30] and Sun et al[44] (24) should be rewritten as

1198771015840(119883) = 119910 isin 119881 | 119877

minus1(119910) sube 119883 (29)

where the prim is employed to distinguish between twodefinitions In this way 1198771015840(119883

1) sube 1198771015840(119883

2) Moreover it would

coincide with (19) when 119880 = 119881We argue that different definitions of the lower approx-

imation are appropriate for different applications Supposethat there is a clinic systemwhere119880 is the set of all symptomsand119881 is the set of all diseases [31]119883 sube 119880 is a set of symptomsAccording to (29) 1198771015840(119883) contains diseases that induced bysymptoms only in 119883 That is if a patient has no symptom in119883 she never has any diseases in 1198771015840(119883) This type of rules isnatural and useful

In our example presented in Section 2 119883 sube 119880 is a groupof people 1198771015840(119883) contains products that favor people only in119883 That is if a person does not belong to 119883 she never likesproducts in 1198771015840(119883) Unfortunately this type of rules is notinteresting to us This is why we employ equation (24) forlower approximation

We are looking for very strong rules through the lowerapproximation indicated in Definition 8 For example ldquoallmen like all alcoholrdquo This kind of rules are called completematch rules [6] However they seldom exist in applications

6 Mathematical Problems in Engineering

On the other hand we are looking for very weak rulesthrough the upper approximation For example ldquoat least oneman like at least one kind of alcoholrdquo Another extremeexample is ldquoall people like at least one kind of productrdquo whichhold for any data set Therefore this type of rules is uselessThese issues will be addressed through a more general modelin the next section

34 Parametric Rough Sets on Two Universes for GranularAssociation Rules Given a group of people the number ofproducts that favor all of them is often quite small On theother hand the number of products that favor at least one ofthem is not quite meaningful Similar to probabilistic roughsets we need to introduce one or more parameters to themodel

To cope with the source confidence measure introducedin Section 22 we propose the following definition

Definition 9 Let 119880 and 119881 be two universes 119877 sube 119880 times 119881 abinary relation and 0 lt 120573 le 1 a user-specified thresholdThelower approximation of119883 sube 119880with respect to119877 for threshold120573 is

119877120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge 120573 (30)

We do not discuss the upper approximation in the newcontext due to lack of semantic

From this definition we know immediately that the lowerapproximation of 119884 sube 119881 with respect to 119877 is

119877minus1

120573(119884) = 119909 isin 119880 |

|119877 (119909) cap 119884|

|119884|ge 120573 (31)

Here 120573 corresponds with the target confidence instead Inour example 119877

120573(119883) are products that favor at least 120573 times 100

people in119883 and119877minus1120573(119884) are people who like at least120573times100

products in 119884The following property indicates that 119877

120573(119883) is a general-

ization of both 119877(119883) and 119877(119883)

Property 2 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation

1198771(119883) = 119877 (119883) (32)

119877120576(119883) = 119877 (119883) (33)

where 120576 asymp 0 is a small positive number

Proof One has10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge1lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816= |119883|lArrrArr119877

minus1(119910) supe 119883

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge120576lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816gt 0lArrrArr119877

minus1(119910) cap 119883 = 0

(34)

The following property shows themonotonicity of119877120573(119883)

Property 3 Let 0 lt 1205731lt 1205732le 1

1198771205732

(119883) sube 1198771205731

(119883) (35)

However given 1198831sub 1198832 we obtain neither 119877

120573(1198831) sube

119877120573(1198832) nor 119877

120573(1198831) supe 119877

120573(1198832) The relationships between

119877120573(1198831) and 119877

120573(1198832) depend on 120573 Generally if 120573 is big

119877120573(1198831) tends to be bigger otherwise 119877

120573(1198831) tends to be

smaller Equation (27) indicates the extreme case for 120573 asymp 0and (28) indicates the other extreme case for 120573 = 1

120573 is the coverage of 119877(119909) (or 119877minus1(119910)) to 119884 (or 119883) It doesnot mean precision of an approximation This is why we callthis model parametric rough sets instead of variable precisionrough sets [28] or probabilistic rough sets [45]

Similar to the discussion in Section 33 in some cases wewould like to employ the following definition

1198771015840

120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

100381610038161003816100381610038161003816100381610038161003816119877minus1 (119910)

1003816100381610038161003816ge 120573 (36)

It coincides with 119877120573(119883) defined in Definition 7 if 119880 = 119881

Take the clinic system again as the example 1198771015840120573(119883) is the set

of diseases that are caused mainly (with a probability no lessthan 120573 times 100) by symptoms in119883

4 A Backward Algorithm to GranularAssociation Rule Mining

In our previous work [5] we have proposed an algorithmaccording to (15) The algorithm starts from both sides andchecks the validity of all candidate rules Therefore it wasnamed a sandwich algorithm

To make use of the concept proposed in the Section 3 weshould rewrite (15) as followssconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

=10038161003816100381610038161003816100381610038161003816119909 isin 119880 |

|119877 (119909) cap RH (GR)||RH (GR)|

ge tc cap LH (GR)10038161003816100381610038161003816100381610038161003816

times (|LH (GR)|)minus1

=

10038161003816100381610038161003816119877minus1tc (RH (GR)) cap LH (GR)10038161003816100381610038161003816

|LH (GR)|

(37)

With this equation we propose an algorithm to deal withProblem 1The algorithm is listed inAlgorithm 1 It essentiallyhas four steps

Step 1 Search in (119880 119860) all granules meeting the minimalsource coverage threshold ms This step corresponds to Line1 of the algorithm where SG stands for source granule

Step 2 Search in (119881 119861) all granules meeting the minimaltarget coverage threshold mt This step corresponds to Line2 of the algorithm where TG stands for target granule

Mathematical Problems in Engineering 7

Input ES = (119880 119860 119881 119861 119877) ms mtOutput All complete match granular association rules satisfying given constraintsMethod complete-match-rules-backward

(1) SG(ms) = (1198601015840 119909) isin 2119860 times 119880 | |1198641198601015840 (119909)||119880| ge ms Candidate source granules

(2) TG(mt) = (1198611015840 119910) isin 2119861 times 119881 | |1198641198611015840 (119910)||119881| ge mt Candidate target granules

(3) for each 1198921015840 isin TG(ms) do(4) 119884 = 119890(1198921015840)(5) 119883 = 119877minus1tc(119884)(6) for each 119892 isin SG(mt) do(7) if (119890(119892) sube 119883) then(8) output rule 119894(119892) rArr 119894(1198921015840)(9) end if(10) end for(11) end for

Algorithm 1 A backward algorithm

Step 3 For each granule obtained in Step 1 construct a blockin 119880 according to 119877 This step corresponds to Line 4 of thealgorithmThe function 119890 has been defined in (5) Accordingto Definition 9 in our example 119877minus1tc(119884) are people who likeat least tc times 100 products in 119884

Step 4 Check possible rules regarding 1198921015840 and 119883 and outputall rules This step corresponds to Lines 6 through 10 of thealgorithm In Line 7 since 119890(119892) and 119883 could be stored insorted arrays the complexity of checking 119890(119892) sube 119883 is

119874 (1003816100381610038161003816119890 (119892)

1003816100381610038161003816 + |119883|) = 119874 (|119880|) (38)

where | sdot | denotes the cardinality of a set

Because the algorithm starts from the right-hand sideof the rule and proceeds to the left-hand side it is calleda backward algorithm It is necessary to compare the timecomplexities of the existing sandwich algorithm and our newbackward algorithm Both algorithms share Steps 1 and 2which do not incur the pattern explosion problemThereforewe will focus on the remaining steps The time complexity ofthe sandwich algorithm is [5]

119874 (|SC (ms)| times |TC (119898119905)| times |119880| times |119881|) (39)

where | sdot | denotes the cardinality of a setAccording to the loops the time complexity of Algo-

rithm 1 is

119874 (|SG (ms)| times |TG (mt)| times |119880|) (40)

which is lower than the sandwich algorithmIntuitively the backward algorithm avoids computing

119877(119909) cap RH(GR) for different rules with the same right-hand side Hence it should be less time consuming than thesandwich algorithm We will compare the run time of thesealgorithms in the next section through experimentation

The space complexities of these two algorithms are alsoimportant To store the relation 119877 a |119880| times |119881| Boolean matrixis needed

5 Experiments on Two Real-World Data Sets

The main purpose of our experiments is to answer thefollowing questions

(1) Does the backward algorithm outperform the sand-wich algorithm

(2) How does the number of rules change for differentnumber of objects

(3) How does the algorithm run time change for differentnumber of objects

(4) How does the number of rules vary for differentthresholds

(5) How does the performance of cold-start recommen-dation vary for the training and testing sets

51 Data Sets We collected two real-world data sets forexperimentation One is course selection and the other ismovie rating These data sets are quite representative forapplications

511 A Course Selection Data Set The course selectionsystem often serves as an example in textbooks to explainthe concept of many-to-many entity-relationship diagramsHence it is appropriate to produce meaningful granularassociation rules and test the performance of our algorithmWe obtained a data set from the course selection system ofMinnan Normal University (The authors would like to thankMrs Chunmei Zhou for her help in the data collection)Specifically we collected data during the semester between2011 and 2012 There are 145 general education courses in theuniversity 9654 students took part in course selection Thedatabase schema is as follows

(i) Student (student ID name gender birth-yearpolitics-status grade department nationality andlength of schooling)

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Mathematical Problems in Engineering 3

Table 1 A many-to-many entity-relationship system

(a) Customer

CID Name Age Gender Married Country Income NumCars1198881 Ron 20ndash29 Male No USA 60 kndash69 k 0-11198882 Michelle 20ndash29 Female Yes USA 80 kndash89 k 0-11198883 Shun 20ndash29 Male No China 40 kndash49 k 0-11198884 Yamago 30ndash39 Female Yes Japan 80 kndash89 k 21198885 Wang 30ndash39 Male Yes China 90 kndash99 k 2

(b) Product

PID Name Country Category Color Price1199011 Bread Australia Staple Black 1ndash91199012 Diaper China Daily White 1ndash91199013 Pork China Meat Red 1ndash91199014 Beef Australia Meat Red 10ndash191199015 Beer France Alcohol Black 10ndash191199016 Wine France Alcohol White 10ndash19

(c) Buys

CIDPID 1199011 1199012 1199013 1199014 1199015 1199016

1198881 1 1 0 1 1 01198882 1 0 0 1 0 11198883 0 1 0 0 1 11198884 0 1 0 1 1 01198885 1 0 0 1 1 1

Definition 4 A many-to-many entity-relationship system(MMER) is a 5-tuple ES = (119880119860 119881 119861 119877) where (119880 119860) and(119881 119861) are two information systems and 119877 sube 119880times119881 is a binaryrelation from 119880 to 119881

An example of MMER is given in Tables 1(a) 1(b) and1(c)

22 Granular Association Rules with Four Measures Now wecome to the central definition of granular association rules

Definition 5 A granular association rule is an implication ofthe form

(GR) ⋀119886isin1198601015840

⟨119886 119886 (119909)⟩ 997904rArr ⋀

119887isin1198611015840

⟨119887 119887 (119910)⟩ (9)

where 1198601015840 sube 119860 and 1198611015840 sube 119861

According to (6) the set of objects meeting the left-handside of the granular association rule is

LH (GR) = 1198641198601015840 (119909) (10)

while the set of objects meeting the right-hand side of thegranular association rule is

RH (GR) = 1198641198611015840 (119910) (11)

From the MMER given in Tables 1(a) 1(b) and 1(c) wemay obtain the following rule

Rule 1 One has

⟨Gender Male⟩ 997904rArr ⟨Category Alcohol⟩ (12)

Rule 1 can be read as ldquomen like alcoholrdquo There are someissues concerning the strength of the rule For example wemay ask the following questions on Rule 1

(1) How many customers are men(2) How many products are alcohol(3) Do all men like alcohol(4) Do all kinds of alcohol favor men

An example of complete granular association rules withmeasures specified is ldquo40 men like at least 30 kindsof alcohol 45 customers are men and 6 products arealcoholrdquo Here 45 6 40 and 30 are the sourcecoverage the target coverage the source confidence and thetarget confidence respectivelyThese measures are defined asfollows

The source coverage of a granular association rule is

scov (GR) = |LH (GR)||119880|

(13)

The target coverage of GR is

tcov (GR) = |RH (GR)||119881|

(14)

4 Mathematical Problems in Engineering

There is a tradeoff between the source confidence and thetarget confidence of a rule Consequently neither value canbe obtained directly from the rule To compute any one ofthem we should specify the threshold of the other Let tc bethe target confidence thresholdThe source confidence of therule is

sconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

(15)

Let sc be the source confidence threshold and

|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870 + 1|

lt sc times |LH (GR)|

le |119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870|

(16)

This equationmeans that sctimes100 elements in LH(GR) haveconnectionswith at least119870 elements in RH(GR) but less thansctimes100 elements in LH(GR) have connections with at least119870 + 1 elements in RH(GR) The target confidence of the ruleis

tconf (GR sc) = 119870

|RH (GR)| (17)

In fact the computation of 119870 is nontrivial First for any119909 isin LH(GR) we need to compute tc(119909) = |119877(119909) cap RH(GR)|and obtain an array of integers Second we sort the array ina descending order Third let 119896 = lfloorsc times | LH(GR)|rfloor 119870 is thekth element in the array

The relationships between rules are interesting to us Asan example let us consider the following rule

Rule 2 One has

⟨Gender Male⟩ and ⟨Country China⟩

997904rArr ⟨Category Alcohol⟩ and ⟨Country France⟩ (18)

Rule 2 can be read as ldquoChinese men like France alcoholrdquoOne may say that we can infer Rule 2 from Rule 1 sincethe former one has a finer granule However with the fourmeasures we know that the relationships between these tworules are not so simple A detailed explanation of Rule 2might be ldquo60 Chinese men like at least 50 kinds of Francealcohol 15 customers are Chinesemen and 2 products areFrance alcoholrdquo Comparing Rules 1 and 2 is stronger in termsof sourcetarget confidence however it is weaker in terms ofsourcetarget coverage Therefore if we need rules coveringmore people and products we prefer Rule 1 if we need moreconfidence on the rules we prefer Rule 2 For example if thesource confidence threshold is 55 Rule 2 might be validwhile Rule 1 is not if the source coverage is 20 Rule 1 mightbe valid while Rule 2 is not

23 The Granular Association Rule Mining Problem Astraightforward rule mining problem is as follows

Input An ES = (119880 119860 119881 119861 119877) a minimal source coveragethreshold ms a minimal target coverage threshold mt aminimal source confidence threshold sc and aminimal targetconfidence threshold tc

Output All granular association rules satisfying scov(GR) gems tcov(GR) ge mt sconf(GR) ge sc and tconf(GR) ge tc

Since both sc and tc are specified we can choose either(15) or (17) to decide whether or not a rule satisfies thesethresholds Equation (15) is a better choice

3 Parametric Rough Sets on Two Universes

In this section we first review rough approximations [32]on one universe Then we present rough approximations ontwo universes Finally we present parametric rough approx-imations on two universes Some concepts are dedicated togranular association rules We will explain in detail the waythey are defined from semantic point of view

31 Classical Rough Sets The classical rough sets [32 34] arebuilt upon lower and upper approximations on one universeWe adopt the ideas and notions introduced in [35] and definethese concepts as follows

Definition 6 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 are

119877 (119883) = 119909 isin 119883 | 119877 (119909) sube 119883 (19)

119877 (119883) = 119909 isin 119883 | 119877 (119909) cap 119883 = 0 (20)

respectively

These concepts can be employed for set approximationor classification analysis For set approximation the interval[119877(119883) 119877(119883)] is called rough set of 119883 which provides anapproximate characterization of 119883 by the objects that sharethe same description of its members [35] For classificationanalysis the lower approximation operator helps findingcertain rules while the upper approximation helps findingpossible rules

32 Existing Parametric Rough Sets Ziarko [28] pointed outthat the classical rough sets cannot handle classification witha controlled degree of uncertainty or amisclassification errorConsequently he proposed variable precision rough sets [28]with a parameter to indicate the admissible classificationerror

Let 119883 and 119884 be nonempty subsets of a finite universe 119880and 119884 sube 119883 The measure 119888(119883 119884) of the relative degree ofmisclassification of the set 119883 with respect to set 119884 is definedas

119888 (119883 119884) =

1 minus|119883 cap 119884|

|119883| |119883| gt 0

0 |119883| = 0

(21)

where | sdot | denotes set cardinality

Mathematical Problems in Engineering 5

The equivalence relation 119877 corresponds to a partitioningof119880 into a collection of equivalence classes or elementary sets119877lowast = 119864

1 1198642 119864

119899 The 120572-lower and 120572-upper approxima-

tion of the set 119880 supe 119883 are defined as

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) le 120572

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) lt 1 minus 120572

(22)

where 0 le 120572 lt 05For the convenience of discussion we rewrite his defini-

tion as follows

Definition 7 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 under precision 120573 are

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|ge 120573

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|gt 1 minus 120573

(23)

respectively 119877(119909) is the equivalence class containing 119909

Note that 05 lt 120573 le 1 indicate the classification accuracy(precision) threshold rather than the misclassification errorthreshold as employed in [28] Wong et al [31] extended thedefinition to an arbitrary binary relation which is at leastserial

Ziarko [28] introduced variable precision rough sets witha parameter The model had seldom explanation about theparameter Yao and Wong [36] studied the condition ofparameter and introduced the decision theoretic rough set(DTRS) model DTRS model is a probabilistic rough sets inthe framework of Bayesian decision theory It requires a pairof thresholds (120572 120573) instead of only one in variable precisionrough sets Its main advantage is the solid foundation basedon Bayesian decision theory (120572 120573) can be systematicallycomputed by minimizing overall ternary classification cost[36]Therefore this theory has drawnmuch research interestsin both theory (see eg [37 38]) and application (see eg[39ndash41])

Gong and Sun [42] firstly defined the concept of theprobabilistic rough set over two universes Ma and Sun [43]presented the parameter dependence or the continuous oflower and upper approximations about two parameters 120572and 120573 for every type probabilistic rough set model over twouniverses in detail Moreover a new model of probabilisticfuzzy rough set over two universes [44] is proposed Thesetheories have broad and wide application prospects

33 Rough Sets on Two Universes for Granular AssociationRules Since our datamodel is concerned with two universeswe should consider computationmodels for this type of dataRough sets on two universes have been defined in [31] Somelater works adopt the same definitions (see eg [29 30])We will present our definitions which cope with granularassociation rulesThenwe discuss why they are different fromexisting ones

Definition 8 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation The lower and upper approximations of 119883 sube

119880 with respect to 119877 are

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) supe 119883 (24)

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) cap 119883 = 0 (25)

respectivelyFrom this definition we know immediately that for 119884 sube

119881

119877minus1(119884) = 119909 isin 119880 | 119877 (119909) supe 119884

119877minus1 (119884) = 119909 isin 119880 | 119877 (119909) cap 119884 = 0

(26)

Nowwe explain these notions through our example119877(119883)contains products that favor all people in119883 119877minus1(119884) containspeoplewho like all products in119884119877(119883) contains products thatfavor at least one person in 119883 and 119877minus1(119884) contains peoplewho like at least one product in 119884

We have the following property concerning the mono-tonicity of these approximations

Property 1 Let1198831sub 1198832

119877 (1198831) supe 119877 (119883

2) (27)

119877 (1198831) sube 119877 (119883

2) (28)

That is with the increase of the object subset the lowerapproximation decreases while the upper approximationincreases It is somehow ad hoc to people in the rough setsociety that the lower approximation decreases in this caseIn fact according to Wong et al [31] Liu [30] and Sun et al[44] (24) should be rewritten as

1198771015840(119883) = 119910 isin 119881 | 119877

minus1(119910) sube 119883 (29)

where the prim is employed to distinguish between twodefinitions In this way 1198771015840(119883

1) sube 1198771015840(119883

2) Moreover it would

coincide with (19) when 119880 = 119881We argue that different definitions of the lower approx-

imation are appropriate for different applications Supposethat there is a clinic systemwhere119880 is the set of all symptomsand119881 is the set of all diseases [31]119883 sube 119880 is a set of symptomsAccording to (29) 1198771015840(119883) contains diseases that induced bysymptoms only in 119883 That is if a patient has no symptom in119883 she never has any diseases in 1198771015840(119883) This type of rules isnatural and useful

In our example presented in Section 2 119883 sube 119880 is a groupof people 1198771015840(119883) contains products that favor people only in119883 That is if a person does not belong to 119883 she never likesproducts in 1198771015840(119883) Unfortunately this type of rules is notinteresting to us This is why we employ equation (24) forlower approximation

We are looking for very strong rules through the lowerapproximation indicated in Definition 8 For example ldquoallmen like all alcoholrdquo This kind of rules are called completematch rules [6] However they seldom exist in applications

6 Mathematical Problems in Engineering

On the other hand we are looking for very weak rulesthrough the upper approximation For example ldquoat least oneman like at least one kind of alcoholrdquo Another extremeexample is ldquoall people like at least one kind of productrdquo whichhold for any data set Therefore this type of rules is uselessThese issues will be addressed through a more general modelin the next section

34 Parametric Rough Sets on Two Universes for GranularAssociation Rules Given a group of people the number ofproducts that favor all of them is often quite small On theother hand the number of products that favor at least one ofthem is not quite meaningful Similar to probabilistic roughsets we need to introduce one or more parameters to themodel

To cope with the source confidence measure introducedin Section 22 we propose the following definition

Definition 9 Let 119880 and 119881 be two universes 119877 sube 119880 times 119881 abinary relation and 0 lt 120573 le 1 a user-specified thresholdThelower approximation of119883 sube 119880with respect to119877 for threshold120573 is

119877120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge 120573 (30)

We do not discuss the upper approximation in the newcontext due to lack of semantic

From this definition we know immediately that the lowerapproximation of 119884 sube 119881 with respect to 119877 is

119877minus1

120573(119884) = 119909 isin 119880 |

|119877 (119909) cap 119884|

|119884|ge 120573 (31)

Here 120573 corresponds with the target confidence instead Inour example 119877

120573(119883) are products that favor at least 120573 times 100

people in119883 and119877minus1120573(119884) are people who like at least120573times100

products in 119884The following property indicates that 119877

120573(119883) is a general-

ization of both 119877(119883) and 119877(119883)

Property 2 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation

1198771(119883) = 119877 (119883) (32)

119877120576(119883) = 119877 (119883) (33)

where 120576 asymp 0 is a small positive number

Proof One has10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge1lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816= |119883|lArrrArr119877

minus1(119910) supe 119883

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge120576lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816gt 0lArrrArr119877

minus1(119910) cap 119883 = 0

(34)

The following property shows themonotonicity of119877120573(119883)

Property 3 Let 0 lt 1205731lt 1205732le 1

1198771205732

(119883) sube 1198771205731

(119883) (35)

However given 1198831sub 1198832 we obtain neither 119877

120573(1198831) sube

119877120573(1198832) nor 119877

120573(1198831) supe 119877

120573(1198832) The relationships between

119877120573(1198831) and 119877

120573(1198832) depend on 120573 Generally if 120573 is big

119877120573(1198831) tends to be bigger otherwise 119877

120573(1198831) tends to be

smaller Equation (27) indicates the extreme case for 120573 asymp 0and (28) indicates the other extreme case for 120573 = 1

120573 is the coverage of 119877(119909) (or 119877minus1(119910)) to 119884 (or 119883) It doesnot mean precision of an approximation This is why we callthis model parametric rough sets instead of variable precisionrough sets [28] or probabilistic rough sets [45]

Similar to the discussion in Section 33 in some cases wewould like to employ the following definition

1198771015840

120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

100381610038161003816100381610038161003816100381610038161003816119877minus1 (119910)

1003816100381610038161003816ge 120573 (36)

It coincides with 119877120573(119883) defined in Definition 7 if 119880 = 119881

Take the clinic system again as the example 1198771015840120573(119883) is the set

of diseases that are caused mainly (with a probability no lessthan 120573 times 100) by symptoms in119883

4 A Backward Algorithm to GranularAssociation Rule Mining

In our previous work [5] we have proposed an algorithmaccording to (15) The algorithm starts from both sides andchecks the validity of all candidate rules Therefore it wasnamed a sandwich algorithm

To make use of the concept proposed in the Section 3 weshould rewrite (15) as followssconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

=10038161003816100381610038161003816100381610038161003816119909 isin 119880 |

|119877 (119909) cap RH (GR)||RH (GR)|

ge tc cap LH (GR)10038161003816100381610038161003816100381610038161003816

times (|LH (GR)|)minus1

=

10038161003816100381610038161003816119877minus1tc (RH (GR)) cap LH (GR)10038161003816100381610038161003816

|LH (GR)|

(37)

With this equation we propose an algorithm to deal withProblem 1The algorithm is listed inAlgorithm 1 It essentiallyhas four steps

Step 1 Search in (119880 119860) all granules meeting the minimalsource coverage threshold ms This step corresponds to Line1 of the algorithm where SG stands for source granule

Step 2 Search in (119881 119861) all granules meeting the minimaltarget coverage threshold mt This step corresponds to Line2 of the algorithm where TG stands for target granule

Mathematical Problems in Engineering 7

Input ES = (119880 119860 119881 119861 119877) ms mtOutput All complete match granular association rules satisfying given constraintsMethod complete-match-rules-backward

(1) SG(ms) = (1198601015840 119909) isin 2119860 times 119880 | |1198641198601015840 (119909)||119880| ge ms Candidate source granules

(2) TG(mt) = (1198611015840 119910) isin 2119861 times 119881 | |1198641198611015840 (119910)||119881| ge mt Candidate target granules

(3) for each 1198921015840 isin TG(ms) do(4) 119884 = 119890(1198921015840)(5) 119883 = 119877minus1tc(119884)(6) for each 119892 isin SG(mt) do(7) if (119890(119892) sube 119883) then(8) output rule 119894(119892) rArr 119894(1198921015840)(9) end if(10) end for(11) end for

Algorithm 1 A backward algorithm

Step 3 For each granule obtained in Step 1 construct a blockin 119880 according to 119877 This step corresponds to Line 4 of thealgorithmThe function 119890 has been defined in (5) Accordingto Definition 9 in our example 119877minus1tc(119884) are people who likeat least tc times 100 products in 119884

Step 4 Check possible rules regarding 1198921015840 and 119883 and outputall rules This step corresponds to Lines 6 through 10 of thealgorithm In Line 7 since 119890(119892) and 119883 could be stored insorted arrays the complexity of checking 119890(119892) sube 119883 is

119874 (1003816100381610038161003816119890 (119892)

1003816100381610038161003816 + |119883|) = 119874 (|119880|) (38)

where | sdot | denotes the cardinality of a set

Because the algorithm starts from the right-hand sideof the rule and proceeds to the left-hand side it is calleda backward algorithm It is necessary to compare the timecomplexities of the existing sandwich algorithm and our newbackward algorithm Both algorithms share Steps 1 and 2which do not incur the pattern explosion problemThereforewe will focus on the remaining steps The time complexity ofthe sandwich algorithm is [5]

119874 (|SC (ms)| times |TC (119898119905)| times |119880| times |119881|) (39)

where | sdot | denotes the cardinality of a setAccording to the loops the time complexity of Algo-

rithm 1 is

119874 (|SG (ms)| times |TG (mt)| times |119880|) (40)

which is lower than the sandwich algorithmIntuitively the backward algorithm avoids computing

119877(119909) cap RH(GR) for different rules with the same right-hand side Hence it should be less time consuming than thesandwich algorithm We will compare the run time of thesealgorithms in the next section through experimentation

The space complexities of these two algorithms are alsoimportant To store the relation 119877 a |119880| times |119881| Boolean matrixis needed

5 Experiments on Two Real-World Data Sets

The main purpose of our experiments is to answer thefollowing questions

(1) Does the backward algorithm outperform the sand-wich algorithm

(2) How does the number of rules change for differentnumber of objects

(3) How does the algorithm run time change for differentnumber of objects

(4) How does the number of rules vary for differentthresholds

(5) How does the performance of cold-start recommen-dation vary for the training and testing sets

51 Data Sets We collected two real-world data sets forexperimentation One is course selection and the other ismovie rating These data sets are quite representative forapplications

511 A Course Selection Data Set The course selectionsystem often serves as an example in textbooks to explainthe concept of many-to-many entity-relationship diagramsHence it is appropriate to produce meaningful granularassociation rules and test the performance of our algorithmWe obtained a data set from the course selection system ofMinnan Normal University (The authors would like to thankMrs Chunmei Zhou for her help in the data collection)Specifically we collected data during the semester between2011 and 2012 There are 145 general education courses in theuniversity 9654 students took part in course selection Thedatabase schema is as follows

(i) Student (student ID name gender birth-yearpolitics-status grade department nationality andlength of schooling)

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

4 Mathematical Problems in Engineering

There is a tradeoff between the source confidence and thetarget confidence of a rule Consequently neither value canbe obtained directly from the rule To compute any one ofthem we should specify the threshold of the other Let tc bethe target confidence thresholdThe source confidence of therule is

sconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

(15)

Let sc be the source confidence threshold and

|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870 + 1|

lt sc times |LH (GR)|

le |119909 isin LH (GR) | |119877 (119909) cap RH (GR)| ge 119870|

(16)

This equationmeans that sctimes100 elements in LH(GR) haveconnectionswith at least119870 elements in RH(GR) but less thansctimes100 elements in LH(GR) have connections with at least119870 + 1 elements in RH(GR) The target confidence of the ruleis

tconf (GR sc) = 119870

|RH (GR)| (17)

In fact the computation of 119870 is nontrivial First for any119909 isin LH(GR) we need to compute tc(119909) = |119877(119909) cap RH(GR)|and obtain an array of integers Second we sort the array ina descending order Third let 119896 = lfloorsc times | LH(GR)|rfloor 119870 is thekth element in the array

The relationships between rules are interesting to us Asan example let us consider the following rule

Rule 2 One has

⟨Gender Male⟩ and ⟨Country China⟩

997904rArr ⟨Category Alcohol⟩ and ⟨Country France⟩ (18)

Rule 2 can be read as ldquoChinese men like France alcoholrdquoOne may say that we can infer Rule 2 from Rule 1 sincethe former one has a finer granule However with the fourmeasures we know that the relationships between these tworules are not so simple A detailed explanation of Rule 2might be ldquo60 Chinese men like at least 50 kinds of Francealcohol 15 customers are Chinesemen and 2 products areFrance alcoholrdquo Comparing Rules 1 and 2 is stronger in termsof sourcetarget confidence however it is weaker in terms ofsourcetarget coverage Therefore if we need rules coveringmore people and products we prefer Rule 1 if we need moreconfidence on the rules we prefer Rule 2 For example if thesource confidence threshold is 55 Rule 2 might be validwhile Rule 1 is not if the source coverage is 20 Rule 1 mightbe valid while Rule 2 is not

23 The Granular Association Rule Mining Problem Astraightforward rule mining problem is as follows

Input An ES = (119880 119860 119881 119861 119877) a minimal source coveragethreshold ms a minimal target coverage threshold mt aminimal source confidence threshold sc and aminimal targetconfidence threshold tc

Output All granular association rules satisfying scov(GR) gems tcov(GR) ge mt sconf(GR) ge sc and tconf(GR) ge tc

Since both sc and tc are specified we can choose either(15) or (17) to decide whether or not a rule satisfies thesethresholds Equation (15) is a better choice

3 Parametric Rough Sets on Two Universes

In this section we first review rough approximations [32]on one universe Then we present rough approximations ontwo universes Finally we present parametric rough approx-imations on two universes Some concepts are dedicated togranular association rules We will explain in detail the waythey are defined from semantic point of view

31 Classical Rough Sets The classical rough sets [32 34] arebuilt upon lower and upper approximations on one universeWe adopt the ideas and notions introduced in [35] and definethese concepts as follows

Definition 6 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 are

119877 (119883) = 119909 isin 119883 | 119877 (119909) sube 119883 (19)

119877 (119883) = 119909 isin 119883 | 119877 (119909) cap 119883 = 0 (20)

respectively

These concepts can be employed for set approximationor classification analysis For set approximation the interval[119877(119883) 119877(119883)] is called rough set of 119883 which provides anapproximate characterization of 119883 by the objects that sharethe same description of its members [35] For classificationanalysis the lower approximation operator helps findingcertain rules while the upper approximation helps findingpossible rules

32 Existing Parametric Rough Sets Ziarko [28] pointed outthat the classical rough sets cannot handle classification witha controlled degree of uncertainty or amisclassification errorConsequently he proposed variable precision rough sets [28]with a parameter to indicate the admissible classificationerror

Let 119883 and 119884 be nonempty subsets of a finite universe 119880and 119884 sube 119883 The measure 119888(119883 119884) of the relative degree ofmisclassification of the set 119883 with respect to set 119884 is definedas

119888 (119883 119884) =

1 minus|119883 cap 119884|

|119883| |119883| gt 0

0 |119883| = 0

(21)

where | sdot | denotes set cardinality

Mathematical Problems in Engineering 5

The equivalence relation 119877 corresponds to a partitioningof119880 into a collection of equivalence classes or elementary sets119877lowast = 119864

1 1198642 119864

119899 The 120572-lower and 120572-upper approxima-

tion of the set 119880 supe 119883 are defined as

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) le 120572

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) lt 1 minus 120572

(22)

where 0 le 120572 lt 05For the convenience of discussion we rewrite his defini-

tion as follows

Definition 7 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 under precision 120573 are

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|ge 120573

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|gt 1 minus 120573

(23)

respectively 119877(119909) is the equivalence class containing 119909

Note that 05 lt 120573 le 1 indicate the classification accuracy(precision) threshold rather than the misclassification errorthreshold as employed in [28] Wong et al [31] extended thedefinition to an arbitrary binary relation which is at leastserial

Ziarko [28] introduced variable precision rough sets witha parameter The model had seldom explanation about theparameter Yao and Wong [36] studied the condition ofparameter and introduced the decision theoretic rough set(DTRS) model DTRS model is a probabilistic rough sets inthe framework of Bayesian decision theory It requires a pairof thresholds (120572 120573) instead of only one in variable precisionrough sets Its main advantage is the solid foundation basedon Bayesian decision theory (120572 120573) can be systematicallycomputed by minimizing overall ternary classification cost[36]Therefore this theory has drawnmuch research interestsin both theory (see eg [37 38]) and application (see eg[39ndash41])

Gong and Sun [42] firstly defined the concept of theprobabilistic rough set over two universes Ma and Sun [43]presented the parameter dependence or the continuous oflower and upper approximations about two parameters 120572and 120573 for every type probabilistic rough set model over twouniverses in detail Moreover a new model of probabilisticfuzzy rough set over two universes [44] is proposed Thesetheories have broad and wide application prospects

33 Rough Sets on Two Universes for Granular AssociationRules Since our datamodel is concerned with two universeswe should consider computationmodels for this type of dataRough sets on two universes have been defined in [31] Somelater works adopt the same definitions (see eg [29 30])We will present our definitions which cope with granularassociation rulesThenwe discuss why they are different fromexisting ones

Definition 8 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation The lower and upper approximations of 119883 sube

119880 with respect to 119877 are

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) supe 119883 (24)

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) cap 119883 = 0 (25)

respectivelyFrom this definition we know immediately that for 119884 sube

119881

119877minus1(119884) = 119909 isin 119880 | 119877 (119909) supe 119884

119877minus1 (119884) = 119909 isin 119880 | 119877 (119909) cap 119884 = 0

(26)

Nowwe explain these notions through our example119877(119883)contains products that favor all people in119883 119877minus1(119884) containspeoplewho like all products in119884119877(119883) contains products thatfavor at least one person in 119883 and 119877minus1(119884) contains peoplewho like at least one product in 119884

We have the following property concerning the mono-tonicity of these approximations

Property 1 Let1198831sub 1198832

119877 (1198831) supe 119877 (119883

2) (27)

119877 (1198831) sube 119877 (119883

2) (28)

That is with the increase of the object subset the lowerapproximation decreases while the upper approximationincreases It is somehow ad hoc to people in the rough setsociety that the lower approximation decreases in this caseIn fact according to Wong et al [31] Liu [30] and Sun et al[44] (24) should be rewritten as

1198771015840(119883) = 119910 isin 119881 | 119877

minus1(119910) sube 119883 (29)

where the prim is employed to distinguish between twodefinitions In this way 1198771015840(119883

1) sube 1198771015840(119883

2) Moreover it would

coincide with (19) when 119880 = 119881We argue that different definitions of the lower approx-

imation are appropriate for different applications Supposethat there is a clinic systemwhere119880 is the set of all symptomsand119881 is the set of all diseases [31]119883 sube 119880 is a set of symptomsAccording to (29) 1198771015840(119883) contains diseases that induced bysymptoms only in 119883 That is if a patient has no symptom in119883 she never has any diseases in 1198771015840(119883) This type of rules isnatural and useful

In our example presented in Section 2 119883 sube 119880 is a groupof people 1198771015840(119883) contains products that favor people only in119883 That is if a person does not belong to 119883 she never likesproducts in 1198771015840(119883) Unfortunately this type of rules is notinteresting to us This is why we employ equation (24) forlower approximation

We are looking for very strong rules through the lowerapproximation indicated in Definition 8 For example ldquoallmen like all alcoholrdquo This kind of rules are called completematch rules [6] However they seldom exist in applications

6 Mathematical Problems in Engineering

On the other hand we are looking for very weak rulesthrough the upper approximation For example ldquoat least oneman like at least one kind of alcoholrdquo Another extremeexample is ldquoall people like at least one kind of productrdquo whichhold for any data set Therefore this type of rules is uselessThese issues will be addressed through a more general modelin the next section

34 Parametric Rough Sets on Two Universes for GranularAssociation Rules Given a group of people the number ofproducts that favor all of them is often quite small On theother hand the number of products that favor at least one ofthem is not quite meaningful Similar to probabilistic roughsets we need to introduce one or more parameters to themodel

To cope with the source confidence measure introducedin Section 22 we propose the following definition

Definition 9 Let 119880 and 119881 be two universes 119877 sube 119880 times 119881 abinary relation and 0 lt 120573 le 1 a user-specified thresholdThelower approximation of119883 sube 119880with respect to119877 for threshold120573 is

119877120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge 120573 (30)

We do not discuss the upper approximation in the newcontext due to lack of semantic

From this definition we know immediately that the lowerapproximation of 119884 sube 119881 with respect to 119877 is

119877minus1

120573(119884) = 119909 isin 119880 |

|119877 (119909) cap 119884|

|119884|ge 120573 (31)

Here 120573 corresponds with the target confidence instead Inour example 119877

120573(119883) are products that favor at least 120573 times 100

people in119883 and119877minus1120573(119884) are people who like at least120573times100

products in 119884The following property indicates that 119877

120573(119883) is a general-

ization of both 119877(119883) and 119877(119883)

Property 2 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation

1198771(119883) = 119877 (119883) (32)

119877120576(119883) = 119877 (119883) (33)

where 120576 asymp 0 is a small positive number

Proof One has10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge1lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816= |119883|lArrrArr119877

minus1(119910) supe 119883

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge120576lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816gt 0lArrrArr119877

minus1(119910) cap 119883 = 0

(34)

The following property shows themonotonicity of119877120573(119883)

Property 3 Let 0 lt 1205731lt 1205732le 1

1198771205732

(119883) sube 1198771205731

(119883) (35)

However given 1198831sub 1198832 we obtain neither 119877

120573(1198831) sube

119877120573(1198832) nor 119877

120573(1198831) supe 119877

120573(1198832) The relationships between

119877120573(1198831) and 119877

120573(1198832) depend on 120573 Generally if 120573 is big

119877120573(1198831) tends to be bigger otherwise 119877

120573(1198831) tends to be

smaller Equation (27) indicates the extreme case for 120573 asymp 0and (28) indicates the other extreme case for 120573 = 1

120573 is the coverage of 119877(119909) (or 119877minus1(119910)) to 119884 (or 119883) It doesnot mean precision of an approximation This is why we callthis model parametric rough sets instead of variable precisionrough sets [28] or probabilistic rough sets [45]

Similar to the discussion in Section 33 in some cases wewould like to employ the following definition

1198771015840

120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

100381610038161003816100381610038161003816100381610038161003816119877minus1 (119910)

1003816100381610038161003816ge 120573 (36)

It coincides with 119877120573(119883) defined in Definition 7 if 119880 = 119881

Take the clinic system again as the example 1198771015840120573(119883) is the set

of diseases that are caused mainly (with a probability no lessthan 120573 times 100) by symptoms in119883

4 A Backward Algorithm to GranularAssociation Rule Mining

In our previous work [5] we have proposed an algorithmaccording to (15) The algorithm starts from both sides andchecks the validity of all candidate rules Therefore it wasnamed a sandwich algorithm

To make use of the concept proposed in the Section 3 weshould rewrite (15) as followssconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

=10038161003816100381610038161003816100381610038161003816119909 isin 119880 |

|119877 (119909) cap RH (GR)||RH (GR)|

ge tc cap LH (GR)10038161003816100381610038161003816100381610038161003816

times (|LH (GR)|)minus1

=

10038161003816100381610038161003816119877minus1tc (RH (GR)) cap LH (GR)10038161003816100381610038161003816

|LH (GR)|

(37)

With this equation we propose an algorithm to deal withProblem 1The algorithm is listed inAlgorithm 1 It essentiallyhas four steps

Step 1 Search in (119880 119860) all granules meeting the minimalsource coverage threshold ms This step corresponds to Line1 of the algorithm where SG stands for source granule

Step 2 Search in (119881 119861) all granules meeting the minimaltarget coverage threshold mt This step corresponds to Line2 of the algorithm where TG stands for target granule

Mathematical Problems in Engineering 7

Input ES = (119880 119860 119881 119861 119877) ms mtOutput All complete match granular association rules satisfying given constraintsMethod complete-match-rules-backward

(1) SG(ms) = (1198601015840 119909) isin 2119860 times 119880 | |1198641198601015840 (119909)||119880| ge ms Candidate source granules

(2) TG(mt) = (1198611015840 119910) isin 2119861 times 119881 | |1198641198611015840 (119910)||119881| ge mt Candidate target granules

(3) for each 1198921015840 isin TG(ms) do(4) 119884 = 119890(1198921015840)(5) 119883 = 119877minus1tc(119884)(6) for each 119892 isin SG(mt) do(7) if (119890(119892) sube 119883) then(8) output rule 119894(119892) rArr 119894(1198921015840)(9) end if(10) end for(11) end for

Algorithm 1 A backward algorithm

Step 3 For each granule obtained in Step 1 construct a blockin 119880 according to 119877 This step corresponds to Line 4 of thealgorithmThe function 119890 has been defined in (5) Accordingto Definition 9 in our example 119877minus1tc(119884) are people who likeat least tc times 100 products in 119884

Step 4 Check possible rules regarding 1198921015840 and 119883 and outputall rules This step corresponds to Lines 6 through 10 of thealgorithm In Line 7 since 119890(119892) and 119883 could be stored insorted arrays the complexity of checking 119890(119892) sube 119883 is

119874 (1003816100381610038161003816119890 (119892)

1003816100381610038161003816 + |119883|) = 119874 (|119880|) (38)

where | sdot | denotes the cardinality of a set

Because the algorithm starts from the right-hand sideof the rule and proceeds to the left-hand side it is calleda backward algorithm It is necessary to compare the timecomplexities of the existing sandwich algorithm and our newbackward algorithm Both algorithms share Steps 1 and 2which do not incur the pattern explosion problemThereforewe will focus on the remaining steps The time complexity ofthe sandwich algorithm is [5]

119874 (|SC (ms)| times |TC (119898119905)| times |119880| times |119881|) (39)

where | sdot | denotes the cardinality of a setAccording to the loops the time complexity of Algo-

rithm 1 is

119874 (|SG (ms)| times |TG (mt)| times |119880|) (40)

which is lower than the sandwich algorithmIntuitively the backward algorithm avoids computing

119877(119909) cap RH(GR) for different rules with the same right-hand side Hence it should be less time consuming than thesandwich algorithm We will compare the run time of thesealgorithms in the next section through experimentation

The space complexities of these two algorithms are alsoimportant To store the relation 119877 a |119880| times |119881| Boolean matrixis needed

5 Experiments on Two Real-World Data Sets

The main purpose of our experiments is to answer thefollowing questions

(1) Does the backward algorithm outperform the sand-wich algorithm

(2) How does the number of rules change for differentnumber of objects

(3) How does the algorithm run time change for differentnumber of objects

(4) How does the number of rules vary for differentthresholds

(5) How does the performance of cold-start recommen-dation vary for the training and testing sets

51 Data Sets We collected two real-world data sets forexperimentation One is course selection and the other ismovie rating These data sets are quite representative forapplications

511 A Course Selection Data Set The course selectionsystem often serves as an example in textbooks to explainthe concept of many-to-many entity-relationship diagramsHence it is appropriate to produce meaningful granularassociation rules and test the performance of our algorithmWe obtained a data set from the course selection system ofMinnan Normal University (The authors would like to thankMrs Chunmei Zhou for her help in the data collection)Specifically we collected data during the semester between2011 and 2012 There are 145 general education courses in theuniversity 9654 students took part in course selection Thedatabase schema is as follows

(i) Student (student ID name gender birth-yearpolitics-status grade department nationality andlength of schooling)

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Mathematical Problems in Engineering 5

The equivalence relation 119877 corresponds to a partitioningof119880 into a collection of equivalence classes or elementary sets119877lowast = 119864

1 1198642 119864

119899 The 120572-lower and 120572-upper approxima-

tion of the set 119880 supe 119883 are defined as

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) le 120572

119877120572(119883) = ⋃119864 isin 119877

lowast| 119888 (119864119883) lt 1 minus 120572

(22)

where 0 le 120572 lt 05For the convenience of discussion we rewrite his defini-

tion as follows

Definition 7 Let 119880 be a universe and 119877 sube 119880 times 119880 an indis-cernibility relation The lower and upper approximations of119883 sube 119880 with respect to 119877 under precision 120573 are

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|ge 120573

119877120573(119883) = 119909 isin 119880 |

|119877 (119909) cap 119883|

|119877 (119909)|gt 1 minus 120573

(23)

respectively 119877(119909) is the equivalence class containing 119909

Note that 05 lt 120573 le 1 indicate the classification accuracy(precision) threshold rather than the misclassification errorthreshold as employed in [28] Wong et al [31] extended thedefinition to an arbitrary binary relation which is at leastserial

Ziarko [28] introduced variable precision rough sets witha parameter The model had seldom explanation about theparameter Yao and Wong [36] studied the condition ofparameter and introduced the decision theoretic rough set(DTRS) model DTRS model is a probabilistic rough sets inthe framework of Bayesian decision theory It requires a pairof thresholds (120572 120573) instead of only one in variable precisionrough sets Its main advantage is the solid foundation basedon Bayesian decision theory (120572 120573) can be systematicallycomputed by minimizing overall ternary classification cost[36]Therefore this theory has drawnmuch research interestsin both theory (see eg [37 38]) and application (see eg[39ndash41])

Gong and Sun [42] firstly defined the concept of theprobabilistic rough set over two universes Ma and Sun [43]presented the parameter dependence or the continuous oflower and upper approximations about two parameters 120572and 120573 for every type probabilistic rough set model over twouniverses in detail Moreover a new model of probabilisticfuzzy rough set over two universes [44] is proposed Thesetheories have broad and wide application prospects

33 Rough Sets on Two Universes for Granular AssociationRules Since our datamodel is concerned with two universeswe should consider computationmodels for this type of dataRough sets on two universes have been defined in [31] Somelater works adopt the same definitions (see eg [29 30])We will present our definitions which cope with granularassociation rulesThenwe discuss why they are different fromexisting ones

Definition 8 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation The lower and upper approximations of 119883 sube

119880 with respect to 119877 are

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) supe 119883 (24)

119877 (119883) = 119910 isin 119881 | 119877minus1(119910) cap 119883 = 0 (25)

respectivelyFrom this definition we know immediately that for 119884 sube

119881

119877minus1(119884) = 119909 isin 119880 | 119877 (119909) supe 119884

119877minus1 (119884) = 119909 isin 119880 | 119877 (119909) cap 119884 = 0

(26)

Nowwe explain these notions through our example119877(119883)contains products that favor all people in119883 119877minus1(119884) containspeoplewho like all products in119884119877(119883) contains products thatfavor at least one person in 119883 and 119877minus1(119884) contains peoplewho like at least one product in 119884

We have the following property concerning the mono-tonicity of these approximations

Property 1 Let1198831sub 1198832

119877 (1198831) supe 119877 (119883

2) (27)

119877 (1198831) sube 119877 (119883

2) (28)

That is with the increase of the object subset the lowerapproximation decreases while the upper approximationincreases It is somehow ad hoc to people in the rough setsociety that the lower approximation decreases in this caseIn fact according to Wong et al [31] Liu [30] and Sun et al[44] (24) should be rewritten as

1198771015840(119883) = 119910 isin 119881 | 119877

minus1(119910) sube 119883 (29)

where the prim is employed to distinguish between twodefinitions In this way 1198771015840(119883

1) sube 1198771015840(119883

2) Moreover it would

coincide with (19) when 119880 = 119881We argue that different definitions of the lower approx-

imation are appropriate for different applications Supposethat there is a clinic systemwhere119880 is the set of all symptomsand119881 is the set of all diseases [31]119883 sube 119880 is a set of symptomsAccording to (29) 1198771015840(119883) contains diseases that induced bysymptoms only in 119883 That is if a patient has no symptom in119883 she never has any diseases in 1198771015840(119883) This type of rules isnatural and useful

In our example presented in Section 2 119883 sube 119880 is a groupof people 1198771015840(119883) contains products that favor people only in119883 That is if a person does not belong to 119883 she never likesproducts in 1198771015840(119883) Unfortunately this type of rules is notinteresting to us This is why we employ equation (24) forlower approximation

We are looking for very strong rules through the lowerapproximation indicated in Definition 8 For example ldquoallmen like all alcoholrdquo This kind of rules are called completematch rules [6] However they seldom exist in applications

6 Mathematical Problems in Engineering

On the other hand we are looking for very weak rulesthrough the upper approximation For example ldquoat least oneman like at least one kind of alcoholrdquo Another extremeexample is ldquoall people like at least one kind of productrdquo whichhold for any data set Therefore this type of rules is uselessThese issues will be addressed through a more general modelin the next section

34 Parametric Rough Sets on Two Universes for GranularAssociation Rules Given a group of people the number ofproducts that favor all of them is often quite small On theother hand the number of products that favor at least one ofthem is not quite meaningful Similar to probabilistic roughsets we need to introduce one or more parameters to themodel

To cope with the source confidence measure introducedin Section 22 we propose the following definition

Definition 9 Let 119880 and 119881 be two universes 119877 sube 119880 times 119881 abinary relation and 0 lt 120573 le 1 a user-specified thresholdThelower approximation of119883 sube 119880with respect to119877 for threshold120573 is

119877120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge 120573 (30)

We do not discuss the upper approximation in the newcontext due to lack of semantic

From this definition we know immediately that the lowerapproximation of 119884 sube 119881 with respect to 119877 is

119877minus1

120573(119884) = 119909 isin 119880 |

|119877 (119909) cap 119884|

|119884|ge 120573 (31)

Here 120573 corresponds with the target confidence instead Inour example 119877

120573(119883) are products that favor at least 120573 times 100

people in119883 and119877minus1120573(119884) are people who like at least120573times100

products in 119884The following property indicates that 119877

120573(119883) is a general-

ization of both 119877(119883) and 119877(119883)

Property 2 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation

1198771(119883) = 119877 (119883) (32)

119877120576(119883) = 119877 (119883) (33)

where 120576 asymp 0 is a small positive number

Proof One has10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge1lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816= |119883|lArrrArr119877

minus1(119910) supe 119883

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge120576lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816gt 0lArrrArr119877

minus1(119910) cap 119883 = 0

(34)

The following property shows themonotonicity of119877120573(119883)

Property 3 Let 0 lt 1205731lt 1205732le 1

1198771205732

(119883) sube 1198771205731

(119883) (35)

However given 1198831sub 1198832 we obtain neither 119877

120573(1198831) sube

119877120573(1198832) nor 119877

120573(1198831) supe 119877

120573(1198832) The relationships between

119877120573(1198831) and 119877

120573(1198832) depend on 120573 Generally if 120573 is big

119877120573(1198831) tends to be bigger otherwise 119877

120573(1198831) tends to be

smaller Equation (27) indicates the extreme case for 120573 asymp 0and (28) indicates the other extreme case for 120573 = 1

120573 is the coverage of 119877(119909) (or 119877minus1(119910)) to 119884 (or 119883) It doesnot mean precision of an approximation This is why we callthis model parametric rough sets instead of variable precisionrough sets [28] or probabilistic rough sets [45]

Similar to the discussion in Section 33 in some cases wewould like to employ the following definition

1198771015840

120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

100381610038161003816100381610038161003816100381610038161003816119877minus1 (119910)

1003816100381610038161003816ge 120573 (36)

It coincides with 119877120573(119883) defined in Definition 7 if 119880 = 119881

Take the clinic system again as the example 1198771015840120573(119883) is the set

of diseases that are caused mainly (with a probability no lessthan 120573 times 100) by symptoms in119883

4 A Backward Algorithm to GranularAssociation Rule Mining

In our previous work [5] we have proposed an algorithmaccording to (15) The algorithm starts from both sides andchecks the validity of all candidate rules Therefore it wasnamed a sandwich algorithm

To make use of the concept proposed in the Section 3 weshould rewrite (15) as followssconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

=10038161003816100381610038161003816100381610038161003816119909 isin 119880 |

|119877 (119909) cap RH (GR)||RH (GR)|

ge tc cap LH (GR)10038161003816100381610038161003816100381610038161003816

times (|LH (GR)|)minus1

=

10038161003816100381610038161003816119877minus1tc (RH (GR)) cap LH (GR)10038161003816100381610038161003816

|LH (GR)|

(37)

With this equation we propose an algorithm to deal withProblem 1The algorithm is listed inAlgorithm 1 It essentiallyhas four steps

Step 1 Search in (119880 119860) all granules meeting the minimalsource coverage threshold ms This step corresponds to Line1 of the algorithm where SG stands for source granule

Step 2 Search in (119881 119861) all granules meeting the minimaltarget coverage threshold mt This step corresponds to Line2 of the algorithm where TG stands for target granule

Mathematical Problems in Engineering 7

Input ES = (119880 119860 119881 119861 119877) ms mtOutput All complete match granular association rules satisfying given constraintsMethod complete-match-rules-backward

(1) SG(ms) = (1198601015840 119909) isin 2119860 times 119880 | |1198641198601015840 (119909)||119880| ge ms Candidate source granules

(2) TG(mt) = (1198611015840 119910) isin 2119861 times 119881 | |1198641198611015840 (119910)||119881| ge mt Candidate target granules

(3) for each 1198921015840 isin TG(ms) do(4) 119884 = 119890(1198921015840)(5) 119883 = 119877minus1tc(119884)(6) for each 119892 isin SG(mt) do(7) if (119890(119892) sube 119883) then(8) output rule 119894(119892) rArr 119894(1198921015840)(9) end if(10) end for(11) end for

Algorithm 1 A backward algorithm

Step 3 For each granule obtained in Step 1 construct a blockin 119880 according to 119877 This step corresponds to Line 4 of thealgorithmThe function 119890 has been defined in (5) Accordingto Definition 9 in our example 119877minus1tc(119884) are people who likeat least tc times 100 products in 119884

Step 4 Check possible rules regarding 1198921015840 and 119883 and outputall rules This step corresponds to Lines 6 through 10 of thealgorithm In Line 7 since 119890(119892) and 119883 could be stored insorted arrays the complexity of checking 119890(119892) sube 119883 is

119874 (1003816100381610038161003816119890 (119892)

1003816100381610038161003816 + |119883|) = 119874 (|119880|) (38)

where | sdot | denotes the cardinality of a set

Because the algorithm starts from the right-hand sideof the rule and proceeds to the left-hand side it is calleda backward algorithm It is necessary to compare the timecomplexities of the existing sandwich algorithm and our newbackward algorithm Both algorithms share Steps 1 and 2which do not incur the pattern explosion problemThereforewe will focus on the remaining steps The time complexity ofthe sandwich algorithm is [5]

119874 (|SC (ms)| times |TC (119898119905)| times |119880| times |119881|) (39)

where | sdot | denotes the cardinality of a setAccording to the loops the time complexity of Algo-

rithm 1 is

119874 (|SG (ms)| times |TG (mt)| times |119880|) (40)

which is lower than the sandwich algorithmIntuitively the backward algorithm avoids computing

119877(119909) cap RH(GR) for different rules with the same right-hand side Hence it should be less time consuming than thesandwich algorithm We will compare the run time of thesealgorithms in the next section through experimentation

The space complexities of these two algorithms are alsoimportant To store the relation 119877 a |119880| times |119881| Boolean matrixis needed

5 Experiments on Two Real-World Data Sets

The main purpose of our experiments is to answer thefollowing questions

(1) Does the backward algorithm outperform the sand-wich algorithm

(2) How does the number of rules change for differentnumber of objects

(3) How does the algorithm run time change for differentnumber of objects

(4) How does the number of rules vary for differentthresholds

(5) How does the performance of cold-start recommen-dation vary for the training and testing sets

51 Data Sets We collected two real-world data sets forexperimentation One is course selection and the other ismovie rating These data sets are quite representative forapplications

511 A Course Selection Data Set The course selectionsystem often serves as an example in textbooks to explainthe concept of many-to-many entity-relationship diagramsHence it is appropriate to produce meaningful granularassociation rules and test the performance of our algorithmWe obtained a data set from the course selection system ofMinnan Normal University (The authors would like to thankMrs Chunmei Zhou for her help in the data collection)Specifically we collected data during the semester between2011 and 2012 There are 145 general education courses in theuniversity 9654 students took part in course selection Thedatabase schema is as follows

(i) Student (student ID name gender birth-yearpolitics-status grade department nationality andlength of schooling)

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

6 Mathematical Problems in Engineering

On the other hand we are looking for very weak rulesthrough the upper approximation For example ldquoat least oneman like at least one kind of alcoholrdquo Another extremeexample is ldquoall people like at least one kind of productrdquo whichhold for any data set Therefore this type of rules is uselessThese issues will be addressed through a more general modelin the next section

34 Parametric Rough Sets on Two Universes for GranularAssociation Rules Given a group of people the number ofproducts that favor all of them is often quite small On theother hand the number of products that favor at least one ofthem is not quite meaningful Similar to probabilistic roughsets we need to introduce one or more parameters to themodel

To cope with the source confidence measure introducedin Section 22 we propose the following definition

Definition 9 Let 119880 and 119881 be two universes 119877 sube 119880 times 119881 abinary relation and 0 lt 120573 le 1 a user-specified thresholdThelower approximation of119883 sube 119880with respect to119877 for threshold120573 is

119877120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge 120573 (30)

We do not discuss the upper approximation in the newcontext due to lack of semantic

From this definition we know immediately that the lowerapproximation of 119884 sube 119881 with respect to 119877 is

119877minus1

120573(119884) = 119909 isin 119880 |

|119877 (119909) cap 119884|

|119884|ge 120573 (31)

Here 120573 corresponds with the target confidence instead Inour example 119877

120573(119883) are products that favor at least 120573 times 100

people in119883 and119877minus1120573(119884) are people who like at least120573times100

products in 119884The following property indicates that 119877

120573(119883) is a general-

ization of both 119877(119883) and 119877(119883)

Property 2 Let 119880 and 119881 be two universes and 119877 sube 119880 times 119881 abinary relation

1198771(119883) = 119877 (119883) (32)

119877120576(119883) = 119877 (119883) (33)

where 120576 asymp 0 is a small positive number

Proof One has10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge1lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816= |119883|lArrrArr119877

minus1(119910) supe 119883

10038161003816100381610038161003816119877minus1 (119910) cap 119883

10038161003816100381610038161003816

|119883|ge120576lArrrArr

10038161003816100381610038161003816119877minus1(119910) cap 119883

10038161003816100381610038161003816gt 0lArrrArr119877

minus1(119910) cap 119883 = 0

(34)

The following property shows themonotonicity of119877120573(119883)

Property 3 Let 0 lt 1205731lt 1205732le 1

1198771205732

(119883) sube 1198771205731

(119883) (35)

However given 1198831sub 1198832 we obtain neither 119877

120573(1198831) sube

119877120573(1198832) nor 119877

120573(1198831) supe 119877

120573(1198832) The relationships between

119877120573(1198831) and 119877

120573(1198832) depend on 120573 Generally if 120573 is big

119877120573(1198831) tends to be bigger otherwise 119877

120573(1198831) tends to be

smaller Equation (27) indicates the extreme case for 120573 asymp 0and (28) indicates the other extreme case for 120573 = 1

120573 is the coverage of 119877(119909) (or 119877minus1(119910)) to 119884 (or 119883) It doesnot mean precision of an approximation This is why we callthis model parametric rough sets instead of variable precisionrough sets [28] or probabilistic rough sets [45]

Similar to the discussion in Section 33 in some cases wewould like to employ the following definition

1198771015840

120573(119883) = 119910 isin 119881 |

10038161003816100381610038161003816119877minus1 (119910) cap 119883

100381610038161003816100381610038161003816100381610038161003816119877minus1 (119910)

1003816100381610038161003816ge 120573 (36)

It coincides with 119877120573(119883) defined in Definition 7 if 119880 = 119881

Take the clinic system again as the example 1198771015840120573(119883) is the set

of diseases that are caused mainly (with a probability no lessthan 120573 times 100) by symptoms in119883

4 A Backward Algorithm to GranularAssociation Rule Mining

In our previous work [5] we have proposed an algorithmaccording to (15) The algorithm starts from both sides andchecks the validity of all candidate rules Therefore it wasnamed a sandwich algorithm

To make use of the concept proposed in the Section 3 weshould rewrite (15) as followssconf (GR tc)

=|119909 isin LH (GR) | |119877 (119909) cap RH (GR)| |RH (GR)| ge tc|

|LH (GR)|

=10038161003816100381610038161003816100381610038161003816119909 isin 119880 |

|119877 (119909) cap RH (GR)||RH (GR)|

ge tc cap LH (GR)10038161003816100381610038161003816100381610038161003816

times (|LH (GR)|)minus1

=

10038161003816100381610038161003816119877minus1tc (RH (GR)) cap LH (GR)10038161003816100381610038161003816

|LH (GR)|

(37)

With this equation we propose an algorithm to deal withProblem 1The algorithm is listed inAlgorithm 1 It essentiallyhas four steps

Step 1 Search in (119880 119860) all granules meeting the minimalsource coverage threshold ms This step corresponds to Line1 of the algorithm where SG stands for source granule

Step 2 Search in (119881 119861) all granules meeting the minimaltarget coverage threshold mt This step corresponds to Line2 of the algorithm where TG stands for target granule

Mathematical Problems in Engineering 7

Input ES = (119880 119860 119881 119861 119877) ms mtOutput All complete match granular association rules satisfying given constraintsMethod complete-match-rules-backward

(1) SG(ms) = (1198601015840 119909) isin 2119860 times 119880 | |1198641198601015840 (119909)||119880| ge ms Candidate source granules

(2) TG(mt) = (1198611015840 119910) isin 2119861 times 119881 | |1198641198611015840 (119910)||119881| ge mt Candidate target granules

(3) for each 1198921015840 isin TG(ms) do(4) 119884 = 119890(1198921015840)(5) 119883 = 119877minus1tc(119884)(6) for each 119892 isin SG(mt) do(7) if (119890(119892) sube 119883) then(8) output rule 119894(119892) rArr 119894(1198921015840)(9) end if(10) end for(11) end for

Algorithm 1 A backward algorithm

Step 3 For each granule obtained in Step 1 construct a blockin 119880 according to 119877 This step corresponds to Line 4 of thealgorithmThe function 119890 has been defined in (5) Accordingto Definition 9 in our example 119877minus1tc(119884) are people who likeat least tc times 100 products in 119884

Step 4 Check possible rules regarding 1198921015840 and 119883 and outputall rules This step corresponds to Lines 6 through 10 of thealgorithm In Line 7 since 119890(119892) and 119883 could be stored insorted arrays the complexity of checking 119890(119892) sube 119883 is

119874 (1003816100381610038161003816119890 (119892)

1003816100381610038161003816 + |119883|) = 119874 (|119880|) (38)

where | sdot | denotes the cardinality of a set

Because the algorithm starts from the right-hand sideof the rule and proceeds to the left-hand side it is calleda backward algorithm It is necessary to compare the timecomplexities of the existing sandwich algorithm and our newbackward algorithm Both algorithms share Steps 1 and 2which do not incur the pattern explosion problemThereforewe will focus on the remaining steps The time complexity ofthe sandwich algorithm is [5]

119874 (|SC (ms)| times |TC (119898119905)| times |119880| times |119881|) (39)

where | sdot | denotes the cardinality of a setAccording to the loops the time complexity of Algo-

rithm 1 is

119874 (|SG (ms)| times |TG (mt)| times |119880|) (40)

which is lower than the sandwich algorithmIntuitively the backward algorithm avoids computing

119877(119909) cap RH(GR) for different rules with the same right-hand side Hence it should be less time consuming than thesandwich algorithm We will compare the run time of thesealgorithms in the next section through experimentation

The space complexities of these two algorithms are alsoimportant To store the relation 119877 a |119880| times |119881| Boolean matrixis needed

5 Experiments on Two Real-World Data Sets

The main purpose of our experiments is to answer thefollowing questions

(1) Does the backward algorithm outperform the sand-wich algorithm

(2) How does the number of rules change for differentnumber of objects

(3) How does the algorithm run time change for differentnumber of objects

(4) How does the number of rules vary for differentthresholds

(5) How does the performance of cold-start recommen-dation vary for the training and testing sets

51 Data Sets We collected two real-world data sets forexperimentation One is course selection and the other ismovie rating These data sets are quite representative forapplications

511 A Course Selection Data Set The course selectionsystem often serves as an example in textbooks to explainthe concept of many-to-many entity-relationship diagramsHence it is appropriate to produce meaningful granularassociation rules and test the performance of our algorithmWe obtained a data set from the course selection system ofMinnan Normal University (The authors would like to thankMrs Chunmei Zhou for her help in the data collection)Specifically we collected data during the semester between2011 and 2012 There are 145 general education courses in theuniversity 9654 students took part in course selection Thedatabase schema is as follows

(i) Student (student ID name gender birth-yearpolitics-status grade department nationality andlength of schooling)

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Mathematical Problems in Engineering 7

Input ES = (119880 119860 119881 119861 119877) ms mtOutput All complete match granular association rules satisfying given constraintsMethod complete-match-rules-backward

(1) SG(ms) = (1198601015840 119909) isin 2119860 times 119880 | |1198641198601015840 (119909)||119880| ge ms Candidate source granules

(2) TG(mt) = (1198611015840 119910) isin 2119861 times 119881 | |1198641198611015840 (119910)||119881| ge mt Candidate target granules

(3) for each 1198921015840 isin TG(ms) do(4) 119884 = 119890(1198921015840)(5) 119883 = 119877minus1tc(119884)(6) for each 119892 isin SG(mt) do(7) if (119890(119892) sube 119883) then(8) output rule 119894(119892) rArr 119894(1198921015840)(9) end if(10) end for(11) end for

Algorithm 1 A backward algorithm

Step 3 For each granule obtained in Step 1 construct a blockin 119880 according to 119877 This step corresponds to Line 4 of thealgorithmThe function 119890 has been defined in (5) Accordingto Definition 9 in our example 119877minus1tc(119884) are people who likeat least tc times 100 products in 119884

Step 4 Check possible rules regarding 1198921015840 and 119883 and outputall rules This step corresponds to Lines 6 through 10 of thealgorithm In Line 7 since 119890(119892) and 119883 could be stored insorted arrays the complexity of checking 119890(119892) sube 119883 is

119874 (1003816100381610038161003816119890 (119892)

1003816100381610038161003816 + |119883|) = 119874 (|119880|) (38)

where | sdot | denotes the cardinality of a set

Because the algorithm starts from the right-hand sideof the rule and proceeds to the left-hand side it is calleda backward algorithm It is necessary to compare the timecomplexities of the existing sandwich algorithm and our newbackward algorithm Both algorithms share Steps 1 and 2which do not incur the pattern explosion problemThereforewe will focus on the remaining steps The time complexity ofthe sandwich algorithm is [5]

119874 (|SC (ms)| times |TC (119898119905)| times |119880| times |119881|) (39)

where | sdot | denotes the cardinality of a setAccording to the loops the time complexity of Algo-

rithm 1 is

119874 (|SG (ms)| times |TG (mt)| times |119880|) (40)

which is lower than the sandwich algorithmIntuitively the backward algorithm avoids computing

119877(119909) cap RH(GR) for different rules with the same right-hand side Hence it should be less time consuming than thesandwich algorithm We will compare the run time of thesealgorithms in the next section through experimentation

The space complexities of these two algorithms are alsoimportant To store the relation 119877 a |119880| times |119881| Boolean matrixis needed

5 Experiments on Two Real-World Data Sets

The main purpose of our experiments is to answer thefollowing questions

(1) Does the backward algorithm outperform the sand-wich algorithm

(2) How does the number of rules change for differentnumber of objects

(3) How does the algorithm run time change for differentnumber of objects

(4) How does the number of rules vary for differentthresholds

(5) How does the performance of cold-start recommen-dation vary for the training and testing sets

51 Data Sets We collected two real-world data sets forexperimentation One is course selection and the other ismovie rating These data sets are quite representative forapplications

511 A Course Selection Data Set The course selectionsystem often serves as an example in textbooks to explainthe concept of many-to-many entity-relationship diagramsHence it is appropriate to produce meaningful granularassociation rules and test the performance of our algorithmWe obtained a data set from the course selection system ofMinnan Normal University (The authors would like to thankMrs Chunmei Zhou for her help in the data collection)Specifically we collected data during the semester between2011 and 2012 There are 145 general education courses in theuniversity 9654 students took part in course selection Thedatabase schema is as follows

(i) Student (student ID name gender birth-yearpolitics-status grade department nationality andlength of schooling)

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

8 Mathematical Problems in Engineering

(ii) Course (course ID credit class-hours availabilityand department)

(iii) Selects (student ID course ID)

Our algorithm supports only nominal data at this timeFor this data set all data are viewed nominally and directlyIn this way no discretization approach is employed to convertnumeric ones into nominal ones Also we removed studentnames and course names from the original data since theyare useless in generating meaningful rules

512 A Movie Rating Data Set The MovieLens data setassembled by the GroupLens project is widely used in recom-mender systems (see eg [15 16] (GroupLens is a research labin the Department of Computer Science and Engineering atthe University of Minnesota (httpwwwgrouplensorg)))We downloaded the data set from the Internet MovieDatabase (httpwwwmovielensorg) Originally the dataset contains 100000 ratings (1ndash5) from 943 users on 1682movies with each user rating at least 20 movies [15] Cur-rently the available data set contains 1000209 anonymousratings of 3952 movies made by 6040 MovieLens users whojoined MovieLens in 2000 In order to run our algorithm wepreprocessed the data set as follows

(1) Removemovie namesThey are not useful in generat-ing meaningful granular association rules

(2) Use release year instead of release date In this way thegranule is more reasonable

(3) Select the movie genre In the original data themovie genre is multivalued since one movie mayfall in more than one genre For example a moviecan be both animation and childrenrsquos Unfortunatelygranular association rules do not support this typeof data at this time Since the main objective of thiswork is to compare the performances of algorithmswe use a simple approach to deal with this issue thatis to sort movie genres according to the number ofusers they attract and only keep the highest prioritygenre for the current movie We adopt the followingpriority (from high to low) comedy action thrillerromance adventure children crime sci-Fi horrorwarmysterymusical documentary animation west-ern filmnoir fantasy and unknown

Our database schema is as follows

(i) User (user ID age gender and occupation)(ii) Movie (movie ID release year and genre)(iii) Rates (user ID movie ID)

There are 8 user age intervals 21 occupations and 71release years Similar to the course selection data set allthese data are viewed nominally and processed directly Weemploy neither discretization nor symbolic value partition[46 47] approaches to produce coarser granules The genreis a multivalued attribute Therefore we scale it to 18 Booleanattributes and deal with it using the approach proposed in[48]

52 Results We undertake five sets of experiments to answerthe questions proposed at the beginning of this section

521 Efficiency Comparison We compare the efficiencies ofthe backward and the sandwich algorithms We look at onlythe run time of Lines 3 through 11 since these codes are thedifference between two algorithms

For the course selection data set when ms = mt = 006sc = 018 and tc = 011 we obtain only 40 rules For higherthresholds no rule can be obtained Therefore we use thefollowing settings sc = 018 tc = 011 ms = mt and ms isin002 003 004 005 006 Figure 1(a) shows the actual runtime in miniseconds Figure 2(a) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 2 times faster than the sandwichalgorithm it only takes less than 13 operations than thesandwich algorithm

For the MovieLens data set we employ the data setwith 3800 users and 3952 movies We use the followingsettings sc = 015 tc = 014 ms = mt and ms isin

001 002 003 004 005 Figure 1(b) shows the actual runtime in miniseconds Figure 2(b) shows the number of basicoperations including addition and comparison of numbersHere we observe that for different settings the backwardalgorithm is more than 3 times faster than the sandwichalgorithm it only takes less than 14 operations than thesandwich algorithm

522 Change of Number of Rules for Different Data Set SizesNow we study how the number of rules changes with theincrease of the data set size The experiments are undertakenonly on theMovieLens data setWe use the following settingssc = 015 tc = 014 ms isin 001 002 and |U| isin

500 1 000 1 500 2 000 2 500 3 000 3 500 The numberof movies is always 3952 While selecting 119896 users we alwaysselect from the first user to the 119896th user

First we look at the number of concepts satisfying thesource confidence threshold ms According to Figure 3(a)the number of source concepts decreases with the increase ofthe number of users However Figure 3(b) indicates that thistrendmay not hold In fact fromFigure 3 themost importantobservation is that the number of source concepts does notvary much with the change of the number of objects Whenthe number of users is more than 1500 this variation is nomore than 3 which is less than 5 of the total number ofconcepts

Second we look at the number of granular associationrules satisfying all four thresholds Figure 4 indicates thatthe number of rules varies more than the number of sourceconcepts However this variation is less than 20when thereare more than 1500 users or less than 10 when there aremore than 2500 users

523 Change of Run Time for Different Data Set Sizes Welook at the run time changewith the increase of the number ofusers The time complexity of the algorithm is given by (40)Since the number of movies is not changed |SG(mt)| is fixed

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Mathematical Problems in Engineering 9

1

05

0

25

2

15

4

35

3

Run

time (

ms)

002 003 004 005 006

times104

SandwichBackward

ms(mt)

(a)

1000

500

0

1500

Run

time (

ms)

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 1 Run time information (a) course selection (b) MovieLens (3800 users)

1

05

0

25

2

15

4

35

5

45

3

Basic

ope

ratio

ns

times109

002 003 004 005 006

SandwichBackward

ms(mt)

(a)

04

02

0

1

08

06

16

14

2

18

12

Basic

ope

ratio

ns

times108

001 002 003 004 005

SandwichBackward

ms(mt)

(b)

Figure 2 Basic operations information (a) course selection (b) MovieLens (3800 users)

Moreover according to our earlier discussion |SC(ms)| doesnot vary much for different number of users Therefore thetime complexity is nearly linear with respect to the numberof users Figure 5 validates this analysis

524 Number of Rules for Different Thresholds Figure 6(a)shows the number of rules decreases dramatically with theincrease of 119898119904 and 119898119905 For the course selection data set thenumber of rules would be 0 when ms = mt = 007 For the

MovieLens data set the number of rules would be 0 whenms = mt = 006

525 Performance of Cold-Start Recommendation Now westudy how the performance of cold-start recommendationvaries for the training and testing sets The experiments areundertaken only on the MovieLens data set Here we employtwo data sets One is with 1000 users and 3952 movies andthe other is with 3000 users and 3952 movies We divide

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

10 Mathematical Problems in Engineering

500 1000 1500 2000 2500 3000 3500Users

122

130

126

128

124

136

134

132

Num

ber o

f sou

rce g

ranu

les

(a)

500 1000 1500 2000 2500 3000 3500Users

72

745

74

73

735

725

76

755

75

Num

ber o

f sou

rce g

ranu

les

(b)

Figure 3 Number of concepts on users for MovieLens (a) ms = 001 (b) ms = 002

500 1000 1500 2000 2500 3000 3500Users

1100

1050

950

1000

1250

1200

1150

Num

ber o

f rul

es

(a)

500 1000 1500 2000 2500 3000 3500Users

260

250

230

220

240

290

280

270

Num

ber o

f rul

es

(b)

Figure 4 Number of granular association rules for MovieLens (a) ms = 001 (b) ms = 002

user into two parts as the training and testing set The othersettings are as follows the training set percentage is 60sc = 015 and tc = 014 Each experiment is repeated 30times with different sampling of training and testing sets andthe average accuracy is computed

Figure 7 shows that with the variation of ms and mt therecommendation performs different Now we observe someespecially interesting phenomena from this figure as follows

(1) Figures 7(a) and 7(b) are similar in general As theuser number increases the accuracy of recommenderalso increases The reason is that we get more infor-mation from the training sets to generate much betterrules for the recommendation

(2) When ms and mt are suitably set (003) the recom-mender of both data sets has the maximal accuracyWith the increase or decrease of these thresholds

the performance of the recommendation increases ordecreases rapidly

(3) The performance of the recommendation does notchange much on the training and the testing setsThis phenomenon figures out that the recommenderis stable

53 Discussions Now we can answer the questions proposedat the beginning of this section

(1) The backward algorithm outperforms the sandwichalgorithm The backward algorithm is more than 2times and 3 times faster than the sandwich algorithmon the course selection and MovieLens data setsrespectively Therefore our parametric rough sets ontwo universes are useful in applications

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Mathematical Problems in Engineering 11

1200

1000

8000

600

400

200500 1000 1500 2000 2500 3000 3500

Users

Run

time (

ms)

(a)

500 1000 1500 2000 2500 3000 3500Users

300

200

100

600

500

400

900

800

700

Run

time (

ms)

(b)

Figure 5 Run time on MovieLens (a) ms = 001 (b) ms = 002

2000

1500

1000

500

0

3000

2500

Num

ber o

f rul

es

002 003 004 005 006ms(mt)

(a)

800

600

400

200

0

1200

1000N

umbe

r of r

ules

002001 003 004 005ms(mt)

(b)

Figure 6 Number of granular association rules (a) course selection (b) MovieLens

007

0065

006

0055

008

0075

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(a)

0075

007

0065

006

0085

008

Accu

racy

00150010 0020 0025 0030 0035 0040

TrainingTesting

ms(mt)

(b)

Figure 7 Accuracy of cold-start recommendation on MovieLens (a) 1000 users and 3952 movies (b) 3000 users and 3952 movies

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

12 Mathematical Problems in Engineering

(2) The number of rules does not change much for differ-ent number of objectsTherefore it is not necessary tocollect too many data to obtain meaningful granularassociation rules For example for the MovieLensdata set 3000 users are pretty enough

(3) The run time is nearly linear with respect to thenumber of objectsTherefore the algorithm is scalablefrom the viewpoint of time complexity However weobserve that the relation table might be rather bigtherefore this would be a bottleneck of the algorithm

(4) The number of rules decreases dramatically with theincrease of thresholds ms and mt It is important tospecify appropriate thresholds to obtain useful rules

(5) The performance of cold-start recommendation isstable on the training and the testing sets with theincrease of thresholds ms and mt

6 Conclusions

In this paper we have proposed a new type of parametricrough sets on two universes to deal with the granularassociation rule mining problem The lower approximationoperator has been defined and its monotonicity has beenanalyzed With the help of the new model a backwardalgorithm for the granular association rule mining problemhas been proposed Experimental results on two real-worlddata sets indicate that the new algorithm is significantlyfaster than the existing sandwich algorithmTheperformanceof recommender systems is stable on the training and thetesting sets To sum up this work applies rough set theory torecommender systems and is one step toward the applicationof rough set theory and granular computing In the futurewe will improve our approach and compare the performancewith other recommendation approaches

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is in part supported by the National ScienceFoundation of China under Grant nos 61379089 61379049and 61170128 the Fujian Province Foundation of HigherEducation under Grant no JK2012028 the Key Project ofEducation Department of Fujian Province under Grant NoJA13192 and the Postgraduate Education Innovation Base forComputer Application Technology Signal and InformationProcessing of Fujian Province (no [2008]114 High Educationof Fujian)

References

[1] L Dehaspe H Toivonen and R D King ldquoFinding frequentsubstructures in chemical compoundsrdquo in Proceedings of the4th International Conference on Knowledge Discovery and DataMining pp 30ndash36 1998

[2] B Goethals W Le Page and M Mampaey ldquoMining interestingsets and rules in relational databasesrdquo in Proceedings of the 25thAnnual ACM Symposium on Applied Computing (SAC rsquo10) pp997ndash1001 March 2010

[3] S Dzeroski ldquoMulti-relational data mining an introductionrdquo inSIGKDD Explorations vol 5 pp 1ndash16 2003

[4] VC Jensen andN Soparkar ldquoFrequent item set counting acrossmultiple tablesrdquo in Proceedings of the 4th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining Current Issues andNew Application vol 1805 of Lecture Notes in Computer Sciencepp 49ndash61 2000

[5] F Min Q H Hu and W Zhu ldquoGranular association rules ontwo universes with four measuresrdquo 2012 httparxivorgabs12095598

[6] F Min Q H Hu and W Zhu ldquoGranular association ruleswith four subtypesrdquo in Proceedings of the IEEE InternationalConference on Granular Computing pp 432ndash437 2012

[7] J Hu and C Guan ldquoAn emotional agent model based ongranular computingrdquo Mathematical Problems in Engineeringvol 2012 Article ID 601295 10 pages 2012

[8] T Y Lin ldquoGranular computing on binary relations I datamining and neighborhoodsystemsrdquo Rough Sets in KnowledgeDiscovery vol 1 pp 107ndash121 1998

[9] J T Yao and Y Y Yao ldquoInformation granulation for web basedinformation retrieval support systemsrdquo in Proceedings of SPIEvol 5098 pp 138ndash146 April 2003

[10] Y Y Yao ldquoGranular Computing Basic issues and possible solu-tionsrdquo in Proceedings of the 5th Joint Conference on InformationSciences (JCIS rsquo00) vol 1 pp 186ndash189 March 2000

[11] W Zhu and F-Y Wang ldquoReduction and axiomization ofcovering generalized rough setsrdquo Information Sciences vol 152pp 217ndash230 2003

[12] W Zhu ldquoGeneralized rough sets based on relationsrdquo Informa-tion Sciences vol 177 no 22 pp 4997ndash5011 2007

[13] F Min H He Y Qian and W Zhu ldquoTest-cost-sensitiveattribute reductionrdquo Information Sciences vol 181 no 22 pp4928ndash4942 2011

[14] F Min Q Hu and W Zhu ldquoFeature selection with test costconstraintrdquo International Journal of Approximate Reasoning vol55 no 1 pp 167ndash179 2014

[15] A I Schein A Popescul L H Ungar and D M PennockldquoMethods and metrics for cold-start recommendationsrdquo SIGIRForum pp 253ndash260 2002

[16] H Guo ldquoSoap live recommendations through social agentsrdquoin Proceedings of the 5th DELOS Workshop on Filtering andCollaborative Filtering pp 1ndash17 1997

[17] M Balabanovic and Y Shoham ldquoContent-based collaborativerecommendationrdquo Communications of the ACM vol 40 no 3pp 66ndash72 1997

[18] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[19] R Burke ldquoHybrid recommender systems survey and experi-mentsrdquoUserModelling andUser-Adapted Interaction vol 12 no4 pp 331ndash370 2002

[20] A H Dong D Shan Z Ruan L Y Zhou and F Zuo ldquoThedesign and implementationof an intelligent apparel recommendexpert systemrdquoMathematical Problemsin Engineering vol 2013Article ID 343171 8 pages 2013

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Mathematical Problems in Engineering 13

[21] M Pazzani and D Billsus ldquoContent-based recommendationsystemsrdquo inThe Adaptive Web vol 4321 pp 325ndash341 Springer2007

[22] W Zhu ldquoRelationship among basic concepts in covering-basedrough setsrdquo Information Sciences vol 179 no 14 pp 2478ndash24862009

[23] Y Y Yao and X F Deng ldquoA granular computing paradigm forconcept learningrdquo in Emerging Paradigms in Machine Learningvol 13 pp 307ndash326 Springer Berlin Germany 2013

[24] J Yao ldquoRecent developments in granular computing a biblio-metrics studyrdquo in Proceedings of the IEEE International Confer-ence onGranular Computing (GRC rsquo08) pp 74ndash79 August 2008

[25] F Aftrati G Das A Gionis H Mannila T Mielikainenand P Tsaparas ldquoMining chains of relationsrdquo in Data MiningFoundations and Intelligent Paradigms vol 24 pp 217ndash246Springer 2012

[26] L Dehaspe and H Toivonen ldquoDiscovery of frequent datalogpatternsrdquo Expert Systems with Applications vol 3 no 1 pp 7ndash36 1999

[27] B Goethals W Le Page and H Mannila ldquoMining associationrules of simple conjunctive queriesrdquo in Proceedings of the 8thSIAM International Conference on Data Mining pp 96ndash107April 2008

[28] W Ziarko ldquoVariable precision rough set modelrdquo Journal ofComputer and System Sciences vol 46 no 1 pp 39ndash59 1993

[29] T J Li and W X Zhang ldquoRough fuzzy approximations on twouniverses of discourserdquo Information Sciences vol 178 no 3 pp892ndash906 2008

[30] G Liu ldquoRough set theory based on two universal sets and itsapplicationsrdquo Knowledge-Based Systems vol 23 no 2 pp 110ndash115 2010

[31] S Wong L Wang and Y Y Yao ldquoInterval structure a frame-work for representinguncertain informationrdquo in Uncertainty inArtificial Intelligence pp 336ndash343 1993

[32] Z Pawlak ldquoRough setsrdquo International Journal of Computer ampInformation Sciences vol 11 no 5 pp 341ndash356 1982

[33] A Skowron and J Stepaniuk ldquoApproximation of relationsrdquoin Proceedings of the Rough Sets Fuzzy Sets and KnowledgeDiscovery W Ziarko Ed pp 161ndash166 1994

[34] Z Pawlak Rough Sets Theoretical Aspects of Reasoning aboutData Kluwer Academic Boston Mass USA 1991

[35] S K MWong L S Wang and Y Y Yao ldquoOnmodelling uncer-tainty with interval structuresrdquo Computational Intelligence vol12 no 2 pp 406ndash426 1995

[36] Y Y Yao and S K M Wong ldquoA decision theoretic frameworkfor approximating conceptsrdquo International Journal of Man-Machine Studies vol 37 no 6 pp 793ndash809 1992

[37] H X Li X Z Zhou J B Zhao and D Liu ldquoAttribute reductionin decision-theoreticrough set model a further investigationrdquoin Proceedings of the Rough Sets and KnowledgeTechnology vol6954 of Lecture Notes in Computer Science pp 466ndash475 2011

[38] D Liu T Li and D Ruan ldquoProbabilistic model criteria withdecision-theoretic rough setsrdquo Information Sciences vol 181 no17 pp 3709ndash3722 2011

[39] X Y Jia W H Liao Z M Tang and L Shang ldquoMinimumcost attribute reductionin decision-theoretic rough set modelsrdquoInformation Sciences vol 219 pp 151ndash167 2013

[40] D Liu Y Yao and T Li ldquoThree-way investment decisionswith decision-theoretic rough setsrdquo International Journal ofComputational Intelligence Systems vol 4 no 1 pp 66ndash74 2011

[41] Y Yao and Y Zhao ldquoAttribute reduction in decision-theoreticrough set modelsrdquo Information Sciences vol 178 no 17 pp3356ndash3373 2008

[42] Z T Gong andB Z Sun ldquoProbability rough setsmodel betweendifferent universes and its applicationsrdquo in Proceedings of the 7thInternational Conference on Machine Learning and Cybernetics(ICMLC rsquo08) pp 561ndash565 July 2008

[43] W Ma and B Sun ldquoProbabilistic rough set over two universesand rough entropyrdquo International Journal of Approximate Rea-soning vol 53 no 4 pp 608ndash619 2012

[44] B SunWMaH Zhao andXWang ldquoProbabilistic fuzzy roughset model overtwo universesrdquo in Rough Sets and Current Trendsin Computing pp 83ndash93 Springer 2012

[45] Y Y Yao and T Y Lin ldquoGeneralization of rough sets usingmodal logicrdquo Intelligent Automation and Soft Computing vol 2pp 103ndash120 1996

[46] X He F Min and W Zhu ldquoA comparative study of dis-cretization approaches for granular association rule miningrdquoin Proceedings of the Canadian Conference on Electrical andComputer Engineering pp 725ndash729 2013

[47] F Min Q Liu and C Fang ldquoRough sets approach to symbolicvalue partitionrdquo International Journal of Approximate Reason-ing vol 49 no 3 pp 689ndash700 2008

[48] FMin andW Zhu ldquoGranular association rules formulti-valueddatardquo in Proceedings of the Canadian Conference on Electricaland Computer Engineering pp 799ndash803 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of