[ieee 2012 third international conference on emerging applications of information technology (eait)...

4
Predicting Annotated HIV-1–Human PPIs using a Biclustering Approach to Association Rule Mining Sumanta Ray * , Anirban Mukhopadhyay * , Ujjawal Maulik , * Department of Computer Science and Engineering, University of Kalyani, Kalyani-741235, West Bengal, India Email: sumanta [email protected], [email protected] Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, West Bengal, India Email: [email protected] Abstract—Discovering novel interactions between HIV-1 and human proteins would greatly contribute to the areas of HIV research. Identification of such interactions leads to a greater insight into drug target prediction. Here we have proposed an association rule mining technique based on biclustering for identifying a set of rules among the human proteins as well as HIV-1 proteins and using those rules some novel interactions are predicted. For prediction both the interaction types and direction of regulation of the interactions, are considered to provide accessible insight into HIV-1 infection. We have studied the biclusters and analyzed the significant GO terms and pathways where the human proteins of the biclusters participate. The predicted rules are further analyzed to discover regulatory relationships between some human proteins in course of HIV-1 infection. Some experimental evidences are collected from recent literature for validating the predicted interactions. I. I NTRODUCTION Human immunodeficiency virus-1 (HIV-1) in acquired im- munodeficiency syndrome (AIDS) relies on human host cell proteins in virtually every aspect of its life cycle. The compu- tational approaches for predicting protein-protein interactions (PPIs) between different organisms (“inter-species predic- tion”), more specifically in virus and the corresponding host proteins are now very important. This helps in development of new therapeutic approaches and design of drugs for these viral diseases. Recently some computational approaches are proposed to predict and analyze some novel interactions be- tween HIV-1 and human proteins. In [1] a random forest classifier integrated with a semi- supervised approach is used for selecting positive interactions in predicting new HIV-1-human PPIs. A structural similarity based approach for predicting HIV-1-human PPIs is proposed in [2]. Recently a biclustering [3] (a clustering [4] technique that is performed on both dimension of the dataset) technique is used to identify significant host-cellular subsystem in [5]. A similar biclustering approach is studied in [6] to find immunodeficiency gateway proteins and their involvement in microRNA regulation. In another study [7], an association rule mining approach is proposed for finding a set of association rules from PPI data and these rules are used for predicting new interactions. In both the studies [6], [7], the interaction types and regulation direction are not considered in finding the bicliques. With this observation we use an association rule mining approach for finding a set of rules considering both the interaction types and the direction of regulations. We use Bi- nary inclusion-Maximal (BiMax) biclustering algorithm [8] for identifying maximal biclusters from the input binary matrix. II. METHODS In this section we describe the proposed approach. A. Preparation of the HIV-1-human PPI Bipartite Network The HIV-1-human PPI dataset published in [9] consists of total 5127 interactions between 19 HIV-1 proteins and 1432 human proteins. For each interaction there is an associated interaction type. We broadly divide all the interaction types in three classes: regulating (direction is from viral to host proteins), undirected (no direction) and regulated by (direction is from host to viral proteins). For example ‘activate’ belongs to class 1 (regulating), ‘downregulated by’ is in class 3 (reg- ulated by), whereas ‘requires/associates with’ is regarded as undirected interaction type. Here we find 69 unique interaction types (among them 34 are in class 1, 26 are in class 2 and the remaining 9 are in class 3). We filter out the interactions by annotating each human protein with its corresponding interaction type considering the two classes (regulating and undirected) of interactions and get 2564 annotated human proteins. Here we consider only forward direction (from viral to host) and undirected regulations as these are important for getting valuable information about the regulation mechanism of human proteins. We construct a binary matrix of human and viral proteins, of size 2564 × 19 in which an entry of ‘1’ denotes the presence of ‘regulating’ interaction between the corresponding pair of human and HIV-1 proteins, and an entry of ‘0’ represents the absence of any information regarding the interaction of the corresponding human and viral proteins. An entry ‘X’ represents the presence of an ‘undirected’ interaction between the corresponding pair of human and HIV-1 proteins. B. Finding Association Rules In data mining, association rule mining (ARM) is a pop- ular and well researched method for discovering interest- ing relations between variables and showing attribute-value associations that occur frequently in large databases. The problem of association rule mining is defined as follows: Let I = {i 1 ,i 2 ,...i n } be a set of n items and X be an itemset where X I . Let T = {(t 1 ,X 1 ), (t 2 ,X 2 ),... (t m ,X m )} be a 2012 Third International Conference on Emerging Applications of Information Technology (EAIT) 978-1-4673-1827-3/12/$31.00 ©2012 IEEE 28

Upload: ujjawal

Post on 07-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2012 Third International Conference on Emerging Applications of Information Technology (EAIT) - Kolkata, West Bengal, India (2012.11.30-2012.12.1)] 2012 Third International Conference

Predicting Annotated HIV-1–Human PPIs using aBiclustering Approach to Association Rule Mining

Sumanta Ray∗, Anirban Mukhopadhyay∗, Ujjawal Maulik†,∗Department of Computer Science and Engineering, University of Kalyani, Kalyani-741235, West Bengal, India

Email: sumanta [email protected], [email protected]†Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, West Bengal, India

Email: [email protected]

Abstract—Discovering novel interactions between HIV-1 andhuman proteins would greatly contribute to the areas of HIVresearch. Identification of such interactions leads to a greaterinsight into drug target prediction. Here we have proposedan association rule mining technique based on biclustering foridentifying a set of rules among the human proteins as well asHIV-1 proteins and using those rules some novel interactions arepredicted. For prediction both the interaction types and direction ofregulation of the interactions, are considered to provide accessibleinsight into HIV-1 infection. We have studied the biclustersand analyzed the significant GO terms and pathways wherethe human proteins of the biclusters participate. The predictedrules are further analyzed to discover regulatory relationshipsbetween some human proteins in course of HIV-1 infection. Someexperimental evidences are collected from recent literature forvalidating the predicted interactions.

I. INTRODUCTION

Human immunodeficiency virus-1 (HIV-1) in acquired im-munodeficiency syndrome (AIDS) relies on human host cellproteins in virtually every aspect of its life cycle. The compu-tational approaches for predicting protein-protein interactions(PPIs) between different organisms (“inter-species predic-tion”), more specifically in virus and the corresponding hostproteins are now very important. This helps in developmentof new therapeutic approaches and design of drugs for theseviral diseases. Recently some computational approaches areproposed to predict and analyze some novel interactions be-tween HIV-1 and human proteins.

In [1] a random forest classifier integrated with a semi-supervised approach is used for selecting positive interactionsin predicting new HIV-1-human PPIs. A structural similaritybased approach for predicting HIV-1-human PPIs is proposedin [2]. Recently a biclustering [3] (a clustering [4] techniquethat is performed on both dimension of the dataset) techniqueis used to identify significant host-cellular subsystem in [5].A similar biclustering approach is studied in [6] to findimmunodeficiency gateway proteins and their involvement inmicroRNA regulation. In another study [7], an association rulemining approach is proposed for finding a set of associationrules from PPI data and these rules are used for predictingnew interactions. In both the studies [6], [7], the interactiontypes and regulation direction are not considered in findingthe bicliques.

With this observation we use an association rule miningapproach for finding a set of rules considering both the

interaction types and the direction of regulations. We use Bi-nary inclusion-Maximal (BiMax) biclustering algorithm [8] foridentifying maximal biclusters from the input binary matrix.

II. METHODS

In this section we describe the proposed approach.

A. Preparation of the HIV-1-human PPI Bipartite Network

The HIV-1-human PPI dataset published in [9] consists oftotal 5127 interactions between 19 HIV-1 proteins and 1432human proteins. For each interaction there is an associatedinteraction type. We broadly divide all the interaction typesin three classes: regulating (direction is from viral to hostproteins), undirected (no direction) and regulated by (directionis from host to viral proteins). For example ‘activate’ belongsto class 1 (regulating), ‘downregulated by’ is in class 3 (reg-ulated by), whereas ‘requires/associates with’ is regarded asundirected interaction type. Here we find 69 unique interactiontypes (among them 34 are in class 1, 26 are in class 2 andthe remaining 9 are in class 3). We filter out the interactionsby annotating each human protein with its correspondinginteraction type considering the two classes (regulating andundirected) of interactions and get 2564 annotated humanproteins. Here we consider only forward direction (from viralto host) and undirected regulations as these are important forgetting valuable information about the regulation mechanismof human proteins. We construct a binary matrix of humanand viral proteins, of size 2564 × 19 in which an entry of ‘1’denotes the presence of ‘regulating’ interaction between thecorresponding pair of human and HIV-1 proteins, and an entryof ‘0’ represents the absence of any information regarding theinteraction of the corresponding human and viral proteins. Anentry ‘X’ represents the presence of an ‘undirected’ interactionbetween the corresponding pair of human and HIV-1 proteins.

B. Finding Association Rules

In data mining, association rule mining (ARM) is a pop-ular and well researched method for discovering interest-ing relations between variables and showing attribute-valueassociations that occur frequently in large databases. Theproblem of association rule mining is defined as follows: LetI = {i1, i2, . . . in} be a set of n items and X be an itemsetwhere X ⊂ I . Let T = {(t1, X1), (t2, X2), . . . (tm, Xm)} be a

2012 Third International Conference on Emerging Applications of Information Technology (EAIT)

978-1-4673-1827-3/12/$31.00 ©2012 IEEE 28

Page 2: [IEEE 2012 Third International Conference on Emerging Applications of Information Technology (EAIT) - Kolkata, West Bengal, India (2012.11.30-2012.12.1)] 2012 Third International Conference

set of m transactions, where ti and Xi, i = 1, 2, . . . ,m are thetransaction identifier and the associated itemset respectively.The support of an itemset X is the number of transactionswhere all the items in X appear. An itemset is called frequentif its support is greater than some threshold min sup. Theconfidence of an Association Rule (AR) of the form P ⇒ Q,P∩Q = ϕ, P

∪Q = X obtained from an itemset X is

defined as the ratio of the support of X to the support ofP . Formally the ARM problem can be defined as follows:find the set of all rules R of the form P ⇒ Q such thatP∪Q is a frequent itemset and the confidence of P ⇒ Q is

greater than a threshold min conf . The concept of frequentclosed itemset [10], which is a condensed representation of allfrequent itemsets, is defined to avoid redundancy. An itemsetis called closed itemset if none of its proper supersets havethe same support value. Finding the set of frequent itemsets isequivalent to find a set of all-1 biclusters each having at leastmin sup number of rows [7]. BiMax generates all maximalbiclusters and as the columns of maximal biclusters representa closed itemset, so all extracted biclusters satisfying min supcondition provide the set of frequent closed itemsets.

Here the rows of the binary matrix represent the viral pro-teins and the columns represent the annotated human proteins.Each row (viral protein) has been considered as a transactionand each column (human protein) represents an item. An itemis purchased by a transaction if the corresponding value inthe matrix is ‘1’ or ‘X’ and is interpreted as follows: with aviral protein some of the human proteins are associated withspecific type of interaction. Here a maximal all-1 biclusterwith a given min sup value is equivalent to a frequent closeditemset. BiMax algorithm is utilized for finding the maximalbiclusters and these biclusters are treated as maximal frequentclosed itemset for finding the association rules.

Here the rules may be of type:

[HP1 upregulates,HP2 activates]

⇒ [{HP3,HP4} activates,HP5 downregulates]

This may be interpreted as follows: if the human protein HP1is upregulated, HP2 is activated by some set of viral proteinsthen there is a high chance of activation of the two proteinsHP3 and HP4 and downregulation of the protein HP5 by thesame set of viral proteins.

C. Predicting New Interactions

From the extracted association rules we predict somenovel interactions associated with interaction types, betweenHIV-1 and human proteins. Consider a frequent closeditemset consists of annotated human proteins as follows:HP1 f1, . . . ,HP5 f5, where each fi denotes the interactiontype tagged with each of these human protein. Suppose a ruleconstructed from those proteins is as follows:

[HP1 f1, HP2 f2,HP3 f3] ⇒ [HP4 f4,HP5 f5]

In this scenario we further assume that the proteins HP1 f1,. . . , HP5 f5 form a biclique with 3 viral proteins V 1, V 2,and V 3 (in other words we can say that the support count for

Fig. 1. An example of prediction process from the association rules.

this frequent itemset is 3) shown in Figure 1. Now without lossof generality suppose the proteins in the antecedent of the ruleform another biclique with 4 viral proteins: V 1, V 2, V 3, andV 4. So the confidence of this rule is 3/4 or 75%. From thisobservation we can predict that viral protein V 4 is also likelyto interact with HP4 f4 and HP5 f5 and confidence of thisprediction is 75%. Figure 1 describes the whole scenario.

III. RESULTS

In this section we analyze the predicted biclusters orbicliques and study the biological relevance of the humanproteins constituting those bicliques. After that we show theassociation rules that are generated from those biclusters. Wealso show some novel predicted interactions and find out theevidences from recent literature that strengthen our prediction.

A. Analysis of Obtained Bicliques and Predicted Rules

We find 17 biclusters keeping minimum number of viralproteins (or, min sup value) as 4 and minimum numberof human proteins (or, minimum number of items) as 2 inour constructed binary matrix. These are shown in Table I.Columns 4, 5 and 6 represent the most significant GO-terms,GO-id and the corresponding p-value of three broadly clas-sified GO categories: biological process, molecular functionand cellular component respectively. We also find significantKEGG pathways for the human proteins participating in eachbicluster. In Table I the first biclique consists of 12 humanproteins that belong to the T cell receptor signaling pathwaywhich plays a key role in human immune system. Humanproteins in bicliques 3 and 4 also belong to the same signalingpathway. Human proteins in biclique 5 are affected by twoglycoproteins GP120 and GP160, Transactivating protein (Tat)and accessory protein Vpr of HIV-1 virus that may lead toColorectal cancer. The human proteins in biclique 6 interactwith 4 HIV-1 proteins (2 envelop protein, Nef and Tat)and are involved in Cytokine-cytokine receptor interactionpathway. Human proteins in some bicliques are involved inGraft-versus-host disease in which a lethal complication ofallogeneic hematopoietic stem cell transplantation (HSCT) isnoticed where immunocompetent donor T cells attack the ge-netically disparate host cells. The proteins in bicliques 11, 14,17 are involved in the pathway Amyotrophic lateral sclerosis(ALS) which is caused by progressive, lethal, degenerativedisorder of motor neurons. It is established that HIV causesdiverse disorders of the brain, spinal cord and peripheral

29

Page 3: [IEEE 2012 Third International Conference on Emerging Applications of Information Technology (EAIT) - Kolkata, West Bengal, India (2012.11.30-2012.12.1)] 2012 Third International Conference

TABLE ITHE SIGNIFICANT GO TERMS AND ID AND KEGG PATHWAYS FOUND IN THE BICLIQUES

Biclique HIV protein Human protein GO term (bp) GO term (cc) GO term (mf) KEGG pathway1 Tat Vpr env gp120 matrix BCL2 CASP3 TP53 IFNG IFNG IL10 IL2

IL6 MAPK1 NFKB1 PARP1 FOS JUNTNF

regulation of apoptosis(GO: 0042981)(3.1E-11)

nucleoplasm(GO:0005654)(6.8E-5)

promoter binding(GO:0010843)(1.7E-5)

T cell receptor signaling pathway(1.2E-9)

2 Nef Tat env gp120 env gp160 BCL2 ICAM1 IFNG IL1B IL2 IL6MAPK1 MAPK3 FOS JUN

positive regulation of nitrogencompound metabolic process(GO: 0051173)(1.8E-8)

extracellular space(GO:0005615)(8.3E-4)

cytokine activity(GO:0005125)(2.6E-4)

Toll-like receptor signaling path-way(3.3E-7)

3 Nef Vpr env gp120 env gp160 CD4 BCL2 IFNG IL2 IL6 MAPK14MAPK1 FOS JUN

positive regulation of macro-molecule metabolic process(GO:0010604)(2.5E-10)

nucleoplasm(GO:0005654)(1.4E-2)

protein dimerization activity(GO:0046983)(3.5E-3)

T cell receptor signaling pathway(2.2E-9)

4 Nef Tat Vpr env gp120 env gp160 BCL2 IFNG IL2 IL6 MAPK1 FOS JUN positive regulation of macro-molecule metabolic process(GO:0010604)( 6.4E-8)

extracellular space(GO:0005615)(3.7E-2)

cytokine activity(GO:0005125)(3.2E-3)

T cell receptor signaling pathway(2.8E-6)

5 Tat Vpr env gp120 env gp160 BCL2 CYCS IFNG IL2 IL6 MAPK1 FOSJUN

regulation of apoptosis( GO:0042981)(2.9E-7)

protein phosphatase type 2Acomplex(GO:0000159)(1.1E-2)

cytokine activity(GO:0005125)(4.5E-3)

Colorectal cancer(2.3E-6)

6 Nef Tat env gp120 env gp41 CCL5 IFNG IL1B IL10 IL2 IL2RA IL6TNF

leukocyte migration(GO:0050900)(2.3E-11)

extracellular space(GO:0005615)(1.6E-7)

cytokine activity(GO:0005125)(7.4E-11)

Cytokine-cytokine receptor inter-action(8.9E-10)

7 Nef Tat Vpr env gp120 env gp41 IFNG IL10 IL2 IL6 TNF regulation of immunoglobulinproduction(GO:0002637)(1.1E-11)

extracellular space(GO:0005615)(8.2E-6)

cytokine activity(GO:0005125)(4.9E-8)

Allograft rejection(1.3E-6)

8 Tat env gp120 env gp160 env gp41 IL1A IL1B IL2 IL6 LCK positive regulation of proteintransport(GO:0051222)(4.6E-7)

extracellular space(GO:0005615)(5.9E-4)

cytokine activity(GO:0005125)(1.3E-5)

Graft-versus-host disease(1.7E-6)

9 Nef Tat env gp120 matrix CCL3 IFNG IL6 TNF positive regulation of proteinamino acid phosphorylation(GO:0001934)(1.8E-9)

extracellular space(GO:0005615)(8.2E-6)

cytokine activity(GO:0005125)(4.9E-8)

Allograft rejection(1.3E-6)

10 Nef Tat Vpr env gp120 env gp41 ma-trix

IFNG IL6 TNF regulation of chemokine biosyn-thetic process( GO:0045073)(4.9E-7)

extracellular space(GO:0005615)(2.9E-3)

cytokine activity(GO:0005125)(2.2E-4)

Graft-versus-host disease(5.7E-5)

11 Tat Vpr env gp120 retropepsin BCL2 CASP3 CYCS PARP1 B cell homeostasis(GO:0001782)(2.4E-3)

protein phosphatase type 2Acomplex(GO:0000159)(4.7E-3)

not found Amyotrophic lateral sclerosis(ALS)(3.2E-4)

12 Tat Vpr env gp120 matrix CCL3 IFNG IL6 TNF regulation of chemokine biosyn-thetic process( GO:0045073)(1.5E-6)

extracellular space(GO:0005615)(1.5E-4)

cytokine activity(GO:0005125)(3.3E-6)

Cytokine-cytokine receptor inter-action( 1.4E-4)

13 Nef env gp120 env gp160 env gp41 CD4 IL1B IL2 IL6 positive regulation of T cell acti-vation( GO:0050870)(1.7E-7)

extracellular space(GO:0005615)(8.3E-3)

growth factor activity(GO:0008083)(4.5E-4)

Graft-versus-host disease(1.7E-4)

14 Nef Tat Vpr env gp120 retropepsin BCL2 CASP3 PARP1 B cell homeostasis(GO:0050870)(1.7E-7)

nuclear envelope( GO:0005635)(3.2E-2)

transcription factor binding(GO:0008134)(7.7E-2)

Amyotrophic lateral sclerosis(ALS)(2.1E-2)

15 Nef Tat Vpr env gp120 env gp160env gp41

IL2 IL6 positive regulation ofimmunoglobulin secretion(GO:0051024)(3.7E-4)

extracellular space(GO:0005615)(5.4E-2)

growth factor activity(GO:0008083)(1.2E-2)

Graft-versus-host disease(7.7E-3)

16 Nef Vpr Vpu env gp120 CD4 CASP3 NFKB1 regulation of T cell activation(GO:0050863)(1.7E-2)

intracellular organelle lumen(GO:0070013)(1.9E-2)

protein homodimerization activ-ity(GO:0042803)(5.1E-2)

Epithelial cell signaling in Heli-cobacter pylori infection(2.7E-2)

17 Tat Vpr env gp120 env gp160retropepsin

BCL2 CYCS positive regulation of catalytic ac-tivity(GO:0043085)(3.8E-2)

protein phosphatase type 2Acomplex(GO:0000159)(1.6E-3 )

not foune Amyotrophic lateral sclerosis(ALS)(1.0E-2)

nerves and could be a risk factor for either amyotrophic lateralsclerosis (ALS) itself or other motor neuron diseases [11].

We predict a total of 46 rules from the biclusters andfilter out those which have confidence level less than 80%.Figure 2 shows the rules. All the rules are important forgetting valuable information about the regulation mechanismof human proteins. A proper analysis of these rules revealsthe interdependence of the regulation mechanism of a set ofproteins constituting a rule.

B. Predicted Interactions

From the biclusters of the binary matrix we predict somehighly confident interactions between HIV-1 and human pro-teins. We also analyze the biological relevance of those interac-tions and conduct a literature survey to establish experimentalevidence supporting our predicted interactions. For findingthe experimental evidences of our predicted interactions weextensively search PUBMED for finding some recent reportsdescribing predicted interactions. Among the predicted 46 in-teractions between HIV-1 and human proteins, 26 interactions

are found to be experimentally validated and these are shownin Table II with corresponding PUBMED ids.

Here we predict 9 human proteins that interact with HIV-1protein Tat with specific interaction types. In row 2 of TableII we predict the downregulation of human protein Interleukin2 (IL2) by HIV-1 protein Tat. Tat induces IL2 secretion andthis is due to Tat-enhanced IL-2 promoter activation. In [12]the cause of enhanced IL-2 secretion is investigated and it isfound that the HIV Tat induces this effect. We also predictthat Tat activates caspase-3 (CASP3) and caspase-9 (CASP9).In [13] it has been found that Tat activated both caspase-3 andendonuclease-G, a caspase-independent effector of apoptosis.We predict upregulation of human protein Interleukin 6 (IL6)by HIV-1 proteins Tat and Nef. Tat induced the production ofhuman interleukin-6 (huIL-6) and its receptor (huIL-6Ra) andactivated STAT3 signaling [13].

Our prediction also includes 10 human proteins that in-teract with (activates-5, downregulates-3, upregulates-2) HIV-1 protein Nef. We are able to find PUBMED ids of somerecent articles indexed in PUBMED that also agree with these

30

Page 4: [IEEE 2012 Third International Conference on Emerging Applications of Information Technology (EAIT) - Kolkata, West Bengal, India (2012.11.30-2012.12.1)] 2012 Third International Conference

Fig. 2. Predicted rules generated from the biclusters

predicted interactions.Our prediction also includes other HIV-1 proteins like Vpr,

matrix, Vpu, Envelop glycoprotein-120, and 160 that interactwith some human proteins associated with specific interactiontypes. We noticed here that there are some human proteinsassociated with a specific interaction type to interact with morethan one viral proteins. For example both viral protein Tat andEnv gp120 are responsible for downregulation of CD4 protein.BCL2 is also downregulated by Tat as well as Vpu.

TABLE IIPREDICTED INTERACTIONS FOUND FROM BICLUSTERS

Sl.No.

HIV-1Protein

HumanProtein

Interaction Types Pubmed Id

1 Tat CD4 DOWNREGULATES 22421574, 223421812 Tat IL2 DOWNREGULATES 20728522, 113856243 Tat MAPK14 ACTIVATES 203785504 Tat CASP9 ACTIVATES 115096215 Tat CASP3 ACTIVATES 175059786 Tat IL6 UPREGULATES 17151125, 91694587 Tat CD4 INTERACTS WITH 124579878 Tat PARP1 INDUCES CLEAVAGE OF 154987769 Tat BCL2 DOWNREGULATES 1199428010 Nef JUN ACTIVATES 1241980511 Nef FOS ACTIVATES 20068037, 1038855512 Nef MAPK1 ACTIVATES 2173858413 Nef LCK ACTIVATES 1684933014 Nef CASP3 ACTIVATES 1112327915 Nef IFNG DOWNREGULATES 2185811716 Nef BCL2 DOWNREGULATES 1585802117 Nef CCL3 DOWNREGULATES 2001599518 Nef IL12B UPREGULATES 1901982419 Nef IL6 UPREGULATES 11519483, 879920820 matrix IL10 UPREGULATES 1817861121 matrix IL1B UPREGULATES 1859376022 matrix IL2 DOWNREGULATES 2148282623 env gp120 CASP3 ACTIVATES 1633053024 env gp120 CD4 DOWNREGULATES 2222666825 env gp160 TNF UPREGULATES 893857426 Vpu BCL2 DOWNREGULATES 11696595

IV. CONCLUSIONS

Here we presented the problem of identifying novel inter-actions between HIV-1 and human proteins as an associationrule mining problem based on BiMax biclustering algorithm.For predicting new interactions we consider the directionof regulation and the types of the interactions as reportedin the HIV-1-human interaction database. For validating thepredicted interactions some evidences from recent literatureare collected to establish the fact that many of our predictedinteractions already exist in reality. We also performed a geneontology based study on the predicted bicliques and foundsome significant pathways. Considering the regulation direc-tion we predicted association rules at certain confidence levelsand illustrated the general meaning of those types of rules.In this article we do not consider the direction of regulationfrom host to viral proteins. Considering interaction types inthis direction may produce a valuable response regardingthe immune response of human proteins under certain HIV-1 attack. We suggest this as a future work.

ACKNOWLEDGMENT

AM acknowledges the support from DST PURSE scheme.UM acknowledges the support from “Mobile Computing andInnovative Applications” under UPE - Phase II.

REFERENCES

[1] O. Tastan et. al., “Semi-supervised multi-task learning for predicting in-teractions between HIV-1 and Human proteins,” Bioinformatics, vol. 26,2010.

[2] J. Doolittle et. al., “Structural similarity-based predictions of proteininteractions between HIV-1 and homo sapiens,” Virology, vol. 7, 2010.

[3] U. Maulik et. al., “Finding multiple coherent biclusters in microarraydata using variable string length multiobjective genetic algorithm,” IEEETransactions on Information Technology in Biomedicine, vol. 13, no. 6,pp. 969–975, 2009.

[4] A. Mukhopadhyay et. al., “Combining pareto-optimal clusters usingsupervised learning for identifying co-expressed genes,” BMC Bioin-formatics, vol. 10, no. 7, 2009.

[5] J. MacPherson et. al., “Patterns of HIV-1 protein interaction identifyperturbed host-cellular subsystems,” PLoS Comput Bio, vol. 6, 2010.

[6] U. Maulik et. al., “Identifying the immunodeficiency gateway proteinsin humans and their involvement in microrna regulation,” Mol BioSyst,vol. 7, pp. 1842–1851, 2011.

[7] A. Mukhopadhyay et. al., “A novel biclustering approach to associationrule mining for predicting HIV-1Human protein interactions,” PLoSONE, vol. 7, no. 4, p. e32289, 2012.

[8] A. Prelic et.al, “A systematic comparison and evaluation of biclusteringmethods for gene expression data,” Bioinformatics, vol. 22, pp. 1122–1129, 2006.

[9] W. Fu et. al., “Human immunodeficiency virus type 1, human proteininteraction database at ncbi.” Nucleic Acids Research (Database Issue),vol. 37, pp. D417–D422, 2009.

[10] N. Pasquier et. al., “Discovering frequent closed itemsets for associationrules,” in In: Proc. 7th International Conference on Database Theory(ICDT-99)., 1999, pp. 398–416.

[11] L. Rowland, “Hiv-related neuromuscular diseases: nemaline myopathy,amyotrophic lateral sclerosis and bibrachial amyotrophic diplegia,” ActaMyol., pp. 29–31, 2011.

[12] A. Ehret et. al., “The effect of HIV-1 regulatory proteins on cellulargenes: derepression of the IL-2 promoter by Tat,” Eur. J. Immunol.,vol. 31, no. 6, pp. 1790–1799, Jun 2001.

[13] T. Zhao et. al., “Silencing the PTEN gene is protective against neu-ronal death induced by human immunodeficiency virus type 1 Tat,” J.Neurovirol., vol. 13, no. 2, pp. 97–106, Apr 2007.

31