k. sekar, ph.d.. dr. k. sekar bioinformatics centre supercomputer education and research centre...
TRANSCRIPT
![Page 1: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/1.jpg)
K. SEKAR, Ph.D.
![Page 2: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/2.jpg)
Dr. K. SekarBioinformatics Centre
Supercomputer Education and Research CentreIndian Institute of Science
Bangalore 560 012INDIA
E-mail: [email protected]
Voice: +91-080-3601409 or +91-080-2932469Fax : +91-080-3600683 or +91-080-3600551
![Page 3: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/3.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
APPROACHES TO DEVELOPING
DATA MINING TOOLS
![Page 4: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/4.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
AbstractBioinformatics is one of the fastest growing interdisciplinary areas in the biological sciences and has explored in such a way that we need powerful tools to organize and analyze the data. An overview will be presented on the general features of data mining tools, techniques and its applications
![Page 5: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/5.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Bioinformatics is the fashionable new name for the field previously called computational biology.The name is preferred by many because it puts the emphasis on the data storage and analysis, rather than on the biology, and the field is really data driven
![Page 6: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/6.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
The term Bioinformatics is used to encompass almost all computer applications in biological sciences, but was originally coined in the mid 1980’s for the analysis of biological sequence dataThe quantity of known sequences data outweighs protein structural data and by virtue of the genome projects, sequence database are doubling in size every yearA key challenge of bioinformatics is to analyze the wealth of sequence data in order to understand the amassed information in term of protein structure function and evolutionWherever possible, a range of different methods should be used, and the results should be married with all available biological information
![Page 7: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/7.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
The primary integrating technology that facilitates access to copious data is the world wide web
Bioinformatics has provided us with a communication channel to reach and decode all this information in a comprehensive manner
Both the large information repositories and the specialized tools to query them are held on distributed internet sites, therefore Bioinformatics require sound internet navigation skills
![Page 8: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/8.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Comprises the entire collection of information management systems, analysis tools and communication networks supporting biology
Refers to database-like activities involving persistent sets of data that are maintained in a consistent state over essentially indefinite periods of timeEncompass the use of algorithmic tools to facilitate biological database analyses
![Page 9: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/9.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DATA MININGDatamining is defined as “exploration and analysis by automatic and semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules”
![Page 10: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/10.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
The central challenge is to derive maximum results from the wealth of data.This can be achieved by establishing and maintaining databases and providing search and analysis tools to interpret the data
![Page 11: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/11.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DATABASEDatabase is nothing but a collection of quantitative data resulting from experimental measurements or observations in various fields of science.Recently interest in database has been kindled through international efforts to organize and analyze the data and update the knowledge
![Page 12: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/12.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
A database is essentially just a store of information.They are usually in the form of simple files (just a flat file, say).You can shove information into this store or retrieve it from the store
![Page 13: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/13.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Derived DatabaseOne of the greatest challenges in database research is analyze the database in depth and create derived databases to meet the needs or demands without compromising the sustainability and quality of the existing database. Creating desired database is expected is expected to dramatically reduce the workload of the user community and will serve as a highly focused database
![Page 14: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/14.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DBREF 1UNE 1 123 SWS P00593 PA2_BOVIN 23 145 SEQADV 1UNE ASN 122 SWS P00593 LYS 144 CONFLICT SEQRES 1 123 ALA LEU TRP GLN PHE ASN GLY MET ILE LYS CYS LYS ILE SEQRES 2 123 PRO SER SER GLU PRO LEU LEU ASP PHE ASN ASN TYR GLY SEQRES 3 123 CYS TYR CYS GLY LEU GLY GLY SER GLY THR PRO VAL ASP SEQRES 4 123 ASP LEU ASP ARG CYS CYS GLN THR HIS ASP ASN CYS TYR SEQRES 5 123 LYS GLN ALA LYS LYS LEU ASP SER CYS LYS VAL LEU VAL SEQRES 6 123 ASP ASN PRO TYR THR ASN ASN TYR SER TYR SER CYS SER SEQRES 7 123 ASN ASN GLU ILE THR CYS SER SER GLU ASN ASN ALA CYS SEQRES 8 123 GLU ALA PHE ILE CYS ASN CYS ASP ARG ASN ALA ALA ILE SEQRES 9 123 CYS PHE SER LYS VAL PRO TYR ASN LYS GLU HIS LYS ASN SEQRES 10 123 LEU ASP LYS LYS ASN CYS HET CA 124 1 HETNAM CA CALCIUM ION FORMUL 2 CA CA1 2+ FORMUL 3 HOH *134(H2 O1) HELIX 1 1 LEU 2 LYS 12 1 11 HELIX 2 2 PRO 18 ASP 21 1 4 HELIX 3 3 ASP 40 LYS 57 1 18 HELIX 4 4 ASP 59 VAL 63 1 5 HELIX 5 5 ALA 90 LYS 108 1 19 HELIX 6 6 LYS 113 HIS 115 5 3 SHEET 1 A 2 TYR 75 SER 78 0 SHEET 2 A 2 GLU 81 CYS 84 -1 N THR 83 O SER 76 SSBOND 1 CYS 11 CYS 77 SSBOND 2 CYS 27 CYS 123 SSBOND 3 CYS 29 CYS 45 SSBOND 4 CYS 44 CYS 105 SSBOND 5 CYS 51 CYS 98 SSBOND 6 CYS 61 CYS 91 SSBOND 7 CYS 84 CYS 96 LINK CA CA 124 O TYR 28 LINK CA CA 124 O GLY 32 CRYST1 47.120 64.590 38.140 90.00 90.00 90.00 P 21 21 21 4
![Page 15: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/15.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
SUB-DERIVED DATABASEEXAMPLE-1 XXXXXSEKAR
RADHASEKAR SHAMIASEKAR SARADASEKAR
EXAMPLE-2
XAXAXA
KAMALA SARADA YAMAHA KANAGA MANASA VANASA PANAMA
![Page 16: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/16.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Adding information
to the database
Software tocollate the required
Information from the database
Analyze the collated information
![Page 17: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/17.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
![Page 18: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/18.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
WHY A TOOL?
The amount of information in the world is growing exponentially, and it is becoming impossible to effectively manage the data.Machine assistance is clearly necessary, but the difficulty lies in designing systems and softwares that are capable of discovering “useful” information with minimal human intervention
![Page 19: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/19.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
PROTEIN DATA BANK(PDB)
GENOME DATABASE(GDB)
STRUCTURAL CLASSIFICATION OF PROTEINS(SCOP)
CAMBRIDGE STRUCTURAL DATABASE(CSD)
![Page 20: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/20.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Given PDB-Id : 1une
HEADER HYDROLASE 05-NOV-97 1UNE TITLE CARBOXYLIC ESTER HYDROLASE, 1.5 ANGSTROM ORTHORHOMBIC FORM TITLE 2 OF THE BOVINE RECOMBINANT PLA2 COMPND MOL_ID: 1; COMPND 2 MOLECULE: PHOSPHOLIPASE A2; COMPND 3 CHAIN: NULL; COMPND 4 EC: 3.1.1.4; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: BOS TAURUS; SOURCE 3 ORGANISM_COMMON: BOVINE; SOURCE 4 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 5 EXPRESSION_SYSTEM_STRAIN: BL21 (DE3) PLYSS; SOURCE 6 EXPRESSION_SYSTEM_PLASMID: PTO-A2MBL21; SOURCE 7 EXPRESSION_SYSTEM_GENE: MATURE PLA2 KEYWDS HYDROLASE, ENZYME, CARBOXYLIC ESTER HYDROLASE EXPDTA X-RAY DIFFRACTION AUTHOR M.SUNDARALINGAM REVDAT 1 06-MAY-98 1UNE 0
![Page 21: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/21.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
REMARK 1 REFERENCE 1 REMARK 1 AUTH K.SEKAR,A.KUMAR,X.LIU,M.-D.TSAI,M.H.GELB, REMARK 1 AUTH 2 M.SUNDARALINGAM REMARK 1 TITL CRYSTAL STRUCTURE OF THE COMPLEX OF BOVINE REMARK 1 TITL 2 PANCREATIC PHOSPHOLIPASE A2 WITH A TRANSITION STATE REMARK 1 TITL 3 ANALOGUE REMARK 1 REF TO BE PUBLISHED
REMARK 1 REFN 0353 REMARK 1 REFERENCE 2 REMARK 1 AUTH K.SEKAR,C.SEKARUDU,M.-D.TSAI,M.SUNDARALINGAM REMARK 1 TITL 1.72A RESOLUTION REFINEMENT OF THE TRIGONAL FORM OF REMARK 1 TITL 2 BOVINE PANCREATIC PHOSPHOLIPASE A2 REMARK 1 REF TO BE PUBLISHED REMARK 1 REFN 0353
REMARK 1 REFERENCE 3 REMARK 1 AUTH K.SEKAR,S.ESWARAMOORTHY,M.K.JAIN,M.SUNDARALINGAM REMARK 1 TITL CRYSTAL STRUCTURE OF THE COMPLEX OF BOVINE REMARK 1 TITL 2 PANCREATIC PHOSPHOLIPASE A2 WITH THE INHIBITOR REMARK 1 TITL 3 1-HEXADECYL-3-(TRIFLUOROETHYL)-SN-GLYCERO-2- REMARK 1 TITL 4 PHOSPHOMETHANOL REMARK 1 REF BIOCHEMISTRY V. 36 14186 1997
![Page 22: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/22.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
REMARK 2 RESOLUTION. 1.5 ANGSTROMS. REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.1 REMARK 3 AUTHORS : BRUNGER
REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.5 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 10.0 REMARK 3 DATA CUTOFF (SIGMA(F)) : 1.0 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : 0.1 REMARK 3 DATA CUTOFF LOW (ABS(F)) : 1000000.0 REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 92. REMARK 3 NUMBER OF REFLECTIONS : 17572
REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : NULL REMARK 3 FREE R VALUE TEST SET SELECTION : X-PLOR REMARK 3 R VALUE (WORKING SET) : 0.184 REMARK 3 FREE R VALUE : 0.228 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 7. REMARK 3 FREE R VALUE TEST SET COUNT : 1198 REMARK 3 ESTIMATED ERROR OF FREE R VALUE : 0.24
![Page 23: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/23.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
REMARK 3 PARAMETER FILE 1 : PARHCSDX.PRO REMARK 3 PARAMETER FILE 2 : NULL REMARK 3 TOPOLOGY FILE 1 : TOPHCSDX.PRO REMARK 3 TOPOLOGY FILE 2 : NULL REMARK 3 OTHER REFINEMENT REMARKS: NULL REMARK 4 1UNE COMPLIES WITH FORMAT V. 2.2, 16-DEC-1996 REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : 26-JAN-1996 REMARK 200 TEMPERATURE (KELVIN) : 291 REMARK 200 PH : 7.2 REMARK 200 NUMBER OF CRYSTALS USED : 1 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : N REMARK 200 RADIATION SOURCE : NULL REMARK 200 BEAMLINE : NULL REMARK 200 X-RAY GENERATOR MODEL : R-AXIS IIC REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M REMARK 200 WAVELENGTH OR RANGE (A) : 1.5418 REMARK 200 MONOCHROMATOR : GRAPHITE REMARK 200 OPTICS : NULL REMARK 200
![Page 24: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/24.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 1.5 REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : 1.55 REMARK 200 COMPLETENESS FOR SHELL (%) : 63. REMARK 200 DATA REDUNDANCY IN SHELL : 3.7 REMARK 200 R MERGE FOR SHELL (I) : 0.172 REMARK 200 R SYM FOR SHELL (I) : NULL REMARK 200 FOR SHELL : NULL REMARK 200 REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: THE HIGH RESOLUTION REMARK 200 ATOMIC COORDINATES OF THE WILD TYPE (PDB ENTRY 1BP2) REMARK 200 WERE USED AS THE STARTING MODEL FOR REFINEMENT. REMARK 200 SOFTWARE USED: X-PLOR REMARK 200 STARTING MODEL: WILD TYPE (PDB ENTRY 1BP2) REMARK 200 REMARK 200 REMARK: NULL REMARK 280 REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 21 21 21 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 1/2-X,-Y,1/2+Z REMARK 290 3555 -X,1/2+Y,1/2-Z REMARK 290 4555 1/2+X,1/2-Y,-Z
![Page 25: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/25.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DBREF 1UNE 1 123 SWS P00593 PA2_BOVIN 23 145 SEQADV 1UNE ASN 122 SWS P00593 LYS 144 CONFLICT SEQRES 1 123 ALA LEU TRP GLN PHE ASN GLY MET ILE LYS CYS LYS ILE SEQRES 2 123 PRO SER SER GLU PRO LEU LEU ASP PHE ASN ASN TYR GLY SEQRES 3 123 CYS TYR CYS GLY LEU GLY GLY SER GLY THR PRO VAL ASP SEQRES 4 123 ASP LEU ASP ARG CYS CYS GLN THR HIS ASP ASN CYS TYR SEQRES 5 123 LYS GLN ALA LYS LYS LEU ASP SER CYS LYS VAL LEU VAL SEQRES 6 123 ASP ASN PRO TYR THR ASN ASN TYR SER TYR SER CYS SER SEQRES 7 123 ASN ASN GLU ILE THR CYS SER SER GLU ASN ASN ALA CYS SEQRES 8 123 GLU ALA PHE ILE CYS ASN CYS ASP ARG ASN ALA ALA ILE SEQRES 9 123 CYS PHE SER LYS VAL PRO TYR ASN LYS GLU HIS LYS ASN SEQRES 10 123 LEU ASP LYS LYS ASN CYS HET CA 124 1 HETNAM CA CALCIUM ION FORMUL 2 CA CA1 2+ FORMUL 3 HOH *134(H2 O1) HELIX 1 1 LEU 2 LYS 12 1 11 HELIX 2 2 PRO 18 ASP 21 1 4 HELIX 3 3 ASP 40 LYS 57 1 18 HELIX 4 4 ASP 59 VAL 63 1 5 HELIX 5 5 ALA 90 LYS 108 1 19 HELIX 6 6 LYS 113 HIS 115 5 3 SHEET 1 A 2 TYR 75 SER 78 0 SHEET 2 A 2 GLU 81 CYS 84 -1 N THR 83 O SER 76 SSBOND 1 CYS 11 CYS 77 …
![Page 26: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/26.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : 8 REMARK 3 BIN RESOLUTION RANGE HIGH (A) : 1.5 REMARK 3 BIN RESOLUTION RANGE LOW (A) : 1.55 REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : 63. REMARK 3 REFLECTIONS IN BIN (WORKING SET) : 1176 REMARK 3 BIN R VALUE (WORKING SET) : 0.340 REMARK 3 BIN FREE R VALUE : 0.352 REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : 7. REMARK 3 BIN FREE R VALUE TEST SET COUNT : 81 REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : NULL REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : 957 REMARK 3 NUCLEIC ACID ATOMS : 0 REMARK 3 HETEROGEN ATOMS : 1 REMARK 3 SOLVENT ATOMS : 134 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : NULL REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL REMARK 3 LOW RESOLUTION CUTOFF (A) : NULL REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR.
![Page 27: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/27.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining toolsATOM 1 N ALA 1 13.830 17.835 32.697 1.00 11.41 ATOM 2 CA ALA 1 12.869 16.725 32.889 1.00 11.31 ATOM 3 C ALA 1 12.106 16.547 31.592 1.00 12.00 ATOM 4 O ALA 1 12.366 17.226 30.614 1.00 11.37 ATOM 5 CB ALA 1 11.891 17.029 34.056 1.00 11.89 ATOM 6 N LEU 2 11.150 15.638 31.585 1.00 13.43 ATOM 7 CA LEU 2 10.392 15.362 30.376 1.00 14.98 ATOM 8 C LEU 2 9.556 16.543 29.879 1.00 14.65 ATOM 9 O LEU 2 9.465 16.764 28.657 1.00 13.62 ATOM 10 CB LEU 2 9.522 14.116 30.561 1.00 15.03 ATOM 11 CG LEU 2 8.919 13.539 29.291 1.00 17.13 ATOM 12 CD1 LEU 2 10.038 13.103 28.360 1.00 17.29 ATOM 13 CD2 LEU 2 8.027 12.361 29.656 1.00 17.65 ATOM 14 N TRP 3 8.960 17.305 30.796 1.00 14.18 ATOM 15 CA TRP 3 8.157 18.443 30.347 1.00 16.10 ATOM 16 C TRP 3 8.998 19.448 29.543 1.00 14.26 ATOM 17 O TRP 3 8.580 19.864 28.472 1.00 14.34 ATOM 18 CB TRP 3 7.359 19.103 31.491 1.00 19.02 ATOM 19 CG TRP 3 8.163 19.810 32.534 1.00 24.63 ATOM 20 CD1 TRP 3 8.699 19.262 33.683 1.00 25.51 ATOM 21 CD2 TRP 3 8.505 21.199 32.555 1.00 27.29 ATOM 22 NE1 TRP 3 9.348 20.230 34.403 1.00 27.56 ATOM 23 CE2 TRP 3 9.253 21.428 33.743 1.00 28.36 ATOM 24 CE3 TRP 3 8.258 22.278 31.686 1.00 27.60 ATOM 25 CZ2 TRP 3 9.754 22.695 34.083 1.00 28.94 ATOM 26 CZ3 TRP 3 8.761 23.542 32.026 1.00 28.78 ATOM 27 CH2 TRP 3 9.503 23.735 33.216 1.00 29.43
![Page 28: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/28.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
CAMBRIDGE STRUCTURAL DATABASE
• The CAMBRIDGE STRUCTURAL DATABASE• Software for search, Retrieval Display and
Analysis of CSD contents
The CSD records bibliographic, 2D chemical and 3D structural results from crystallographic analysis of organics, organometallics and metal complexes .Both X-Ray and Neutron Diffraction studies are included for small and medium sized compounds containing upto 500 atoms including hydrogens)
![Page 29: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/29.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
THREE DBACOMPONENTS
Database Integrity
Database Security
Database Recovery
![Page 30: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/30.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DATABASE INTEGRITY
However, certain safety measures can be built into a database to ensure that errors within the system are minimized
The major issue for the database management is to ensure that the data in the database is accurate, correct, valid and consistent.Any inconsistency between two or more entries that represent the same entity demonstrates the lack of integrity
Database technology cannot do very much to protect users against data errors made in the outside world before the data has been entered in the system
![Page 31: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/31.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DATA RECOVERY
The most common way to achieve this is to dump the contents of the database with the defined frequency on another medium, magnetic tape or optical disk, which is then stored in the same place
The process of recovery involves restoring the database to a state which is know to be correct following some kind of failure
The technique of redundancy is used in the sense that it has to be possible to recover the database to its correct state from information available somewhere else in the system
![Page 32: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/32.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DATABASE SECURITY
A password and a list of privileges attach to it are most commonly used to control user access rights to database information
The DBA has to ensure that adequate measures are taken to prevent unauthorized disclosure, alteration or destruction of both the data within the database and the database software itself
![Page 33: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/33.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
THREE COMPONENTS OF DATABASE
Retrieval of the data by end users equipped with suitable analysis and display tools
Development of a database structure that allows the storage and maintenance of the required dataData entry, maintenance and management
![Page 34: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/34.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DATABASE ADMINISTRATION
Once the data is entered, it has to be maintained and kept upto date
The database administrator (DBA) is a person or a group of persons responsible for overall control of database systems
The DBA is usually not only answerable for the design of the database, but also for choice of DBMS used, its implementation and training of all involved in the database running and use
![Page 35: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/35.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
PROBLEMS WITH THE DATA
Incomplete data
Noisy data
Temporal data
An extremely large amount of data
Non-textual data
![Page 36: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/36.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
INCOMPLETE DATASome data may be missing (e.g., some fields may be left blank)
Sometimes, the fact that missing data itself is a valuable piece of information
![Page 37: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/37.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
NOISY DATAThe field may contain incorrectly entered information
We do not know how does this affect the certainty factor (or) confidence level of the results
![Page 38: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/38.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
TEMPORAL DATA
Since database grow rapidly, how can data be incrementally added to our results
What effect should this have in the knowledge discovery process
![Page 39: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/39.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
AN EXTREMELY LARGE AMOUNT OF DATA
The option is to perform parallel processing, where n processors, each process approximately 1/n’ th of the data in approximately 1/n’ th of the time
Some datasets can grow significantly over time
How should such datasets be processed ?
![Page 40: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/40.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
NON-TEXTUAL DATAThere are many types of data that need to be manipulated, including image data, multimedia data (Video and Sound), spatial data in GIS and user defined data types
![Page 41: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/41.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
knowledge
Patterns
“Cleaned”data
Target data
Data Selection
Preprocessing &
transformation
Data Mining
Interpolationevaluation &validation
![Page 42: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/42.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Stand alone machine application
Web Application
![Page 43: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/43.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
PERL
Application programming(Standalone machine)
Applet Programming (Web oriented)
Useful for graphics application over the WWW
Very powerful for string manipulation
Uses CGI as the interface
JAVA
![Page 44: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/44.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
WHAT IS PERL?
PERL uses sophisticated pattern matching techniques to scan large amounts of data very quickly.Although optimized for scanning text, PERL can also deal with binary data and can make dbm files look associate arrays
PERL is an interpreted language optimized for scanning arbitrary test files, extracting information from those text files
The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant and minimal)
![Page 45: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/45.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
CGI(CommonGateway Interface)
CGI performs the task of translation, means translates the needs of clients into server requests and then back translates server replies to clients
Common Gateway interface (CGI), as its name implies, provides a gateway between a user (Client) and command/logic oriented server
![Page 46: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/46.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Client CGI Server
Client Java Servlet Server
![Page 47: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/47.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
RMI concept is very useful for multitier architecture EXAMPLE
www.hotmail.com www.google.com
![Page 48: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/48.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Remote
machine
Server
Client RMI
Software(Search Engine)
![Page 49: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/49.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
WEB-Page
Java Server pages(sun micro systems)
Active server pages(Microsoft corporation)
useful for dynamic web page creation
![Page 50: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/50.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
GRAPHICAL USER INTERFACE
(GUI)The Programmer can quickly design the user interface by drawing and arranging the screen elements rather than writing the raw code
CGI is easily visualizable to users
It is user friendly
Example:
MS-WINDOWS OPERATING SYSTEMS
![Page 51: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/51.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
GUI (Graphical User Interface)
Active X(Microsoft corporation)
Java swing(Sun micro systems)
Buttons, boxes and pull down menus (windows based)
![Page 52: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/52.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
VB (Visual Basic)Application development languages.
Supports graphics
Good for standalone applications
Web programming is not possible.But it is possible to use script languages(vb script or java script) to make it web oriented
![Page 53: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/53.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
VC++System & Application
Programming
Almost same as VB
Additional advantage
System side
![Page 54: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/54.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
WORLD WIDE WEB (W W W)
The hyper linked documents are known as HTML documents. They are written in a special language called HTML, stands for Hyper Text Markup Language. The HTML is nothing but ASCII text with embedded tags on it
World Wide Web is the famous and fastest growing Internet function.It is the way of accessing information already on the Internet using the concept of hypertext to link information.Like FTP, any types of digital documents, images, artwork, movies and sounds on the remote computer can be made hyperlinks.The protocol used for accessing such information is HTTP (Hyper Text Transfer Protocol)
![Page 55: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/55.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DBMS & RDBMSDBMS: Dbase
MS-AccessMysql-server
FoxPro (partially RDBMS)
RDBMS: SybaseOracleSQL-server
![Page 56: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/56.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
DATABASE
a bunch of tables
TABLES
Store numerous rows of information
FIELDS
The little boxes inside a tables
![Page 57: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/57.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
The best way to create your own access database is by using, microsoft access.This tool chips with the professional edition of office-87 and enables you to graphically design your own tables and individual field.
Yet another one my-SQL
An expensive whopper of a database system called SQL server, which is used in corporation that needs to store huge wads of information
ORACLE, which is another database format
![Page 58: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/58.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Typical Web Search
Keywords
Search Engine
Output
![Page 59: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/59.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Flat file
Web Browser
W W W
CGI-Program
HTML
HTML
Form O/p (in HTML)
Form O/p (in HTML)
![Page 60: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/60.jpg)
Mirror sites
![Page 61: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/61.jpg)
PDBGDBSCOP
![Page 62: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/62.jpg)
PDB
PROTEIN DATABANK
144.16.71.2
144.16.49.185
203.90.127.146 (VPN users)
![Page 63: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/63.jpg)
PDB-MIRROR MACHINE
3.40 GHz PIV machine
2 GB RD RAM
1 Tera-byte Hard Disk
32 MB Graphics Card
Powered by Intel SOLARIS
![Page 64: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/64.jpg)
The PDB server is up-to-date and as of now contains 24,080 coordinate entries(21,788 proteins, 992 protein and nucleic acid complexes, 1282 nucleic acids.
PDB
![Page 65: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/65.jpg)
![Page 66: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/66.jpg)
GDB
GENOME DATABASE
144.16.71.10
144.16.49.185
203.90.127.147 (VPN users)
![Page 67: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/67.jpg)
GDB-MIRROR site machine
3.40 GHz PIV machine
2 GB RD RAM
1 Tera-byte Hard Disk
32 MB Graphics Card
Powered by Intel SOLARIS
![Page 68: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/68.jpg)
![Page 69: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/69.jpg)
Structural Classification of Proteins
SCOP 144.16.71.2/scop 144.16.49.78/scop
203.90.127.146/scop
(for VPN users)
![Page 70: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/70.jpg)
SCOPThe SCOP mirror site at the institute has been created and maintained with the latest copy. Now the mirror site (version 1.63, May 2003 release) contains 49,497 domains from 18,946 PDB entries.
![Page 71: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/71.jpg)
![Page 72: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/72.jpg)
Packages developed at theBioinformatics Centre
Raman BuildingIndian Institute of Science
Bangalore 560 012
Dr. K. SEKAR
E-mail [email protected]
![Page 73: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/73.jpg)
![Page 74: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/74.jpg)
GENOME SEQUNECES
![Page 75: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/75.jpg)
MSGSMotif Search in Genome Sequences
-A web based interactive display tool
P. Selvarani, B.N. Vijay, V. Shanthi, S. Saravanan and K. Sekar
(To be submitted)
http://144.16.71.10/msgs (Internet users)http://203.90.127.147/msgs (VPN users)
![Page 76: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/76.jpg)
![Page 77: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/77.jpg)
THGSA Web based database of
Transmembrane Helices in Genome Sequences
S.A. Fernando, P. Selvarani, Soma Das, Ch. Kiran kumar, S. Mondal, S. Ramakumar and K. Sekar
NUCL. ACIDS RES. (2004), 32, D125-D128
http://144.16.71.10/thgs (Internet users)http://203.90.127.147/thgs (VPN users)
![Page 78: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/78.jpg)
![Page 79: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/79.jpg)
![Page 80: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/80.jpg)
PROTEIN SEQUNECES
![Page 81: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/81.jpg)
PSSTProtein Sequence Search Tool
-A web based interactive search engine
S. Saravanan, A. Ajmal Khan and K. Sekar
CURR. SCI. (2000), 550-552
http://144.16.71.10/psst (Internet users)http://203.90.127.147/psst (VPN users)
![Page 82: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/82.jpg)
![Page 83: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/83.jpg)
PROTEIN STRUCTURES
![Page 84: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/84.jpg)
BSDDBiomolecules Segment Display Device -A web based interactive display tool
P. Selvarani, V. Shanthi, C.K. Rajesh, S. Saravanan and K. Sekar
J. MOL. GRA. & MODEL. (2004) (In the press)
http://144.16.71.2/bsdd (Internet users)http://203.90.127.146/bsdd (VPN users)
![Page 85: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/85.jpg)
![Page 86: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/86.jpg)
![Page 87: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/87.jpg)
![Page 88: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/88.jpg)
![Page 89: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/89.jpg)
PDB Goodies-a web-based GUI to manipulate
the Protein Data Bank file
A.S.Z. Hussain, V. Shanthi, S.S. Sheik, J. Jeyakanthan, P. Selvarani and K. Sekar
ACTA. CRYST. (2002), D58, 1385-1386
http://144.16.71.11/pdbgoodies (Internet users)http://203.90.127.149/pdbgoodies (VPN users)
![Page 90: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/90.jpg)
![Page 91: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/91.jpg)
CAPConformation Angles Package -Displaying the conformation angles
of side chains in proteins
S.S. Sheik, P. Sundararajan, V. Shanthi and K. Sekar
BIOINFORMATICS (2003), 19, 1043-1044
http://144.16.71.146/cap (Internet users)http://203.90.127.148/cap (VPN users)
![Page 92: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/92.jpg)
![Page 93: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/93.jpg)
WAP - a Web-based package to calculate geometrical parameters between water oxygen and protein atoms
V. Shanthi, C.K. Rajesh, J. Jayalakshmi, V.G. Vijay and K. Sekar
J. APPL. CRYST. (2003), 36, 167-168
http://144.16.71.11/wap (Internet users)http://203.90.127.149/wap (VPN users)
![Page 94: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/94.jpg)
![Page 95: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/95.jpg)
RPRamachandran Plot on the
web
S.S. Sheik, P. Sundararajan, A.S.Z. Hussain and K. Sekar
BIOINFORMATICS (2002), 18, 1548-1549
http://144.16.71.146/rp (Internet users)http://203.90.127.148/rp (VPN users)
![Page 96: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/96.jpg)
![Page 97: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/97.jpg)
SSEPSecondary Structural Elements
of Proteins
V. Shanthi, P. Selvarani, Ch. Kiran Kumar, C.S.Mohire and K. Sekar
NUCL. ACIDS RES. (2003), 31, 3404-3405
http://144.16.71.148/ssep (Internet users)http://203.90.127.150/ssep (VPN users)
![Page 98: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/98.jpg)
![Page 99: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/99.jpg)
![Page 100: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/100.jpg)
![Page 101: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/101.jpg)
![Page 102: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/102.jpg)
SEMSymmetry Equivalent Molecules
A.S.Z. Hussain, Ch. Kiran Kumar, C.K. Rajesh, S.S. Sheik and K. Sekar
NUCL ACIDS RES. (2003), 31, 3356-3358.
http://144.16.71.11/sem (Internet users)http://203.90.127.149/sem (VPN users)
![Page 103: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/103.jpg)
![Page 104: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/104.jpg)
![Page 105: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/105.jpg)
![Page 106: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/106.jpg)
![Page 107: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/107.jpg)
![Page 108: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/108.jpg)
CADBConformational Angles DataBase of proteins
S.S. Sheik, P. Ananthalakshmi, G. Ramya Bhargavi and K. Sekar
NUCL. ACIDS RES. (2003), 31(1), 448-451
http://144.16.71.148/cadb (Internet users)http://203.90.127.150/cadb (VPN users)
![Page 109: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/109.jpg)
![Page 110: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/110.jpg)
Non-homologous (25% Identity) protein chains
Hobohm & Sander, Protein Sci. 3, 522-524
Non-homologous (25% Identity) protein chains
Hobohm & Sander, Protein Sci. 3, 522-524
X-Ray Diffraction : 1,276 (25)
NMR : 460 (2)
Fibre Diffraction : 3 (0)
Others : 0 (5)
Total no. of chains : 1,739 (32)Total no. of residues in
X-Ray Diffraction : 2,53,623
NMR : 37,281
Numbers within the paranthesis denote files having C coordinates.
![Page 111: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/111.jpg)
Non-homologous (90% Identity) protein chains
Hobohm & Sander, Protein Sci. 3, 522-524
Non-homologous (90% Identity) protein chains
Hobohm & Sander, Protein Sci. 3, 522-524
X-Ray Diffraction : 5,147 (26)
NMR : 993 (5)
Fibre Diffraction : 6 (0)
Others : 0 (5)
Total no. of chains : 6,146 (36)Total no. of residues in
X-Ray Diffraction : 11,29,466
NMR : 72,145
Numbers within the paranthesis denote files having C coordinates.
![Page 112: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/112.jpg)
LySDBLysozyme Structural DataBase
K. S. Mohan, Soma Das, C. Chockalingham, V. Shanthi & K. Sekar
ACTA CRYST. (2004), D60, 597-600.
http://144.16.71.2/lysdb (Internet users)http://203.90.127.146/lysdb (VPN users)
![Page 113: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/113.jpg)
![Page 114: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/114.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
TAKE HOME MESSAGEDatamining is nothing but exploiting the Hidden Trends in your data
Create your own derived database
No one tool or set of tools is universally applicable
Present the data in a useful format such as graph or table
![Page 115: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/115.jpg)
IISc
Bioinformatics Centre & Supercomputer Education and Research Centre
Approaches to developing data mining tools
Department of BiotechnologyMinistry of Science & Technology
Govt. of IndiaIndia
&
Jai Vigyan National Science FoundationGovt. of India
India
![Page 116: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/116.jpg)
![Page 117: K. SEKAR, Ph.D.. Dr. K. Sekar Bioinformatics Centre Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012 INDIA E-mail:](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e885503460f94b8c754/html5/thumbnails/117.jpg)