world ranked unversities analysis by bhadra
TRANSCRIPT
WORLD RANKING UNVIERSITIES ANALYIS
G.BHADRA
• Abstract.• Introduction.• Architecture.• Explanation.• Commands.• Screen shots.• Gantt Chart.• Future work• Conclusion
INTRODUCTION
Global university rankings have cemented the notion of a world university market arranged in a single “league table” for comparative purposes and have given a powerful impetus to international and international competitive pressures in the sector. Both the research rankings by Shanghai Jiao Tong University and the composite rankings by the Times Higher Education Supplement have been widely publicized and already appear to have generated incentives in favor of greater system stratification and the concentration of elite researchers.
ARCHITECTURE
RAW DATATA(.xl) HiveMySQL
IMPORT SQOOP
COMMANDSSQL COMMANDS
1. CREATE database database_name;2. USE database database name;3. CREATE table table_name (col name varchar(size)…………………….);4. local data infile ‘/home/cloudera/g9/univresity.csv’ fields terminated by ‘,’ lines
terminated by ’\n’; 5. select * from universities;
SQOOP COMMANDS
1. Sqoop import –connect “jdbc:mysql://localhost/worldrankinguniversity” - - username root - - password cloudera - - table university -hive-import - m 1
ABSTRACT•With the upcoming data deluge of semantic data, the fast growth of ontology bases has brought significant challenges in performing efficient and scalable reasoning. •Traditional centralized reasoning methods are not sufficient to process large ontologies. •Distributed searching methods are thus required to improve the scalability and performance of inferences. •This paper proposes an incremental and distributed inference method for large-scale ontologies by using Map reduce, which realizes high-performance reasoning and runtime searching, especially for incremental knowledge base. •By constructing transfer inference forest and effective assertion triples, the storage is largely reduced and the search process is simplified and accelerated. •We propose an incremental and distributed inference method (IDIM) for large-scale RDF datasets via Map reduce. •The choice of Map reduce is motivated by the fact that it can limit data exchange and alleviate load balancing problems by dynamically scheduling jobs on computing nodes.
EXPLANATION
• HIVE: Hive has three main functions: data summarization, query and analysis. It supports queries expressed in a language called Hive QL, which automatically translates SQL-like queries into map reduce jobs executed on Hadoop. In addition, Hive QL supports custom Map Reduce scripts to be plugged into queries. Hive supports text files, Sequence Files and RC Files (Record Columnar Files which store columns of a table in a columnar database way.)• HDFS: Hadoop Distributed File System (HDFS) is a Java-based file system that
provides scalable and reliable data storage that is designed to span large clusters of commodity servers. HDFS, Map Reduce, and YARN form the core of Apache Hadoop• SQOOP: Sqoop is a command-line interface application for transferring data between
relational databases and Hadoop. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import.
SCREEN SHOTS
GANTT CHART
Activities Week
Installations Week-1
Collection of data sets Week-2
Hadoop tools Week-3,4,5
Importing data into sql Week-6
Importing data to hadoop Week-7
Development Week-8,9
Analysis Week-10
Report Week-11,12
FUTURE WORK In order to store the incremental RDF triples more efficiently, we present two novel concepts, i.e., transfer inference forest (TIF) and effective assertion triples (EAT). Their use can largely reduce the storage and simplify the reasoning process. Based on TIF/EAT, we need not compute and store RDF closure, and the reasoning time so significantly decreases that a user’s online query can be answered timely, which is more efficient than existing methods to our best knowledge. More importantly, the update of TIF/EAT needs only minimum computation since the relationship between new triples and existing ones is fully used, which is not found in the existing literature. In order to store the incremental RDF triples more efficiently, we present two novel concepts, transfer inference forest and effective assertion triples. Their use can largely reduce the storage and simplify the searching process.
CONCLUSIONAs far we have collected data of universities on which it’s rank reside. Retrieving and storing data from traditional database is somewhat troublesome so, we have implemented a method of storing and retrieving data more efficiently.Further we are going to analyze information of particular universities by our criteria.