project descriptionfeasibility analysis company profileuse case diagram existing systemclass diagram...
TRANSCRIPT
DOCUMENT ARCHIVAL SYSTEM
-: INDEX :-
Project Description Feasibility AnalysisCompany Profile Use case DiagramExisting System Class DiagramNeed For The New System Interaction DiagramObjectives Of The New System Activity DiagramProblem Definition Data DictionaryCore Components User Interface DesignProject Profile Report DesignAdvantages and Limitations Of Proposed System
Proposed Enhancement
Proposed TimeLine Chart ConclusionRequirement Determination Bibliography
-: Project Description :-• Document Archival System is a web based application with distributed
architecture. • Its main utilization will be within organizations like Law Firm, Government
Agencies, and Corporations etc. • With data growing so rapidly and the rise of unstructured data accounting for
90% of the data today, the time has come for enterprises to re-evaluate their approach to data storage, management and analytics.
• Big Data processing takes long time to process when it run on single machine. • Our application uses MapReduce framework which is a batch-based,
distributed computing framework, It allows paralleled work over a large amount of data.
• Our project use solr cloud for indexing and searching for fast retrieval of data
Company name Capital NovusAddress A-501, Mind space, Appl-It\iTes SEZ
K.Raheja Road,Koba, Gandhinagar-382009
Contact no. 079 65721500Company work Provide a solution to the various law firms,
corporation etc.
-: Company Profile :-
-: Company Profile :-
Capital Novus has provided high quality technology services to the legal community since 2002.
Our clients are law firms, corporations and government agencies involved in complex litigation, regulatory matters and investigations.
We assist them with efficient, cost-effective solutions to the challenge of managing electronically stored information.
-: Existing System :-The existing system of the law firms are not computerized but a
manual one, thus it makes time consuming, tedious and expensive work for the case study and analysis.
The Company has to maintain a lot of registers and files in order to store the information or data. Because of high progress of the number of the cases that time existing system was not able to meet the rising requirements, which led to computerization.
-: Need For The New System:-Computers are now becoming and important part of every activity
in every organization, as they are fast and accurate. Today every organization requires a system that is accurate, secure and affordable.
FASTER PROCESSING OF TRANSACTION LESS STORAGE SPACE SECURITY REDUCTION IN EXPENSES
-: Objective of the New System:-FASTER PROCESSING OF TRANSACTION :
The system must fit into the existing environment and should be user friendly. It speed up the processing of all the transactions. LESS STORAGE SPACE :
The new system will store much more information than the current one in an avoidable amount of space. SECUIRTY :
The new system provides more security options than the current system by means of different types of accounts and passwords. REDUCTION IN EXPENSES :
The new system is a one-time investment and requires much less maintenance as compared to the existing system. It’s recurring cost will be reduced every year.
-: Problem Definition :-
In E-Discovery field, today each corporate has large amount of data (social media, emails, loose documents & hard copy-terabytes of data-millions of documents).
It is a challenging job to process, maintain, archive and get required docs efficiently from millions of docs.
DAS is to serve user with ability to manage, retrieve and archive, large number of documents in structured manner. Using this application user can search documents easily from entire repository of documents.
-: Core Components :-DAS
Document Archival System is to serve user with ability to manage, retrieve and archive, large number of documents in structured manner.ADMIN
Administrator, He has the authority to add/delete users, grant permission to user to Indexing documents and search.STRUTS 2
Apache Struts 2 is an open-source web application framework for developing Java EE web applications. It uses and extends the Java Servlet API to encourage developers to adopt a model–view–controller (MVC) architecture. SPRING 4.0 The Spring Framework is an open source application framework and inversion of control container for the Java platform. The framework's core features can be used by any Java application, but there are extensions for building web applications on top of the Java EE platform.
-: Cont.. :-HIBERNATE 4.0
Hibernate is a free software that is distributed under the GNU Lesser General Public License. Hibernate ORM (Hibernate in short) is an object-relational mapping library for the Java language, providing a framework for mapping an object-oriented domain model to a traditional relational database. APACHE HADOOP 2.2.0
Apache Hadoop is an open-source software framework written in Java and set of algorithms for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (Map Reduce).
-: Cont… :-SOLR/LUCENE 4.10.3
Solr is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, highly scalable. Solr is the most popular enterprise search engine.
APACHE TIKA 1.5 The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). These entire file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation.
-: Cont… :- APACHE MAVEN
Maven is built using a plug-in-based architecture that allows it to make use of any application controllable through standard input. Maven dynamically downloads Java libraries and Maven plug-ins from one or more repositories such as the Maven Central Repository, and stores them in a local project.
EXT JS 5.1.0Ext JS is a pure JavaScript application framework for building
interactive web applications using techniques such as Ajax, DHTML and DOM scripting.
-: Project Profile :-Project Title : Document Archival System
Organization : Capital Novus
Front End: Intellij Idea 14, ExtJs 5.1.0, Struts2
Middle Ware : J2EE (Spring 4.0, Hibernate 4.0)
Back End: MySQL 2012
Tools & Technology: Solr 4.10.3, Hadoop 2.2.0
Project Duration: 5th January to 4th April
External Project Guide: Mr. Jagdish Vasani (Team Leader)
Internal Project Guide: Prof. Anuradha Mam
-: Advantages and Limitationof the proposed system :-
AdvantagesFully automatedNo paper wastageHistorical data are available on demandRole-based securityTime savingDatabase can manage and transaction will be fast
Disadvantages
Internet connection must needContinuous power supply needIntruders may affect our personal data
-: Proposed Time Line Chart:-Week 2 3 4 5 6 7 8 9 10 11 12 13 14
Date 8 15 22 29 4 11 18 25 5 12 19 26 2
Month
ActivityJanuary February March April
Domain Under- standing
Further Analysis
Learning Process
Design
Coding &Testing
Documentation
Final Documentation
-: Requirement Determination :-
Computers are now becoming and important part of every activity in every organization, as they are fast and accurate. Today every organization requires a system that is accurate, secure and affordable.
For the Requirement Determination of the company requires first all of documents for the process the case.
These documents are provided by the law firms which are going to use this product. Now after collecting all the documents like .pst , .nsf , compressed file, loose docs, office file.
This all documents are make entry of them in this product and this product do the indexing of that document and also provide the search facility.
-: Feasibility Analysis :-For this project success we first all of the communicate with
the our team leader for this project requirement of Law Firms. After then we make our planning upon the admin or end-user
requirements and make deeply study on that. After that we model a system design that shows that how the
system actual work flow. In this modeling process we make lots of paper work to draw a
different kind of the diagram like use-case, class, sequence, activity etc. After the modeling we start the actual work of this project
which is construction. In this process we develop the system by doing a coding of the system.
After the construction the system we test them and noted the where the problems are occurred and after it we solve that problems. And we also give chance to use the system to the user for its user compatibility.
-: System Design :-
-: Process Diagram:-Process Diagram For Inventory Process
Process Diagram For Extraction
Process Disgram For Meta-data
Front End Process-: UseCase Diagram:-
Use Case Diagram For Search Document
Use Case Diagram For Background Process
Use Case Diagram For Manage Job Document Archival System
-: Class Diagram :-
-: Sequence Diagram :-
Sequence Diagram For Compressed File Extraction
Sequence Diagram For Embedded File Extraction
Sequence Diagram For Text Conversion
-: Activity Diagram :-
Activity Diagram For Inventory
Activity Diagram For Extraction Process
Activity Diagram For PST Extraction
Get PST Detail From Edocs 0f Email Message
Retrive Each PST’s Folder Detail
Generate All Metadata Capture Message Body and Generate Locale
Yes No
Extract Attachments
Store BCC Detail
Save Msg
Extract All ?
Save BCC Detail ?
Activity Diagram For Meta-data Process
Activity Diagram For Search Activity
-: Data Dictionary :-
Table Name : User Role
Description : This Table is Used to Define Role of a User
SR. NO. FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 ROLE_ID TinyInt P.K User Role Id 1
2 ROLE_NAME varchar(15) Not Null User role Role_Admin Or Role_User
3 ENABLED Bit Default : 1 1: Enabled 0: Disable
1
4 CREATED_DATE Datetime Not Null Created Date 2015-03-02
5 CREATED_BY SmallInt Not Null Ref. User 1
6 UPDATED_DATE Datetime Not Null Updated date 2015-03-02
7 UPDATED_BY SmallInt Not Null Ref. User 1
8 DESCRIPTION varchar(100) Description about role
What can user do
Table Name : User
Description : This Table is Used to Store User Detail
SR. NO. FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 USER_ID Smallint P.K User id 1
2 ROLE_ID Tinyint Not Null Ref. User_Role 2
3 COMPANY_ID Smallint F.K. Ref. Company_Detail
1
4 USER_NAME varchar(30) Not NullUnique
User Name CapitalNovus
5 USER_PASS varchar(30) Not Null User Password ******
6 ENABLED Bit Default : 1 1: Enabled 0: Disable
1
7 CREATED_DATE Datetime Not Null Created Date 2015-03-02
8 CREATED_BY SmallInt Created By 1
9 UPDATED_DATE Datetime Not Null Updated Date 2015-03-02
10 UPDATED_BY SmallInt Updated By 2
11 DESCRIPTION varchar(100) Description about user
User for only view analysis
Table Name : Company_Detail
Description : This Table is Used to Store Detail of Company
SR. NO. FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 COMPANY_ID Smallint P.K Company Id 1
2 COMPANY_NAME varchar(50) Not Null Company Name Capital Novus
3 EMAIL_ID varchar(50) Not Null Email Id [email protected]
4 PHONE_NO varchar(12) Not Null Contact No 1234567890
5 CREATED_DATE Datetime Not Null Created Date 2015-03-02
6 CREATED_BY Smallint F.K. Ref. User 1
7 UPDATED_DATE Datetime Not Null Updated Date 2015-03-02
8 UPDATED_BY Smallint F.K. Ref. User 2
9 DESCRIPTION varchar(150) Description about company
Capital novus
Table Name : Configuration
Description : This Table is Used to Store Detail of Different Configurations
SR. NO.
FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 CONFIG_ID Smallint P.K. Configuration id
1
2 COMPANY_ID Smallint F.K Ref. Company 1
3 SOLR_CORE_PATH varchar(200) Not Null Path of Solr Core
http://localhost:8083/solr/core5
4 HADOOP_CLUSTER_PATH varchar(200) Not Null Path of cluster
hdfs://10.1.12.108:9000
5 DB_URL varchar(150) Not Null Databse url jdbc:sqlserver://TRAINEEDEV02\\SQLEXPRESS;databaseName=TEMPDAS
6 DB_USER_NAME Varchar(50) Not Null Database username
Capital
7 DB_PASSWORD Varchar(50) Not Null Database Password
*****
Table Name : Case_Detail
Description : This Table is Used to Store Detail of Case
SR. NO. FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 CASE_ID int(5) P.K Case Id 1
2 COMPANY_ID smallInt Not Null Company Id 1
2 CASE_NAME varchar(50) Not Null Case Name Copy right
3 CASE_PATH varchar(100) Not Null Path Of Case /CN/Case 1
4 CREATED_DATE Datetime Not Null Created Date 2015-03-02
5 CREATED_BY Int(5) F.K. Ref. User 1
Table Name : Document_Detail
Description : This Table is Used to Store Detail of Documents
SR. NO. FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 DOC_ID bigint(10) P.K Document Id 1
2 CASE_ID int(5) F.K Ref. Case_Detail
1
3 DOC_NAME varchar(100) Not Null Document Name
Hello.txt
4 DOC_PATH varchar(200) Not Null Original Path /user/doc1.txt
5 TXT_DOC_PATH varchar(200) Not Null Txt file path D:/java source/00000001/00000001_00000.txt
5 INDEXED tinyint(1) Not Null 0: not indexed1: indexed-1: failed
1
Table Name : Metadata_Field_Detail
Description : This Table is Used to Store Detail of Documents Metadata Field
SR. NO.
FIELD NAME DATA TYPE CONSTRAINT
DESCRIPTION
SAMPLE DATA
1 METADATA_FIELD_ID
int(5) P.K Metadata Field Id
1
2 FIELD_NAME varchar(50) Not Null Metadata Field
Author
Table Name : Metadata_Value_Detail
Description : This Table is Used to Store Detail of Documents Metadata Value
SR. NO.
FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 METADATA_FIELD_ID int(5) F.K Ref. Metadata Field Id
1
2 DOC_ID bigint(10) F.K Ref. Document Detail
1
3 FIELD_VALUE varchar(100) Not Null Metadata Value
Capital Novus
Table Name : Job_Detail
Description : This Table is Used to Store Detail of Job Scheduling
SR. NO. FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA
1 JOB_ID int(5) P.K Job Id 1
2 CASE_ID int(5) F.K Ref. Case_Detail 1
3 JOB_STATUS char(3) Def: p P=pendingR=readyRu=runningF=failedC=completed
ready
4 JOB_SUBMIT_TIME datetimeb Not Null Job Submission Time
2015-03-02 12:11:01
5 JOB_FINISHED_TIME datetime Not Null Job Completion Time
2015-03-02 22:11:01
6 JOB_SUBMITED_BY int(5) Not Null Ref. User 1
Table Name : Save_Search_Master
Description : This Table is Used to Store Detail of Save Search Of User
SR. NO. FIELD NAME DATA TYPE CONSTRAINT Description SAMPLE DATA
1 SSM_ID int(5) P.K. Save search Master Id
1
2 COMPANY_ID smallInt F.K Ref. Company_Detail
1
3 FOLDER_NAME varchar(30) Not Null Folder Name SAVE1
4 SEARCH_QUERY varchar(500) Not Null User Search Query
Hi !, How r u?
5 NUM_OF_DOCUMENT int(8) Not Null Total Stored Document
501
6 SAVED_DATE datetime Not Null Saved Date 2015-03-02 12:11:01
7 SAVED_BY int(5) F.K. Ref. User 1
Table Name : Save_Search_Detail
Description : This Table is Used to Store Detail of Saved Search Documents
SR. NO.
FIELD NAME DATA TYPE CONSTRAINT Description SAMPLE DATA
1 SSM_ID int(5) F.K. Ref. Save_ Search_Master
1
2 DOCUMENT_ID bigint(10) F.K. Ref. Document_Detail
5
-: User Interface Design :-
Hadoop Job Submission
Hadoop Job Complition
HDFS Screen Shot
Hadoop Job Output
Solr Home
Solr Core Layout
-: Test Cases :-Once code has been generated,
program testing begins. The testing process focuses on the logical internals of the software, ensuring that all statements have been tested, and on the functional externals; that is, conducting tests to uncover errors and ensure that defined input will produce actual results that agree with required results.
-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_01 Test Designed Date: 25-Mar-15Test Priority(Low/Med/High):High Test Execution Date: 26-Mar-15Module Name: Inventory Test Performed by: Dhaval PatelTest Title: Verify insertion of files from physical location into database.
Test Designed by: Dhaval Patel
Description: Test module of the inventory
Pre-Conditions: User must have a specify physical locationDependencies: File entries must resides into the database
Step Test Steps Expected Result Actual Result Status(Pass/Fail)1 Monitor Physical
Location.Accessibility of Physical Location
Physical Location is accessible from database
Pass
2 Generate Case for file Case wise Insertion into the Database
Case wise file inserted in the database
Pass
3 File structure maintenance for given location
File entry in form of Compliance Path
File entered in form of Compliance Path
Pass
-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_02 Test Designed Date: 25-Mar-15Test Priority(Low/Med/High):Med Test Execution Date: 26-Mar-15Module Name: Extraction Test Performed by: Dhaval PatelTest Title: Verify the user has perform zip extraction Test Designed by: Dhaval PatelDescription: To test user is having all the inventory file extracted from the embedded documentPre-Conditions: User must have a set of records in the inventory tableDependencies: Respective status must be set for a file to perform application operationStep Test Steps Expected Result Actual Result Status(Pass/Fail)1 Select Process ids to perform file
extractionFetch qualified case ID for particular Process ID
Qualified records are retrieved for given Process ID
Pass
2 Select case ids to perform file extraction
Fetch qualified files from the inventory table
Qualified records are retrived for given batchId
Pass
3 start Application processing All documents will be extracted form the respected container files and status will be updated for a specific document and extracted document will be physically stored in destination path
All documents is extracted form the respected files and status is updated for a specific document and extracted document is physically stored in destination path
Pass
-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_03 Test Designed Date: 27-Mar-15Test Priority(Low/Med/High):High Test Execution Date: 28-Mar-15Module Name: Metadata Test Performed by: Dhaval PatelTest Title: Verify the Proper metadata extracted from the document Test Designed by: Dhaval PatelDescription: Test module of metadata extraction
Step Test Steps Expected Result Actual Result Status(Pass/Fail)1 Select Process ids to perform
file extractionFetch qualified case ID for particular Process ID
Qualified records are retrieved for given Process ID
Pass
2 Select case ids to perform file extraction
Fetch qualified documents from the document table
Qualified records are retrieved for given case Id
Pass
3 start Application processing All documents metadata will be extracted form the respected container document and status will be updated for a specific document and insert metadata information to the metadata table.
All documents metadata will be extracted form the respected container document and status will be updated for a specific document and insert metadata information to the metadata table.
Pass
-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_04 Test Designed Date: 27-Mar-15Test Priority(Low/Med/High):High Test Execution Date: 28-Mar-15Module Name: Indexing Test Performed by: Dhaval PatelTest Title: Verify proper index is created at solr server Test Designed by: Dhaval PatelDescription: Test Module for Indexer
Pre-Conditions: Indexing is only possible on Text DataDependencies: Text file for all type of physical file is generated
Step Test Steps Expected Result Actual Result Status(Pass/Fail)1 Select Process ids to perform
file extractionFetch qualified case ID for particular Process ID
Qualified records are retrieved for given Process ID
Pass
2 Select case ids to perform file extraction
Fetch qualified document from the metadata and document table
Qualified records are retrieved for given case Id
Pass
3 start Application processing All metadata and Document Content will be added to solr server and update the indexed status in the document table.
All metadata and Document Content will be added to solr server and update the indexed status in the document table.
Pass
-: Proposed Enhancement :-The application is not fully developed,
changes is going on day by day, and new functionality also added by future.
Making UI more professional.
Apply clustering in Searching.
Images, video, Audio are also Indexable.
-: Conclusion :-Throughout the process, I obtained the
experience of working in a large organization and it was a great learning experience. I had the privilege of going through the entire software development lifecycle right from requirement gathering phase. Working with a globally renowned company, was a great learning experience to learn their standards and application areas.
-: Bibliography:-www.javatpoint.comhttp://www.solr.wiki.comhttp://www.javaprogrammingforum.comwww.stackoverflow.comhttp://www.hackki.com/2013/05/how-to-determine-number-of-mappers
-and.htmlhttp://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_H
adoop_Definedhttp://db-engines.com/en/system/Cassandra%3BHBase%3BHivehttp://hadoop.apache.org/http://wiki.constellio.com/index.php/Solrj_examplehttp://strata.oreilly.com/2011/01/what-is-hadoop.htmlhttp://www.techrepublic.com/article/hadoop-and-cloud-computing-colli
sion-course-or-happy-symbiosis/http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
Thank You