project descriptionfeasibility analysis company profileuse case diagram existing systemclass diagram...

DOCUMENT ARCHIVAL SYSTEM

-: INDEX :-

Project Description Feasibility AnalysisCompany Profile Use case DiagramExisting System Class DiagramNeed For The New System Interaction DiagramObjectives Of The New System Activity DiagramProblem Definition Data DictionaryCore Components User Interface DesignProject Profile Report DesignAdvantages and Limitations Of Proposed System

Proposed Enhancement

Proposed TimeLine Chart ConclusionRequirement Determination Bibliography

-: Project Description :-• Document Archival System is a web based application with distributed

architecture. • Its main utilization will be within organizations like Law Firm, Government

Agencies, and Corporations etc. • With data growing so rapidly and the rise of unstructured data accounting for

90% of the data today, the time has come for enterprises to re-evaluate their approach to data storage, management and analytics.

• Big Data processing takes long time to process when it run on single machine. • Our application uses MapReduce framework which is a batch-based,

distributed computing framework, It allows paralleled work over a large amount of data.

• Our project use solr cloud for indexing and searching for fast retrieval of data

Company name Capital NovusAddress A-501, Mind space, Appl-It\iTes SEZ

K.Raheja Road,Koba, Gandhinagar-382009

Contact no. 079 65721500Company work Provide a solution to the various law firms,

corporation etc.

-: Company Profile :-

-: Company Profile :-

Capital Novus has provided high quality technology services to the legal community since 2002.

Our clients are law firms, corporations and government agencies involved in complex litigation, regulatory matters and investigations.

We assist them with efficient, cost-effective solutions to the challenge of managing electronically stored information.

-: Existing System :-The existing system of the law firms are not computerized but a

manual one, thus it makes time consuming, tedious and expensive work for the case study and analysis.

The Company has to maintain a lot of registers and files in order to store the information or data. Because of high progress of the number of the cases that time existing system was not able to meet the rising requirements, which led to computerization.

-: Need For The New System:-Computers are now becoming and important part of every activity

in every organization, as they are fast and accurate. Today every organization requires a system that is accurate, secure and affordable.

FASTER PROCESSING OF TRANSACTION LESS STORAGE SPACE SECURITY REDUCTION IN EXPENSES

-: Objective of the New System:-FASTER PROCESSING OF TRANSACTION :

The system must fit into the existing environment and should be user friendly. It speed up the processing of all the transactions. LESS STORAGE SPACE :

The new system will store much more information than the current one in an avoidable amount of space. SECUIRTY :

The new system provides more security options than the current system by means of different types of accounts and passwords. REDUCTION IN EXPENSES :

The new system is a one-time investment and requires much less maintenance as compared to the existing system. It’s recurring cost will be reduced every year.

-: Problem Definition :-

In E-Discovery field, today each corporate has large amount of data (social media, emails, loose documents & hard copy-terabytes of data-millions of documents).

It is a challenging job to process, maintain, archive and get required docs efficiently from millions of docs.

DAS is to serve user with ability to manage, retrieve and archive, large number of documents in structured manner. Using this application user can search documents easily from entire repository of documents.

-: Core Components :-DAS

Document Archival System is to serve user with ability to manage, retrieve and archive, large number of documents in structured manner.ADMIN

Administrator, He has the authority to add/delete users, grant permission to user to Indexing documents and search.STRUTS 2

Apache Struts 2 is an open-source web application framework for developing Java EE web applications. It uses and extends the Java Servlet API to encourage developers to adopt a model–view–controller (MVC) architecture. SPRING 4.0 The Spring Framework is an open source application framework and inversion of control container for the Java platform. The framework's core features can be used by any Java application, but there are extensions for building web applications on top of the Java EE platform.

-: Cont.. :-HIBERNATE 4.0

Hibernate is a free software that is distributed under the GNU Lesser General Public License. Hibernate ORM (Hibernate in short) is an object-relational mapping library for the Java language, providing a framework for mapping an object-oriented domain model to a traditional relational database. APACHE HADOOP 2.2.0

Apache Hadoop is an open-source software framework written in Java and set of algorithms for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (Map Reduce).

-: Cont… :-SOLR/LUCENE 4.10.3

Solr is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, highly scalable. Solr is the most popular enterprise search engine.

APACHE TIKA 1.5 The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). These entire file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation.

-: Cont… :- APACHE MAVEN

Maven is built using a plug-in-based architecture that allows it to make use of any application controllable through standard input. Maven dynamically downloads Java libraries and Maven plug-ins from one or more repositories such as the Maven Central Repository, and stores them in a local project.

EXT JS 5.1.0Ext JS is a pure JavaScript application framework for building

interactive web applications using techniques such as Ajax, DHTML and DOM scripting.

-: Project Profile :-Project Title : Document Archival System

Organization : Capital Novus

Front End: Intellij Idea 14, ExtJs 5.1.0, Struts2

Middle Ware : J2EE (Spring 4.0, Hibernate 4.0)

Back End: MySQL 2012

Tools & Technology: Solr 4.10.3, Hadoop 2.2.0

Project Duration: 5th January to 4th April

External Project Guide: Mr. Jagdish Vasani (Team Leader)

Internal Project Guide: Prof. Anuradha Mam

-: Advantages and Limitationof the proposed system :-

AdvantagesFully automatedNo paper wastageHistorical data are available on demandRole-based securityTime savingDatabase can manage and transaction will be fast

Disadvantages

Internet connection must needContinuous power supply needIntruders may affect our personal data

-: Proposed Time Line Chart:-Week 2 3 4 5 6 7 8 9 10 11 12 13 14

Date 8 15 22 29 4 11 18 25 5 12 19 26 2

Month

ActivityJanuary February March April

Domain Under- standing

Further Analysis

Learning Process

Design

Coding &Testing

Documentation

Final Documentation

-: Requirement Determination :-

Computers are now becoming and important part of every activity in every organization, as they are fast and accurate. Today every organization requires a system that is accurate, secure and affordable.

For the Requirement Determination of the company requires first all of documents for the process the case.

These documents are provided by the law firms which are going to use this product. Now after collecting all the documents like .pst , .nsf , compressed file, loose docs, office file.

This all documents are make entry of them in this product and this product do the indexing of that document and also provide the search facility.

-: Feasibility Analysis :-For this project success we first all of the communicate with

the our team leader for this project requirement of Law Firms. After then we make our planning upon the admin or end-user

requirements and make deeply study on that. After that we model a system design that shows that how the

system actual work flow. In this modeling process we make lots of paper work to draw a

different kind of the diagram like use-case, class, sequence, activity etc. After the modeling we start the actual work of this project

which is construction. In this process we develop the system by doing a coding of the system.

After the construction the system we test them and noted the where the problems are occurred and after it we solve that problems. And we also give chance to use the system to the user for its user compatibility.

-: System Design :-

-: Process Diagram:-Process Diagram For Inventory Process

Process Diagram For Extraction

Process Disgram For Meta-data

Front End Process-: UseCase Diagram:-

Use Case Diagram For Search Document

Use Case Diagram For Background Process

Use Case Diagram For Manage Job Document Archival System

-: Class Diagram :-

-: Sequence Diagram :-

Sequence Diagram For Compressed File Extraction

Sequence Diagram For Embedded File Extraction

Sequence Diagram For Text Conversion

-: Activity Diagram :-

Activity Diagram For Inventory

Activity Diagram For Extraction Process

Activity Diagram For PST Extraction

Get PST Detail From Edocs 0f Email Message

Retrive Each PST’s Folder Detail

Generate All Metadata Capture Message Body and Generate Locale

Yes No

Extract Attachments

Store BCC Detail

Save Msg

Extract All ?

Save BCC Detail ?

Activity Diagram For Meta-data Process

Activity Diagram For Search Activity

-: Data Dictionary :-

Table Name : User Role

Description : This Table is Used to Define Role of a User

SR. NO. FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA

1 ROLE_ID TinyInt P.K User Role Id 1

2 ROLE_NAME varchar(15) Not Null User role Role_Admin Or Role_User

3 ENABLED Bit Default : 1 1: Enabled 0: Disable

1

4 CREATED_DATE Datetime Not Null Created Date 2015-03-02

5 CREATED_BY SmallInt Not Null Ref. User 1

6 UPDATED_DATE Datetime Not Null Updated date 2015-03-02

7 UPDATED_BY SmallInt Not Null Ref. User 1

8 DESCRIPTION varchar(100) Description about role

What can user do

Table Name : User

Description : This Table is Used to Store User Detail


1 USER_ID Smallint P.K User id 1

2 ROLE_ID Tinyint Not Null Ref. User_Role 2

3 COMPANY_ID Smallint F.K. Ref. Company_Detail

1

4 USER_NAME varchar(30) Not NullUnique

User Name CapitalNovus

5 USER_PASS varchar(30) Not Null User Password ******

6 ENABLED Bit Default : 1 1: Enabled 0: Disable

1


8 CREATED_BY SmallInt Created By 1

9 UPDATED_DATE Datetime Not Null Updated Date 2015-03-02

10 UPDATED_BY SmallInt Updated By 2

11 DESCRIPTION varchar(100) Description about user

User for only view analysis

Table Name : Company_Detail

Description : This Table is Used to Store Detail of Company


1 COMPANY_ID Smallint P.K Company Id 1

2 COMPANY_NAME varchar(50) Not Null Company Name Capital Novus

3 EMAIL_ID varchar(50) Not Null Email Id [email protected]

4 PHONE_NO varchar(12) Not Null Contact No 1234567890


6 CREATED_BY Smallint F.K. Ref. User 1

7 UPDATED_DATE Datetime Not Null Updated Date 2015-03-02

8 UPDATED_BY Smallint F.K. Ref. User 2

9 DESCRIPTION varchar(150) Description about company

Capital novus

mailto:[email protected]

Table Name : Configuration

Description : This Table is Used to Store Detail of Different Configurations

SR. NO.

FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA

1 CONFIG_ID Smallint P.K. Configuration id

1

2 COMPANY_ID Smallint F.K Ref. Company 1

3 SOLR_CORE_PATH varchar(200) Not Null Path of Solr Core

http://localhost:8083/solr/core5

4 HADOOP_CLUSTER_PATH varchar(200) Not Null Path of cluster

hdfs://10.1.12.108:9000

5 DB_URL varchar(150) Not Null Databse url jdbc:sqlserver://TRAINEEDEV02\\SQLEXPRESS;databaseName=TEMPDAS

6 DB_USER_NAME Varchar(50) Not Null Database username

Capital

7 DB_PASSWORD Varchar(50) Not Null Database Password

*****

Table Name : Case_Detail

Description : This Table is Used to Store Detail of Case


1 CASE_ID int(5) P.K Case Id 1

2 COMPANY_ID smallInt Not Null Company Id 1

2 CASE_NAME varchar(50) Not Null Case Name Copy right

3 CASE_PATH varchar(100) Not Null Path Of Case /CN/Case 1


5 CREATED_BY Int(5) F.K. Ref. User 1

Table Name : Document_Detail

Description : This Table is Used to Store Detail of Documents


1 DOC_ID bigint(10) P.K Document Id 1

2 CASE_ID int(5) F.K Ref. Case_Detail

1

3 DOC_NAME varchar(100) Not Null Document Name

Hello.txt

4 DOC_PATH varchar(200) Not Null Original Path /user/doc1.txt

5 TXT_DOC_PATH varchar(200) Not Null Txt file path D:/java source/00000001/00000001_00000.txt

5 INDEXED tinyint(1) Not Null 0: not indexed1: indexed-1: failed

1

Table Name : Metadata_Field_Detail

Description : This Table is Used to Store Detail of Documents Metadata Field

SR. NO.

FIELD NAME DATA TYPE CONSTRAINT

DESCRIPTION

SAMPLE DATA

1 METADATA_FIELD_ID

int(5) P.K Metadata Field Id

1

2 FIELD_NAME varchar(50) Not Null Metadata Field

Author

Table Name : Metadata_Value_Detail

Description : This Table is Used to Store Detail of Documents Metadata Value

SR. NO.

FIELD NAME DATA TYPE CONSTRAINT DESCRIPTION SAMPLE DATA

1 METADATA_FIELD_ID int(5) F.K Ref. Metadata Field Id

1

2 DOC_ID bigint(10) F.K Ref. Document Detail

1

3 FIELD_VALUE varchar(100) Not Null Metadata Value

Capital Novus

Table Name : Job_Detail

Description : This Table is Used to Store Detail of Job Scheduling


1 JOB_ID int(5) P.K Job Id 1

2 CASE_ID int(5) F.K Ref. Case_Detail 1

3 JOB_STATUS char(3) Def: p P=pendingR=readyRu=runningF=failedC=completed

ready

4 JOB_SUBMIT_TIME datetimeb Not Null Job Submission Time

2015-03-02 12:11:01

5 JOB_FINISHED_TIME datetime Not Null Job Completion Time

2015-03-02 22:11:01

6 JOB_SUBMITED_BY int(5) Not Null Ref. User 1

Table Name : Save_Search_Master

Description : This Table is Used to Store Detail of Save Search Of User

SR. NO. FIELD NAME DATA TYPE CONSTRAINT Description SAMPLE DATA

1 SSM_ID int(5) P.K. Save search Master Id

1

2 COMPANY_ID smallInt F.K Ref. Company_Detail

1

3 FOLDER_NAME varchar(30) Not Null Folder Name SAVE1

4 SEARCH_QUERY varchar(500) Not Null User Search Query

Hi !, How r u?

5 NUM_OF_DOCUMENT int(8) Not Null Total Stored Document

501

6 SAVED_DATE datetime Not Null Saved Date 2015-03-02 12:11:01

7 SAVED_BY int(5) F.K. Ref. User 1

Table Name : Save_Search_Detail

Description : This Table is Used to Store Detail of Saved Search Documents

SR. NO.

FIELD NAME DATA TYPE CONSTRAINT Description SAMPLE DATA

1 SSM_ID int(5) F.K. Ref. Save_ Search_Master

1

2 DOCUMENT_ID bigint(10) F.K. Ref. Document_Detail

5

-: User Interface Design :-

Hadoop Job Submission

Hadoop Job Complition

HDFS Screen Shot

Hadoop Job Output

Solr Home

Solr Core Layout

-: Test Cases :-Once code has been generated,

program testing begins. The testing process focuses on the logical internals of the software, ensuring that all statements have been tested, and on the functional externals; that is, conducting tests to uncover errors and ensure that defined input will produce actual results that agree with required results.

-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_01 Test Designed Date: 25-Mar-15Test Priority(Low/Med/High):High Test Execution Date: 26-Mar-15Module Name: Inventory Test Performed by: Dhaval PatelTest Title: Verify insertion of files from physical location into database.

Test Designed by: Dhaval Patel

Description: Test module of the inventory

Pre-Conditions: User must have a specify physical locationDependencies: File entries must resides into the database

Step Test Steps Expected Result Actual Result Status(Pass/Fail)1 Monitor Physical

Location.Accessibility of Physical Location

Physical Location is accessible from database

Pass

2 Generate Case for file Case wise Insertion into the Database

Case wise file inserted in the database

Pass

3 File structure maintenance for given location

File entry in form of Compliance Path

File entered in form of Compliance Path

Pass

-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_02 Test Designed Date: 25-Mar-15Test Priority(Low/Med/High):Med Test Execution Date: 26-Mar-15Module Name: Extraction Test Performed by: Dhaval PatelTest Title: Verify the user has perform zip extraction Test Designed by: Dhaval PatelDescription: To test user is having all the inventory file extracted from the embedded documentPre-Conditions: User must have a set of records in the inventory tableDependencies: Respective status must be set for a file to perform application operationStep Test Steps Expected Result Actual Result Status(Pass/Fail)1 Select Process ids to perform file

extractionFetch qualified case ID for particular Process ID

Qualified records are retrieved for given Process ID

Pass

2 Select case ids to perform file extraction

Fetch qualified files from the inventory table

Qualified records are retrived for given batchId

Pass

3 start Application processing All documents will be extracted form the respected container files and status will be updated for a specific document and extracted document will be physically stored in destination path

All documents is extracted form the respected files and status is updated for a specific document and extracted document is physically stored in destination path

Pass

-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_03 Test Designed Date: 27-Mar-15Test Priority(Low/Med/High):High Test Execution Date: 28-Mar-15Module Name: Metadata Test Performed by: Dhaval PatelTest Title: Verify the Proper metadata extracted from the document Test Designed by: Dhaval PatelDescription: Test module of metadata extraction

Step Test Steps Expected Result Actual Result Status(Pass/Fail)1 Select Process ids to perform

file extractionFetch qualified case ID for particular Process ID


Pass


Fetch qualified documents from the document table

Qualified records are retrieved for given case Id

Pass

3 start Application processing All documents metadata will be extracted form the respected container document and status will be updated for a specific document and insert metadata information to the metadata table.

All documents metadata will be extracted form the respected container document and status will be updated for a specific document and insert metadata information to the metadata table.

Pass

-: Test Cases :-Project Name : Document Archival SystemTest Case Id: TST_04 Test Designed Date: 27-Mar-15Test Priority(Low/Med/High):High Test Execution Date: 28-Mar-15Module Name: Indexing Test Performed by: Dhaval PatelTest Title: Verify proper index is created at solr server Test Designed by: Dhaval PatelDescription: Test Module for Indexer

Pre-Conditions: Indexing is only possible on Text DataDependencies: Text file for all type of physical file is generated

Step Test Steps Expected Result Actual Result Status(Pass/Fail)1 Select Process ids to perform

file extractionFetch qualified case ID for particular Process ID


Pass


Fetch qualified document from the metadata and document table

Qualified records are retrieved for given case Id

Pass

3 start Application processing All metadata and Document Content will be added to solr server and update the indexed status in the document table.

All metadata and Document Content will be added to solr server and update the indexed status in the document table.

Pass

-: Proposed Enhancement :-The application is not fully developed,

changes is going on day by day, and new functionality also added by future.

Making UI more professional.

Apply clustering in Searching.

Images, video, Audio are also Indexable.

-: Conclusion :-Throughout the process, I obtained the

experience of working in a large organization and it was a great learning experience. I had the privilege of going through the entire software development lifecycle right from requirement gathering phase. Working with a globally renowned company, was a great learning experience to learn their standards and application areas.

-: Bibliography:-www.javatpoint.comhttp://www.solr.wiki.comhttp://www.javaprogrammingforum.comwww.stackoverflow.comhttp://www.hackki.com/2013/05/how-to-determine-number-of-mappers

-and.htmlhttp://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_H

adoop_Definedhttp://db-engines.com/en/system/Cassandra%3BHBase%3BHivehttp://hadoop.apache.org/http://wiki.constellio.com/index.php/Solrj_examplehttp://strata.oreilly.com/2011/01/what-is-hadoop.htmlhttp://www.techrepublic.com/article/hadoop-and-cloud-computing-colli

sion-course-or-happy-symbiosis/http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

http://www.javatpoint.com/

http://www.solr.wiki.com/

http://www.javaprogrammingforum.com/

http://www.stackoverflow.com/

http://www.hackki.com/2013/05/how-to-determine-number-of-mappers-and.html

http://www.hackki.com/2013/05/how-to-determine-number-of-mappers-and.html

http://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_Hadoop_Defined

http://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_Hadoop_Defined

http://db-engines.com/en/system/Cassandra%3BHBase%3BHive

http://hadoop.apache.org/



http://wiki.constellio.com/index.php/Solrj_example

http://strata.oreilly.com/2011/01/what-is-hadoop.html

http://www.techrepublic.com/article/hadoop-and-cloud-computing-collision-course-or-happy-symbiosis/

http://www.techrepublic.com/article/hadoop-and-cloud-computing-collision-course-or-happy-symbiosis/

http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

Thank You

project descriptionfeasibility analysis company profileuse case diagram existing systemclass diagram...

Documents

new system

current system

time existing system

data storage

big data processing

process diagram

company profile

activity diagram