document management at google scale€¦ · google cloud bigtable. o nosql database google cloud...

12
From Students… …to Professionals The Capstone Experience Project Plan Document Management at Google Scale Team Technology Services Group Ali Alaali Joe Wan Justin Newman Luke Kline Rohit Sen Department of Computer Science and Engineering Michigan State University Fall 2019

Upload: others

Post on 25-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

From Students…

…to Professionals

The Capstone Experience

Project PlanDocument Management at Google Scale

Team Technology Services GroupAli AlaaliJoe Wan

Justin NewmanLuke KlineRohit Sen

Department of Computer Science and EngineeringMichigan State University

Fall 2019

Page 2: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Functional Specifications

• Due to the rapid growth of data, companies need a reliable solution for data management.

• TSG provides a solution to this problem:▪ OpenContent Management Suite (OCMS)

▪ High speed search results

▪ Scalable platform

• Our project goal:▪ Research how TSG can utilize Google Cloud Platform

(GCP)

▪ Surpass the AWS solution of 20,000 documents/s

The Capstone Experience Team Technology Services Group Project Plan Presentation 2

Page 3: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Design Specifications

• Integrate the existing features of OCMS to be able to communicate with GCP

▪ Document searching (OpenContent Search)

▪ Document annotation (OpenAnnotate)

• Create a simple and viable UI for:

▪ Speech API

▪ Vision API

The Capstone Experience Team Technology Services Group Project Plan Presentation 3

Page 4: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Screen Mockup: Speech to Text Button

The Capstone Experience 4Team Technology Services Group Project Plan Presentation

Page 5: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Screen Mockup: Speech to Text UI

The Capstone Experience 5Team Technology Services Group Project Plan Presentation

Page 6: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Screen Mockup: Image Search Box

The Capstone Experience 6Team Technology Services Group Project Plan Presentation

Page 7: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Screen Mockup: Image Search Results

The Capstone Experience 7Team Technology Services Group Project Plan Presentation

Page 8: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Technical Specifications

• Storage Solutions▪ Google Cloud BigTableo NoSQL Database

▪ Google Cloud Storageo Online file storage

• GCP’s APIs for enhanced searching and functionality▪ Natural Language API o Classify documents

▪ Vision APIo Classify Images

▪ Speech APIo Transcribe Audio Files

The Capstone Experience Team Technology Services Group Project Plan Presentation 8

Page 9: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

System Architecture

The Capstone Experience Team Technology Services Group Project Plan Presentation 9

Page 10: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

System Components

• Frontend▪ JavaScript

▪ jQuery

▪ Bootstrap.js /CSS

▪ HTML

• Backend▪ Java

▪ Apache Tomcat

▪ Apache Solr

▪ Google Cloud Platform

The Capstone Experience Team Technology Services Group Project Plan Presentation 10

Page 11: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Risks

• Scalability: Small sample size of testing

▪ Description: A small sample size of testing may result in inaccurate quality assurance

Mitigation: Actively request access to a proper and larger dataset from the clients or create dummy data to be used for the benchmarking

• Efficient Google BigTable schema▪ Description: Optimized schema is essential to achieve high performance from GCP’s BigTable

▪ Mitigation: Continued research with Google’s BigTable documentation and practice designing schemas and test them on our own instances

• Processing Overhead for GCP’s Vision AI▪ Description: Vision AI processing overhead would decrease document ingestion rate to GCP

▪ Mitigation: Processing documents using Vision AI at night or off-peak hours

• Limited GCP resource

▪ Description: TSG offers a GCP instance for developing that runs during business hours

▪ Mitigation: Setup our own GCP instance to be able to test without client’s instance running

The Capstone Experience Team Technology Services Group Project Plan Presentation 11

Page 12: Document Management at Google Scale€¦ · Google Cloud BigTable. o NoSQL Database Google Cloud Storage. o Online file storage •GCP’s APIs for enhanced searching and functionality

Questions?

The Capstone Experience Team Technology Services Group Project Plan Presentation 12

?

? ??

?

?

?

?

?