document digitization - qcon.ai · @nischalhp | document digitization | qconai sfo 2019 human and...

Post on 22-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DOCUMENT DIGITIZATIONRethinking it with Machine Learning

Nischal Harohalli Padmanabha QConAI SFO 2019

@nischalhp | Document Digitization | QconAI SFO 2019

“The brain sure as hell doesn’t work by somebody programming in rule.”

- Geoffrey Hinton

PROBLEM

@nischalhp | Document Digitization | QconAI SFO 2019

Understanding unstructured documents and extracting semantic information to automate claims handling.

@nischalhp | Document Digitization | QconAI SFO 2019

DOCUMENT CLASS

Policy

POLICY NUMBER

H 54/16 307 728

CUSTOMER

Renolate GmbH10115 Berlin

AGENT

pma Insurance Broker48149 Nurnberg

RISK DESCRIPTION / INSURED LOCATION

Private liability insurance comfort plus Dog liabilityEnvironmental damage insuranceEmployees on premises

POLICY

Liability Protection

EFFECTIVE DATE OF CHANGE

22.12.2016 12:00TERMINATION

22.12.2019 12:00ANNUAL CHARGE

EUR 424,63

COVERAGES

Persons & property damage flatFinancial lossesEnvironmental damage basic flat

EUR 3.000.000EUR 100.000EUR 3.000.000

REWIND

TABULAR INFORMATION EXTRACTION

@nischalhp | Document Digitization | QconAI SFO 2019

Writing a lot of rules

COURSE OF ACTION - ROUND 1

Initial results, gave us a lot of happiness. Evaluation on known Data

RESULT

@nischalhp | Document Digitization | QconAI SFO 2019

In production 58% accuracy

RESULT

@nischalhp | Document Digitization | QconAI SFO 2019

We failed, miserably.Rules became cumbersome & brittle.

In production 58% accuracy

@nischalhp | Document Digitization | QconAI SFO 2019

Life or death situation for the project (and us engineers)

@nischalhp | Document Digitization | QconAI SFO 2019

ADAPTIVE LEARNING THOUGHT PROCESS

How does a human solve the same problem?

Identifies Grouping of Text, to build Context

Eg: Tables, paragraphs, passages Given the context, domain knowledge and semantic understanding of text

@nischalhp | Document Digitization | QconAI SFO 2019

Sounds straightforward, right?

@nischalhp | Document Digitization | QconAI SFO 2019

TECH STACK CHECK

NEXT STEPS

@nischalhp | Document Digitization | QconAI SFO 2019

Which algorithms to use?

What should we feed as input to the algorithm?What to annotate?

What are our deadlines?

Human and computation resources required?

How to agile this?

@nischalhp | Document Digitization | QconAI SFO 2019

Which algorithms to use?

COURSE OF ACTION - ROUND 2

Supervised Learning

Unsupervised Learning

Computer Vision

NLP

Computer Vision

NLP

Using this technique to generate data for supervised training. Wrote implementations of Deep clustering, word / sentence / page / document embeddings

● Object detection● Messaging parsing networks● Custom CNN networks

● Implementation of Deep Topic modeling● Custom RNN + CNN networks with

domain adaptation

EMPHASIS ON SUPERVISED LEARNING

@nischalhp | Document Digitization | QconAI SFO 2019

@nischalhp | Document Digitization | QconAI SFO 2019

Computer Vision

NLP

● Drawing polygon bounding boxes● Labeling pages● Labeling documents

Complex annotation of passages, phrases, tables, line items, hierarchy nature of textual information

What should we feed as input to the algorithm?What to annotate?

]Built an in houseAnnotation System

COURSE OF ACTION - ROUND 2

Workflows supporthuge annotation jobs

@nischalhp | Document Digitization | QconAI SFO 2019

Human and computation resources required?

Data Scientists

Engineers

● Data Scientists from Academia● Deep learning engineers● Research programme with Universities● Master Thesis sponsorship at omni:us

● Full stack engineers● Data Engineers● Devops

Leadership & Mentors

Cloud startup programmes

● Team leads with experience in AI● Identifying and convincing industry experts to mentor● Devops

● Credits to support memory and GPU training algorithms● Mentoring to scale operations

COURSE OF ACTION - ROUND 2

@nischalhp | Document Digitization | QconAI SFO 2019

What are our Deadlines?

How to agile this?

Sprint Planning for Research

Quick turn around of POC

Engineer AI systems to run experiments in a systematic and automated way

COURSE OF ACTION - ROUND 2

RESULT

@nischalhp | Document Digitization | QconAI SFO 2019

In production 94% accuracy

Successful AIdelivery

@nischalhp | Document Digitization | QconAI SFO 2019

TECH STACK CHECK

GO LIVE OR GO HOME

@nischalhp | Document Digitization | QconAI SFO 2019

Trained Models Predict

AI IN PRODUCTION

Human in the loop, fixes the errors and validates corrections

Train on the corrections, Continuous improvements

@nischalhp | Document Digitization | QconAI SFO 2019

DO NOT IGNORE

Domain Knowledge is essential

Educate your customers on AI

Engineer end to end AI systems to solve business use case, not a dataset

@nischalhp | Document Digitization | QconAI SFO 2019

PLATFORM

Training Platform Prediction Platform with human in the loop

Management Console of Infrastructure, Applications & Users

@nischalhp | Document Digitization | QconAI SFO 2019

Training Platform

COURSE OF ACTION - ROUND 3

Annotation System

Ability to train and evaluate models

Mechanism and system to trigger training, retraining of evaluation and versioning of different types models, in a managed way across various infrastructures supporting CPU and GPU

System to define data models, annotate data, manage annotation jobs, audit the annotated data and version control the datasets]Console connecting

the two together

@nischalhp | Document Digitization | QconAI SFO 2019

COURSE OF ACTION - ROUND 3

Async API for Ingestion

Data PipelinesRobust data pipelines connecting the services with providing capabilities of high throughput, reliability and retry mechanisms.

Rest API that supports asynchronous data upload capabilities ]Prediction console

connects all.

Prediction Platform with human in the loop

Validation UI

AI microservices

User interface to fix prediction errors

Scaling deep learning models as microservices

@nischalhp | Document Digitization | QconAI SFO 2019

Management Console of Infrastructure, Applications & Users

COURSE OF ACTION - ROUND 3

Configuration management

Application logsMonitoring logs of applications and setting up dashboards for internal and external stakeholders

Central management of configuration of various systems, consoles and services ]Management and

monitoring console

User management

Infrastructure logs

Managing users and providing authentication and authorisation capabilities for services.

Monitoring infrastructure usage and patterns to setup alerts and notifications

@nischalhp | Document Digitization | QconAI SFO 2019

TECH STACK CHECK

@nischalhp | Document Digitization | QconAI SFO 2019

omni:us platform console |

Learnings

Learnings

@nischalhp | Document Digitization | QconAI SFO 2019

● Very important for an entire organization to believe that AI can solve problems● Engineer AI products, do not believe that having just AI models are good enough● Agile for AI works, choose an interpretation that works for your team● Pay attention to details, domain knowledge and use case to be solved. ● Combination of multiple technologies have to be used to solve use case, not just one

hammer for all.● Do not try to “AI” everything, certain matured technologies are capable of solving

certain problems well. Use them wisely.● Believe in human in the loop, builds trust with business● Educate internal and external stakeholders around the possibilities and limitations

of AI.● Visualisation is power tool to understand and explain AI to everybody. Use them.● AI is no more a black box, it can fine tuned, managed and configured appropriately.● Automate your current processes as much as possible, this gives more room for

research.

top related