ml engineering platform scribble enrich...scribble is primarily a python shop today. the enrich...

7
1 SCRIBBLE ENRICH ML Engineering Platform Scribble Data - Accelerated ML Engineering

Upload: others

Post on 18-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ML Engineering Platform SCRIBBLE ENRICH...Scribble is primarily a python shop today. The Enrich platform sits atop customers’ data lakes, and provides feature matrices to models

1

SCRIBBLE ENRICH

ML Engineering Platform

Scribble Data - Accelerated ML Engineering

Page 2: ML Engineering Platform SCRIBBLE ENRICH...Scribble is primarily a python shop today. The Enrich platform sits atop customers’ data lakes, and provides feature matrices to models

Invariably, this work is context sensitive, coupled closely to both the data as well as the business usecases that the

data science team is solving for. A common problem here is that without guardrails, feature engineering can be messy,

can discourage collaboration, and can be hard to untangle when the models and their output need to be examined or

debugged.

Feature Engineering - 80% of the ML lifecycle

© Scribble Data 2019

FEATUREENGINEERING

2

Feature Engineering remains a critical

bottleneck in the ML lifecycle, to go from a

data store to a feature matrix, rich with

numerous derived variables or features.

Data Science teams routinely spend up to

80 % of their time on feature engineering

before they can build their ML models.

Page 3: ML Engineering Platform SCRIBBLE ENRICH...Scribble is primarily a python shop today. The Enrich platform sits atop customers’ data lakes, and provides feature matrices to models

Enrich - Feature Engineering for ML

Enrich streamlines the most laborious parts of ML model training and productionization, and does so with high

auditability, reproducibility, and the highest per-core compute efficiency.

3© Scribble Data 2019

Scribble Enrich is an ML engineering platform

focused on Feature Engineering. It sits behind

customers’ firewalls, takes data from a lake or

other store, and turns it into features. It is built

for scalability, with numerous guardrails to help

data science teams accelerate their productivity,

whether it is in ML model training, model

deployment, or general purpose data enrichment.

Page 4: ML Engineering Platform SCRIBBLE ENRICH...Scribble is primarily a python shop today. The Enrich platform sits atop customers’ data lakes, and provides feature matrices to models

Enrich - Architecture Schematic

4 © Scribble Data 2019

Scribble is primarily a python

shop today. The Enrich

platform sits atop customers’

data lakes, and provides

feature matrices to models or

dashboards.

The Enrich stack includes:

Storage Frontend Backend Pipelines Hardware

● S3● SQLDBs● Cassandra

● Bootstrap● JQuery

● Django● REST API

● Pandas● Spark

● Standard compute x-86 16 core,64GB

Page 5: ML Engineering Platform SCRIBBLE ENRICH...Scribble is primarily a python shop today. The Enrich platform sits atop customers’ data lakes, and provides feature matrices to models

Enrich - Components and Design Principles

5 © Scribble Data 2019

The Enrich platform comprises a

number of different components,

each fit-for-purpose and thought

through in the context of the flow

of the feature engineering

discipline. They represent the four

principles we chose in our design

thinking.

● Quick time-to-market for each feature and model ● Trust (correctness and dependability)

● Flexibility ● Scalability

Page 6: ML Engineering Platform SCRIBBLE ENRICH...Scribble is primarily a python shop today. The Enrich platform sits atop customers’ data lakes, and provides feature matrices to models

Enrich - Components

6 © Scribble Data 2019

Catalog Health Augment

Labeling Core Audit

Marketplace Search Monitor

A lightweight data catalog to continuously document what is in the data store

A programmable health check monitor of data flowing into the data store

Extend data by linking with thirdparty datasets

Generate labeled datasets or extend master for richer features

Versioned auditable feature computation pipelines

Audit interface to understand lineage of every dataset

Discover features being computed by the system (for status and reuse)

Filter and export datasets Monitor model performance

Page 7: ML Engineering Platform SCRIBBLE ENRICH...Scribble is primarily a python shop today. The Enrich platform sits atop customers’ data lakes, and provides feature matrices to models

CONTACTUS

DENVER BANGALORELittleton Indiranagar | HSR

[email protected]

7

Scribble DataAccelerated ML Engineering