parquet: a columnar storage for the people

Report

Tags:

Post on 25-May-2015

2.423 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

We would like to introduce Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases. Parquet supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. It is available as a standalone library, allowing any Hadoop framework or tool to build support for it with minimal dependencies. As of this release, Parquet is supported by Apache Pig, plain Hadoop Map-Reduce, and Cloudera?s Impala, and is being put into production at Twitter. We will discuss Parquet?s design and share performance numbers.

TRANSCRIPT

Parquet

Columnar storage for the peopleJulien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter

Nong Li nong@cloudera.com Software engineer, Cloudera Impala

http://parquet.io

Context from various companies

Early results

Format deep-dive

•

Outline

http://parquet.io

This presentation is only partially previewed.

top related

parquet: a columnar storage for the people

outline http

cloudera impala http

twitter nong li

analytics infrastructure

software engineer

processing tools lead

people julien

parquet columnar storage

Technology

“extending in-memory database processing to … · imc...

queens parquet

the columnar roadmap: apache parquet and apache arrow

calpont open source columnar storage engine for scalable...

parquet information

parquet adhesives

parquet katalog meister_2014_gb

columnar storage @ uber · applications of presto @ uber...

teradata columnar

collection 2014 berryalloc parquet - random berryalloc...

analytics with parquet and arrow -...

erck5/6 flat epithelial atypia columnar cell lesions: a...

gb parquet

column-oriented database systems...

if you have your own columnar format, stop now and use...

the university of chicago optimizing lightweight … ·...

predicate pushdown in parquet and apache...

parquet types pictures

demystifying columnar databases

dataframes - github pages. spark sql_class.pdf · parquet...