avrilia floratou (university of wisconsin – madison) jignesh m. patel (university of wisconsin –...

Avrilia Floratou (University of Wisconsin Madison) Jignesh M. Patel (University of Wisconsin Madison) Eugene J. Shekita (While at IBM Almaden Research Center) Sandeep Tata (IBM Almaden Research Center) Column-Oriented Storage Techniques for MapReduce 1 Slide 2 Motivation Parallel DBMS MapReduce Column Oriented Storage Performance Ease of use Fault tolerance. 2 Slide 3 Challenges 3 How to incorporate columnarstorage into an existing MR system (Hadoop) without changing its core parts? How can columnar-storage operate efficiently on top of a DFS (HDFS)? Unique problems to MapReduce : The use of complex data types which are common in many MapReduce jobs. Hadoops choice of Java as its default programming language. Slide 4 Outline Address problems Column-Oriented Storage Lazy Tuple Construction Compression Experimental Evaluation Conclusions 4 Slide 5 Complex Data Types 5 The use of complex types causes two major problems: Deserialization costs Switching to a binary storage format can improve Hadoops scan performance by 3x. The lack of effective columnoriented compression techniques. Column-oriented storage formats tend to exhibit better compression ratios. Slide 6 Hadoops choice of Java 6 Java require deserialization. The overhead of deserializing and creating the objects corresponding to a complex type can be substantial. Lazy Record Construction mitigates the deserialization overhead in Hadoop. Slide 7 Outline Address problems Column-Oriented Storage Its interaction with Hadoops data replication and scheduling. Lazy Tuple Construction Compression Experimental Evaluation Conclusions 7 Slide 8 Column-Oriented Storage in Hadoop 8 Implement a column-oriented storage format How to generate roughly equal sized splits to guarantee effectively parallelized over the cluster. How to make sure corresponding values from different columns in the dataset are co-located on the same node. HDFS does not provide any co-location guarantees Slide 9 A dataset with three columns c1,c2,c3 which are stored in three different files. They are randomly spread over the cluster. 9 We will introduce a new format to avoid these problems Remotely accessing will occur! Slide 10 Row-Store:Merits/Limits with MapReduce 10 Data loading is fast(no additional processing). All columns of a data row are located in the same HDFS block. Not all columns are used(unnecessary storage bandwidth). Compression of different types may add additional overhead. I used one slide of Professor Xiaodong Zhang's as reference Data loading is fast(no additional processing). All columns of a data row are located in the same HDFS block. Not all columns are used(unnecessary storage bandwidth). Compression of different types may add additional overhead. I used one slide of Professor Xiaodong Zhang's as reference Slide 11 Column-Store:Merits/Limits with MapReduce 11 Unnecessary I/O costs can be avoided:Only needed columns are loaded,and easy compression. Additional network transfers for column grouping Unnecessary I/O costs can be avoided:Only needed columns are loaded,and easy compression. Additional network transfers for column grouping Slide 12 Read Operation in Row-store 12 Read local rows concurrently Discard unneeded columns Read local rows concurrently Discard nuneeded columns Slide 13 Read Operation in Column-store 13 This slide is made by Professor Zhang RCFile FormatAvoids these problems which occur in Row- store and Column-store Slide 14 Goals of RCFile 14 Eliminate unnecessary I/O costs like Column-store Only read needed columns from disks Eliminate network costs in row construction like Row-store Keep the fast data loading speed of Row-store Can apply efficient data compression algorithms Conveniently like Column-store. Eliminate all the limits of Row-store and Column-store. Slide 15 RCFile FormatAvoids these problems 15 A fast and space-efficient placement structure. Metadata Describes the columns in the data region and their starting offsets. The number of rows in the Data Region Packing all columns. a special synchronization marker Metadata Data region: The laid out is in a column-oriented fashion. Data region: The laid out is in a column-oriented fashion. A row group Slide 16 RCFile:Partitioning a Table into Row Groups 16 Slide 17 Inside a Row Group 17 Slide 18 RCFile: Inside each Row Group 18 Slide 19 RCFile:Distributed Row-Group Data among Nodes 19 Slide 20 Optimize RCFile 20 Main disadvantage Tuning the row-group size is critical. Extra metadata needs to be written for each row group, leading to additional space overhead. Adding a column to a dataset is expensive. The entire dataset has to be read and each block re-written. Slide 21 CIF Storage Format 21 A dataset is loaded Break the dataset into smaller partitions. Each partition referred to as a split-directory. Each partition contains a set of files, one per column in the dataset. An additional file describing the schema is also kept in each split-directory. How to guarantee co-location of a row? Slide 22 ColumnPlacementPolicy(CPP) 22 CPP is a new HDFS block placement policy which can solve the problem of co-locating. CPP guarantees that the files corresponding to the different columns of a split are always co-located across replicas. HDFS allows its placement policy to be changing by setting the configuration property dfs.block.replicator.classnameto point to the appropriate class. Slide 23 Column-Oriented Storage in CIF format in Hadoop NameAgeInfo Joe23hobbies: {tennis} friends: {Ann, Nick} David32friends: {George} John45hobbies: {tennis, golf} Smith65hobbies: {swimming} friends: {Helen} 1 st node 2 nd node NameAgeInfo Joe23hobbies: {tennis} friends: {Ann, Nick} David32friends: {George} NameAgeInfo John45hobbies:{tennis, golf} Smith65hobbies: {swimming} friends: {Helen} Name Joe David Age 23 32 Info hobbies: {tennis} friends:{Ann, Nick} friends: {George} Name John Smith Age 45 65 Info hobbies: {tennis, golf} hobbies: {swimming} friends: {Helen} 23 Slide 24 Replication and Co-location HDFS Replication Policy Node ANode BNode CNode D NameAgeInfo Joe23hobbies: {tennis} friends: {Ann, Nick} David32friends: {George} John45hobbies: {tennis, golf} Smith65hobbies: {swimming} friends: {Helen} Name Joe David Age 23 32 Info hobbies: {tennis} friends:{Ann, Nick} friends: {George} Name Joe David Name Joe David Age 23 32 Age 23 32 Info hobbies: {tennis} friends: {Ann,Nick} friends: {George} Info hobbies: {tennis} friends:{Ann, Nick} friends: {George} CPP 24 Perhaps this slide is made by the author. Slide 25 Outline Column-Oriented Storage Lazy Tuple Construction It is used to mitigate the deserialization overhead in Hadoop,as well as eliminate disk I/O. Compression Experiments Conclusions 25 Slide 26 Implementation 26 The basic idea: Deserialize only those columns of a record that are actually accessed in a map function. We use one class called LazyRecord. CurPos pointer:It keeps track of the current record the map function is working on. LastPos pointer:It keeps track of the last record that was actually read and deserialized for a particular column file. Slide 27 Class MyMapper { void map (NullWritable key, Record rec) { String url = (String) rec.get("url"); if (url.contains("ibm.com/jp")) output.collect(null, rec.get("metadata").get("content- type")); } } 27 Each time RecordReader is asked to read the next record, it increments curPos. No bytes are actually read or deserialized until one of the get() methods is called on the resulting Record object. Slide 28 Example AgeName Record if (age < 35) return name 23 32 45 30 50 Joe David John Mary Ann Map Method 23Joe 32David No bytes are actually read if age > 35. We avoid reading and deserializing the name field. 28 Slide 29 Skip List Format 29 A skip list format can be used within each column file to efficiently skip records. A column file contains two kinds of values: Regular serialized values. Skip blocks. Skip blocks contain information about byte offsets to enable skipping the next N records. Skip() method Called by LazyRecord as skip(curPos-lastPos) Slide 30 Example Age hobbies: tennis friends : Ann, Nick Null friends : George Info Skip10 = 2013 Skip100 = 19400 Skip 10 = 1246 hobbies: tennis, golf 10 rows 100 rows 23 39 45 30 if (age < 35) return hobbies 30 Slide 31 Outline Column-Oriented Storage Lazy Record Construction Compression We propose two schemes to compress columns of complex types Both are amenable to lazy decompression. Experiments Conclusions 31 Slide 32 Compressed Blocks 32 Compress a block of contiguous column values. The compressed block size is set at load time. Compression ratio and the decompression overhead are affected. A header indicates the number of records in a compressed block and the blocks size. Advantage: One block can be skipped if no values are accessed in it Disadvantage: If a value in the block is accessed, the entire block needs to be decompressed. Slide 33 Dictionary Compressed Skip List 33 This scheme is tailored for map-typed columns. Build a dictionary of keys for each block of map values. Store the compressed keys in a map using a skip list format. Disadvantage: Provide a worse compression ratio but with lower CPU overhead for decompression. Advantage:A value can be accessed without having to decompress an entire block of values. Slide 34 Outline Column-Oriented Storage Lazy Record Construction Compression Experiments Conclusions 34 Slide 35 Experimental Setup 42 node cluster Each node: 8 cores 32 GB main memory 5 500GB SATA 1.0 disks Network : 1Gbit ethernet switch Hadoop version: 0.21.0 35 Slide 36 Overhead of Columnar Storage Synthetic Dataset 57GB 13 columns 6 Integers, 6 Strings, 1 Map Query Select * 36 Single node experiment Using a binary format can dramatically improve Hadoops performance Scan time Slide 37 Benefits of Column-Oriented Storage Query Projection of different columns 37 Single node experiment CIF reading much less data than SEQ leads to the speedup CIF reading much less data than SEQ leads to the speedup Gathering data from columns stored in different files incurs additional seeks Slide 38 Conclusions Describe a new column-oriented binary storage format in MapReduce. Introduce skip list layout. Describe the implementation of lazy record construction. Show that lightweight dictionary compression for complex columns can be beneficial. 38 Slide 39 39

avrilia floratou (university of wisconsin – madison) jignesh m. patel (university of wisconsin –...

Documents

reference slide

different columns

compression of different

data row

binary storage format

additional overhead

easy compression

use of complex data