advance mapreduce concepts - module 4
TRANSCRIPT
Counters• Counter provides a way to measure the progress or the number of operations
that occur within MapReduce programs.
Creating Custom Counters• First step is to create an Enum that will contain the names of all custom
counters for a particular job.public enum CustomCounters {VALID, INVALID};
• Inside the map or reduce task, the counter can be adjustedif(validRecord) context.getCounter(CustomCounters.VALID).increment(1); // increase the counter by 1else if(invalidRecord) context.getCounter(CustomCounters.INVALID).increment(1); // increase the counter by 1
• The custom counter values will be displayed alongside the built-in counter values on the summary web page for a job viewed through the JobTracker.
• The values can be accessed programmaticallylong validCounterValue = job.getCounters().findCounter(CustomCounters.VALID).getValue();
Serialization• Serialization is the process of turning structured objects into a byte
stream for transmission over a network or for writing to persistent storage.
• Deserialization is the reverse process of turning a byte stream back into a series of structured objects.
• Hadoop uses its own serialization format, Writables, which is certainly compact and fast, but not so easy to extend or use from languages other than Java.
The Writable Interface• The Writable interface defines two methods
• one for writing its state to a DataOutput binary stream• one for reading its state from a DataInput binary stream
Custom Writables Examplepublic class TextPair implements WritableComparable<TextPair>{private Text first;private Text second;public TextPair() {set(new Text(), new Text());}public TextPair(String first, String second) {set(new Text(first), new Text(second));}
public TextPair(Text first, Text second) {set(first, second);}
public void set(Text first, Text second) {this.first = first;this.second = second;}
public Text getFirst() {return first;}
public Text getSecond() {return second;}
public void write(DataOutput out) throws IOException {first.write(out); second.write(out);}
public void readFields(DataInput in) throws IOException {first.readFields(in);second.readFields(in);}
@Overridepublic int hashCode() {return first.hashCode() * 163 + second.hashCode();}
@Overridepublic boolean equals(Object o) {if (o instanceof TextPair) {TextPair tp = (TextPair) o;return first.equals(tp.first) && second.equals(tp.second);}return false;}
@Overridepublic String toString() {return first + "\t" + second;}
public int compareTo(TextPair tp) {int cmp = first.compareTo(tp.first);if (cmp != 0) {return cmp;}return second.compareTo(tp.second);}}
Error Handling• Handling non-fatal errors that need to be tracked• In the mapper:
if (some_error_condition){ context.getCounter(COUNTER_GROUP, COUNTER).increment(1);}
• In the client:boolean okay = job.waitForCompletion(true);if (okay){ Counters counters = job.getCounters(); Counter bwc = counters.findCounter(COUNTER_GROUP, COUNTER); System.out.println("Errors" + bwc.getDisplayName()+":" + bwc.getValue());}
Compression• It reduces the space needed to store files.• It speeds up data transfer across the network, or to or from disk.