yokozuna, distributed search you don't think about

119
Yokozuna Distributed Search You Don’t Think About Ryan Zezeski May 14th 2013 1 Tuesday, May 21, 13

Upload: rzezeski

Post on 20-May-2015

609 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Yokozuna, Distributed Search You Don't Think About

YokozunaDistributed Search You Don’t Think About

Ryan Zezeski May 14th 20131Tuesday, May 21, 13

Page 2: Yokozuna, Distributed Search You Don't Think About

Live Demo

2Tuesday, May 21, 13

Page 3: Yokozuna, Distributed Search You Don't Think About

Live Demos

3Tuesday, May 21, 13

Page 4: Yokozuna, Distributed Search You Don't Think About

PROBLEM?

4Tuesday, May 21, 13

Page 5: Yokozuna, Distributed Search You Don't Think About

PROBLEM?

SOLUTION!

5Tuesday, May 21, 13

Page 6: Yokozuna, Distributed Search You Don't Think About

Solution Pre-made

6Tuesday, May 21, 13

Page 7: Yokozuna, Distributed Search You Don't Think About

Piece At A Time

7Tuesday, May 21, 13

Page 8: Yokozuna, Distributed Search You Don't Think About

Goals• Don’t screw up

• Show how Yokozuna doesn’t make you think (too hard)

• Teach you about Search

• Neat things you can do with Yokozuna

8Tuesday, May 21, 13

Page 9: Yokozuna, Distributed Search You Don't Think About

PROBLEM: SEARCH FOR COMMITS

ABOUT SPECIFIC FEATURE/BUG

MAKE IT GOOGLE-LIKE

9Tuesday, May 21, 13

Page 10: Yokozuna, Distributed Search You Don't Think About

SOLUTION: INDEX COMMITS IN YOKOZUNA -

“COMMIT LOG SEARCHER” (CLS)

10Tuesday, May 21, 13

Page 11: Yokozuna, Distributed Search You Don't Think About

Anatomy of a Commit Msg

11Tuesday, May 21, 13

Page 12: Yokozuna, Distributed Search You Don't Think About

Primary Key

12Tuesday, May 21, 13

Page 13: Yokozuna, Distributed Search You Don't Think About

Any Node Will Do

13Tuesday, May 21, 13

Page 14: Yokozuna, Distributed Search You Don't Think About

Term Query

14Tuesday, May 21, 13

Page 15: Yokozuna, Distributed Search You Don't Think About

Query Any Node

15Tuesday, May 21, 13

Page 16: Yokozuna, Distributed Search You Don't Think About

Boolean (1)repo:riak_kv repo:riak_core

16Tuesday, May 21, 13

Page 17: Yokozuna, Distributed Search You Don't Think About

Boolean (2)repo:riak_kv AND author:”Ryan Zezeski”

17Tuesday, May 21, 13

Page 18: Yokozuna, Distributed Search You Don't Think About

Boolean (3)commit_author:"Ryan Zezeski" OR

commit_author:"Joseph Blomstedt" NOT commit_repo:riak_kv

18Tuesday, May 21, 13

Page 19: Yokozuna, Distributed Search You Don't Think About

Range (1)commit_repo:riak_core

AND commit_dt:[NOW-1YEAR TO NOW]

19Tuesday, May 21, 13

Page 20: Yokozuna, Distributed Search You Don't Think About

Range (2)commit_repo:riak_core

AND commit_dt:[NOW-1YEAR TO NOW]

I RAN THIS ON 2013-05-10

sort=dt asc

20Tuesday, May 21, 13

Page 21: Yokozuna, Distributed Search You Don't Think About

Wildcard (1)*:*

GET TOTAL COUNT FIRST

21Tuesday, May 21, 13

Page 22: Yokozuna, Distributed Search You Don't Think About

Wildcard (2)commit_repo:riak_*

NOTICE COUNT IS LOWER

22Tuesday, May 21, 13

Page 23: Yokozuna, Distributed Search You Don't Think About

WHAT ABOUT SEARCHING

SUMMARY AND BODY?

23Tuesday, May 21, 13

Page 24: Yokozuna, Distributed Search You Don't Think About

THE INVERTED INDEX

24Tuesday, May 21, 13

Page 25: Yokozuna, Distributed Search You Don't Think About

AN INDEX - BUT INVERTED

25Tuesday, May 21, 13

Page 26: Yokozuna, Distributed Search You Don't Think About

EVERYONE KNOWS WHAT IT IS

26Tuesday, May 21, 13

Page 27: Yokozuna, Distributed Search You Don't Think About

EVEN NON-TECH PEOPLE

27Tuesday, May 21, 13

Page 28: Yokozuna, Distributed Search You Don't Think About

YES...EVEN YOUR PARENTS

28Tuesday, May 21, 13

Page 29: Yokozuna, Distributed Search You Don't Think About

What’s In A Book?

29Tuesday, May 21, 13

Page 30: Yokozuna, Distributed Search You Don't Think About

• WORDS

• PARAGRAPHS

• SECTIONS

• CHAPTERS

• ETC.

30Tuesday, May 21, 13

Page 31: Yokozuna, Distributed Search You Don't Think About

AND PAGE NUMBERS

31Tuesday, May 21, 13

Page 32: Yokozuna, Distributed Search You Don't Think About

32Tuesday, May 21, 13

Page 33: Yokozuna, Distributed Search You Don't Think About

PAGE NUMBERS ARE AN IMPLICIT INDEX

33Tuesday, May 21, 13

Page 34: Yokozuna, Distributed Search You Don't Think About

PAGE NUMBER TO WORDS

WORD TO PAGE NUMBERS

INVERTED

34Tuesday, May 21, 13

Page 36: Yokozuna, Distributed Search You Don't Think About

HOW DO YOU GET THE WORDS IN THE

FIRST PLACE?

36Tuesday, May 21, 13

Page 37: Yokozuna, Distributed Search You Don't Think About

Analysis - The Iceberg That Sunk The Titanic

37Tuesday, May 21, 13

Page 38: Yokozuna, Distributed Search You Don't Think About

Phrase (1)subject:hinted OR subject:handoff OR body:hinted OR body:handoff

38Tuesday, May 21, 13

Page 39: Yokozuna, Distributed Search You Don't Think About

Phrase (2)subject:”hinted handoff” OR body:”hinted handoff”

39Tuesday, May 21, 13

Page 40: Yokozuna, Distributed Search You Don't Think About

Phrase (3)subject:”partition vnode” OR body:”partition vnode”

40Tuesday, May 21, 13

Page 41: Yokozuna, Distributed Search You Don't Think About

Phrase (4)subject:”partition vnode”~4 OR body:”partition vnode”~4

41Tuesday, May 21, 13

Page 42: Yokozuna, Distributed Search You Don't Think About

Exact Termsubject:behavior OR body:behavior

42Tuesday, May 21, 13

Page 43: Yokozuna, Distributed Search You Don't Think About

Fuzzy Termsubject:behavior~1 OR body:behavior~1

43Tuesday, May 21, 13

Page 44: Yokozuna, Distributed Search You Don't Think About

Ranking

ADD SCORE TO FL

SCORE ADDED TO EVERY RESULT

44Tuesday, May 21, 13

Page 45: Yokozuna, Distributed Search You Don't Think About

RECALL, PRECISION, AND RELEVANCY, OH

MY!

45Tuesday, May 21, 13

Page 46: Yokozuna, Distributed Search You Don't Think About

RELEVANCY - FOR A GIVEN QUERY &

DOC SET THERE IS AN IDEAL ANSWER OF ONLY RELEVANT

DOCS

46Tuesday, May 21, 13

Page 47: Yokozuna, Distributed Search You Don't Think About

RECALL = WHAT % OF IDEAL ANSWER SET WAS RETRIEVED

47Tuesday, May 21, 13

Page 48: Yokozuna, Distributed Search You Don't Think About

PRECISION = WHAT % OF ANSWER IS

RELEVANT

48Tuesday, May 21, 13

Page 49: Yokozuna, Distributed Search You Don't Think About

RECALLvs.PRECISION AS YOU INCREASE

RECALL YOU DEGRADE PRECISION

49Tuesday, May 21, 13

Page 50: Yokozuna, Distributed Search You Don't Think About

SOLR DETERMINES RELEVANCY VIA THE

NOTION OF SIMILARITY

50Tuesday, May 21, 13

Page 51: Yokozuna, Distributed Search You Don't Think About

SOLR USES TF-IDF: TERM FREQUENCY,

INVERSE DOCUMENT FREQUENCY

51Tuesday, May 21, 13

Page 52: Yokozuna, Distributed Search You Don't Think About

Dismax + Facets + Highlighting

FACETSHIGHLIGHTING

DISMAX

52Tuesday, May 21, 13

Page 53: Yokozuna, Distributed Search You Don't Think About

FACET - A TAXONOMY OF

YOUR QUERY BASED ON FIELD’S VALUES

53Tuesday, May 21, 13

Page 54: Yokozuna, Distributed Search You Don't Think About

FACETS ALLOW “DRILL DOWN” - THEY GUIDE THE

USER

54Tuesday, May 21, 13

Page 55: Yokozuna, Distributed Search You Don't Think About

HIGHLIGHTING GIVES YOUR RESULTS CONTEXT - ALLOWS

QUICKER DETERMINATION OF

RELEVANCY

55Tuesday, May 21, 13

Page 56: Yokozuna, Distributed Search You Don't Think About

DISMAX - DISjunction MAX - A QUERY

HANDLER MEANT FOR DIRECT USER

INPUT

56Tuesday, May 21, 13

Page 57: Yokozuna, Distributed Search You Don't Think About

All Nodes Up

57Tuesday, May 21, 13

Page 58: Yokozuna, Distributed Search You Don't Think About

All Nodes Up - Query

58Tuesday, May 21, 13

Page 59: Yokozuna, Distributed Search You Don't Think About

Node 4 Down

59Tuesday, May 21, 13

Page 60: Yokozuna, Distributed Search You Don't Think About

Node 4 Down - Query

60Tuesday, May 21, 13

Page 61: Yokozuna, Distributed Search You Don't Think About

Node 3 & 4 Down

61Tuesday, May 21, 13

Page 62: Yokozuna, Distributed Search You Don't Think About

Node 3 & 4 Down - Query

62Tuesday, May 21, 13

Page 63: Yokozuna, Distributed Search You Don't Think About

REPLICATION PROVIDES HIGH

AVAILABILITY

2 3 41

START WITH 4 NODES

63Tuesday, May 21, 13

Page 64: Yokozuna, Distributed Search You Don't Think About

Write 3 Replicas

2 3 41

64Tuesday, May 21, 13

Page 65: Yokozuna, Distributed Search You Don't Think About

Take 2 Nodes Down

2 3 41

1 REPLICA STILL AVAILABLE

65Tuesday, May 21, 13

Page 66: Yokozuna, Distributed Search You Don't Think About

WHAT IF DATA IS WRITTEN WHILE

NODES ARE DOWN?

66Tuesday, May 21, 13

Page 67: Yokozuna, Distributed Search You Don't Think About

YZ Not Stored Yet

67Tuesday, May 21, 13

Page 68: Yokozuna, Distributed Search You Don't Think About

Store YZ Log

68Tuesday, May 21, 13

Page 69: Yokozuna, Distributed Search You Don't Think About

Query YZ - Node 1& 2

69Tuesday, May 21, 13

Page 70: Yokozuna, Distributed Search You Don't Think About

Set XFer Limit To 0

70Tuesday, May 21, 13

Page 71: Yokozuna, Distributed Search You Don't Think About

Start Nodes 3 & 4

71Tuesday, May 21, 13

Page 72: Yokozuna, Distributed Search You Don't Think About

Query Solr Direct

WHEN MAKING THIS DEMO I WAS

EXPECTING THIS TO BE 0 BUT I FORGOT ABOUT

AAE WHICH STARTED KICKING IN BEFORE

HANDOFF - SELF HEALING FTW!

72Tuesday, May 21, 13

Page 73: Yokozuna, Distributed Search You Don't Think About

Set Xfer Limit To 64

73Tuesday, May 21, 13

Page 74: Yokozuna, Distributed Search You Don't Think About

Handoff Occurs

74Tuesday, May 21, 13

Page 75: Yokozuna, Distributed Search You Don't Think About

0 Pending Xfers

75Tuesday, May 21, 13

Page 76: Yokozuna, Distributed Search You Don't Think About

Solr Direct (Again)

NOTICE IT’S NOW 301, UP FROM 54, MORE

PROOF THAT HANDOFF OCCURRED - NOTE

THIS QUERY IS GOING DIRECT TO ONLY 1

SHARD

76Tuesday, May 21, 13

Page 77: Yokozuna, Distributed Search You Don't Think About

Query Node 4 YZ

NOW HIT YOKOZUNA ON NODE4 (NOTICE CHANGE IN PORT #) - THIS WILL RUN A DIST

SEARCH AND THUS RETURN CORRECT

COUNT

77Tuesday, May 21, 13

Page 78: Yokozuna, Distributed Search You Don't Think About

Data Ownership

A VNODE

THE RING

78Tuesday, May 21, 13

Page 79: Yokozuna, Distributed Search You Don't Think About

Node DownX

X

X

XXX

X

X

XX

79Tuesday, May 21, 13

Page 80: Yokozuna, Distributed Search You Don't Think About

Write FallbackX

X

X

XXX

X

X

XX

80Tuesday, May 21, 13

Page 81: Yokozuna, Distributed Search You Don't Think About

Node Up

HINTED HANDOFF WILL MOVE REPLICA

TO PRIMARY

81Tuesday, May 21, 13

Page 82: Yokozuna, Distributed Search You Don't Think About

WHAT IF YOU RM -RF THE INDEX DIR?

82Tuesday, May 21, 13

Page 83: Yokozuna, Distributed Search You Don't Think About

Kill The Data

RM -RF THE INDEX DIRECTORY

KILL THE SOLR PROC

83Tuesday, May 21, 13

Page 84: Yokozuna, Distributed Search You Don't Think About

Auto Restart

YOKOZUNA NOTICES SOLR DIED AND AUTOMATICALLY RESTARTS

IT

84Tuesday, May 21, 13

Page 85: Yokozuna, Distributed Search You Don't Think About

Node 4 - 0 Results

85Tuesday, May 21, 13

Page 86: Yokozuna, Distributed Search You Don't Think About

AAE Notices Missing Data

86Tuesday, May 21, 13

Page 87: Yokozuna, Distributed Search You Don't Think About

Node 4 - 13 Results

DATA IS RE-INDEXED OVER TIME

87Tuesday, May 21, 13

Page 88: Yokozuna, Distributed Search You Don't Think About

More AAE Repair

88Tuesday, May 21, 13

Page 89: Yokozuna, Distributed Search You Don't Think About

Node 4 - 128 Results

MORE INDEXES ARE REPAIRED, THIS CONTINUES UNTIL AAE

REPAIRS ALL INDEXES

89Tuesday, May 21, 13

Page 90: Yokozuna, Distributed Search You Don't Think About

WHAT EVEN IS ACTIVE ANTI-

ENTROPY?

90Tuesday, May 21, 13

Page 91: Yokozuna, Distributed Search You Don't Think About

Mo Systems Mo Failure

• index update could get lost

• files can become truncated/corrupted

• accidental `rm -rf`

• segfault at right time

• etc...

91Tuesday, May 21, 13

Page 92: Yokozuna, Distributed Search You Don't Think About

MYRAID OF FAILURE SCENARIOS - FROM

OBVIOUS TO NEARLY INVISIBLE

92Tuesday, May 21, 13

Page 93: Yokozuna, Distributed Search You Don't Think About

ENTROPY IS DAMAGE

AAE IS SELF HEALING

STRIKER!!!! EHEM, I MEAN, ENTROPY!!!!

93Tuesday, May 21, 13

Page 94: Yokozuna, Distributed Search You Don't Think About

REPAIR EFFICIENTLY - NOT STUPIDLY

94Tuesday, May 21, 13

Page 95: Yokozuna, Distributed Search You Don't Think About

Learn You Some Merkle For A Great Good

BIG UPS TO @jtuple FOR THE AAE DIAGRAMS

95Tuesday, May 21, 13

Page 96: Yokozuna, Distributed Search You Don't Think About

SegmentsEACH SEGMENT IS LIST OF

KEY-HASH PAIRS

96Tuesday, May 21, 13

Page 97: Yokozuna, Distributed Search You Don't Think About

Segment HashesHASH OF HASHES IN

SEGMENT

97Tuesday, May 21, 13

Page 98: Yokozuna, Distributed Search You Don't Think About

Hash O’ Hashes

98Tuesday, May 21, 13

Page 99: Yokozuna, Distributed Search You Don't Think About

WHAT HAPPENS DURING

EXCHANGE?

99Tuesday, May 21, 13

Page 100: Yokozuna, Distributed Search You Don't Think About

Start With 2 Trees

100Tuesday, May 21, 13

Page 101: Yokozuna, Distributed Search You Don't Think About

Compare Top Hashes

TOP HASHES DON’T MATCH -

SOMETHING IS DIFFERENT

101Tuesday, May 21, 13

Page 102: Yokozuna, Distributed Search You Don't Think About

Compare Child Hashes

NARROW DOWN THE DIVERGENT

SEGMENT

102Tuesday, May 21, 13

Page 103: Yokozuna, Distributed Search You Don't Think About

Recur

NARROW DOWN THE DIVERGENT

SEGMENT CONT...

103Tuesday, May 21, 13

Page 104: Yokozuna, Distributed Search You Don't Think About

Iter Key-Hash Pairs

ITER FINAL LIST OF HASHES TO FIND DIVERGENT KEYS

104Tuesday, May 21, 13

Page 105: Yokozuna, Distributed Search You Don't Think About

Repair Divergent Keys

REPAIR (RE-INDEX) KEYS THAT ARE

DIVERGENT (RED)

105Tuesday, May 21, 13

Page 106: Yokozuna, Distributed Search You Don't Think About

CODE FOR DETECTION AND

REPAIR - NOT PREVENTION

106Tuesday, May 21, 13

Page 107: Yokozuna, Distributed Search You Don't Think About

WHAT HAPPENS IF 3 NODES GO DOWN?

107Tuesday, May 21, 13

Page 108: Yokozuna, Distributed Search You Don't Think About

Stop 3 Nodes

108Tuesday, May 21, 13

Page 109: Yokozuna, Distributed Search You Don't Think About

Query

109Tuesday, May 21, 13

Page 110: Yokozuna, Distributed Search You Don't Think About

CONSISTENCY vs.

AVAILABILITY

110Tuesday, May 21, 13

Page 111: Yokozuna, Distributed Search You Don't Think About

Uptime - Story of 9s

UPTIME = (MTBF - MTTR) / MTBF

111Tuesday, May 21, 13

Page 112: Yokozuna, Distributed Search You Don't Think About

Uptime is Flawed

IF THE SYSTEM IS DOWN, BUT NO

ONE MAKES A REQUEST, IS IT

REALLY DOWN?112Tuesday, May 21, 13

Page 113: Yokozuna, Distributed Search You Don't Think About

Yield - Uptime of the People

YIELD = QUERIES COMPLETED / QUERIES OFFERED

113Tuesday, May 21, 13

Page 114: Yokozuna, Distributed Search You Don't Think About

Harvest vs. YieldHARVEST = DATA AVAIL / COMPLETE DATA

IF FACE OF FAILURE YOU CAN’T HAVE BOTH FOR A SINGLE REQUEST

114Tuesday, May 21, 13

Page 115: Yokozuna, Distributed Search You Don't Think About

IN TIMES OF TROUBLE -

YOKOZUNA CHOOSES HARVEST

FOR QUERIES115Tuesday, May 21, 13

Page 116: Yokozuna, Distributed Search You Don't Think About

TECHNICALLY - YOKOZUNA IS ALWAYS < 100%

HARVEST IN A NON-QUIESCENT CLUSTER

116Tuesday, May 21, 13

Page 117: Yokozuna, Distributed Search You Don't Think About

YOKOZUNA FAVORS YIELD FOR WRITES

117Tuesday, May 21, 13

Page 118: Yokozuna, Distributed Search You Don't Think About

ONCE RIAK 1.4 SHIPS - YOKOZUNA LANDS

IN MASTER

118Tuesday, May 21, 13

Page 119: Yokozuna, Distributed Search You Don't Think About

THANK YOU

HTTP://GITHUB.COM/BASHO/YOKOZUNA

119Tuesday, May 21, 13