cross-platform data synchronization

78
Cross-Platform Data Synchronization Dan Grover Wonder Warp Software LLC 1 Friday, October 16, 2009 Good morning. I’m going to talk today about how you can write your own cross-plaform data synchronization as part of your iPhone apps.

Upload: makoto-kondo

Post on 17-Nov-2014

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cross-Platform Data Synchronization

Cross-Platform Data Synchronization

Dan GroverWonder Warp Software LLC

1Friday, October 16, 2009

Good morning. I’m going to talk today about how you can write your own cross-plaform data synchronization as part of your iPhone apps.

Page 2: Cross-Platform Data Synchronization

Outline

1 Why Syncing Is Important

2 Syncing Through The Agesand why you still might want to write your own

3 Algorithms & Architecture

4 Implementing Sync in Obj-C

2Friday, October 16, 2009

Here’s what we’re going to talk about today.- First, I want to persuade you why data synchronization is important, and why you might want to add it to your app.- Next, I’ll explain the ways data syncing has been solved by apps before, the advantages and disadvantages of various approaches, and explain why you may want to write your own.- Then we’re going to go over different algorithms that you can use to write your data synchronization code. I’m going to be very abstract and handwavy because it’s hard to talk about this kind of stuff when you’re also talking about implementation details.- Finally, we’ll dig in and talk about how to actually implement this stuff in Objective-C using the Cocoa APIs available to you.

Page 3: Cross-Platform Data Synchronization

Who I Am

• Former Northeastern student

• Independent software developer

3Friday, October 16, 2009

Page 4: Cross-Platform Data Synchronization

How I Learned AboutSyncing

4Friday, October 16, 2009

Page 5: Cross-Platform Data Synchronization

ShoveBox for Mac

5Friday, October 16, 2009

I write an app called ShoveBox for the Mac. (describe)

Page 6: Cross-Platform Data Synchronization

6Friday, October 16, 2009

Last November, I get an email from a friend of mine involved in the local Mobile Monday group here in Boston. They were going to do a fancy event at the Omni Parker House on “up and coming” mobile companies. Unfortunately, they couldn’t find enough up and coming mobile companies, so they asked me to present instead.

“So do you have anything you would like to present?”

At that time, I was mostly focused on Mac software -- I had a game out, but nothing much.So I said “Oh, of course, I can demo the new iPhone version of ShoveBox”

Unfortunately, there was no iPhone version of ShoveBox. I didn’t really want to do one. It was kind of beyond the scope of the app. And syncing was HAAARRRD.

So I made up a functional prototype of the iPhone app. I added a pretend dialog to the Mac app to show it syncing. I had a script that I used to convert the example data over so it looked the same.

Page 7: Cross-Platform Data Synchronization

?7Friday, October 16, 2009

I actually *did* want to make the iPhone version for real, though. But I had no idea how it was going to work. I played around with a few half-way solutions -- storing the new entries and just propagating those on sync. But I realized that real, honest-to-god two-way syncing was doable if I just sat down and thought about it for a while. I studied all the ways that people are doing syncing and realized it wouldn’t be too hard to write my own from scratch. Sounds crazy.

Page 8: Cross-Platform Data Synchronization

8Friday, October 16, 2009

A few months later, I finally ship the iPhone version. Sales quadruple, it gets two reviews in Macworld. Still some bugs with syncing, but eventually those get ironed out.

Page 9: Cross-Platform Data Synchronization

Quick Demo

9Friday, October 16, 2009

Page 10: Cross-Platform Data Synchronization

Outline

1 Why Syncing Is Important

2 Syncing Through The Agesand why you still might want to write your own

3 Algorithms & Architecture

4 Implementing Sync in Obj-C

10Friday, October 16, 2009

Page 11: Cross-Platform Data Synchronization

Why Syncing isImportant

1

11Friday, October 16, 2009

I’m going to get on my soapbox for a moment and explain briefly why I think this is an important topic, and how it’s applicable to more apps than you’d think.

Page 12: Cross-Platform Data Synchronization

12Friday, October 16, 2009

Syncing has been something people have been trying to solve for a long time.If you follow the current hype, we don’t have to worry about it because...

Page 13: Cross-Platform Data Synchronization

the

CLOUD

13Friday, October 16, 2009

...you put everything on the cloud! The cloud will solve all our problems!The popular conception of the trend of “cloud computing” is a little wrong. People think of it as a monolithic thing.

Page 14: Cross-Platform Data Synchronization

14Friday, October 16, 2009

But the reality is that the huge benefit of cloud computing is that you can outsource the right things to the right people. I use one company for sending my email newsletters, because they have the best infrastructure and software for that. I use another for my regular web hosting, and yet another to host downloads. And I use a help desk app called Zendesk. So it’s not really on “the cloud” -- it’s on a lot of clouds!

So we’re back to the same problem -- data is going to be in a ton of different places, and you have to build systems that can deal with that. Sync plays a big part.

Page 15: Cross-Platform Data Synchronization

A

CLOUD A

CLOUD

A

CLOUDA

CLOUD

15Friday, October 16, 2009

So the future’s more complicated than it seems. It’s not “the cloud”, but lots of clouds and client apps and platforms and apparently goats. And they all have to be share data but operate independently.

Page 16: Cross-Platform Data Synchronization

Does your app pass the Green Line Test?

16Friday, October 16, 2009

And if you don’t think data synchronization applies to your app, I’d like you all to try this while you’re in the city. I call it the Green Line Test.

Page 17: Cross-Platform Data Synchronization

17Friday, October 16, 2009

I used to live near Lechmere in East Cambridge, and I’d commute in to classes at Northeastern using the Green Line. The Green Line touches a lot of areas of Boston and goes above ground and below. Some of the stations underground are dead, some have reception. Inevitably, the ones that the train stops inexplicably for 20 minutes in will be those that don’t. You see, they’ve upgraded all the trains and haven’t quite got all the kinks worked out.

If your app is one of those “thin” or “hybrid” apps that needs to make an HTTP request to do anything, you should try running your app for the entirety of a Green Line ride. How does it handle it when you lose connectivity for a minute? Pop up an error? Or stall indefinitely? How good an experience is it? Do you cache things well, or does it always need a connection?

If you find that it’s not very good in this situation, you should consider making more of your application operate on the device itself, and then sync its state back to the cloud. It will be more responsive and usable more of the time. You’ve probably avoided something like this because, well, syncing is a pain. But what I’m going to talk about in this presentation will help.

Page 18: Cross-Platform Data Synchronization

Syncing Through The Agesand

Why You Still Might Want ToWrite Your Own

2

18Friday, October 16, 2009

Page 19: Cross-Platform Data Synchronization

19Friday, October 16, 2009

I thought this tweet from Steven Frank was funny. It’s true. It never works.I think that’s because there’s not a lot of knowledge about syncing out there. There are a lot of companies that have written (bad) syncing, and a few academic papers on it. But not a lot of talk about syncing as a subject. If more people didn’t have to waste all this time learning the basics for themselves, we could have better syncing as more people work out the kinks and integrate it in more systems.

Page 20: Cross-Platform Data Synchronization

Set-Reconciliation Problem

20Friday, October 16, 2009

Academics call syncing the “set reconciliation problem”. You’ve got two sets, and you want to reconcile their differences. The literature on it is pretty limited though.

Page 21: Cross-Platform Data Synchronization

rsync

21Friday, October 16, 2009

Page 22: Cross-Platform Data Synchronization

Subversion

22Friday, October 16, 2009

Subversion is a kind of syncing a lot of us probably use every day. Like most version control systems, the idea is that your whole team can have the most current copy of the code.

Page 23: Cross-Platform Data Synchronization

Data ≠ Files

23Friday, October 16, 2009

But it’s important to note that there’s a big difference between syncing *data* and syncing *files*. Syncing data is a LOT harder!

Page 24: Cross-Platform Data Synchronization

DropBox

24Friday, October 16, 2009

Dropbox is a consumer file syncing solution. But it actually ends up working a lot more like Subversion than you’d think. It keeps revisions and actually handles conflicts in a neat way.

Page 25: Cross-Platform Data Synchronization

HotSync

25Friday, October 16, 2009

Palm was one of the first companies to try to make a comprehensive syncing solution for consumers.

The way HotSync works is that, once you’ve done the first sync, the Palm would set these status flags on any piece of data that you changed. That would make it really fast to sync back up with your PC, because the PC had an old copy of the data that both devices had the last time you synced.

Page 26: Cross-Platform Data Synchronization

Sync ServicesMac OS X

26Friday, October 16, 2009

Sync Services is Apple’s syncing framework. It’s pretty neat, and if you were like me and trying to write a Mac app that synced with an iPhone app, it would *almost* work.

Page 27: Cross-Platform Data Synchronization

Your App Truth Database

Macs

Sync Services

27Friday, October 16, 2009

Sync Services has this concept of a “Truth Database” -- where you replicate all your data so that it can sync it elsewhere. It gives you lots of goodies to sync your app to the Truth database -- pushing and pulling changes. They give you tools to define the schema you want the Truth to keep for your data.

But then it gets magically put on MobileMe and synced to other Macs. You don’t have any control over that.

The iPhone supports MobileMe, but only for syncing contacts, appointments, and notes. It doesn’t read in the truth database from Sync Services, it’s totally separate. There is no Sync Services for the iPhone.

So that’s kind of a bummer.

Page 28: Cross-Platform Data Synchronization

Two Approaches:

History-BasedEx-Post-Facto

28Friday, October 16, 2009

Page 29: Cross-Platform Data Synchronization

History-Based Ex-Post-Facto

CONS

- Easy to bolt onto an existing system- Hot swappable: arbitrary configurations of devices in any state can be synced

- Syncing can be slower- Requires accurate date/time info

- Efficient and accurate

- All client software must maintain status flags/history- Does not scale as well- Complicated

CONS

PROSPROS

29Friday, October 16, 2009

Page 30: Cross-Platform Data Synchronization

History-Based Ex-Post-Facto

SubversionDropboxHotSync (Fast)

RsyncSync ServicesHotSync (Slow)

30Friday, October 16, 2009

Page 31: Cross-Platform Data Synchronization

When To Write Your Own• When your schema demands custom handling

• Dependencies• Ordering

• When data needs to be specially converted and prepared for different clients/devices

• iTunes and iPod Shuffles

• When it’s a core function

31Friday, October 16, 2009

Page 32: Cross-Platform Data Synchronization

Algorithmsand

Architecture

3

32Friday, October 16, 2009

Page 33: Cross-Platform Data Synchronization

A BA ∩ B

33Friday, October 16, 2009

So in these algorithms, we’re going to be a little abstract and think of this as two sets of data.- A is all the data that’s on your first device, B is all the data that’s on the second device.- Here’s all the data that’s *only on A*. That needs to be put on B if it was added, deleted from A if not.- Here’s all the data that’s *only on B*. That needs to be put on A if it was added, deleted from B if not.- Here’s the data that’s on both. This is the trickiest part. We need to sift through this data and figure out if any of it has been modified since the last sync. We need to merge modifications when we can, and otherwise, ask the user to resolve the conflict.

Page 34: Cross-Platform Data Synchronization

34Friday, October 16, 2009

Page 35: Cross-Platform Data Synchronization

Goal of a Sync Algorithm

Make Two Sets The Same (duh!)

... in a way consistent with user expectations

... as quickly as possible

35Friday, October 16, 2009

So what is the goal of any sync algorithm? To make both sets of data the same.Well, that part is pretty easy. I could just erase what’s on your server account and erase what’s on your iPhone. Done!Turns out it’s more complicated. There are a lot of *correct* ways to make this happen, but only some of them are what the user is expecting to see.The sync also has to be fast. This usually means a minimum of data being transferred.

Page 36: Cross-Platform Data Synchronization

Three Algorithms

Copy Sync Merge

36Friday, October 16, 2009

But there are a few ways to skin a cat. Let’s look at each of these. They all meet the definition we discussed, but go about it differently.

Page 37: Cross-Platform Data Synchronization

Good Will Hunting

The Departed

21

Spenser: For Hire

The Boondock Saints

With Honors

A BCopy

Good Will Hunting

37Friday, October 16, 2009

Page 38: Cross-Platform Data Synchronization

Good Will Hunting

The Departed

21

A BCopy

Good Will Hunting

The Departed

21

38Friday, October 16, 2009

Page 39: Cross-Platform Data Synchronization

Good Will Hunting

The Departed

21

Spenser: For Hire

The Boondock Saints

With Honors

A BMerge

Good Will Hunting

39Friday, October 16, 2009

Page 40: Cross-Platform Data Synchronization

Good Will Hunting

The Departed

21

Spenser: For Hire

The Boondock Saints

With Honors

A BMerge

Good Will Hunting

The Departed

21

Spenser: For Hire

The Boondock Saints

With Honors

40Friday, October 16, 2009

Page 41: Cross-Platform Data Synchronization

A BSync

last sync = 12PM

The Departed created:modified:

2PM2PM

21 created:modified:

11AM11AM

Good Will… created:modified:

11AM11AM

Boondock… created:modified:

1PM1PM

With Honors created:modified:

2PM2PM

now = 3PM

created:modified:

2PM2PM Good Will… created:

modified:11AM11AM

41Friday, October 16, 2009

Page 42: Cross-Platform Data Synchronization

A BSync

last sync = 12PM

The Departed created:modified:

2PM2PM

Good Will… created:modified:

11AM11AM

Boondock… created:modified:

1PM1PM

With Honors created:modified:

2PM2PM

now = 3PM

created:modified:

2PM2PM Good Will… created:

modified:11AM11AM

The Departed created:modified:

2PM2PM

Boondock… created:modified:

1PM1PM

With Honors created:modified:

2PM2PM

42Friday, October 16, 2009

Page 43: Cross-Platform Data Synchronization

Three Algorithms

Copy Sync Merge

43Friday, October 16, 2009

So let’s go back here and talk about when to use each of these algorithms:SYNC: This is what you’re going to want to do 95% of the time.The other two algorithms are for when you’re first setting two devices up to sync.COPY: Some people doing sync like to offer you a choice of data on either device to become the “one true” set of data.MERGE: What I do with ShoveBox is just do a merge the first time -- because there might be data on both devices they want to keep. It avoids any confusion over the choice, and nobody’s going to be pissed with the initial result.

Page 44: Cross-Platform Data Synchronization

Needed for Sync

•On each device, each object needs:

•Creation Date

•Modification Date

•UDID

44Friday, October 16, 2009

Page 45: Cross-Platform Data Synchronization

Sync: In Depth

PREPARE

SYNC OBJECTS IN ONLY A

SYNC OBJECTS IN ONLY B

SYNC INTERSECTION

CLEAN UP45Friday, October 16, 2009

Page 46: Cross-Platform Data Synchronization

Sync: In Depth

PREPARE

SYNC OBJECTS IN ONLY A

SYNC OBJECTS IN ONLY B

SYNC INTERSECTION

CLEAN UP46Friday, October 16, 2009

Page 47: Cross-Platform Data Synchronization

PREPARE

•Establish Communication With Sources

•Grab summaries from A and B• UUIDs, creation, modification

•Sort into sets

47Friday, October 16, 2009

Page 48: Cross-Platform Data Synchronization

Sync: In Depth

PREPARE

SYNC OBJECTS IN ONLY A

SYNC OBJECTS IN ONLY B

SYNC INTERSECTION

CLEAN UP48Friday, October 16, 2009

Page 49: Cross-Platform Data Synchronization

SYNC OBJECTS IN ONLY A• For each object o in a:

• if o.creation > last sync then• tell b to copy o over

• else• tell a to delete o

• end if• next

49Friday, October 16, 2009

Page 50: Cross-Platform Data Synchronization

Sync: In Depth

PREPARE

SYNC OBJECTS IN ONLY A

SYNC OBJECTS IN ONLY B

SYNC INTERSECTION

CLEAN UP50Friday, October 16, 2009

Page 51: Cross-Platform Data Synchronization

SYNC OBJECTS IN ONLY B• For each object o in b:

• if o.creation > last sync then• tell a to copy o over

• else• tell b to delete o

• end if• next

51Friday, October 16, 2009

Page 52: Cross-Platform Data Synchronization

Sync: In Depth

PREPARE

SYNC OBJECTS IN ONLY A

SYNC OBJECTS IN ONLY B

SYNC INTERSECTION

CLEAN UP52Friday, October 16, 2009

Page 53: Cross-Platform Data Synchronization

SYNC INTERSECTION• For each object o in both a and b:

• if o.modification < last sync then• skip it

• else• if only a’s mod > last sync then

• propogate a’s version to b• else if only b’s mod > last sync then

• propogate b’s version to a• else if both a and b’s mod > last sync then

• present conflict• end

• next

53Friday, October 16, 2009

Page 54: Cross-Platform Data Synchronization

Sync: In Depth

PREPARE

SYNC OBJECTS IN ONLY A

SYNC OBJECTS IN ONLY B

SYNC INTERSECTION

CLEAN UP54Friday, October 16, 2009

Page 55: Cross-Platform Data Synchronization

CLEAN UP

• tell a and b we’re finished• store current time as last sync

55Friday, October 16, 2009

Page 56: Cross-Platform Data Synchronization

What’s wrong with this?

1. Single last-sync date can cause problems with partial syncs.

SOLUTION Sync engine keeps per-item last-sync dates

2. Single modification date makes merging hard

SOLUTION Keep per-attribute modification dates on each source

56Friday, October 16, 2009

Page 57: Cross-Platform Data Synchronization

• else if both a and b’s mod > last sync then• let c = new list of conflicting keys• let e = new entry record

• for each key k on o• if a[o].k == b[o].k then

• e.k = a[o].k• else

• if only a[o].k.mod > o.last sync then• e.k = a[o].k

• else if only b[o].k.mod > o.last sync then• e.k = b[o].k

• else• c += k

• end if• end if

• next

• if c.count > 0 then• present conflict to user• e = a | b

• end if

push e to a and bnext entry

INTERSECTION REVISITED

57Friday, October 16, 2009

Page 58: Cross-Platform Data Synchronization

Going Further

•On textual keys, if the same key on the same entry was modified on both entries, then use diff to do a text merge and

• only ask the user to select one version or the other if there is a text merge conflict

58Friday, October 16, 2009

Page 59: Cross-Platform Data Synchronization

Architecture

59Friday, October 16, 2009

Page 60: Cross-Platform Data Synchronization

Architecture

Syncer

60Friday, October 16, 2009

Page 61: Cross-Platform Data Synchronization

Architecture

Syncer

BASource Source

61Friday, October 16, 2009

Page 62: Cross-Platform Data Synchronization

Architecture

Syncer

BASource SourceLocal SQLLite DB Web Service

62Friday, October 16, 2009

Page 63: Cross-Platform Data Synchronization

Architecture

Web Service

iPhone App

Web Service

63Friday, October 16, 2009

Page 64: Cross-Platform Data Synchronization

Architecture

Web Service

The Cloud

Web Service

iPhone App

64Friday, October 16, 2009

Page 65: Cross-Platform Data Synchronization

ArchitectureiPhone App

Web Service

Mac App

65Friday, October 16, 2009

Page 66: Cross-Platform Data Synchronization

Architecture

Syncer

BASource Source

66Friday, October 16, 2009

Page 67: Cross-Platform Data Synchronization

Sync Source Abstraction

• A sync source supports:

•Create/Overwrite Object

• Delete Object

•Get Object

•Get summary

67Friday, October 16, 2009

Page 68: Cross-Platform Data Synchronization

Implementing Syncin Objective-C

4

68Friday, October 16, 2009

Page 69: Cross-Platform Data Synchronization

DBCE017A-AF95-11DE-98BE-228156D89593example:

CFUUIDRef uuid = CFUUIDCreate(kCFAllocatorDefault);

CFUUIDCreateString(kCFAllocatorDefault,uuid);

how to generate:

UDIDs

69Friday, October 16, 2009

Page 70: Cross-Platform Data Synchronization

Dates

•NSDate contains time zone info

• You can compare two NSDate objects or two timestamps

•UNIX Timestamp (1970)

•NSDate Timestamp (2001)

70Friday, October 16, 2009

Page 71: Cross-Platform Data Synchronization

Syncing with CoreData

• Set modification date in -willSave

•Check -isUpdated and - changedValues

• Don’t update the mod date if it’s just the mod date that changed.

• Set creation date, mod date, and GUID in -awakeFromInsert

71Friday, October 16, 2009

Page 72: Cross-Platform Data Synchronization

Networking

• Protocol choices:

•HTTP

•GameKit

• BEEP/BLIP-based protocol

• Roll your own (not recommended)

•Using Bonjour/ZeroConf

72Friday, October 16, 2009

You have a few choices for your protocol.If you’re communicating with a server, you can make yourself a web service API. Your sync source is just wrapping code that makes NSURLRequests.I made the unfortunate choice of using it locally over the network. Writing an HTTP server that just has to talk with one other device isn’t too hard, but it was a really dumb architectural decision. Routers like to screw with it, even when it’s on a non-standard port.

Page 73: Cross-Platform Data Synchronization

Some (Bad) Syncing Codefrom My App

73Friday, October 16, 2009

Page 74: Cross-Platform Data Synchronization

ShoveBox Mac App

SBSyncEngine

SBIPhoneSyncSource

SBSyncSource

SBLocalDBSyncSource

74Friday, October 16, 2009

Page 75: Cross-Platform Data Synchronization

- (id) initWithLastSyncDate:(NSDate *)lastSync sourceA:(NSObject<SBSyncSource> *)a sourceB:(NSObject<SBSyncSource> *)b operation:(SBSyncEngineOperation)newOperation;

- (IBAction) start:(id)sender- (IBAction) cancel:(id)sender;

- (NSDate *) lastSyncDate;- (NSString *) currentlySyncingObjectName;- (SBSyncEngineOperation) operation;

- (NSObject<SBSyncSource> *) sourceA;- (NSObject<SBSyncSource> *) sourceB;

- (NSObject<SBSyncEngineDelegate> *) delegate;- (void) setDelegate:(NSObject<SBSyncEngineDelegate> *)theDelegate;

75Friday, October 16, 2009

Page 76: Cross-Platform Data Synchronization

typedef enum SBSyncEngineOperation {! SBSyncEngineOperationSync = 0, // Time-based sync A and B! SBSyncEngineOperationMerge = 1, // Non-destructive merge between A and B! SBSyncEngineOperationCopy = 2, // Replace B’s contents with A’s} SBSyncEngineOperation;

76Friday, October 16, 2009

Page 77: Cross-Platform Data Synchronization

@protocol SBSyncEngineDelegate- (void) syncEngineFinishedSyncingSuccesfully:(SBSyncEngine *)syncEngine;

- (void) syncEngineDidCancel:(SBSyncEngine *)syncEngine;

- (void) syncEngine:(SBSyncEngine *)syncEngine abortedWithError:(NSError *)err;

- (BOOL) syncEngine:(SBSyncEngine *)syncEngine pausedWithRecoverableError:(NSError *)err; // return YES to continue, NO to cancel

- (void) syncEngine:(SBSyncEngine *)syncEngine syncedObjects:(NSUInteger)objects ofTotal:(NSUInteger)total;

// return the index of the correct choice- (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine

encounteredEntryConflictWithA:(NSDictionary *)aEntryInfo b:(NSDictionary *)bEntryInfo;

- (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine encounteredFolderConflictWithA:(NSDictionary *)aFolderInfo b:(NSDictionary *)bFolderInfo;

- (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine encounteredSimpleEntityConflictWithKeyPath:(NSString *)keyPath aValue:(id)aValue bValue:(id)bValue;@end

77Friday, October 16, 2009

Page 78: Cross-Platform Data Synchronization

Questions/Discussion

78Friday, October 16, 2009