couchdb to cloudant

5
Moving CouchDB Data to the Cloudant DBaaS 1 Moving Apache CouchDB Data to Cloudant The path to scalable, always-on, and managed CouchDB as a service February 2013

Upload: sabyrzhan-tynybayev

Post on 17-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

About CouchDB to Cloudant

TRANSCRIPT

Page 1: CouchDB to Cloudant

Moving CouchDB Data to the Cloudant DBaaS 1  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Moving Apache CouchDB Data to Cloudant The path to scalable, always-on, and managed CouchDB as a service February 2013

Page 2: CouchDB to Cloudant

Moving CouchDB Data to the Cloudant DBaaS 2  

Cloudant Overview

Cloudant provides a managed, cloud database as a service (DBaaS) that is based on Apache CouchDB. Cloudant is a fast, always-on, scalable database that the big data experts at Cloudant operate and grow for you so you can stay focused on new development and not on database administration.

Samsung, Microsoft, Salesforce.com, DHL, Hothead Games, Flurry, and thousands of other developers of large-scale or fast-growing Web and mobile applications use Cloudant. The Cloudant DBaaS features:

• A schema-less (NoSQL) JSON document store

o For operational data – optimized for concurrent reads & writes, high availability, and data durability

o Monitored, scaled, and managed by big-data experts at Cloudant

o Accessed via an Apache CouchDB-compatible, RESTful API

• APIs for specialized data management features:

o Data replication & sync with mobile devices or local data centers

o Full-text indexing & search (Apache Lucene powered)

o Geo-spatial indexing and analytics

o Incremental MapReduce for real-time analytics

• Global data distribution

o Across a network of data centers in North America, Europe, and Asia

o Fault-tolerance via cross-data center data distribution

o Multiple hosting options – AWS, Azure, Joyent, Rackspace, SoftLayer

o Geo-load balancing connects users to the closest data source for lower data access latency

Why Transition to Cloudant? Developers choose CouchDB for a variety of reasons: schema freedom, ease of development, and replication & sync to name a few. But CouchDB can be difficult to scale out to handle larger workloads. The Cloudant DBaaS is based on Apache CouchDB, and has been enhanced with a horizontal scaling framework, Lucene full-text search, geo-spatial indexing, fault tolerance, and other features not found in CouchDB so that you can:

Build More Cloudant enhances the CouchDB development experience with built in scaling, fault tolerance, Lucene-powered full-text indexing and search, and geo-spatial indexing. Having these built into your data layer makes it easier to enrich your apps with advanced data management features.

Grow More Growing a CouchDB database to hold more data or support many more users is hard to do. Cloudant includes a horizontal scaling and fault-tolerance framework that makes this easy; it was initially developed to manage the petabytes of data that the Large Hadron Collider generates every second so that it could be accessed by physics researchers around the world.

Sleep More Keeping CouchDB running smoothly is a 24x7 operation, and we do that for you. We monitor, grow (reconfigure, repartition/rebalance clusters), protect and administer your data layer around the clock so you can get a good night’s sleep.

Page 3: CouchDB to Cloudant

Moving CouchDB Data to the Cloudant DBaaS 3  

Moving Your Data to Cloudant Migrating data from CouchDB to Cloudant is conceptually straightforward. It involves:

1 Replicating data from your CouchDB database to Cloudant

2 Optionally, adjusting your CouchDB design docs

The process generally takes a day or two depending on the scope of your application.

Taking a Phased Approach Migrating your current data layer to Cloudant can be done in phases; it does not have to be an “all or nothing” process. You can start by migrating a single database to Cloudant while other data continues to reside on other servers.

Good candidates for migration include databases that need to be scaled out. Rather than configuring your own CouchDB cluster and partitioning your CouchDB data across it, consider moving your data to Cloudant. Your data will be scaled out by Cloudant as part of the process.

Importing Your Data into Cloudant If Cloudant will hold your data in the same database and JSON structure as your existing CouchDB database does, you can simply replicate data from your CouchDB database to Cloudant.

Otherwise, you’ll need to output your CouchDB data to a file containing an array of JSON objects and then perform one or more HTTP POST requests to bulk load the docs from you export file into Cloudant.

API and Data Design Doc Changes Cloudant is based on CouchDB, and its API is largely compatible with CouchDB. Cloudant has had to make a few changes to the CouchDB API in order to make it faster, richer and possible to use as CouchDB a hosted and managed, cluster-based service. These differences might require that you change your design documents or application code:

• View docs must be written in Javascript, unlike CouchDB, which permits these to be written in other languages.

• Temp views are disabled. They are not a best practice for production systems in CouchDB because of performance.

• Changes feed might be unordered. In Cloudant, items in the changes feed are collected from nodes in the cluster independently, so they might not be reported in order as they are in CouchDB. The “since” parameter works as expected though.

• Sequence numbers are integers in couch and are ordered. In Cloudant they are opaque JSON tokens which include cluster state information.

• Server configuration commands are disabled. We have disabled server configuration commands and server shutdown; they aren’t applicable within a hosted service.

• Authorization & Authentication differences:

o Cloudant permits you to share database access across Cloudant user accounts.

o Cloudant supports the same authentication methods as CouchDB, except for Oauth. Full support for Oauth is currently under development.

Page 4: CouchDB to Cloudant

Moving CouchDB Data to the Cloudant DBaaS 4  

Interview with Stockr.com Stockr.com is a real-time social networking site that connects financial investors and traders to track stocks and discuss public companies, without the spam that dominates most other stock sites and message boards. We spoke with Eugene Kashpureff Jr. of Stockr.com about his experience getting started with Cloudant.

Cloudant: Why did you move Stockr.com to Cloudant?

Eugene: I’m a volunteer firefighter at home, I’m a professional firefighter at Stockr. I have better things to do than to sit and mess with databases all day long. We use CouchDB to store our site’s “big data” – stock info, user posts, statistical data and the like. We have a few tables of relational data(mostly user login info) that we keep in MySQL, but 95% of our data volume is in CouchDB. Offloading the management and worry of all that information is the core reason behind our move.

Cloudant: How much data, access, storage?

Eugene: Between our various development, test and production environments? About half a terabyte, and growing daily. But I don’t watch that number; that’s why I let you guys have our data.

Cloudant: How has the performance been on Cloudant?

Eugene: Since we moved to Cloudant we’ve had zero user complaints about our site’s speed, when it used to be a constant nag. There are so many fewer problems that we have to deal with every single day. I can’t say we’ve had NO problems since moving to Cloudant, but it’s far fewer than we used to have.

Cloudant: What type of issues do you no longer have to deal with?

Eugene: Database sharding, disk compaction, running out of storage space, managing hardware & daemons, all the stuff I blindly wandered through the CouchDB documentation for -- now that’s gone from my life.

Cloudant: What did you migrate from?

Eugene: We originally started with the Ubuntu CouchDB package running on one node. Then expanded to three nodes of BigCouch (http://bigcouch.cloudant.com). Then we tried to add two nodes to that, and we couldn’t get it to work, so we decided to just move to Cloudant. In addition, each of our developers had a separate environment, and we had numerous unpleasant surprises when a configuration or version difference was found.

Page 5: CouchDB to Cloudant

Moving CouchDB Data to the Cloudant DBaaS 5  

Cloudant: How did the migration go?

Eugene: It took us a while to get replication set up, data imported over the weekend, and then we flipped the switch and it just worked. Only thing was we had to set up a proxy server to deal with SSL endpointing. Replication took a few days to get right because Cloudant was having an issue with the hardware SoftLayer provisioned; a new switch was installed and configured wrong. Just one of those set-up-new-hardware problems you always have. We moved over 40GB of raw data then re-generated indexes.

Cloudant: Did you have to make any code changes?

Eugene: Our CouchDB views were written in Python, but Cloudant requires those be written in Javascript. Mike Miller did a lot of the work converting those views documents for us. I think he said it took him about an hour. The only thing we changed in our actual application was the address of the CouchDB server.

Cloudant: Is there anything we could have done to make moving to Cloudant easier?

Eugene: I’m not sure if there’s much more you folks could have delivered. I was surprised at how quick and painless as it was. Outside of Stockr, I use other services like AWS. It’s painful. With Cloudant it was a couple of config problems, that was it. For an easier migration process, it would have been nice to have a tool to run against our BigCouch/CouchDB server to automatically load it all up into Cloudant, rather than having to log in as admin and assign all those relationships by hand. New customers would appreciate that.

Cloudant: Any closing thoughts?

Eugene: With Cloudant now, I can work on making the system faster rather than trying to keep the system up.

Getting More Help If you need help getting started with Cloudant, visit the Cloudant Developer Resources Site (https://cloudant.com/for-developers/) or contact us for assistance:

• #cloudant on IRC

• @cloudant on Twitter

[email protected]

129 South Street, Boston, MA 02111 (857) 400-9900 | cloudant.com

About Cloudant Cloudant provides developers of large-scale and fast-growing web and mobile applications with the world’s first globally distributed database as a service (DBaaS) for loading, storing, analyzing, and distributing operational application data. As a managed service, Cloudant helps developers eliminate the delays, costs, and distractions inherent in working with databases and their administrators, while providing unmatched scalability, availability, and performance. The Cloudant service is available hosted on AWS, Joyent, Rackspace, SoftLayer, and Windows Azure. Cloudant customers include Samsung, Hothead Games, Microsoft Big Park Studios, Flurry, Salesforce.com, DHL and thousands of other developers worldwide.