the 5 main benefits of apache cassandra

12
IBM Data and AI The 5 main benefits of Apache Cassandra

Upload: others

Post on 21-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The 5 main benefits of Apache Cassandra

IBM Data and AI

The 5 main benefits of Apache Cassandra

Page 2: The 5 main benefits of Apache Cassandra

Contents Introduction

5 benefits of Apache Cassandra01 Scalability

02 High availability via data replication

03 High fault tolerance

04 High performance

05 Multi-data-center and hybrid cloud support

How IBM and DataStax add value

Next steps

2

Page 3: The 5 main benefits of Apache Cassandra

Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store, and analyze their data.

But then Facebook came along, and suddenly an RDBMS was not quite enough. The social giant needed a powerful database solution for its Inbox Search feature, and Apache Cassandra—a distributed NoSQL database—was born.

Cassandra was released as an open source project in July 2008, taking its name from a mythological Greek prophet. It quickly

progressed to an Apache Incubator project in March 2009 and a top-level project in February 2010.

Since its 2010 release, Cassandra has gone through several iterations. With the recent release of Cassandra 4.0, it’s worth checking out a brief overview of how the database evolved over the last decade:

2011 June

Cassandra 0.8 added support for the Cassandra Query Language (CQL), support for zero-downtime upgrades, and more.

2011 October

Cassandra 1.0 added improved read performance, integrated compression, and more.

2013 September

Cassandra 2.0 added lightweight transactions, improved compactions, and more.

2015 November

Cassandra 3.0 added a refactored storage engine, materialized views, and more.

2017June

Cassandra 3.11 added faster import/export, time-window compaction, and more.

2021 July

Cassandra 4.0added real-time audit logging and many improvements for security, performance, and stability.

Figure 1. Cassandra version releases timeline

3

Page 4: The 5 main benefits of Apache Cassandra

5 benefits of Apache Cassandra

As an open source project, Cassandra is freely available from the Apache Software Foundation.

There are, however, various distributions of Cassandra, including the one from IBM’s OEM partner, DataStax. Their distribution of Apache Cassandra is distributed and supported by the same people who wrote the majority of Cassandra’s code.

IBM, in partnership with DataStax, also provides DataStax Enterprise with IBM to help enterprises build and manage modern data applications in hybrid and multicloud environments.

Cassandra adoption has significantly increased over the last few years, and for good reason: the distributed database delivers enormous value.

To see exactly how, read on and explore five major benefits of Cassandra.

4

Page 5: The 5 main benefits of Apache Cassandra

01Scalability

You can’t succeed without easy scalability. Period.

There’s no substitute for knowing you’ll be able to handle an unexpected surge of traffic—even while you’re asleep.

But when scaling is difficult to achieve or adds significant risks such as potential downtime, you can never rest easy. You never know when a large influx of traffic is headed your way. If your systems can’t scale to accommodate this traffic, your customers will go somewhere else.

Generally speaking, there are two ways to achieve scale at the database level:

1. You can scale upward by adding capacity such as memory, storage, and CPU to individual machines. If you scale this way, the main advantage is that you won’t have to run a large number of servers. However, you will likely spend significant money on implementation of expensive high-end hardware, there’s a much bigger chance your infrastructure will fail due to increased strain, and failure of a single component will have disruptive effects. Furthermore, this method of scaling can leave you completely locked in.

2. You can scale out by adding more servers. With this method, obviously, you’ll have to run more servers. Licensing fees and utility costs might go up as well. But overall your expenses will be lower compared to scaling up. You’ll also enjoy resilience and fault tolerance, both of which can be built into the foundation of the database architecture.

Scale out is quickly becoming the preferred method of scalability for leading enterprises, and Cassandra makes linear scale out easy—if you want to double the workload, just double the number of servers. You can scale out without downtime or impacting performance.

Not only can Cassandra’s ability to scale save you money, but you also won’t have to worry about getting stuck in a less-than-optimal vendor’s tech stack, either.

Node

Figure 2. Linear scalability through addition of nodes (scale out)

Node 1

Node 2

Node 1

Node 3

Node 2

Node 4

Node 1

Node 8

Node 7

Node 6

Node 5

Node 4

Node 3

Node 2

100,000ops/sec

200,000ops/sec

400,000ops/sec

Business value

It’s estimated that Amazon lost up to USD 100 million due to a one-minute outage in 2018, ostensibly because too many users flooded the site simultaneously. With scalable systems in place, your business won’t miss out on opportunities during heavily trafficked periods, and you’ll avoid extremely costly outages. Add opportunity. Subtract losses. That’s value.

5

Page 6: The 5 main benefits of Apache Cassandra

02High availability via data replication

It may seem contradictory, but the world is becoming increasingly connected as it becomes increasingly distributed.

This evolving reality demands a database that can handle data coming from multiple geographically distributed sources.

Traditionally, databases had primary-secondary architectures. Primary nodes could read and write; secondary nodes could only read. While this architecture helped ensure consistency, it also introduced serious problems. Database operations, for example, would grind to a halt if a primary node failed.

That might have been something an enterprise could tolerate in the 1980s. But in the 2020s, no serious organization can—or should—absorb such a significant disruption.

There is good news, however: within Cassandra’s architecture, every node can perform read and write operations. This enables data to quickly be replicated across data centers and geographies.

As a result, team members and customers spread out across the world can expect an optimal experience each time they interact with applications. Data is always available, no matter where the physical infrastructure is located. In the event a node gets knocked offline, traffic is automatically rerouted to the nearest healthy node.

EMEAMicrosoft Azure

North AmericaAmazon EC2

San FranciscoGoogle Web Services

New YorkIBM Cloud®

Figure 3. High availability architecture with geographically distributed resources from different providers

Business value

IBM and Gartner research revealed that bad data collectively costs US organizations USD 3.1 trillion each year—an average of USD 15 million per organization. Thanks to Cassandra, you won’t have to worry about duplicative work, lost intellectual property, or inaccessible customer data. Automatic data replication means data is never lost, and because of this, you don’t need to invest in a separate disaster recovery data center. Money saved is money earned.

6

Page 7: The 5 main benefits of Apache Cassandra

03High fault tolerance

In a perfect world, your systems would always run as designed—even when one part fails.

With Cassandra, that perfect world is possible.

Thanks to its peer-to-peer architecture and data replication capabilities, applications never slow down or fail when nodes get knocked offline. If you use the leading distribution of Cassandra, DataStax Enterprise, you’ll have built-in repair services that fix problems immediately after they occur. Cassandra also has transparent fault detection and recovery, so nodes that fail can easily be restored or replaced.

When a node goes down, primary-secondary architectures require administrators to invest a lot of time and energy repairing the database. Cassandra has no such requirements; there’s no need for any manual intervention when a node fails.

With Cassandra, you can forget about fault tolerance altogether. It’s automatic.

Business value

According to EE Power, unplanned downtime costs the average company USD 9,000 per minute—and Gartner notes the actual figures vary considerably, from USD 100,000 per hour well into the millions. Regardless of the exact cost, downtime is expensive and, with the right technology, increasingly unnecessary.

7

Page 8: The 5 main benefits of Apache Cassandra

04High performance

Suffice it to say: speed matters. Customers expect prompt service at restaurants, quick delivery of packages, and zero lag from their applications.

And when things don’t happen as quickly as they hoped, they’re liable to switch to a better service.

The same holds true for websites and applications. Consider these statistics compiled by HubSpot:

– 47% of customers expect a website to load in two seconds or less.

– 79% of customers are unlikely to support a business that has poor website performance.

– A one-second delay in page load time translates into an 11% reduction in page views.

High performance is more than just a flashy capability. With performant systems, employees can get things done quickly and customers can enjoy positive user experiences in every interaction.

In a world that moves faster than ever, you can’t afford even the smallest delay in getting data to the right place.

Business value

Cassandra’s high performance means developers can be more productive without high latency or bottlenecks slowing them down. From the customer’s perspective, websites and applications will work as they’re expected to, translating into positive user experiences and improved customer retention.

8

Page 9: The 5 main benefits of Apache Cassandra

05Multi-data-center and hybrid cloud support

In an age where hybrid cloud is quickly becoming the go-to data management environment, support for multiple data centers is key.

Fortunately, Cassandra is designed as a distributed system for deployment of large numbers of nodes across multiple data centers. In addition, key features of Cassandra’s distributed architecture are specifically tailored for deployment to multiple data centers. These features are robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy for failover and disaster recovery, or even for creating a dedicated analytics center replicated from your main data storage centers.

Cassandra characteristics that are key to multi-data-center deployment include:

– Replication factor and replica placement strategy: The default placement strategy, NetworkTopologyStrategy, has capabilities for fine-grained adjustment of the number and location of replicas at the data center and rack level.

– Snitch: For multi-data center deployments, it is important to make sure the snitch has complete and accurate information about the network, either by automatic detection (RackInferringSnitch) or details specified in a properties file (PropertyFileSnitch).

– Consistency level: Cassandra provides consistency levels that are specifically designed for scenarios with multiple data centers.

Your specific needs will determine how you combine these ingredients in a “recipe” for operations across multiple data centers.

Business value

If you want to reliably serve a distributed, global audience with powerful, always-on applications, you’re going to need multiple data centers. And to serve multiple data centers, it’s essential to have a database that’s easily scalable across geographic regions.

9

Page 10: The 5 main benefits of Apache Cassandra

How IBM and DataStax add value

IBM collaborates with DataStax to offer customers a flexible, high-performance, and trusted open data stack for the data-driven enterprise.

DataStax Enterprise (DSE), based on Apache Cassandra, is the world’s most scalable database, well known for 100% uptime, unmatched low latency, and the ability to handle massive data at planetary scale. DSE delivers enterprise capabilities that are used by the best internet companies, and trusted by 40% of Fortune 100 and a fifth of Fortune 500 companies. DataStax Enterprise seamlessly plugs into IBM Cloud Pak® for Data to power your NoSQL workloads and can either run analytical workloads or feed other analytical systems in the IBM Cloud Pak ecosystem.

If you’re running open source Apache Cassandra and hit a performance or optimization issue that you can’t resolve on your own, IBM can help. With DataStax Luna, you can get subscription-based support for open source Apache Cassandra from the experts who authored the majority of the Cassandra codebase. DataStax Luna subscribers get 24x7 support with general-purpose and technical questions for their open source Cassandra deployments. Choose IBM to avoid the maintenance, support, and compliance issues that can plague enterprises who deploy the open source version of Cassandra.

10

Page 11: The 5 main benefits of Apache Cassandra

Next steps There are many options for learning more about IBM, DataStax and Cassandra

IBM-DataStax Fireside Chat

Watch this deep dive into how IBM Cloud Pak for Data and DataStax Enterprise enable a Cassandra-based distributed datastore, coupled with best-in-class data science, governance, and privacy.

Free consultation

Reach out to schedule a no-cost expert consultation to discuss the help that’s available as you modernize your data architecture.

Read more

See what IBM and DataStax do together to deliver the data solution for data modernization and scaling AI in the hybrid cloud.

11

Page 12: The 5 main benefits of Apache Cassandra

© Copyright IBM Corporation 2021

IBM Corporation New Orchard RoadArmonk, NY 10504

Produced in the United States of AmericaSeptember 2021

IBM, the IBM logo, IBM Cloud, and IBM Cloud Pak are trademarks or registered trademarks of International Business Machines Corporation, in the United States and/or other countries. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on ibm.com/trademark.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. It is the user’s responsibility to evaluate and verify the operation of any other products or programs with IBM products and programs. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.

Cassandra is not an IBM product or offering. The DataStax distribution of Apache Cassandra is sold or licensed, as the case may be, to users under DataStax’s terms and conditions, which are provided with the product or offering. Availability, and any and all warranties, services and support for Cassandra is the direct responsibility of and is provided directly to users by DataStax.

The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation.

12