software development with apache cassandra

135
CASSANDRA DAY DALLAS 2015 SOFTWARE DEVELOPMENT WITH CASSANDRA: A WALKTHROUGH Nate McCall @zznate Co-Founder & Sr. Technical Consultant http://www.slideshare.net/zznate/soft-dev-withcassandraawalkthrough Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Upload: zznate

Post on 16-Apr-2017

400 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Software Development with Apache Cassandra

CASSANDRA DAY DALLAS 2015

SOFTWARE DEVELOPMENT WITH CASSANDRA:A WALKTHROUGH

Nate McCall@zznate

Co-Founder & Sr. Technical Consultanthttp://www.slideshare.net/zznate/soft-dev-withcassandraawalkthrough

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Software Development with Apache Cassandra

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Based in New Zealand, Australia & USA.

Page 3: Software Development with Apache Cassandra

OVERVIEWDATA MODELINGWRITING CODE

TESTING REVIEWING

MANAGING ENVIRONMENTS

Page 4: Software Development with Apache Cassandra

Overview:

What makes a software development

project successful?

Page 5: Software Development with Apache Cassandra

Overview: Successful Software Development

- it ships- maintainable- good test coverage- check out and build

Page 6: Software Development with Apache Cassandra

Overview:

Impedance mismatch:distributed systems

developmenton a laptop.

Page 7: Software Development with Apache Cassandra

OVERVIEWDATA MODELINGWRITING CODE

TESTING REVIEWING

MANAGING ENVIRONMENTS

Page 8: Software Development with Apache Cassandra

Data Modeling:

… a topic unto itself.But quickly:

Page 9: Software Development with Apache Cassandra

Data Modeling - Quickly

• It’s Hard• Do research• #1 performance problem• Don’t “port” your schema!

Page 10: Software Development with Apache Cassandra

Data Modeling - Using CQL:

• tools support• easy tracing (and trace discovery)• documentation*

*Maintained in-tree:https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

Page 11: Software Development with Apache Cassandra

Data Modeling - DevCenter :

Tools:DataStax DevCenter

http://www.datastax.com/what-we-offer/products-services/devcenter

Page 12: Software Development with Apache Cassandra
Page 13: Software Development with Apache Cassandra

OVERVIEWDATA MODELINGWRITING CODE

TESTING REVIEWING

MANAGING ENVIRONMENTS

Page 14: Software Development with Apache Cassandra

Writing Code:

use CQL

Page 15: Software Development with Apache Cassandra

Writing Code - Java Driver :

Use the Java Driver

• Reference implementation• Well written, extensive coverage• Open source• Dedicated development resourceshttps://github.com/datastax/java-driver/

Page 16: Software Development with Apache Cassandra

Writing Code - Java Driver :

Existing Spring Users:Spring Data Integration

http://projects.spring.io/spring-data-cassandra/

Page 17: Software Development with Apache Cassandra

Writing Code - Java Driver :

Four rules for Writing Code• one Cluster for physical cluster• one Session per app per keyspace• use PreparedStatements • use Batches to reduce network IO

Page 18: Software Development with Apache Cassandra

Writing Code - Java Driver :

Configuration is Similar to Other DB Drivers(with caveats**)

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html

Page 19: Software Development with Apache Cassandra

Writing Cluster - Java Driver - Configuration:

Major Difference:it’s a Cluster!

Page 20: Software Development with Apache Cassandra

Writing Code - Java Driver - Configuration:

Two groups of configurations

• policies• connections

Page 21: Software Development with Apache Cassandra

Writing Code - Java Driver - Configuration:

Three Policy Types:• load balancing• connection• retry

Page 22: Software Development with Apache Cassandra

Writing Code - Java Driver - Configuration:

Connection Options:• protocol*• pooling**• socket

*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec**https://github.com/datastax/java-driver/tree/2.1/features/pooling

Page 23: Software Development with Apache Cassandra

Writing Code - Java Driver - Configuration:

Code sample for building a Cluster

Page 24: Software Development with Apache Cassandra
Page 25: Software Development with Apache Cassandra
Page 26: Software Development with Apache Cassandra
Page 27: Software Development with Apache Cassandra
Page 28: Software Development with Apache Cassandra

https://github.com/datastax/java-driver/tree/2.1/features/compression

Page 29: Software Development with Apache Cassandra

https://github.com/datastax/java-driver/tree/2.1/features/logging

Page 30: Software Development with Apache Cassandra

Writing Code - Java Driver - Pagination:

Simple result iteration

CREATE TABLE IF NOT EXISTS transit.vehicle_data ( vehicle_id text, speed double, time timeuuid, PRIMARY KEY ((customer_id), time) );

Page 31: Software Development with Apache Cassandra
Page 32: Software Development with Apache Cassandra
Page 33: Software Development with Apache Cassandra
Page 34: Software Development with Apache Cassandra
Page 35: Software Development with Apache Cassandra
Page 36: Software Development with Apache Cassandra
Page 37: Software Development with Apache Cassandra
Page 38: Software Development with Apache Cassandra
Page 39: Software Development with Apache Cassandra

Writing Code - Java Driver - Pagination:

Simple result iteration:Java 8 style

Page 40: Software Development with Apache Cassandra
Page 41: Software Development with Apache Cassandra
Page 42: Software Development with Apache Cassandra
Page 43: Software Development with Apache Cassandra
Page 44: Software Development with Apache Cassandra
Page 45: Software Development with Apache Cassandra
Page 46: Software Development with Apache Cassandra

Writing Code - Java Driver - Async

Async!(not so) Simple result iteration

Page 47: Software Development with Apache Cassandra
Page 48: Software Development with Apache Cassandra
Page 49: Software Development with Apache Cassandra
Page 50: Software Development with Apache Cassandra
Page 51: Software Development with Apache Cassandra

Writing Code - Java Driver - Pagination:

Not much to it:

PreparedStatement prepStmt = session.prepare(CQL_STRING);BoundStatement boundStmt = new BoundStatement(prepStmt);

boundStatement.setFetchSize(100)

https://github.com/datastax/java-driver/tree/2.1/features/paging

Page 52: Software Development with Apache Cassandra

Writing Code - Java Driver - Inserts and Updates:

About Inserts (and updates)

Page 53: Software Development with Apache Cassandra

Writing Code - Java Driver - Inserts and Updates:

Batches: three types- logged- unlogged- counter

Page 54: Software Development with Apache Cassandra
Page 55: Software Development with Apache Cassandra
Page 56: Software Development with Apache Cassandra
Page 57: Software Development with Apache Cassandra
Page 58: Software Development with Apache Cassandra

Writing Code - Java Driver - Inserts and Updates:

unlogged batch

Page 59: Software Development with Apache Cassandra
Page 60: Software Development with Apache Cassandra

Writing Code - Java Driver - Inserts and Updates:

LWT:INSERT INTO vehicle (vehicle_id, make, model, vin)VALUES ('VHE-101', 'Toyota','Tercel','1234f') IF NOT EXISTS;

Page 61: Software Development with Apache Cassandra

Writing Code - Java Driver - Inserts and Updates:

LWT:UPDATE vehicleSET vin = '123fa'WHERE vehichle_id = 'VHE-101'IF vin = '1234f';

Page 62: Software Development with Apache Cassandra

Writing Code:

ORM?Great for basic

CRUD operations

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html

Page 63: Software Development with Apache Cassandra

https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java

Page 64: Software Development with Apache Cassandra

https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java

Page 65: Software Development with Apache Cassandra

https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java

Page 66: Software Development with Apache Cassandra

https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java

Page 67: Software Development with Apache Cassandra

Writing Code - Java Driver :

A note about User Defined Types (UTDs)

Page 68: Software Development with Apache Cassandra

Writing Code - Java Driver - Using UDTs:

Wait.- serialized as blobs !!?!- new version already being discussed*- will be a painful migration path

* https://issues.apache.org/jira/browse/CASSANDRA-7423

Page 69: Software Development with Apache Cassandra

OVERVIEWDATA MODELINGWRITING CODE

TESTING REVIEWING

MANAGING ENVIRONMENTS

Page 70: Software Development with Apache Cassandra

Testing:

Use a Naming Scheme

• *UnitTest.java: no external resources• *ITest.java: uses external resources• *PITest.java: safely parallel “ITest”

Page 71: Software Development with Apache Cassandra

Testing:

Tip: wildcards on the CLI

are not a naming schema.

Page 72: Software Development with Apache Cassandra

Testing:

Group tests into

logical units (“suites”)

Page 73: Software Development with Apache Cassandra

Testing - Suites:

Benefits of Suites:• share test data• share Cassandra instance(s)• build profiles

Page 74: Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 75: Software Development with Apache Cassandra

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Page 76: Software Development with Apache Cassandra

Testing - Suites:

Using annotations for suites in code

Page 77: Software Development with Apache Cassandra
Page 78: Software Development with Apache Cassandra

Testing - Suites:

Interesting test plumbing• [Before|Afer]Suite• [Before|After]Group• Listeners

Page 79: Software Development with Apache Cassandra

Testing:

Use Mocks where possible

Page 80: Software Development with Apache Cassandra

Testing:

scassandra:not quite integration

http://www.scassandra.org/

Page 81: Software Development with Apache Cassandra
Page 82: Software Development with Apache Cassandra
Page 83: Software Development with Apache Cassandra
Page 84: Software Development with Apache Cassandra
Page 85: Software Development with Apache Cassandra

Testing:

Unit Integration Testing

Page 86: Software Development with Apache Cassandra

Testing:

Verify Assumptions:test failure scenarios

explicitly

Page 87: Software Development with Apache Cassandra

Testing - Integration:

Runtime Integrations:• local • in-process• forked-process

Page 88: Software Development with Apache Cassandra

Testing - Integration - Runtime:

EmbeddedCassandra

https://github.com/jsevellec/cassandra-unit/

Page 89: Software Development with Apache Cassandra

Testing - Integration - Runtime:

ProcessBuilder to fork Cassandra(s)

Page 90: Software Development with Apache Cassandra

Testing - Integration - Runtime:

CCMBridge:delegate to CCM

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java

Page 91: Software Development with Apache Cassandra

Testing - Integration:

Best Practice:Jenkins should be able to

manage your cluster

Page 92: Software Development with Apache Cassandra

Testing:

Load Testing Goals• reproducible metrics• catch regressions• test to breakage point

Page 93: Software Development with Apache Cassandra

Testing - Load Testing:

Stress.java(lot’s of changes recently)

https://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.htmlhttp://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

Page 94: Software Development with Apache Cassandra

Testing - Load Testing:

Workload recording and playback coming soon

one day

https://issues.apache.org/jira/browse/CASSANDRA-8929

Page 95: Software Development with Apache Cassandra

Testing:

Primary testing goal:Don’t let

cluster behavior surprise you.

Page 96: Software Development with Apache Cassandra

OVERVIEWDATA MODELINGWRITING CODE

TESTING REVIEWING

MANAGING ENVIRONMENTS

Page 97: Software Development with Apache Cassandra

Writing Code:

Metrics API for your own code

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.javahttps://dropwizard.github.io/metrics/3.1.0/

Page 98: Software Development with Apache Cassandra

Writing Code - Instrumentation via Metrics API:

Run Riemann locally

http://riemann.io/

Page 99: Software Development with Apache Cassandra
Page 100: Software Development with Apache Cassandra

Reviewing Said Code:

Using Trace (and doing so frequently)

Page 101: Software Development with Apache Cassandra

Writing Code - Tracing:

Trace per query via DevCenter

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Page 102: Software Development with Apache Cassandra

Writing Code - Tracing:

Trace per query via cqlsh

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Page 103: Software Development with Apache Cassandra

Writing Code - Tracing:

Trace per query via Java Driver

http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Statement.html#enableTracing()

Page 104: Software Development with Apache Cassandra

cqlsh> tracing on;Now tracing requests.cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1;

doc_version------------- 65856

Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b

Page 105: Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…

Page 106: Software Development with Apache Cassandra

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…

Page 107: Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 108: Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

!!?!

Page 109: Software Development with Apache Cassandra

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Page 110: Software Development with Apache Cassandra

Writing Code - Tracing:

Enable traces in the driver

http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html

Page 111: Software Development with Apache Cassandra

Writing Code - Tracing:

`nodetool settraceprobability`

Page 112: Software Development with Apache Cassandra

Writing Code - Tracing:

…then make sure you try it again

with a node down!

Page 113: Software Development with Apache Cassandra

Writing Code - Tracing:

Final note on tracing:do it sparingly

Page 114: Software Development with Apache Cassandra

Writing Code - Tracing:

Enable query latency logging

https://github.com/datastax/java-driver/tree/2.1/features/logging

Page 115: Software Development with Apache Cassandra

Writing Code:

Logging Verbositycan be changed dynamically**

** since 0.4rc1

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html

Page 116: Software Development with Apache Cassandra

Writing Code:

nodetool for developers• cfstats• cfshistograms• proxyhistograms

Page 117: Software Development with Apache Cassandra

Writing Code - nodetool - cfstats:

cfstats:per-table statistics about size

and performance (single most useful command)

Page 118: Software Development with Apache Cassandra

Writing Code - nodetool - cfhistograms:

cfhistograms:column count and partition size vs. latency distribution

Page 119: Software Development with Apache Cassandra

Writing Code - nodetool - proxyhistograms:

proxyhistograms:performance of inter-cluster

requests

Page 120: Software Development with Apache Cassandra

OVERVIEWDATA MODELINGWRITING CODE

TESTING REVIEWING

MANAGING ENVIRONMENTS

Page 121: Software Development with Apache Cassandra

Managing Environments:

Configuration Management is Essential

Page 122: Software Development with Apache Cassandra

Managing Environments:

Laptop to Productionwith NO

Manual Modifications!

Page 123: Software Development with Apache Cassandra

Managing Environments:

Running Cassandraduring development

Page 124: Software Development with Apache Cassandra

Managing Environments - Running Cassandra:

Local Cassandra• easy to setup• you control it • but then you control it!

Page 125: Software Development with Apache Cassandra

Managing Environments - Running Cassandra:

CCM• supports multiple versions• clusters and datacenters• up/down individual nodeshttps://github.com/pcmanus/ccm

Page 126: Software Development with Apache Cassandra

Managing Environments - Running Cassandra:

Docker:• Official image available with excellent docs*• Docker Compose for more granular control**

*https://hub.docker.com/_/cassandra/**https://docs.docker.com/compose/

Page 127: Software Development with Apache Cassandra

Managing Environments - Running Cassandra:

Vagrant• isolated, controlled environment• configuration mgmt integration• same CM for production!

http://www.vagrantup.com/

Page 128: Software Development with Apache Cassandra

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

Page 129: Software Development with Apache Cassandra

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

Page 130: Software Development with Apache Cassandra

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

Page 131: Software Development with Apache Cassandra

chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }

Page 132: Software Development with Apache Cassandra

Managing Environments - Running Cassandra:

Mesos?Compelling features, but not quite there

(though it won't be long)

http://mesosphere.github.io/cassandra-mesos/docs/http://www.datastax.com/2015/08/a-match-made-in-heaven-cassandra-and-mesos

Page 133: Software Development with Apache Cassandra

Summary:• Cluster-level defaults, override in queries • Follow existing patterns (it's not that different)• Segment your tests and use build profiles• Monitor and Instrument• Use reference implementation drivers• Control your environments• Verify any assumptions about failures

Page 134: Software Development with Apache Cassandra

Thanks.

Page 135: Software Development with Apache Cassandra

Nate McCall@zznate

Co-Founder & Sr. Technical Consultantwww.thelastpickle.com

#CassandraDays