real time business intelligence with cassandra, kafka and hadoop - a real story... (alexandra...

41
Dominique Rondé (@talk2nerd) Alexandra Klimova (@aklimova) Real Time Business Intelligence with Cassandra, Kafka and Hadoop A real story @ Allianz Deutschland AG

Upload: datastax

Post on 16-Apr-2017

295 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

Dominique Rondé (@talk2nerd)Alexandra Klimova (@aklimova)

Real Time Business Intelligence with Cassandra, Kafka and HadoopA real story @ Allianz Deutschland AG

Page 2: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Dominique Rondé Big Data Pilot

Dipl. Wirt.-Inf. (FH)

128479 hrs with Java

40831 hrs with Big Data

14047 hrs Certified Datastax

Cassandra Solution Architect

Twitter: @Talk2Nerd

Alexandra Klimova Big Data Pilotesse

M.Sc. Informatik 75895 hrs with Big Data

40831 hrs with Hadoop

14047 hrs Certified Datastax Cassandra Solution Architect

Twitter: @Aklimova

Page 3: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

We don‘t have an agenda-

We have some checklists!

3. Mai 2023 3

Agenda

Page 4: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Security

Instructions

3. Mai 2023 4

Page 5: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Checklist

Before Engine Start

Define the destination

3. Mai 2023 5

Page 6: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 6

Real Time Reporting

• Sold items for the current day• Open tickets during the day• Response Time on consumer

requests• Sold items grouped by type• Current Errors

Page 7: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 7

Fraud Protection

• Prevent „Fake Accounts“• Figure out „data grabber“• Detect fraud pattern

Page 8: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 8

Helping decision makers to understand the market

• Risk Specialists• Product Designers• Marketing Experts

Page 9: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 9

Our destination

TTDReduce the Time – To – Data

Page 10: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Time to Data is the time which is required until a requester received the data he / she needs to do his / her job.

Time to • find the source of required data• get the needed aggregation• clean up the data• write the statistical scripts• execute and refine these scripts• get a visualized result

3. Mai 2023 10

Definition of TTD

Page 11: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Checklist

Before Taxi

Check if we know all we need

3. Mai 2023 11

Page 12: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

• Decoupled from all other development workChanges in analytics should not require additional work in all other applications

• Allow fast deploymentsLearn through the data and bring improvements fast into production

• High availableNo Event should get lost after it was fired

• Very accurateMake sure that every Event processed

• Horizontal scalableStart small and grow with the data

3. Mai 2023 12

Define functional requirements

Page 13: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

• Data Privacy

• Data Security

• Data Protection

3. Mai 2023 13

Define legal requirements

Page 14: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Checklist

Before Take Off

Do the first steps

3. Mai 2023 14

Page 15: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Picking Measuring points

• Implement servlet filters to stay informed about http headers i.e. error-code, referrer

• Implement interceptors for the or-mapper to store the history of entities

• Instrument the web ui to send events about user interactions i.e. changes between pages

• Instrument the java code to send events with additional data at some points i.e. create a document

Page 16: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Each transfer object holds at least the

• current sessionId• timestamp when this event occurs• unique identifier of this event• version identifier

In some cases• current authenticated user

3. Mai 2023 16

Create some transfer objects

Page 17: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 17

Find an architecture

WebApplication

Reports

Dashboards

R-Scripts

Page 18: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 18

Design you first CF

Design conceptual

model

Specify access pattern

Choose a logical model

Configure physical model

Write a cql script

Page 19: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Checklist

During Take-Off

Run everything up

3. Mai 2023 19

Page 20: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

But mention the difference

Start small

Add nodes

Grow up

Page 21: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Checklist

During Climb Out

Fill your speed-layer

3. Mai 2023 21

Page 22: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Monitor the Instruments

Page 23: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Consume

DataStream<String> messageStream =env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topicName"), new SimpleStringSchema(), properties));

MapDataStream<Tuple3<String,Date,Double>> clickMessageStream = messageStream.map(new ClickEventMapper());

Aggregate

DataStream<Tuple2<Date,Double,String>> aggregatedClickMessageStream = clickMessageStream.map(new KeyStreamMapper()).keyBy("f1").timeWindow(Time.minutes(2)).apply(new KeyWindowFunktion())

Store

CassandraSink.addSink(clickMessageStream).setQuery("INSERT INTO itemssale_by_product (eventtime, price, product) values (?, ?,?);").setClusterBuilder(new ClusterBuilder() {

public Cluster buildCluster(Cluster.Builder builder) {return builder.addContactPoint(„csn-node1.development.allianz.de").build();}

}).build();

Page 24: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Use the cassandra connector coming with Apache Flink since v. 1.1.0

<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-cassandra_2.11</artifactId> <version>1.1.1</version></dependency>

Write aggregated data

Page 25: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

@Table(keyspace= "allianz", name = "itemssale_by_product")public class MyCustomSalesEvent implements Serializable {

private static final long serialVersionUID = 1L;

@Column(name = "product")private String product;@Column(name = "eventdate")private Date eventdate;@Column(name = "price")private double price;

//Getter and Setter}

3. Mai 2023 25

Write aggregated data

Page 26: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

DataStream<MyCustomSalesEvent> clickMessageStream = messageStream.map(new ClickEventMapper());

CassandraSink.addSink(clickMessageStream) .setClusterBuilder(new ClusterBuilder() { @Override public Cluster buildCluster(Cluster.Builder builder) { return builder.addContactPoint(„csn-node1.development.allianz.de").build(); } }) .build();

3. Mai 2023 26

Write aggregated data

Page 27: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Checklist

At 10.000 Feet

Make it safe and fancy

3. Mai 2023 27

Page 28: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 28

Privacy

WebApplication

Reports

Dashboards

R-Scripts

Page 29: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 29

Single gateway to the data

AdHoc Queries

Proof of Thesis

Quick Lookups

PeriodicReports

Web-basedDashboard

3rd PartieReportings

ExpertSystems

Page 30: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 30

Encryption

DC 1

Node 1

Node 3

Node 5

DC 2

Node 0

Node 4

Node 2

Page 31: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

server_encryption_options: internode_encryption: all keystore: nasmount/conf/keystore.node0 keystore_password: changeme truststore: nasmount/conf/truststore.node0 truststore_password: changeme require_client_auth: true

3. Mai 2023 31

Encryption – Just easy to enable

allnone

dc: Cassandra encrypts the traffic between the data centers.rack: Cassandra encrypts the traffic between the racks.

Page 32: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

CREATE TABLE zzz …

with compression_parameters:sstable_compression = 'Encryptor'... and compression_parameters:cipher_algorithm = 'AES/ECB/PKCS5Padding'... and compression_parameters:secret_key_strength = 128;

3. Mai 2023 32

Encryption – With DSE

Page 33: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

• ZeppelinIs ok as developer or data scientist toolNot suitable for C-Level reports

• MicroStrategyOnly support of Cassandra 2.xNeeds write permissions for the Column family (?)

• TablaeuAccess Cassandra via Spark (?)

3. Mai 2023 33

Hard to find a visualization solution

Page 34: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

• D3.jsIs great to visualize and has stunning featuresNeeds an AngularJS developer to create a new report

• RProvides simple visualizationNeeds knowledge in R

3. Mai 2023 34

Hard to find a visualization solution

Page 35: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

CREATE ROLE flink;

CREATE ROLE productsales;CREATE ROLE riskanalyst; GRANT SELECT ON allianz.solditems TO productsales;GRANT SELECT ON allianz.riskdata TO riskanalyst;

GRANT MODIFY ON KEYSPACE allianz TO flink;

3. Mai 2023 35

Limit read / write access

Page 36: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

The maximum period to store some detailed information is limited by law

We have to ensure that me meet this requirement

TTL in cassandra does this job well

INSERT INTO proposal (id,date,product,price) VALUES (‘p-4711’, ‘09.09.2016’,’product-1’,50.00);UPDATE proposal USING TTL 86400 SET firstname = ‘Joe’ WHERE id = ‘p-4711’;UPDATE proposal USING TTL 86400 SET lastname = ‘Doe’ WHERE id = ‘p-4711’;UPDATE proposal USING TTL 172800 SET city = ‘Berlin’ WHERE id = ‘p-4711’;

3. Mai 2023 36

Remove outdated events

Page 37: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Checklist

At cruising altitude

Work with it

3. Mai 2023 37

Page 38: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

Circle of dataMeet the experts

Extract and Enrich data

Aggregate data

Analyse the dataVisualize

Test Hypothesis

Discuss Actions

Page 39: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz 3. Mai 2023 39

Recalculate theSpeed-Layer

WebApplication

Page 40: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

#Load RJDBClibrary(RJDBC)

#Load in the Cassandra-JDBC divercassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", list.files(„/opt/cassandra/lib/",pattern="jar$",full.names=T))

#Connect to Cassandra node and Keyspacecasscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9160/allianz")

3. Mai 2023 40

Bring the Data to R

Page 41: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

© Copyright Allianz

#Query timeseries datares <- dbGetQuery(casscon, "select * from solditems limit 10")

#Transposetres <- t(res[2:10])

#Plotboxplot(tres,names=res$KEY,col=topo.colors(length(res$KEY)))title("BoxPlot of 10 Sold Items prices Historie")

3. Mai 2023 41

Bring the Data to R