Download - Divolte Collector - meetup presentation
![Page 1: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/1.jpg)
GoDataDrivenPROUDLY PART OF THE XEBIA GROUP
@asnare / @fzk [email protected]
Divolte Collector
Andrew Snare / Friso van Vollenhoven
Because life’s too short for log file parsing
![Page 2: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/2.jpg)
99% of all data in Hadoop156.68.7.63 - - [28/Jul/1995:11:53:28 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 200 669 137.244.160.140 - - [28/Jul/1995:11:53:29 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 304 0 163.205.160.5 - - [28/Jul/1995:11:53:31 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 4324 163.205.160.5 - - [28/Jul/1995:11:53:40 -0400] "GET /shuttle/countdown/count70.gif HTTP/1.0" 200 46573 140.229.50.189 - - [28/Jul/1995:11:53:54 -0400] "GET /shuttle/missions/sts-67/images/images.html HTTP/1.0" 200 4464 163.206.89.4 - - [28/Jul/1995:11:54:02 -0400] "GET /shuttle/technology/sts-newsref/sts-mps.html HTTP/1.0" 200 215409 163.206.89.4 - - [28/Jul/1995:11:54:05 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 200 1204 163.206.89.4 - - [28/Jul/1995:11:54:05 -0400] "GET /images/shuttle-patch-logo.gif HTTP/1.0" 200 891 131.110.53.48 - - [28/Jul/1995:11:54:07 -0400] "GET /shuttle/technology/sts-newsref/stsref-toc.html HTTP/1.0" 200 84905 163.205.160.5 - - [28/Jul/1995:11:54:14 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 200 1204 130.160.196.81 - - [28/Jul/1995:11:54:15 -0400] "GET /shuttle/resources/orbiters/challenger.html HTTP/1.0" 200 8089 131.110.53.48 - - [28/Jul/1995:11:54:16 -0400] "GET /images/shuttle-patch-small.gif HTTP/1.0" 200 4179 137.244.160.140 - - [28/Jul/1995:11:54:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10136 131.110.53.48 - - [28/Jul/1995:11:54:18 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 200 1204 131.110.53.48 - - [28/Jul/1995:11:54:19 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1713 130.160.196.81 - - [28/Jul/1995:11:54:19 -0400] "GET /shuttle/resources/orbiters/challenger-logo.gif HTTP/1.0" 200 4179 163.205.160.5 - - [28/Jul/1995:11:54:25 -0400] "GET /shuttle/missions/sts-70/images/images.html HTTP/1.0" 200 8657 130.181.4.158 - - [28/Jul/1995:11:54:26 -0400] "GET /history/rocket-history.txt HTTP/1.0" 200 26990 137.244.160.140 - - [28/Jul/1995:11:54:30 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 304 0 137.244.160.140 - - [28/Jul/1995:11:54:31 -0400] "GET /images/launch-logo.gif HTTP/1.0" 304 0 137.244.160.140 - - [28/Jul/1995:11:54:38 -0400] "GET /history/apollo/images/apollo-logo1.gif HTTP/1.0" 304 0 168.178.17.149 - - [28/Jul/1995:11:54:48 -0400] "GET /shuttle/missions/sts-65/mission-sts-65.html HTTP/1.0" 200 131165 140.229.50.189 - - [28/Jul/1995:11:54:53 -0400] "GET /shuttle/missions/sts-67/images/KSC-95EC-0390.jpg HTTP/1.0" 200 128881 131.110.53.48 - - [28/Jul/1995:11:54:58 -0400] "GET /shuttle/missions/missions.html HTTP/1.0" 200 8677 131.110.53.48 - - [28/Jul/1995:11:55:02 -0400] "GET /images/launchmedium.gif HTTP/1.0" 200 11853 131.110.53.48 - - [28/Jul/1995:11:55:05 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786 128.159.111.141 - - [28/Jul/1995:11:55:09 -0400] "GET /procurement/procurement.html HTTP/1.0" 200 3499 128.159.111.141 - - [28/Jul/1995:11:55:10 -0400] "GET /images/op-logo-small.gif HTTP/1.0" 200 14915 128.159.111.141 - - [28/Jul/1995:11:55:11 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786 128.159.111.141 - - [28/Jul/1995:11:55:11 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 200 1204 192.213.154.220 - - [28/Jul/1995:11:55:15 -0400] "GET /shuttle/countdown/tour.html HTTP/1.0" 200 4347 192.213.154.220 - - [28/Jul/1995:11:55:15 -0400] "GET /images/KSC-94EC-412-small.gif HTTP/1.0" 200 20484
![Page 3: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/3.jpg)
GoDataDriven
![Page 4: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/4.jpg)
USER
HTTP request:/org/apache/hadoop/io/IOUtils.html
log transportservice
log event:2012-07-01T06:00:02.500Z /org/apache/hadoop/io/IOUtils.html
transport logs tocompute cluster
off line analytics /model training
batch updatemodel state
serve model result(e.g. recommendations) streaming log
processingstreaming updatemodel state
Typical web optimization architecture
![Page 5: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/5.jpg)
GoDataDriven
Parse HTTP server logs
access.log
![Page 6: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/6.jpg)
GoDataDriven
How did it get there?
Option 1: parse HTTP server logs
• Ship log files on a schedule
• Parse using MapReduce jobs
• Batch analytics jobs feed online systems
![Page 7: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/7.jpg)
GoDataDriven
HTTP server log parsing
• Inherently batch oriented
• Schema-less (URL format is the schema)
• Initial job to parse logs into structured format
• Usually multiple versions of parsers required
• Requires sessionizing
• Logs usually have more than you ask for (bots, image requests, spiders, health check, etc.)
![Page 8: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/8.jpg)
GoDataDriven
Stream HTTP server logs
access.logMessage Queue or Event Transport
(Kafka, Flume, etc.) EVENTS
tail -F
EVENTS
OTHER CONSUMERS
![Page 9: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/9.jpg)
GoDataDriven
How did it get there?
Option 2: stream HTTP server logs
• tail -F logfiles
• Use a queue for transport (e.g. Flume or Kafka)
• Parse logs on the fly
• Or write semi-schema’d logs, like JSON
• Parse again for batch work load
![Page 10: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/10.jpg)
GoDataDriven
Stream HTTP server logs
• Allows for near real-time event handling when consuming from queues
• Sessionizing? Duplicates? Bots?
• Still requires parser logic
• No schema
![Page 11: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/11.jpg)
GoDataDriven
Tagging
index.html script.
js
web server
access.log
tracking server
Message Queue or Event Transport(Kafka, Flume, etc.) EVENTS
OTHER CONSUMERS
web page traffic
tracking traffic(asynchronous)
structured events
structured events
![Page 12: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/12.jpg)
GoDataDriven
How did it get there?
Option 3: tagging
• Instrument pages with special ‘tag’, i.e. special JavaScript or image just for logging the request
• Create special endpoint that handles the tag request in a structured way
• Tag endpoint handles logging the events
![Page 13: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/13.jpg)
GoDataDriven
Tagging
• Not a new idea (Google Analytics, Omniture, etc.)
• Less garbage traffic, because a browser is required to evaluate the tag
• Event logging is asynchronous
• Easier to do inflight processing (apply a schema, add enrichments, etc.)
• Allows for custom events (other than page view)
![Page 14: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/14.jpg)
GoDataDriven
Also…
• Manage session through cookies on the client side
• Incoming data is already sessionized
• Extract additional information from clients
• Screen resolution
• Viewport size
• Timezone
![Page 15: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/15.jpg)
GoDataDriven
Looks familiar?
<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ! ga('create', 'UA-40578233-2', 'godatadriven.com'); ga('send', 'pageview'); !</script>
![Page 16: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/16.jpg)
GoDataDriven
Divolte Collector
Tag based click stream data collection for
Hadoop and Kafka.
![Page 17: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/17.jpg)
GoDataDriven
Divolte Collector
index.html script.
js
web server
access.log
tracking server
Message Queue or Event Transport(Kafka, Flume, etc.) EVENTS
OTHER CONSUMERS
web page traffic
tracking traffic(asynchronous)
structured events
structured events
![Page 18: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/18.jpg)
The TAG
<script src="//tr.example.com/divolte.js" defer async> </script>
![Page 19: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/19.jpg)
Schema!
{ "namespace": "com.example.record", "type": "record", "name": "ClickEventRecord", "fields": [ { "name": "productNumber", "type": ["null", "string"], "default": null }, { "name": "shop", "type": ["null", "string"], "default": null }, { "name": "category", "type": ["null", "string"], "default": null }, { "name": "advisor", "type": ["null", "string"], "default": null }, { "name": "searchPhrase", "type": ["null", "string"], "default": null }, { "name": "basketProductNumber", "type": ["null", "string"], "default": null }, { "name": "basketSizeCode", "type": ["null", "string"], "default": null }, { "name": "basketProductCount", "type": ["null", "string"], "default": null } ] }
![Page 20: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/20.jpg)
Mapping// Page type detector: // http://.../basket basket = "^https?://[^/]+/basket(?:[?#].*)?$" !// Page type detector: // http://.../search?q=fiets search = "^https?://[^/]+/search\\?.*$" !// Page type detector: // http://.../checkout checkout = "^https?://[^/]+/checkout(?:[?#].*)?$" !// Page type detector: // http://.../thankyou payment_ok = "^https://[^/]+/thankyou(?:[?#].*)?$"
![Page 21: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/21.jpg)
MappingpageType { type = regex_name regexes = [ home, category, shop, basket, search, customercare ] field = location } productNumber { type = regex_group regex = pdp field = location group = product } viewportPixelWidth = viewportPixelWidth viewportPixelHeight = viewportPixelHeight screenPixelWidth = screenPixelWidth screenPixelHeight = screenPixelHeight
![Page 22: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/22.jpg)
Configure
divolte { server { host = 0.0.0.0 use_x_forwarded_for = true landing_page = false } ! tracking { cookie_domain = .example.com include "click-schema-mapping.conf" schema_file = /etc/divolte/ClickEventRecord.avsc } !…
![Page 23: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/23.jpg)
Configure
kafka_flusher { enabled = true producer = { metadata.broker.list = [ "broker1:9092", "broker2:9092", "broker3:9092" ] } } !…
![Page 24: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/24.jpg)
Configure hdfs_flusher { hdfs { replication = 3 } ! simple_rolling_file_strategy { roll_every = 60 minutes sync_file_after_records = 1000 sync_file_after_duration = 10 seconds ! working_dir = /divolte/inflight publish_dir = /divolte/published } } }
![Page 25: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/25.jpg)
Run
./bin/divolte-collector
![Page 26: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/26.jpg)
Demo: Javadoc analytics!
javadoc -d outputdir \ -bottom '<script src="//localhost:8290/divolte.js" defer async></script>' \ -subpackages .
![Page 27: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/27.jpg)
Kafka event consumer
![Page 28: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/28.jpg)
GoDataDriven
private static class JavadocEventHandler implements EventHandler<JavadocEventRecord> { private static final String TCP_SERVER_HOST = "127.0.0.1"; private static final int TCP_SERVER_PORT = 1234;! private Socket socket = null; private OutputStream stream;! @Override public void setup() throws Exception { socket = new Socket(TCP_SERVER_HOST, TCP_SERVER_PORT); stream = socket.getOutputStream(); }! @Override public void handle(JavadocEventRecord event) throws Exception { if (!event.getDetectedDuplicate()) { // Avro's toString already produces JSON. stream.write(event.toString().getBytes(StandardCharsets.UTF_8)); stream.write("\n".getBytes(StandardCharsets.UTF_8)); } }! @Override public void shutdown() throws Exception { if (null != stream) stream.close(); if (null != socket) socket.close(); }}
![Page 29: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/29.jpg)
GoDataDriven
public static void main(String[] args) { final DivolteKafkaConsumer<JavadocEventRecord> consumer = DivolteKafkaConsumer.createConsumer( KAFKA_TOPIC, ZOOKEEPER_QUORUM, KAFKA_CONSUMER_GROUP_ID, NUM_CONSUMER_THREADS, () -> new JavadocEventHandler(), JavadocEventRecord.getClassSchema());! Runtime.getRuntime().addShutdownHook(new Thread(() -> { System.out.println("Shutting down consumer."); consumer.shutdownConsumer(); }));! System.out.println("Starting consumer."); consumer.startConsumer();}
![Page 30: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/30.jpg)
SQL FTW!
![Page 31: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/31.jpg)
GoDataDriven
CREATE EXTERNAL TABLE javadoc_analytics ( firstInSession boolean -- other fields are created automatically from schema)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'LOCATION '/divolte/published'TBLPROPERTIES ( 'avro.schema.url'='hdfs:///JavadocEventRecord.avsc');
![Page 32: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/32.jpg)
Python & Spark
![Page 33: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/33.jpg)
GoDataDriven
export IPYTHON=1export IPYTHON_OPTS="notebook --ip=0.0.0.0"pyspark \ --jars divolte-spark-assembly-0.1.jar \ --driver-class-path divolte-spark-assembly-0.1.jar \ --num-executors 40
![Page 34: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/34.jpg)
GoDataDriven
![Page 35: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/35.jpg)
Spark&
Spark Streaming
![Page 36: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/36.jpg)
GoDataDriven
import io.divolte.spark.avro._ import org.apache.avro.generic.IndexedRecord import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ !val sc = new SparkContext() val events = sc.newAvroFile[IndexedRecord](path) !// And then… val records = events.toRecords // or val eventFields = events.fields("sessionId", "location", "timestamp")
![Page 37: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/37.jpg)
GoDataDriven
// Kafka configuration. val consumerConfig = Map( "group.id" -> "some-id-for-the-consumer-group", "zookeeper.connect" -> "zookeeper-connect-string", "auto.commit.interval.ms" -> "5000", "auto.offset.reset" -> "largest" ) val topicSettings = Map("divolte" -> Runtime.getRuntime.availableProcessors()) !val sc = new SparkContext() val ssc = new StreamingContext(sc, Seconds(15)) !// Establish the source event stream. val stream = ssc.divolteStream[GenericRecord](consumerConfig, topicSettings, StorageLevel.MEMORY_ONLY) !// And then… val eventStream = stream.toRecords // or val locationStream = stream.fields("location")
![Page 38: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/38.jpg)
Also in the box
![Page 39: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/39.jpg)
GoDataDriven
Zero config deploy
• Easy to use for local development
• Works out of the box with zero custom config
• Comes with a built in schema and mapping
• Works on local machine without Hadoop
• Flushes to /tmp on local file system
![Page 40: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/40.jpg)
GoDataDriven
Collector has no global state
• Load balancer friendly
• Horizontally scalable
• Shared nothing
• (other than HDFS and Kafka)
![Page 41: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/41.jpg)
GoDataDriven
In stream de-duplication
• The internet is a mean place; data will have noise
• In stream hash based de-duplication
• Low false negative rate
• Virtually zero false positive rate
• Requires URI based routing from load balancer
• Easy to setup on nginx
• Supported on many hardware load balancers
![Page 42: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/42.jpg)
GoDataDriven
Corrupt request detection
• The internet is still a mean place… Some URLS are truncated
• Incomplete events detected and discarded
![Page 43: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/43.jpg)
GoDataDriven
Defeat Chrome’s pre-rendering
• Chrome sometimes speculatively pre-renders pages in the background
• This triggers JS even if the page is not shown
• Unless you use the Page Visibility API to detect this
• Which we do
• We take care of many other JS caveats as well
![Page 44: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/44.jpg)
GoDataDriven
Custom events
• Divolte presents itself as a JS library
• Map custom event parameters directly onto Avro fields
<!-- client side --> <script> divolte.signal("addToBasket", { count: 2, productId: "a3bc38de" }) </script>
// server side mapping eventType = eventType !basketProductId { type = event_parameter name = productId }
![Page 45: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/45.jpg)
GoDataDriven
Bring your own IDs
• Generate page view ID on server side
• Possible to relate server side logging to page views and other client side events
<script src="//…/divolte.js#a28de3bf42a5dc98c03" defer async> </script>
![Page 46: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/46.jpg)
GoDataDriven
User agent parsing
• On the fly parsing of user agent string
• Uses: http://uadetector.sourceforge.net/
• Updates user agent database at runtime without restart
![Page 47: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/47.jpg)
GoDataDriven
IP to geo coordinates
• On the fly enrichment with geo coordinates based on IP address
• MaxMind geoIP database
• https://www.maxmind.com/en/geoip2-databases
• Updates database at runtime without restart
• Sets:
• Latitude & longitude
• Country, City, Subdivision
![Page 48: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/48.jpg)
GoDataDriven
https://github.com/divolte/divolte-collector
https://github.com/divolte/divolte-examples
https://github.com/divolte/divolte-spark
https://github.com/divolte/divolte-kafka-consumer
![Page 49: Divolte Collector - meetup presentation](https://reader030.vdocuments.us/reader030/viewer/2022013111/559967111a28ab871e8b485c/html5/thumbnails/49.jpg)
GoDataDriven
We’re hiring / Questions? / Thank you!
@asnare / @fzk [email protected]
Andrew Snare / Friso van Vollenhoven