swiss transport in real time: tribulations in the big data stack
TRANSCRIPT
![Page 1: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/1.jpg)
Swiss Transport in Real Time: Tribulations in the Big Data Stack
Alexandre Masselot Soft-shake, Geneva
October 2016
![Page 2: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/2.jpg)
Swiss Transport in Real Time: Tribulations in the Big Data Stack
Alexandre Masselot Soft-shake, Geneva
October 2016
![Page 3: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/3.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, store, transform and visualize “near real time” data and achieve a posteriori analysis?
This is onlya POC!!!
![Page 4: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/4.jpg)
Finding a dataset
• social media
• finance
• sport
• energy
• transport
• log analysis
• meteorology
• bioinformatics
• personalized health
• monitoring
• security
• IOT
![Page 5: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/5.jpg)
Finding a dataset
• social media
• finance
• sport
• energy
• transport
• log analysis
• meteorology
• bioinformatics
• personalized health
• monitoring
• security
• IOT
![Page 6: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/6.jpg)
www.voev.ch
![Page 7: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/7.jpg)
www.voev.ch
![Page 8: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/8.jpg)
www.voev.ch
![Page 9: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/9.jpg)
www.voev.ch
![Page 10: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/10.jpg)
AAGL Autobus AG Liestal
AAGR Auto AG Rothenburg
AAGS Auto AG Schwyz
AAGU AUTO AG URI
AB Appenzeller Bahnen AG
ABl Autolinee Bleniesi SA
ABF Autobusbetrieb Freienbach
AFA Automobilverkehr Frutigen Adelboden AG
AMSA Autolinea Mendrisiense SA
AOT Autokurse Oberthurgau AG
ARAG Rottal Auto AG
ARBAG Aletsch Riederalp Bahnen AG
ARL Autolinee Regionali Luganesi
AS Autobetrieb Sernftal AG
ASGS Autotransports Sion-Grône-Sierre
ASm Aare Seeland mobil AG
AVG Autoverkehr Grindelwald AG
AVJ Autotransports de la Vallée de Joux
AWA Autobetrieb Weesen-Amden
AZZK Autobus Zürich-Zollikon-Küsnacht
BB Bürgenstock Bahnen
BBA Busbetrieb Aarau AAR bus+bahn
BBBW Bus-Betrieb Binggeli
BDWM BDWM Transport AG
BGU BGU Busbetrieb Grenchen und Umgebung AG
BLAG Busland AG
BLM Bergbahn Lauterbrunnen-Mürren AG
BLS BLS AG
BLT BLT Baselland Transport AG
BLWE Busbetrieb Lichtensteig-Wattwil-Ebnat-Kappel
BOB Berner Oberland-Bahnen AG
BOGG Busbetrieb Olten Gösgen Gäu AG
BOS BUS Ostschweiz AG
BOS-M BOS Management AG
BRB Brienz Rothorn Bahn AG
BRER Busbetrieb Rapperswil-Eschenbach-Rüti
BRSB Braunwald-Standseilbahn AG
BSU Busbetrieb Solothurn und Umgebung AG
BVB Basler Verkehrs-Betriebe
CGN CGN SA
CJ Compagnie des chemins de fer du Jura (C.J.) SA
CROS Crossrail AG
DBSCH DB Schenker Rail Schweiz GmbH
DBZ Dolderbahn Zürich
ETB Emmentalbahn, Huttwil
FART Ferrovie Autolinee Regionali Ticinesi
FB Forchbahn AG
FC FUNICAR Kursbetriebe AG
FLP Ferrovie Luganesi SA
FW Frauenfeld-Wil-Bahn AG
GGB Gornergrat Bahn AG HBSAG Hafenbahn Schweiz AG
JB Jungfraubahn AG
LEB Chemin de fer Lausanne-Echallens-Bercher
LLB AG für Verkehrsbetriebe Leuk-Leukerbad und Umgebung
LSMS Schilthornbahn AG
MBC Transports de la région Morges-Bière-Cossonay SA
MG Ferrovia Monte Generoso SA
MGB Matterhorn Gotthard Bahn
MIB Kraftwerke Oberhasli AG Meiringen-Innertkirchen-Bahn
MOB Chemin de fer Montreux-Oberland Bernois
MVR Transports Montreux-Vevey-Riviera SA
NHB Niederhornbahn
NB Niesenbahn AG
NStCM Chemin de fer Nyon-St. Cergue-Morez
OeBB Oensingen-Balsthal-Bahn
PAG PostAuto Schweiz AG
PB PILATUS-BAHNEN AG
RA RegionAlps SA
RAILG Railgate AG
RB RIGI BAHNEN AG
RBL Regionalbus Lenzburg AG
RBS Regionalverkehr Bern-Solothurn AG
REGO Regiobus Gossau AG
RhB Rhätische Bahn AG
RNCH DB Schenker Rail Schweiz GmbH
RLC railCare
RVBW Regionale Verkehrsbetriebe Baden-Wettingen AG
RVSH SchaffhausenBus, Regionale Verkehrsbetriebe SH AG
SBB SBB AG
SBB-D SBB GmbH
SBC Stadtbus Chur AG
SBF Stadtbus Frauenfeld
SBW Stadtbus Winterthur
SMC Cie de Chemin de Fer+d'Autobus Sierre-Montana-Crans (SMC) SA
SMGN Société des Mouettes Genevoises Navigation SA
SMtS Funiculaire St-Imier - Mont-Soleil SA
SOB Schweizerische Südostbahn AG
SRTAG Swiss Rail Traffic AG
SSIF Società Subalpina di Imprese Ferroviarie S.p.A.
ST Sursee-Triengen-Bahn
STB Sensetalbahn AG
STI Verkehrsbetriebe STI AG
SVB BERNMOBIL Städt. Verkehrsbetriebe Bern
SWAG Seilbahn Weissenstein AG
SZU Sihltal Zürich Uetliberg Bahn SZU AG
THURBO Thurbo AG
TL Transports publics de la région lausannoise SA
TMR TRANSPORTS DE MARTIGNY ET REGIONS SA
TPC Transports Publics du Chablais SA
TPF Transports publics fribourgeois SA
TPG Transports publics genevois
TPL Trasporti Pubblici Luganesi SA
TPN Transports Publics de la Région Nyonnaise SA
TRN Transports Publics Neuchâtelois SA
TRAVYS TRAVYS SA Transports Vallée de Joux-Yverdon-Sainte-Croix
TSD Theytaz Excursions Sion
VB Verkehrsbetriebe Biel
VBD Verkehrsbetrieb der Landschaft Davos
VBG VBG Verkehrsbetriebe Glattal AG
VBH Verkehrsbetriebe Herisau
VBL Verkehrsbetriebe Luzern AG
VBSG Verkehrsbetriebe St.Gallen
VBSH Verkehrsbetriebe Schaffhausen
VBZ Verkehrsbetriebe Zürich
VMCV Transports publics Vevey-Montreux-Chillon-Villeneuve
VSSU Verband Schweizerischer Schifffahrtsunternehmen
VZO Verkehrsbetriebe Zürichsee und Oberland AG
WAB Wengernalpbahn AG
WB Waldenburgerbahn AG
WRS Widmer Rail Services Personal AG
WSB Wynental- und Suhrentalbahn AAR bus+bahn
ZB zb Zentralbahn AG
ZVB Zugerland Verkehrsbetriebe AG
ZVV Zürcher Verkehrsverbund ZVV
AES Ägerisee Schifffahrt AG
BLS BLS AG Schifffahrt Berner Oberland Thuner- und Brienzersee
BPG Basler Personenschifffahrt AG
BSG Bielersee-Schifffahrts-Gesellschaft AG
CGN CGN SA
FHM Zürichsee-Fähre Horgen-Meilen AG
LNM Société de Navigation Lacs de Neuchâtel et Morat SA
NLM Navigazione Lago Maggiore
SBS SBS Schifffahrt AG
SGG Schifffahrts-Genossenschaft Greifensee
SGH Schifffahrtsgesellschaft Hallwilersee AG
SGV Schifffahrtsgesellschaft des Vierwaldstättersees
SGZ Schifffahrtsgesellschaft für den Zugersee AG / Ägerisee
SNL Società Navigazione del Lago di Lugano SA
SW Schiffsbetrieb Walensee AG
URh Schweiz. Schifffahrtsgesellschaft Untersee und Rhein AG
ZSG Zürichsee-Schifffahrtsgesellschaft AG
![Page 11: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/11.jpg)
AAGL Autobus AG Liestal
AAGR Auto AG Rothenburg
AAGS Auto AG Schwyz
AAGU AUTO AG URI
AB Appenzeller Bahnen AG
ABl Autolinee Bleniesi SA
ABF Autobusbetrieb Freienbach
AFA Automobilverkehr Frutigen Adelboden AG
AMSA Autolinea Mendrisiense SA
AOT Autokurse Oberthurgau AG
ARAG Rottal Auto AG
ARBAG Aletsch Riederalp Bahnen AG
ARL Autolinee Regionali Luganesi
AS Autobetrieb Sernftal AG
ASGS Autotransports Sion-Grône-Sierre
ASm Aare Seeland mobil AG
AVG Autoverkehr Grindelwald AG
AVJ Autotransports de la Vallée de Joux
AWA Autobetrieb Weesen-Amden
AZZK Autobus Zürich-Zollikon-Küsnacht
BB Bürgenstock Bahnen
BBA Busbetrieb Aarau AAR bus+bahn
BBBW Bus-Betrieb Binggeli
BDWM BDWM Transport AG
BGU BGU Busbetrieb Grenchen und Umgebung AG
BLAG Busland AG
BLM Bergbahn Lauterbrunnen-Mürren AG
BLS BLS AG
BLT BLT Baselland Transport AG
BLWE Busbetrieb Lichtensteig-Wattwil-Ebnat-Kappel
BOB Berner Oberland-Bahnen AG
BOGG Busbetrieb Olten Gösgen Gäu AG
BOS BUS Ostschweiz AG
BOS-M BOS Management AG
BRB Brienz Rothorn Bahn AG
BRER Busbetrieb Rapperswil-Eschenbach-Rüti
BRSB Braunwald-Standseilbahn AG
BSU Busbetrieb Solothurn und Umgebung AG
BVB Basler Verkehrs-Betriebe
CGN CGN SA
CJ Compagnie des chemins de fer du Jura (C.J.) SA
CROS Crossrail AG
DBSCH DB Schenker Rail Schweiz GmbH
DBZ Dolderbahn Zürich
ETB Emmentalbahn, Huttwil
FART Ferrovie Autolinee Regionali Ticinesi
FB Forchbahn AG
FC FUNICAR Kursbetriebe AG
FLP Ferrovie Luganesi SA
FW Frauenfeld-Wil-Bahn AG
GGB Gornergrat Bahn AG HBSAG Hafenbahn Schweiz AG
JB Jungfraubahn AG
LEB Chemin de fer Lausanne-Echallens-Bercher
LLB AG für Verkehrsbetriebe Leuk-Leukerbad und Umgebung
LSMS Schilthornbahn AG
MBC Transports de la région Morges-Bière-Cossonay SA
MG Ferrovia Monte Generoso SA
MGB Matterhorn Gotthard Bahn
MIB Kraftwerke Oberhasli AG Meiringen-Innertkirchen-Bahn
MOB Chemin de fer Montreux-Oberland Bernois
MVR Transports Montreux-Vevey-Riviera SA
NHB Niederhornbahn
NB Niesenbahn AG
NStCM Chemin de fer Nyon-St. Cergue-Morez
OeBB Oensingen-Balsthal-Bahn
PAG PostAuto Schweiz AG
PB PILATUS-BAHNEN AG
RA RegionAlps SA
RAILG Railgate AG
RB RIGI BAHNEN AG
RBL Regionalbus Lenzburg AG
RBS Regionalverkehr Bern-Solothurn AG
REGO Regiobus Gossau AG
RhB Rhätische Bahn AG
RNCH DB Schenker Rail Schweiz GmbH
RLC railCare
RVBW Regionale Verkehrsbetriebe Baden-Wettingen AG
RVSH SchaffhausenBus, Regionale Verkehrsbetriebe SH AG
SBB SBB AG
SBB-D SBB GmbH
SBC Stadtbus Chur AG
SBF Stadtbus Frauenfeld
SBW Stadtbus Winterthur
SMC Cie de Chemin de Fer+d'Autobus Sierre-Montana-Crans (SMC) SA
SMGN Société des Mouettes Genevoises Navigation SA
SMtS Funiculaire St-Imier - Mont-Soleil SA
SOB Schweizerische Südostbahn AG
SRTAG Swiss Rail Traffic AG
SSIF Società Subalpina di Imprese Ferroviarie S.p.A.
ST Sursee-Triengen-Bahn
STB Sensetalbahn AG
STI Verkehrsbetriebe STI AG
SVB BERNMOBIL Städt. Verkehrsbetriebe Bern
SWAG Seilbahn Weissenstein AG
SZU Sihltal Zürich Uetliberg Bahn SZU AG
THURBO Thurbo AG
TL Transports publics de la région lausannoise SA
TMR TRANSPORTS DE MARTIGNY ET REGIONS SA
TPC Transports Publics du Chablais SA
TPF Transports publics fribourgeois SA
TPG Transports publics genevois
TPL Trasporti Pubblici Luganesi SA
TPN Transports Publics de la Région Nyonnaise SA
TRN Transports Publics Neuchâtelois SA
TRAVYS TRAVYS SA Transports Vallée de Joux-Yverdon-Sainte-Croix
TSD Theytaz Excursions Sion
VB Verkehrsbetriebe Biel
VBD Verkehrsbetrieb der Landschaft Davos
VBG VBG Verkehrsbetriebe Glattal AG
VBH Verkehrsbetriebe Herisau
VBL Verkehrsbetriebe Luzern AG
VBSG Verkehrsbetriebe St.Gallen
VBSH Verkehrsbetriebe Schaffhausen
VBZ Verkehrsbetriebe Zürich
VMCV Transports publics Vevey-Montreux-Chillon-Villeneuve
VSSU Verband Schweizerischer Schifffahrtsunternehmen
VZO Verkehrsbetriebe Zürichsee und Oberland AG
WAB Wengernalpbahn AG
WB Waldenburgerbahn AG
WRS Widmer Rail Services Personal AG
WSB Wynental- und Suhrentalbahn AAR bus+bahn
ZB zb Zentralbahn AG
ZVB Zugerland Verkehrsbetriebe AG
ZVV Zürcher Verkehrsverbund ZVV
AES Ägerisee Schifffahrt AG
BLS BLS AG Schifffahrt Berner Oberland Thuner- und Brienzersee
BPG Basler Personenschifffahrt AG
BSG Bielersee-Schifffahrts-Gesellschaft AG
CGN CGN SA
FHM Zürichsee-Fähre Horgen-Meilen AG
LNM Société de Navigation Lacs de Neuchâtel et Morat SA
NLM Navigazione Lago Maggiore
SBS SBS Schifffahrt AG
SGG Schifffahrts-Genossenschaft Greifensee
SGH Schifffahrtsgesellschaft Hallwilersee AG
SGV Schifffahrtsgesellschaft des Vierwaldstättersees
SGZ Schifffahrtsgesellschaft für den Zugersee AG / Ägerisee
SNL Società Navigazione del Lago di Lugano SA
SW Schiffsbetrieb Walensee AG
URh Schweiz. Schifffahrtsgesellschaft Untersee und Rhein AG
ZSG Zürichsee-Schifffahrtsgesellschaft AG
![Page 12: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/12.jpg)
![Page 13: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/13.jpg)
What do we propose?
https://github.com/alexmasselot/swiss-transport-realtime
![Page 14: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/14.jpg)
![Page 15: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/15.jpg)
![Page 16: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/16.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 17: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/17.jpg)
offline
real time
users
data analysts
vehiclespositions
stationboards
![Page 18: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/18.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 19: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/19.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 20: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/20.jpg)
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
![Page 21: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/21.jpg)
![Page 22: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/22.jpg)
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
![Page 23: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/23.jpg)
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
This is onlya POC!!!
![Page 24: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/24.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 25: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/25.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 26: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/26.jpg)
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
dispatch
vehiclespositions
stationboards
![Page 27: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/27.jpg)
Acquire
SBB rest apivehiclespositionsvehiclespositions
stationboardsstationboards
OpenData transport api
![Page 28: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/28.jpg)
{ id: 12345xyz, category: IR, name: IR 72928, destination: Alpnach, position: { lat: 46.940582, lon: 8.275442 }}
vehiclespositionsvehiclespositions
![Page 29: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/29.jpg)
{ id: 12345xyz, category: IR, name: IR 72928, destination: Alpnach, position: { lat: 46.940582, lon: 8.275442 }}
stationboardsstationboards
{ station: { name: Lausanne, location: {lat, long} }, departures: [ { to:Domodossola, time: 20:13, delayed: 4, prognosis: {
capacity2nd: 3, capacity1st: 1
} }, {…}
vehiclespositionsvehiclespositions
![Page 30: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/30.jpg)
Dispatch
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
dispatch
vehiclespositions
stationboards
![Page 31: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/31.jpg)
Events are streamed to
“Kafka is used for building real-time data pipelines and
streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in
thousands of companies.”
kafka.apache.org
![Page 32: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/32.jpg)
Events are streamed to
“Kafka is used for building real-time data pipelines and
streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in
thousands of companies.”
kafka.apache.org
real time offline
![Page 33: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/33.jpg)
Kafka, RabbitMQ, ZeroMQ…
TIMTOWTDI
![Page 34: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/34.jpg)
Store
format
dispatch
storage
![Page 35: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/35.jpg)
Store
format
dispatch
storagelogstash
![Page 36: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/36.jpg)
Store
format
dispatch
storagelogstash elasticsearch
![Page 37: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/37.jpg)
Store
format
dispatch
storagelogstash elasticsearch
flat fileflat fileflat fileflat fileflat fileflat fileflat files
![Page 38: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/38.jpg)
Logstash, Flume, Filebeat…
TIMTOWTDI
![Page 39: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/39.jpg)
Elasticsearch, HBase, Cassandra…
TIMTOWTDI
![Page 40: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/40.jpg)
real time
transform
dispatch
expose visualization
![Page 41: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/41.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 42: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/42.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 43: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/43.jpg)
Stream transformation• We have an input flow of events and want to:
• know if a train is stopped into a station; • know if a train as exited the network; • expose an aggregated station board.
• We need to: • digest the input flow; • process with temporary state persistance; • be able to expose snapshots.
![Page 44: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/44.jpg)
Stream transformation
• Scala is The language for Big Data (functional & OO)
• Akka (actors): • lightweight entities (one per train, per station); • easy asynchronous communications; • the perfect use case.
• Play framework for REST service, configuration etc.
![Page 45: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/45.jpg)
Spark Streaming, Storm, Flink…
TIMTOWTDI
![Page 46: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/46.jpg)
Spark Streaming, Storm, Flink…
TIMTOWTDI
![Page 47: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/47.jpg)
DevOps
![Page 48: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/48.jpg)
: putting everything together
• The “simple” infrastructure is not so light; • A developper should have everything on his/her
laptop without polluting the machine; • Docker comes to the rescue:
• lightweight containers, • pre-existing images, • docker-compose to describe the infrastructure • deploy directly to AWS or GCE.
![Page 49: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/49.jpg)
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
![Page 50: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/50.jpg)
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
![Page 51: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/51.jpg)
Performance: 2 numbers
![Page 52: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/52.jpg)
Performance: 2 numbers15x faster ajax queries (vs SBB rest)
to gather 30 times more trains
![Page 53: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/53.jpg)
Performance: 2 numbers
15% CPU: nodeJS + kafka + akka + play
15x faster ajax queries (vs SBB rest) to gather 30 times more trains
![Page 54: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/54.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 55: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/55.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 56: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/56.jpg)
A scalable infrastructureKafka partitioning and zookeeper
Logstash ? (but naturally recover on failure)
Elasticsearch partitioning
Spark streaming distributed by essence & write ahead logs
Akka aka cluster, supervisors & failure strategy
Docker Kubernetes, AWS, GCE, Exoscale
![Page 57: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/57.jpg)
offline
real time
users
data analysts
vehiclespositions
stationboards
![Page 58: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/58.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 59: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/59.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 60: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/60.jpg)
![Page 61: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/61.jpg)
![Page 62: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/62.jpg)
JS for large data set
• Only a rendering library (but fast); • Use a flux architecture; • Built by Facebook.
![Page 63: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/63.jpg)
JS for large data set
• Only a rendering library (but fast); • Use a flux architecture; • Built by Facebook. Dispatcher
Store
View
Action
Act
ion
![Page 64: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/64.jpg)
JavaScript for big data viz• React can handle viz >100k elements (don’t show
them individually!)
![Page 65: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/65.jpg)
JavaScript for big data viz• React can handle viz >100k elements (don’t show
them individually!)• Beware of performance issue;
![Page 66: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/66.jpg)
JavaScript for big data viz• React can handle viz >100k elements (don’t show
them individually!)• Beware of performance issue;• Testing is not an option.
![Page 67: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/67.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 68: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/68.jpg)
Is it possible to build a simple scalable infrastructure, to
dispatch, transform and visualize“near real time” massive data
and achieve a posteriori analysis?
![Page 69: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/69.jpg)
4.5 months of data
A. What is the train occupancy during weekdays, between Lausanne and Geneva?
B. When are the train the most delayed?
C. Where are the train the most delayed?
![Page 70: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/70.jpg)
A. Lausanne-Genève: when to have a seat?
![Page 71: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/71.jpg)
Lausanne-Genève: when to have a seat?
![Page 72: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/72.jpg)
Lausanne-Genève: when to have a seat?
![Page 73: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/73.jpg)
Lausanne-Genève: when to have a seat?
Good luckin finding a spot!
![Page 74: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/74.jpg)
or pay…
Lausanne-Genève: when to have a seat?
Good luckin finding a spot!
Wake up earlier!
![Page 75: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/75.jpg)
or pay…
Lausanne-Genève: when to have a seat?
Good luckin finding a spot!
Wake up earlier!
![Page 76: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/76.jpg)
Lausanne-Genève: when to have a seat?
![Page 77: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/77.jpg)
B. When are the trains most delayed?
![Page 78: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/78.jpg)
![Page 79: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/79.jpg)
C. Where are the trains most delayed?
![Page 80: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/80.jpg)
![Page 81: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/81.jpg)
Trains Expected
![Page 82: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/82.jpg)
Trains Delayed
![Page 83: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/83.jpg)
Data analysis tooling…
![Page 84: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/84.jpg)
…or “reproducible science”
![Page 85: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/85.jpg)
a data science notebook
![Page 86: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/86.jpg)
• Web application
• Interactively edit and run pieces of code (analysis steps)
• Inclined towards Python (although other languages are available)
• Beware of performance with large dataset (sample data or use Spark mode)
a data science notebook
![Page 87: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/87.jpg)
Jupyter, Zeppelin, RStudio…
TIMTOWTDI
![Page 88: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/88.jpg)
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
https://github.com/alexmasselot/swiss-transport-realtime
![Page 89: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/89.jpg)
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehiclespositions
stationboards
This is onlya POC!!!
https://github.com/alexmasselot/swiss-transport-realtime
![Page 90: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/90.jpg)
users
data analysts
![Page 91: Swiss Transport in Real Time: Tribulations in the Big Data Stack](https://reader031.vdocuments.us/reader031/viewer/2022030305/587134d31a28abf0568b56b7/html5/thumbnails/91.jpg)
Nov 8th 7 pm, Genève “Banknote Recognition System”
(Machine Learning)
Nov 10th 6 pm, Genève “Data Science & Machine Learning:Explorer, Comprendre Et Prédire”
Demo on OCTO stand