big data for labour market information · 2019. 11. 18. · big data for labour market information...
TRANSCRIPT
![Page 1: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/1.jpg)
Big Data for Labour Market Information
Session 7
Architecture: solutions for real-time LMI
(based on KDD)
Alessandro Vaccarino – Fabio Mercorio
Big Data for Labour Market Information – focus on data from online job vacancies – training workshop
Milan, 21-22 November 2019
![Page 2: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/2.jpg)
1. Goal & context2. Challenges
1. The functional architecture
2. Why use micro-services
3. The Team and the pipeline design
4. How handle infrastructure costs
2
Topics
![Page 3: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/3.jpg)
![Page 4: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/4.jpg)
![Page 5: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/5.jpg)
5
Challenges
• Handle a huge amount of near real time data
• Data coming from web Need to detect and reduce noise
• Multi language environment
• Need to relate to classification standards
• Find a way to summarize and present a wide and complex
scenario
![Page 6: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/6.jpg)
![Page 7: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/7.jpg)
1. Goal & context
2. Challenges
1. Stakeholders2. The functional architecture
3. Why use micro-services
4. The Team and the pipeline design
7
Topics
![Page 8: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/8.jpg)
8
Stakeholders
Project
Leader
Key
Users
Domain
Experts
End
Users
![Page 9: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/9.jpg)
• Lead the project with the steering committee
• Define the scope of the project
• Define key organizations
• Maintain relations with stakeholders
• Provide advice
9
Project leader
![Page 10: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/10.jpg)
• Define requirements
• Monitor quality of the project
• Provide input to the development of the project
• Manage the source landscaping
• Validate overall data flow and methodology
10
Key Users
![Page 11: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/11.jpg)
• International Country Experts
• Provide the knowledge and expertise
• Execute the landscaping
• Understand the language/terms of their
context
• Evaluate the accuracy of the results
• Test the product
• Provide feedback
11
Domain Experts
![Page 12: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/12.jpg)
• Decision Makers and Business Users
o (Visual) Explore dataset, analysis and aggregate data
o Define new analysis processes
o Produce Data storytelling
o Make decisions by exploring data
• Data Scientists
o Apply new machine learning models and AI techniques
o Extract new insights from the data
o Apply advanced data modelling to the dataset
• Data Analysts
o Interprets data and turns it into information
o Identifying patterns and trends
o Extract and analyze aggregate data
o Publish and share their analysis
12
End Users
![Page 13: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/13.jpg)
1. Goal & context
2. Challenges
1. Stakeholders
2. The functional architecture3. Why use micro-services
4. The Team and the pipeline design
13
Topics
![Page 14: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/14.jpg)
Overall Data Flow
Data
Ingestion
Pre-Processing Information
Extraction
ETL Presentation
Area
Ingestion Processing Front end
![Page 15: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/15.jpg)
Conceptual architectureData ingestion Data processing Data analysis
Visual
interface
Data lab
Data
Supply
Mo
nit
or
an
d s
ched
ule
r
Cra
wle
r
Data
qu
ality
Data
pro
cess
ing a
nd
cla
ssif
icati
on
ET
L
Dir
ect
acc
ess
Scr
ap
er
Backup
![Page 16: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/16.jpg)
Labour Market
Analysts
Interactive Data
Analytics
Web Scraper
Web Crawler
Direct
Access
Pre-Processing
Information
Extraction and
Classification
Data Management
and Presentation
Employment
Agencies and
Public Employment
Services
Job Portals
Newspaper,
Companies
University Job
Placement
Classified Ads Sites
Job Vacancies
Classified on ISCO
Recognised NUTs
Other dimension
(contract, sector,
education, …)
Document
store
DW
Logical view
![Page 17: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/17.jpg)
DataIngestion
DataProcessing
Modelling, Machine
Learning, AI
Data visualization
Data storage & archiving
System and process monitoring
Automation & management
Input Output
UnstructuredData
Dashboard andinteractive report
Machine to machine
Web App
Physical view
![Page 18: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/18.jpg)
Modelling, Machine
Learning, AI
DataIngestion
DataProcessing
Data visualization
Data storage & archiving
System and process monitoring
Automation & management
Input Output
UnstructuredData
Dashboard andinteractive report
Machine to machine
Web App
Technology view
![Page 19: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/19.jpg)
- Micro-services
- Componentization
- Component specialization
- Small applications
- Portability
- Reuse
- Maintenance
- Scale Out
- Performance
Key design projects
![Page 20: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/20.jpg)
20
Key components
• Data ingestion: collect raw data from OJV in both
structured and unstructured (raw text) formats
• Data processing: classify data through machine
learning techniques
• Data analysis: extract information from data and
make it available through visualization
• Backup: store data in a safe environment to
allow warm and cold restore
![Page 21: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/21.jpg)
Infrastructure Challenges
• parallel ingestion
• high performance
at a glance
• High memory
• storage
•
• Scalable
![Page 22: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/22.jpg)
Big Data Flow
Infrastructure
challenges
Components
by definition
Quality
requirementsMicro-services
design
01010101000101010010101010010101
01010101000100101010100101
01010101000100101010100101
01010101000101010010101010010101
0101010100010010101010010101010101000100101010100101
010101010001010101010010010101010001010101010010
![Page 23: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/23.jpg)
1. Goal & context
2. Challenges
1. Stakeholders
2. The functional architecture
3. Why use micro-services4. The Team and the pipeline design
23
Topics
![Page 24: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/24.jpg)
Microservices
![Page 25: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/25.jpg)
25
Context
Manutability Monitoring Scability
Updates Onboarding
![Page 26: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/26.jpg)
26
Pre-Processing Microservices
Language
Detector
Spam
Filter
Deduplication
component
N-gram
component
Tokenizer
StemmerNo-Vacancy
Filter
Text Cleaner Merge Vacancy
TF-IDF
TransformerDocument2Vec
StopWords
Removers
![Page 27: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/27.jpg)
27
Classification Microservices
Skills
Classifier
Occupation
Classifier
Education
Requirements
Classifier
Industry
Classifier
WorkingHours
Detector
Contract
Detector
Locations
Detector
Dates
Extractor
Salary
Extractor
Experience
Extractor
![Page 28: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/28.jpg)
1. Services on request
2. Network access
3. Resource pooling
1. Governance
4. Quick elasticity
5. Measurement of services
1. Data Quality
2. Performance
6. Portability (on-premises and different cloud services)
7. Polyglot
1. Computer programming languages
2. Technologies
28
Technology requirements
![Page 29: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/29.jpg)
1. Goal & context
2. Challenges
1. Stakeholders
2. The functional architecture
3. Why use micro-services
4. The Team and the pipeline design
29
Topics
![Page 30: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/30.jpg)
1. Cloud Architects
2. Software Architects and Developers
3. Big Data Engineers
4. Data Scientists
5. Domain & Ontology Experts
30
The team
![Page 31: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/31.jpg)
31
Organizations
Cloud
InfrastructureService
Team
Components Micro-service
Service
ExecutionDefine
DesignDeploy
Develop
![Page 32: Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information Session 7 Architecture: solutions for real-time LMI (based on KDD) Alessandro Vaccarino](https://reader035.vdocuments.us/reader035/viewer/2022071015/5fce7e5f3a178f6fe43e49de/html5/thumbnails/32.jpg)
Organize around business services
Language Detector
Occupation Classifier
Salary Extractor Skills Classifier