![Page 1: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/1.jpg)
Crawling, Parsing and Semantic Matching of Vacancies and CV’s
Semantic Recruitment Technology
Jakub Zavrel, TextkernelInGRID Workshop 11-2-2014
![Page 2: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/2.jpg)
Textkernel: • Spinoff from R&D in machine learning and language
technology
• Founded 2001, offices in Amsterdam (HQ), Frankfurt, Paris, 45 employees; strong R&D focus
• Deloitte Fast 50 2007, 2010, 30% YoY growth
• Core technology: Understanding unstructured text data. Multi-lingual
Market:
• Job boards, Recruitment Software, Staffing and recruitment, Mobility, Large Employers
• Products:
• Multi-lingual tools (15 languages) to extract CVs and jobs
• Jobfeed: largest real time DB for job market analysis
• Search! & Match! to connect people and jobs
• Customers: UWV, Pole Emploi, Adecco, Randstad, USG, Monster, Stepstone, XING, SAP, Unisys, Bosch, Axa, Philips, etc. (350 direct, 2000+ indirect),
• Large partner network (HR & recruitment software)
![Page 3: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/3.jpg)
I like programming, but I’m interested do take on more project management responsibility
Is there a job in our organisation that better fits my degree?
I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.
I’d like to do more with my organisational talent.
We are looking to hire:An experienced tech team team lead
Language gap
The ideal candidate has:- min. 5yr of experience- Certfied scrummaster- Exp. w/iOS, Android
Completed academic studies Computer Science or related
30% travel for customer presentations
![Page 4: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/4.jpg)
The Job ad searches directly in a database and identifies relevant candidates (or vice
versa) …
![Page 5: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/5.jpg)
![Page 6: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/6.jpg)
Automatically convert each document into a complete record
Extract! CV/Job Parsing
![Page 7: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/7.jpg)
Extract!
![Page 8: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/8.jpg)
Extract!
![Page 9: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/9.jpg)
Extract!
![Page 10: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/10.jpg)
Extract!
![Page 11: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/11.jpg)
Extract! – Zero data entry job application
![Page 12: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/12.jpg)
Extract!
![Page 13: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/13.jpg)
• Time savings coding CVs and Jobs• If you accept noise, 100% time savings• Structured data allows better search:
Semantic Searching and Matching• Coding enables reporting and statistics
Extract!
![Page 14: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/14.jpg)
• Coding follows Extraction• Customer specific or standard taxonomies• String similarity based normalization• Lot of synonyms per language• Distance = confidences • Problem cases: ambiguity, context, long tail• More complex models can help
(classifiers, multi-variate models)• Semantic matching better (occupation coding errors are
counterbalanced by other variables)
Occupation coding!
![Page 15: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/15.jpg)
• Semantic search:
„Lets you find what you mean not what you type“
Impression...
Search!
![Page 16: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/16.jpg)
Match!
Match!
![Page 17: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/17.jpg)
Semantic Matching Technology:
• Natural Language Processing
• Machine Learning
• Semantic Analysis
• Probabilistic Language Model
• Search Engine
• Multi-lingual taxonomies
• Recruitment knowledge-bases
![Page 18: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/18.jpg)
Demo
![Page 19: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/19.jpg)
Search and analyse real-time online job ads as well as historical
data
Jobfeed
![Page 20: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/20.jpg)
Jobfeed
![Page 21: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/21.jpg)
Jobfeed!
Knowledge of all demand for labour in European job market
– Sales leads for recruitment and staffing companies– Real time labour market analytics tools– Largest database of jobs for matching unemployed– Perfect data source for text mining
![Page 22: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/22.jpg)
Jobfeed!• Real time collection of online job ads from any
(unstructured) source
• Available in NL, DE, FR, IT• Gradually rolling out in rest of Europe• Richly semantically structured data
![Page 23: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/23.jpg)
Jobfeed!
![Page 24: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/24.jpg)
Jobfeed: Multilingual Occupation Taxonomy
Occupations >4000 codes4 languages3 layer hierarchy
>50K synonyms
Link to other concepts:- Skills- Education level- Sector- O*NET- UWV (Dutch Employment Agency)- ROME
Based on millions of jobs, years of customer feedback and experience!
Example: NL: administratief medewerker, EN: administrative assistant, FR: employé administratif, DE: Verwaltungsassistent (m/w).
Group: administrative personnelClass: Administration and Customer ServiceSynonyms: administrative employee, assistant clerk, office support
Skills: ms office, excel, english language, etc
O*NET: 43-9199.00: Office and Administrative Support Workers, All OtherUWV: 1000402563: Administratief medewerker secretariaat
![Page 25: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/25.jpg)
Demo
![Page 26: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/26.jpg)
Jobfeed as material for Research
![Page 27: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/27.jpg)
![Page 28: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/28.jpg)
![Page 29: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/29.jpg)
![Page 30: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/30.jpg)
![Page 31: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/31.jpg)
Frequent words for "Java developer"
en
van
de
een
je
met
in
het
Java
of
Je
op
is
voor
te
ervaring
aan
als
and
software
omteamzijnkennisbijErvaringdiethenaara
jaarjijbentDeveloperHBOhebttowerken
werk
![Page 32: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/32.jpg)
Frequent words for all professions
en
van
de
een
in
het
je
met
op
Je
voor
te
is
of
zijn
aan
bent
naar
bij
om
alservaringdieHethebtdezewerkenzoekDewij
functieonzebentotoverwerkopleidinguitandwerkzaamheden
datbinnenuAlsVoorzelfstandigkennisooksverantwoordelijk
![Page 33: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/33.jpg)
Solution: contrast frequencies
• Observed frequency of w: • O(w) = A• Expected frequency of w: • E(w) = C * B / D• Pick words with highest
score:• score(w) = (O - E)2 / E
Java develo
per jobs
Alljobs
# jobs where
w occurs
A B
Total # jobs C D
![Page 34: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/34.jpg)
Top words for "Java developer"
java
developer
software
spring
scrum
agile
hibernate
ontwikkelaar
u
j2ee
developmentmavenapplicatieservaringwebdeframeworksjbossmbosenior
wijxmljeeojavascriptyoukennisontwikkelenoracleontwikkeling
architectuurwebservicesinformaticawerkzaamhedentechnologiedeveloperseclipsebezithetteam
worijbewijstechniekentomcatthevcazelfstandigarchitectwerklocatiehtml
Building rich skills profiles for thousands of occupations from millions of real time jobs…
… new trends and occupations…
![Page 35: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/35.jpg)
Supply & Demand
• Have: lots of data, technology, ideas
• Want: labor market expertise, students, research
![Page 36: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014](https://reader036.vdocuments.us/reader036/viewer/2022062303/551a6d8055034643688b4f89/html5/thumbnails/36.jpg)
Semantic Recruitment Technology
Thanks!