Download - South Big Data Hub: Text Data Analysis Panel
![Page 1: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/1.jpg)
Text Data Analysis Panel: South Big Data HubTrey Grainger
SVP of Engineering, Lucidworks
![Page 2: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/2.jpg)
Trey Grainger SVP of Engineering
• Previously Director of Engineering @ CareerBuilder• MBA, Management of Technology – Georgia Tech• BA, Computer Science, Business, & Philosophy – Furman University• Information Retrieval & Web Search - Stanford University
Other fun projects: • Co-author of Solr in Action, plus numerous research papers• Frequent conference speaker• Founder of Celiaccess.com, the gluten-free search engine• Lucene/Solr contributor
About Me
![Page 3: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/3.jpg)
what do you do?
![Page 4: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/4.jpg)
![Page 5: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/5.jpg)
Search-Driven
Everything
Customer Service Custome
r Insights
Fraud Surveillance
Research Portal
Online Retail Digital Content
![Page 6: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/6.jpg)
Lucidworks enables Search-Driven Everything
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations & Alerts Analytics & InsightsExtreme Relevancy
CUSTOMER SERVICE
RESEARCH PORTAL
DIGITAL CONTENT
CUSTOMER INSIGHTS
FRAUD SURVEILLANCE
ONLINERETAIL
•Access all your data in a number of ways from one place.
•Secure storage and processing from Solr and Spark.
•Acquire data from any source with pre-built connectors and adapters.
Machine learning and advanced analytics turn all of your apps into intelligent data-driven applications.
![Page 7: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/7.jpg)
![Page 8: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/8.jpg)
![Page 9: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/9.jpg)
![Page 10: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/10.jpg)
![Page 11: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/11.jpg)
![Page 12: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/12.jpg)
how do you do it?
![Page 13: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/13.jpg)
Solr is the popular, blazing-fast, open source enterprise
search platform built on Apache Lucene™.
![Page 14: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/14.jpg)
Key Solr Features:● Multilingual Keyword search● Relevancy Ranking of results● Faceting & Analytics (nested / relational)● Highlighting● Spelling Correction● Autocomplete/Type-ahead Prediction● Sorting, Grouping, Deduplication● Distributed, Fault-tolerant, Scalable● Geospatial search● Complex Function queries● Recommendations (More Like This)● Graph Queries and Traversals● SQL Query Support● Streaming Aggregations● Batch and Streaming processing● Highly Configurable / Plugins● Learning to Rank● Building machine-learning models● … many more
*source: Solr in Action, chapter 2
![Page 15: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/15.jpg)
The standard for enterprise search.
of Fortune 500 uses Solr.
90%
![Page 16: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/16.jpg)
Reference Architecture (Lucidworks Fusion)
![Page 17: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/17.jpg)
Bay Area Search
Type-aheadPrediction
Building an Intent Engine
Search Box
Semantic Query Parsing
Intent Engine
Spelling Correction
Entity / Entity Type Resolution
Machine-learned Ranking
Relevancy Engine (“re-expressing intent”)
User Feedback (Clarifying Intent)
Query Re-writing Search Results
Query Augmentation
Knowledge Graph
Contextual Disambiguation
![Page 18: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/18.jpg)
Additional References:
![Page 19: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/19.jpg)
what’s next?
![Page 20: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/20.jpg)
Basic Keyword Search(inverted index, tf-idf, bm25, query formulation, etc.)
Taxonomies / Entity Extraction(entity recognition, ontologies, synonyms, etc.)
Query Intent(query classification, semantic query parsing, concept expansion, rules, clustering, classification)
Relevancy Tuning(signals, AB testing/genetic algorithms, Learning to Rank, Neural Networks)
Self-learning
![Page 21: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/21.jpg)
The Three C’sContent:Keywords and other features in your documents
Collaboration:How other’s have chosen to interact with your system
Context:Available information about your users and their intent
Reflected Intelligence “Leveraging previous data and interactions to improve how new data and interactions should be interpreted”
![Page 22: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/22.jpg)
Feedback LoopsUser
Searches
User Sees
ResultsUser
takes an
action
Users’ actions inform system improvements
![Page 23: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/23.jpg)
● Recommendation Algorithms● Building user profiles from past searches, clicks, and other actions● Identifying correlations between keywords/phrases● Building out automatically-generated ontologies from content and
queries● Determining relevancy judgements (precision, recall, nDCG, etc.)
from click logs● Learning to Rank - using relevancy judgements and machine
learning to train a relevance model● Discovering misspellings, synonyms, acronyms, and related
keywords● Disambiguation of keyword phrases with multiple meanings● Learning what’s important in your content
Examples of Reflected Intelligence
![Page 24: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/24.jpg)
Key Technologies• Keyword Search
- Lucene/Solr• Taxonomies / Entity Extraction
- Solr Text Tagger- Word2Vec / Dice Conceptual Search- SolrRDF
• Query Intent- Probabilistic Query Parser (SOLR-9418)- Semantic Knowledge Graph (SOLR-9480)
• Relevancy Tuning- Solr Learning to Rank Plugin (SOLR-8542)
• General Needs: a solid log processing framework (Apache Spark, Lucidworks Fusion, or Solr Daemon Expression)
![Page 25: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/25.jpg)
![Page 26: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/26.jpg)
Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016.
Knowledge Graph
Semantic Knowledge Graph Traversal
software engineer*(materialized node)
Java
C#
.NET
.NET Developer
Java Developer
HibernateScalaVB.NET
Software Engineer
Data Scientist
SkillNodes
has_related_skillStartingNode
SkillNodes
has_related_skill Job TitleNodes
has_related_job_title
0.900.88 0.93
0.93
0.34
0.74
0.91
0.89
0.74
0.89
0.780.72
0.48
0.93
0.76
0.83
0.80
0.64
0.61
0.780.55
![Page 27: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/27.jpg)
Knowledge Graph
![Page 28: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/28.jpg)
Knowledge Graph
![Page 29: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/29.jpg)
Traditional Keyword Search
Recommendations
SemanticSearch
User Intent
Personalized Search
Augmented Search
Domain-awareMatching
![Page 30: South Big Data Hub: Text Data Analysis Panel](https://reader033.vdocuments.us/reader033/viewer/2022051404/58a660fd1a28ab1c5b8b62c9/html5/thumbnails/30.jpg)
Contact InfoTrey Grainger
[email protected] @treygrainger
http://solrinaction.comOther presentations: http://www.treygrainger.com