Download - Understanding DSE Search by Matt Stump
![Page 1: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/1.jpg)
DSE 4.7 SearchMatt Stump, Chief Architect/Manager for SWAT, DataStax
Thank you for joining. We will begin shortly.
![Page 2: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/2.jpg)
![Page 3: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/3.jpg)
All attendees
placed on muteInput questions at any time
using the online interface
Webinar Housekeeping
![Page 4: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/4.jpg)
1 Data Locality
2 Bitmap Indexing
3 IO Path
4 Demo
5 Performance
6 Why DSE?
Agenda
![Page 5: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/5.jpg)
Hash(“some bytes”) => A Number
![Page 6: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/6.jpg)
![Page 7: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/7.jpg)
![Page 8: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/8.jpg)
![Page 9: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/9.jpg)
![Page 10: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/10.jpg)
![Page 11: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/11.jpg)
![Page 12: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/12.jpg)
![Page 13: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/13.jpg)
??
![Page 14: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/14.jpg)
![Page 15: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/15.jpg)
![Page 16: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/16.jpg)
![Page 17: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/17.jpg)
V1 OR V2
![Page 18: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/18.jpg)
![Page 19: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/19.jpg)
![Page 20: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/20.jpg)
![Page 21: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/21.jpg)
Quick to ReadExpensive to Update
![Page 22: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/22.jpg)
![Page 23: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/23.jpg)
![Page 24: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/24.jpg)
![Page 25: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/25.jpg)
![Page 26: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/26.jpg)
![Page 27: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/27.jpg)
Near Real Time is Expensive
![Page 28: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/28.jpg)
![Page 29: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/29.jpg)
![Page 30: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/30.jpg)
![Page 31: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/31.jpg)
![Page 32: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/32.jpg)
Use 32 vnodes in DSE 4.7.1
![Page 33: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/33.jpg)
{ 'asin': '0007148089', 'title': "Blood and Roses: The Tumultuous Wars of the Roses", 'price': 5.98, 'imUrl': 'http://ecx.images-amazon.com/images/I/518p8d64F8L.jpg', 'related': { 'also_bought': ['0061430765', '0061430773’,'B00A4E8E78'], 'buy_after_viewing': ['0061430773', '0345404335', 'B00A4E8E78', '0975126407'] }, 'salesRank': {'Books': 326205}, 'categories': [['Books']]}
![Page 34: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/34.jpg)
CREATE TABLE IF NOT EXISTS amazon.metadata ( asin text, title text, imurl text, price double, categories set<text>, also_bought set<text>, buy_after_viewing set<text>, PRIMARY KEY(asin));
CREATE TABLE IF NOT EXISTS amazon.rank ( asin text, category text, rank int, PRIMARY KEY(asin, category));
![Page 35: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/35.jpg)
dsetool create_core amazon.metadata generateResources=true
dsetool create_core amazon.rank generateResources=true
http://localhost:8983/solr/#/amazon.metadata
http://localhost:8983/solr/#/amazon.rank
![Page 36: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/36.jpg)
Index Size
![Page 37: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/37.jpg)
Index Size
• Core index size• Fields, term frequency, count, and settings• Number of dynamic fields and frequency using Luke• termVectors="false" • termPositions="false" • termOffsets="false"• omitNorms="true"• Only index fields you intend to search
![Page 39: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/39.jpg)
Indexing throughput
• Set autoSoftCommit as high as possible• Disable all caches except filterCache• Increase RAM buffer to 512-1024MB• Enable realtime indexing• Large heap (20GB) with G1 or 8150 tuning• Increase back_pressure_threshold_per_core to 2000-5000• Set max_solr_concurrency_per_core to number of cores• Recommend more cores (32)
![Page 40: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/40.jpg)
Live Indexing Throughput
![Page 41: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/41.jpg)
Live Indexing Throughput
![Page 42: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/42.jpg)
Live Indexing Throughput
![Page 43: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/43.jpg)
Query Latency and Throughput
• Set autoSoftCommit as high as possible• Disable all caches except filterCache• Use docValues for faceted or sorted fields• Large heap (20GB) with G1 or 8150 tuning• Move query parameters to filters• Use single pass queries where possible• Recommend more cores (32)
![Page 44: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/44.jpg)
Query Latency and Throughput
• DSETool Performance objects• Solr slow query log• Tracing• Use Jbean com.datastax.bdp.search DSP-2792
– EXECUTE– RETREIVE– COORDINATE
![Page 45: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/45.jpg)
CASSANDRA-8150 Tuning
MAX_HEAP_SIZE="20G"HEAP_NEWSIZE="6G"JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2"JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=8"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=10"JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768"
JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=10000"JVM_OPTS="$JVM_OPTS -XX:+CMSEdenChunksRecordAlways"JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled"
![Page 46: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/46.jpg)
CASSANDRA-7486 (G1) Tuning
MAX_HEAP_SIZE="20G"JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
# set these to the number of coresJVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=8"JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=8"JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:G1ReservePercent=15"JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32"
![Page 47: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/47.jpg)
DSE 4.7 Improvements
DSP-4477 - Pivot facetingDSP-4476 - PaginationDSP-3740 - Live indexingDSP-4091 - Remove support for stored copy fieldsDSP-4703 - Query Solr from SparkDSP-4518 - Improved memory usage for facetingDSP-3931 - Filter cache sizing is now global across all segmentsDSP-4475 - Verify/Integrate single pass distributed queries (SOLR-5768)DSP-4091 - Remove support for stored copy fieldsDSP-4072 - Fault-tolerant distributed queriesDSP-3958 - Improve shard routing by taking into account node health factorsDSP-3935 - Implement faceting inside CQL Solr queries
![Page 48: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/48.jpg)
DSE vs ElasticSearch
Feature DSE ElasticSearch
Replication and multiple datacentersBased on Cassandra, multi-DC support for free,
real-time replication, high availabilityMaster slave, long replication delay, doesn't do
multi-DC well
Scalability Hundreds of nodes, hundreds of terabytes 10s of nodes a couple terabytes
Data loss possible No Yes
Primary Data Store Yes No
Operational Complexity Single system Multiple systems
Analytics Yes No
Dynamic Schema Sorta Sorta, slightly easier
![Page 49: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/49.jpg)
Increased performance by 700% while growing data by 500%
![Page 50: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/50.jpg)
Reduced operational costs by 40%
![Page 51: Understanding DSE Search by Matt Stump](https://reader035.vdocuments.us/reader035/viewer/2022062711/55c04437bb61ebc4708b465c/html5/thumbnails/51.jpg)
Deleted 15,000 lines of code