solr for indexing and searching logs
DESCRIPTION
How to index logs from Logstash, Ryslog, Flume, Fluentd, via Morphlines, etc. into Solr and make them searchable.TRANSCRIPT
Using Solr to Search and
Analyze Logs
Radu Gheorghe
@radu0gheorghe@sematext
Elasticsearch API
syslogreceiver
Logsene
Kibana
syslogd
Logstash
What about ?
defining and handling logs in general
4 sets of tools to send logs to
Performance tuning and SolrCloud
syslog
Defining and Handling Logs(story time!)
syslog
syslog
syslog
?
Requirements
1) What’s wrong?
http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png
( for debugging)
Problem
looooots of messages coming in
http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346
Solved with no indexing
BUT
Elasticsearch
Requirements
1) What’s wrong? ✓
2) What will go wrong?
(stats)
Parsing Raw Logs
BUT
mickey mouse 10
user item time
still slow format changes
Parsing Raw Logs
BUT
mickey mouse 0 10
add error code
still slow format changes
Facets. Logging in JSON
2013-11-06… mickey mouse
{ "date": "2013-11-06", "message": "mickey mouse"}
Facets. Logging in JSON
2013-11-06… @cee:{"user": "mickey"}
{ "date": "2013-11-06", "user": "mickey"}
2013-11-06… mickey mouse
{ "date": "2013-11-06", "message": "mickey mouse"}
Requirements
1) What’s wrong? ✓
2) What will go wrong? ✓
3) Handle logs like production data ✓
Requirements
1) What’s wrong? ✓
2) What will go wrong? ✓
3) Handle logs like production data ✓
What is a log?
How to handle logs?
4 Ways of Sending Logs to Solr
logger
Logstash
files
Schemaless
% cd solr-4.5.1/example/% mv solr solr.bak
% cp -R example-schemaless/solr/ .
Automatic ID generation
solrconfig.xml
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> ……..
<processor class="solr.UUIDUpdateProcessorFactory"> <str name="fieldName">id</str> </processor><processor class="solr.LogUpdateProcessorFactory"/><processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
logger
/dev/log
mmjsonparseomprog + script
/dev/log -> parse -> format -> send to Solr
% logger '@cee: {"hello": "world"}'
rsyslog.conf
module(load="imuxsock") # version 7+
/dev/log -> parse -> format -> send to Solr
...
module(load="mmjsonparse")
action(type="mmjsonparse")
/dev/log -> parse -> format -> send to Solr
...template(name="CEE"
type="list") {
property(name="$!all-json")
constant(value="\n")
}
/dev/log -> parse -> format -> send to Solr
...action(type="mmjsonparse")template(name="CEE"…module(load="omprog")
if $parsesuccess == "OK" then action(type="omprog"
binary="/opt/json-to-solr.py"
template="CEE")
/dev/log -> parse -> format -> send to Solr
import json, pysolr, sys
solr = pysolr.Solr('http://localhost:8983/solr/')
while True:
line = sys.stdin.readline()
doc = json.loads(line)
solr.add([doc])
Avro
MorphlineSolr Sink
Avro -> buffer -> parse -> send to Solr
https://github.com/mpercy/flume-log4j-example
flume.confagent.sources = avroSrc
agent.sources.avroSrc.type = avro
agent.sources.avroSrc.bind = 0.0.0.0
agent.sources.avroSrc.port = 41414
Avro -> buffer -> parse -> send to Solr
flume.conf
agent.channels = solrMemoryChannel
agent.channels.solrMemoryChannel.type = memory
agent.sources.avroSrc.channels = solrMemoryChannel
Avro -> buffer -> parse -> send to Solr
flume.conf
agent.sinks = solrSink
agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.solrSink.morphlineFile = conf/morphline.conf
agent.sinks.solrSink.channel = solrMemoryChannel
Avro -> buffer -> parse -> send to Solr
morphline.conf... commands : [
{ readLine { charset : UTF-8 }}
{ grok {
dictionaryFiles : [conf/grok-patterns]
expressions : {
message : """%{INT:pid} %{DATA:message}"""
...
https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries
Avro -> buffer -> parse -> send to Solr
morphline.conf
SOLR_LOCATOR : { collection : collection1 #zkHost : "127.0.0.1:2181" solrUrl : "http://localhost:8983/solr/"}... commands : [
...
{ loadSolr {
solrLocator : ${SOLR_LOCATOR}
...
fluent-logger fluent-plugin-solr
fluent-logger -> fluentd -> fluent-plugin-solr
% pip install fluent-logger
from fluent import sender,event
sender.setup('solr.test')
event.Event('forward', {'hello': 'world'})
fluent-logger -> fluentd -> fluent-plugin-solr
<source>
type forward
</source>
<match solr.**>
type solr
host localhost
port 8983
core collection1
</match>
fluent-logger -> fluentd -> fluent-plugin-solr
% gem install fluent-plugin-solr
doc = Solr::Document.new(:hello => record["hello"])
https://github.com/btigit/fluent-plugin-solr
out_solr.rb
file input solr_http output
Logstashfile
grok filter
logstash.conf:
input { file { path => "/tmp/testlog" }}
file input -> grok filter -> solr_http output
% echo '2 world' >> /tmp/testlog
logstash.conf:
filter { grok { match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"] }}
file input -> grok filter -> solr_http output
{"pid": "2", "hello":"world"}
logstash.conf:
output { solr_http { # master or v1.2.3+ solr_url => "http://localhost:8983/solr" }}
file input -> grok filter -> solr_http output
Fast and Cloud
“It Depends”
http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png
load test monitor: SPM
20% off: LR2013SPM20
|>>>>|Single Core: # of docs/update
http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp
|>>>>|Single Core: Commits
http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpghttp://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png
<autoSoftCommit> <maxTime>...
<autoCommit> <openSearcher>false <maxTime>???
<ramBufferSizeMB>???
|>>>>|Single Core: Size and Merges
http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.pnghttp://mergewords.com/gfx/logo-big.png
omitNorms="true"omitTermFreqAndPositions="true" <mergeFactor>??
|>>>>|Single Core: Caches
http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.pnghttp://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png
http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png
<fieldValueCache ... size="???" autowarmCount="0"
docValues="true"
facets
changing datato sort&facet
SolrCloud: ZooKeeper
bin/zkServer.sh start
OR
java -DzkRun … -jar start.jarhttp://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png
http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
SolrCloud: ZooKeeper
zkcli.sh -cmd upconfig \ -zkhost SERVER:2181 \ -confdir solr/collection1/conf/ \ -confname start
-Dbootstrap_confdir=solr/collection1/conf -Dcollection.configName=start
http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.pnghttp://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
SolrCloud: Start Nodes
java -DzkHost=SERVER:2181 -jar start.jar
Timed Collections
04Nov
05Nov
06 Nov
07Nov
search latest
search all
index
optimize
Collections API
05Nov
06Nov
07 Nov
08Nov
action=CREATE&name=08Nov&numShards=4
action=DELETE&name=05Nov
Aliases. Optimize
05Nov
06Nov
07 Nov
08Nov
action=CREATEALIAS&name=ALL&collection=06Nov,07Nov,08Nov
action=CREATEALIAS&name=LATEST&collection=08Nov07Nov/update?optimize=true
logs =production
data
logs =production
data
Logstash
logs =production
data
Logstash
docs/updatecommits
mergeFactor
omit*docValues
caches
logs =production
data
Logstash
docs/updatecommits
mergeFactor
omit*docValues
caches
logs =production
data
Logstash
docs/updatecommits
mergeFactor
omit*docValues
caches
time
Collections APIaliases
optimize
We’re hiring!
sematext.com/about/jobs