auslug2012 - client serve and application monitoring and optimization done right!
TRANSCRIPT
AusLUG2012
Meet.Share.Learn
29th & 30th March, Melbourne, Victoria, Australia
Florian Vogler | CEO & CTO | panagenda www.panagenda.com
Client, Server and Application Monitoring and Optimization done right
Efficiency describes the extent to which time or effort is well used for an intended task or purpose.
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Agenda
Who am I? … and about panagenda
Laying the basics of what is actually possible – or:
• What Admins and IT departments have to cope with
Deep Diving …
• The 30 most important server statistics (out of ~2.000)
• … and Clients?
• … and Groups?
• … and Databases?
Coming up next …
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
About Florian Vogler CEO & CTO – (hopefully) representative for the great work of my colleagues at panagenda
Born in Hamburg (DE), lived in London (UK), Vienna (AT), Frankfurt (DE), Alicante (ES); currently back in Frankfurt (DE)
Lotus Notes / Domino since 1992
Started to work with Notes at Raiffeisen Austria
• Administration and Development • 35,000 user worldwide (today > 100,000)
Since 2002 core competency Client Management, Notes / Domino infrastructure analysis and optimization
I enjoy working with many great companies in many different countries (I travel *a lot*)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
About panagenda We network symbiotic relationships with our customers and partners for ongoing joint win-win
HQ: Vienna/AT, offices in Heppenheim near Frankfurt/DE, Boston/USA
Development of standard products
> 4 million licenses in over 70 countries
IBM Lotus Notes Client Management
MarvelClient :: „99%“ manageability
(not „just“ IBM Lotus Domino) Server Analytics, Monitoring & Reporting
GreenLight :: realtime, longterm, smart
Analyze Groups, Certifiers and ACLs
GroupExplorer :: better transparency, security & automation
plus: NameChanger (Name changes), DatabaseExplorer (Design Analysis), Notes2Web (Web transformation)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Agenda
Who am I? … and about panagenda
Laying the basics of what is actually possible – or:
• What Admins and IT departments have to cope with
Deep Diving …
• The 30 most important server statistics (out of ~2.000)
• … and Clients?
• … and Groups?
• … and Databases?
Coming up next …
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
What Admins and IT departments have to cope with
• Above all: Lack of knowledge (apologies)
• Mostly because of overstress No time (anymore) for the inner workings of clients, servers, and systems Growing complexity of single systems Growing number of systems „Laying the egg“ = yes;
(Proactive) „Nurturing“ = no. • Unknown sources of knowledge
• Lack of time
• If you don't take the time to do things right you’ll need the time to do them over
• „Wrong“ &| missing tooling
Grown environments: large servers are fundamentally different from small ones; new ones (8) from old ones (< 8)!
Newborn bear
3 month old bear, without fur
Full-grown teddy with thick fur
Development stages of teddy bears
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
What Admins and IT departments have to cope with Systemic interactions / dependencies in Lotus Notes / Domino
Servers
Databases Clients
Hardware (CPU, Memory) Data storage Network connection Configuration Databases, tasks, mail traffic …
ODS Size Reader fields Design # & Size of documents …
Hardware Data storage
NW connection Configuration
Databases …
Across all: Geographies Network (bandwidth, structure) Online/Offline Clustering/Loadbalancing …
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Public NAB – Servers
– Clusters
– People/Groups
– Directory
– Messaging
– Replication
– Policies
– Web Configuration
Lotus Domino „out of the box“ tooling
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Lotus Domino „out of the box“ tooling
Public NAB (8) Log.nsf
– Miscellaneous (!!)
– Replication
– (Database) Usage
– Passthru Connections
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Lotus Domino „out of the box“ tooling
Public NAB (8) Log.nsf (5) Admin Client
– Monitoring
Tip 1: Enable Health-
Monitoring in Admin Preferences
Tip 2: Disable „Refresh server bookmarks“
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Lotus Domino „out of the box“ tooling
Public NAB (8) Log.nsf (5) Admin Client
– Monitoring (1)
– Analysis (~15) (ACL, Catalog, AdminP, ...)
– Statistics (1 or ~1.200)
– Activity Trends („1“)
– Messaging („1“)
– Replication („1“)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Lotus Domino „out of the box“ tooling
Public NAB (8) Log.nsf (5) Admin Client (20) Events (1) & DDM …
– Probes
– Filters
– Collection Hierarchy
– Event Handlers
– Event Generators
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Lotus Domino „out of the box“ tooling
Public NAB (8) Log.nsf (5) Admin Client (20) Events & DDM (6) Monitoring Results (statrep)
– Alarms
– Events
– Statistic Reports
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Lotus Domino „out of the box“ tooling
42
Although 42 is „the answer to life,
the universe and everything“ (according to the Hitchhikers Guide to the Galaxy)
that doesn‘t help much for LN/D Monitoring & Analysis
Tip 3: In case you don‘t know the Hitchhikers Guide to the Galaxy from Douglas Adams Must Read
Public NAB (8) Log.nsf (5) Admin Client (20) Events & DDM (6) Monitoring Results (statrep)
(3)
8 + 5 + 20 + 6 + 3 = 42
That‘s at least 42 views / areas, one should monitor ...
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Making more of what you already have
• Many companies don‘t even use what‘s in the box already …
• (As said earlier): Realtime Server Monitoring with Health Monitoring • DDM – Domino Domain Monitoring (sometimes a bit too much, but then again much better than nothing!) • Frequent reviews of Groups • Frequent checking of the most
important server stats (more of that later)
• Look through Lotusphere presentations
• … • Investigate Usage-views in log.nsf;
for example …
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
A sample analysis of usage information from log.nsf (that you can do yourself easily)
Copy/Paste in Excel Daten Sortieren nach z.B. Transaktionen
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Possibilities are endless (unfortunately, time is not)
• In almost all of the beforementioned areas one can (and should) „dig deeper“
• Unfortunately, digging deeper requires (time-consuming) correlation of data, e.g. …
• Connection documents and log.nsf (db usage): How much Mail- and/or Replication traffic is there between which
servers? • Clients and log.nsf - database usage:
Which users cause what load from where? • Database details from clients and servers:
Who has replicas of databases s/he no longer has access to? Who has (unencrypted) replicas of critical databases?
• Network compression between servers and clients
• A lot of the data is either already there or (relatively ;-)) easy to get a hold of
• Correlation pays back (repeatedly) …
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
A picture says a thousand words … Topological visualization of Mail- & Replication-Traffic between Servers
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
A picture says a thousand words …
75% of your IBM Notes Clients use port compression (35,409 of 47,212 clients)
= 1.000 Clients
87% of your IBM Lotus Domino servers use port compression (33 of 38 servers)
= 1 Server
One way to look at network compression
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
A picture says a thousand words … Another way to look at network compression
2.30 3.30 1.65
0
2
4
current setup no port compression full port compression
saved (GByte) transfered (GByte)
● Network transfer volume per day: 3.3 Gbyte
● Current settings: 60% configured „correctly“ ~1 GByte / 30% saved
● Applying port compression to all your servers and clients could save you an additional ~0.65 GByte every day which is an additional 28% reduction / absolute 50% reduction of traffic
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Agenda
Who am I? … and about panagenda
Laying the basics of what is actually possible – or:
• What Admins and IT departments have to cope with
Deep Diving …
• The 30 most important server statistics (out of ~2.000)
• … and Clients?
• … and Groups?
• … and Databases?
Coming up next …
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Before we look at the 30 most important server statistics … • Difficult – if not impossible – to test in the lab • Start with the obvious / easy things • Note down current settings before changing them • Think in possible interdependencies • „Too much good“ can actually harm performance
(or lead to „Out of Memory“) • Don‘t change (too) many things at once
• Unless it‘s absolutely necessary / so „documented“
• Watch your servers for some (sense making) time after making changes
• Check whether/that your servers are doing better
• „Google“ • Think along/ahead • Have the heart to try • This is just the beginning – stay curious!
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
And another preliminary note (last one(s), promised ;-))
• Many of the following statistics cannot be grasped with a ‚single‘ „sh sta“, but require analysis „over time“
• Otherwise you won‘t know whether you‘re looking at a permant / recurring / onetime / sometime problem • Otherwise you won‘t know whether changes actually improved things (or made things worse) • A picture says a thousand words …
• Admin Client can be used
as a starting point … (unfortunately, it is very limited)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
ViewRebuildDir & Disk optimization(s)
Most important of all: free disk space & disk performance („30%“ to prevent fragmentation)
Seperate, dedicated disks for …
– Translog – Data – If possible, own disk for page file/OS – „ViewRebuildDir“=…
view indexing on its own disk – From 8.5.3. on where necessary/wanted
.ft-directories on own disk – DAOS („cheap“)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Server.Availability Shows how available = „ready to respond“ a Server is (in %)
< 30% means trouble (or loadbalancing); IF the Availability Index is correct in the first place …
(Only!) if the server is well busy: „sh ai“ on server console; results in recommendation on how to tune ini:SERVER_TRANSINFO_RANGE
From notes 8.5 and up, you are advised to set:
– notes.ini: Server_MinPossibleTransTime=1500 – notes.ini: Server_MaxPossibleTransTime=20000000
Important: Delete loadmon.ncf after server shutdown in order to delete old values
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Keep an eye on Monitor.* Warnings; Examples
Monitor.Last.ADMIN PROCESS.Warning(High)Text = Disk space statistics could not be found on Servername/Cert.
Monitor.Last.EVENT MONITOR.Warning(High)Text = Event: Error adding event document to Domino Domain Monitoring: Event correlation cache is full. You can increase its size via the NOTES.INI setting EVENT_CORRELATION_POOL_SIZE.
Monitor.Last.INDEX ALL.Warning(High)Text = Error updating view '#4538' in mail\nameabc.nsf: The single copy template associated with this database cannot be located.
Monitor.Last.SMTP SERVER.FailureText = SMTP Server: Initialization failure: Message Queue name already in use.
Monitor.Last.STATISTICS.Warning(High)Text = Unable to update activity document in log database for mail\namexyz.nsf: In Datenbank kann nicht geschrieben werden, da die Datenbank die erlaubte Größe überschreiten würde.
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Server.Sessions.Dropped
Tells you how many sessions have been ‚dropped‘ since last server restart
Happens when
• issuing a serverside „Drop all“
• Pressing Ctrl+Break on clients („frustration-meter“)
Server.Sessions.Dropped = 25407
18/6 – 18/10 = 4*30 = 120 days
25407 / 120 = 211 sessions dropped per day
Should be further correlated with peak # of users
„different“ Problem
„Drop all“
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Platform.LogicalDisk.*
Platform.LogicalDisk.1.AssignedName = D Platform.LogicalDisk.1.AvgQueueLen = 0 Platform.LogicalDisk.1.AvgQueueLen.Avg = 0,01 Platform.LogicalDisk.1.AvgQueueLen.Peak = 1,01 Platform.LogicalDisk.1.BytesReadPerSec = 0 Platform.LogicalDisk.1.BytesWrittenPerSec = 10.172,49 Platform.LogicalDisk.1.PctUtil = 0,22 Platform.LogicalDisk.1.PctUtil.Avg = 0,86 Platform.LogicalDisk.1.PctUtil.Peak = 101,07 Platform.LogicalDisk.1.ReadsPerSec = 0 Platform.LogicalDisk.1.WritesPerSec = 2,07
Platform.LogicalDisk.2.AssignedName = C Platform.LogicalDisk.2.AvgQueueLen = 0,01 Platform.LogicalDisk.2.AvgQueueLen.Avg = 0,73 Platform.LogicalDisk.2.AvgQueueLen.Peak = 34,74 Platform.LogicalDisk.2.BytesReadPerSec = 17.272,75 Platform.LogicalDisk.2.BytesWrittenPerSec = 63.697,52 Platform.LogicalDisk.2.PctUtil = 1,11 Platform.LogicalDisk.2.PctUtil.Avg = 72,8 Platform.LogicalDisk.2.PctUtil.Peak = 3.473,81 Platform.LogicalDisk.2.ReadsPerSec = 2,58 Platform.LogicalDisk.2.WritesPerSec = 7,3
Interpretation GOOD < 2% < AvgQueueLen > 5% > BAD (1-100% = 0,01 – 1,0!) GOOD = PctUtil < 80% (1-100% = 1-100) NOTE: may need to divide by # of spindles SAN/NAS Solution Various parameters (bufferpool, cache, namelookup) and OS / Disk Tuning
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Platform.LogicalDisk.#.PctUtil
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Mail.Mailbox.*
Mail.Mailbox.AccessConflicts/Mail.Mailbox.Accesses) x 100
Must be < 2, otherwise: add another Mailbox (benefit increase decreases above 4-5 mailboxes)
Example:
Mail.Mailbox.AccessConflicts = 1636 Mail.Mailbox.Accesses = 189864
= 0,86 = ok
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Update.PendingList Update.PendingList = number of views waiting to be updated
If Update.PendingList „is often“ > 0, then …
Notes.ini: Update_Fulltext_Thread=1 FTUPDATE_IDLE_TIME=4
Background: • If you have many databases/apps … • … and a busy update task
– Full text index could be the reason for slowing down / “blocking” view indexing
• Separate FTI and view updates – FTI then runs in its own Memory Thread
• Improves performance • Update_Fulltext_Thread=1
Speaking of Fulltext-Indexing: You can isolate the FTI thread from the limited Domino update pool:
ftg_use_sys_memory=1 FTI thread then gets memory from OS pool;
relieves Domino system memory
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Database.Database.BufferPool.*
Database.Database.BufferPool.PerCentReadsInBuffer = 78,96
BAD < 90% < PercentReadsInBuffer < 98% < PERFECT (99.9% is bad, too!)
– Typically leads to too many requests being written to disk – Server needs a larger BufferPool
Solution: notes.ini NSF_Buffer_Pool_Size_MB=n (in MB) ─ Default: 512 MB
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Database.DbCache.*
Database.DbCache.CurrentEntries = 1647 Database.DbCache.HighWaterMark = 1691 Database.DbCache.MaxEntries = 1536 Database.DbCache.OvercrowdingRejections = 0
GOOD = HighWaterMark < MaxEntries GOOD = 0 OvercrowdingRejections
Solution:
– notes.ini NSF_DbCache_MaxEntries = n • Default: NSF_BUFFER Pool size x 3
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Replica.Cluster.*
Replica.Cluster.Failed Replica.Cluster.SecondsOnQueue Replica.Cluster.WorkQueueDepth
PERFECT < 10 < SecondsOnQueue > 15 > BAD PERFECT < 10 < WorkQueueDepth > 15 > BAD
Solution:
– Add more cluster replicators – Optimize cluster load
(e.g. “manually” balance users across cluster if not load-balance)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Server.Trans.PerMinute
Server.Trans.PerMinute=956 Server.Users = 26 956/26=36,7 HEAVY < 30 < Trans.PerMinute (per User) > 10 > LIGHT Solution:
– Identify users causing load (db usage view!)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Database.NAMELookupCache*
Database.NAMELookupCacheCacheSize = 2.513.328 Database.NAMELookupCacheHits = 24.628.339 Database.NAMELookupCacheMisses = 48.160.502
IMPORTANT: NoHitHits!
Cache too small or too large(!)
Miss > Hits: „Doublecheck“ ini:NLCache_Size=16000000
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Server.ConcurrentTasks*
Server.ConcurrentTasks Server.ConcurrentTasks.Waiting
Waiting should be ZERO (0)
Solution: ─ Server_Pool_Tasks = n (e.g. 80) ─ Server_Max_Concurrent_Trans = m (e.g. Server_Pool_Tasks * # Ports)
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Platform.PagingFile.Total.*
Platform.PagingFile.Total.PctUtil = 0,28 Platform.PagingFile.Total.PctUtil.Avg = 0,14 Platform.PagingFile.Total.PctUtil.Peak = 0,8
OK < 0% < PctUtil.Avg > 10% > BAD
OS Level tuning, Check Memory
Note: If “sh sta” doesn’t show Platform.* stats Admin-Help
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Agenda
Who am I? … and about panagenda
Laying the basics of what is actually possible – or:
• What Admins and IT departments have to cope with
Deep Diving …
• The 30 most important server statistics (out of ~2.000)
• … and Clients?
• … and Groups?
• … and Databases?
Coming up next …
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Sponsor Break – Sneak Peek during Social Evening http://panagenda.com/giftoftransparency
• Efficient Client-Analysis is impossible without additional tooling
• FREE 4 weeks license of panagenda GreenLight – our server monitoring and reporting solution – includes Database Analyzer for 1 year for one of your servers
• FREE one year license of panagenda MarvelClient Analyze
• The results speak for themselves on „just“ the clientside • The results can also be used together with GreenLight
• For groups and databases, wie also have GroupExplorer and
DatabaseExplorer
• Whether we may help you is up to you
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Timeout
Spending 60 minutes on Performance Improvements
can be compared to a walk on the tip of the iceberg –
we have worked on a MANY more business cases
and solved a MANY more problems than those mentioned just now.
If your problem was not mentioned in this session –
be it a Client, Server, Design, Admin or other challenge:
we would love to hear from you.
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Thank you for listening – Questions? Answers!
Q&A
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Contact me – I look forward to hearing from you!
panagenda GmbH
Doblhoffgasse 7 / 6a :: 1010 Vienna :: Austria Web: http://www.panagenda.com
Email: [email protected] Fax: +43 1 89 012 89 – 15
Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
AusLUG2012
Ressources / Links
• Daniel Nashed, Nash!Com
• LS08: BP112
• LS11: BP102, BP110, BP118
• LS12: BP110, BP121, ID112, ID114
• Windows Indexing: http://bit.ly/ACzO6Z
• „The internet“ – google „Domino performance ibm“; great IBM Whitepapers and articles, some very good site out there