dynamic performance tuning and troubleshooting with dtrace (sa-327-s10) --new

274
Sun Microsystems, Inc. UBRM05-104 500 Eldorado Blvd. Broomfield, CO 80021 U.S.A. Revision A Student Guide Dynamic Performance Tuning and Troubleshooting With DTrace SA-327-S10

Upload: purushothama-gn

Post on 03-Oct-2014

336 views

Category:

Documents


18 download

TRANSCRIPT

Page 1: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Sun Microsystems, Inc.UBRM05-104

500 Eldorado Blvd.Broomfield, CO 80021

U.S.A.

Revision A

StudentGuide

Dynamic Performance Tuning andTroubleshooting With DTrace

SA-327-S10

Page 2: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

March 18, 2005 11:30 am

Page 3: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Please

Recycle

Copyright 2005 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.

This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, anddecompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization ofSun and its licensors, if any.

Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.

Sun, Sun Microsystems, the Sun logo, Solaris, and OpenBoot are trademarks or registered trademarks of Sun Microsystems, Inc., in the U.S.and other countries.

All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc., in the U.S. andother countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.

UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.

Federal Acquisitions: Commercial Software – Government Users Subject to Standard License Terms and Conditions

Export Laws. Products, Services, and technical data delivered by Sun may be subject to U.S. export controls or the trade laws of othercountries. You will comply with all such laws and obtain all licenses to export, re-export, or import as may be required after delivery toYou. You will not export or re-export to entities on the most current U.S. export exclusions lists or to any country subject to U.S. embargoor terrorist controls as specified in the U.S. export laws. You will not use or provide Products, Services, or technical data for nuclear, missile,or chemical biological weaponry end uses.

DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, ANDWARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSEOR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BELEGALLY INVALID.

THIS MANUAL IS DESIGNED TO SUPPORT AN INSTRUCTOR-LED TRAINING (ILT) COURSE AND IS INTENDED TO BEUSED FOR REFERENCE PURPOSES IN CONJUNCTION WITH THE ILT COURSE. THE MANUAL IS NOT A STANDALONETRAINING TOOL. USE OF THE MANUAL FOR SELF-STUDY WITHOUT CLASS ATTENDANCE IS NOT RECOMMENDED.

Export Control Classification Number EAR99 assigned: 10 September 2004

Page 4: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Please

Recycle

Copyright 2005 Sun Microsystems Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits réservés.

Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l’utilisation, la copie, la distribution,et la décompilation. Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, par quelque moyen que ce soit,sans l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y en a.

Le logiciel détenu par des tiers, et qui comprend la technologie relative aux polices de caractères, est protégé par un copyright et licenciépar des fournisseurs de Sun.

Sun, Sun Microsystems, le logo Sun, Solaris, et OpenBoot sont des marques de fabrique ou des marques déposées de Sun Microsystems,Inc., aux Etats-Unis et dans d’autres pays.

Toutes les marques SPARC sont utilisées sous licence sont des marques de fabrique ou des marques déposées de SPARC International, Inc.aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par SunMicrosystems, Inc.

UNIX est une marques déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.

Législation en matière dexportations. Les Produits, Services et données techniques livrés par Sun peuvent être soumis aux contrôlesaméricains sur les exportations, ou à la législation commerciale dautres pays. Nous nous conformerons à lensemble de ces textes et nousobtiendrons toutes licences dexportation, de ré-exportation ou dimportation susceptibles dêtre requises après livraison à Vous. Vousnexporterez, ni ne ré-exporterez en aucun cas à des entités figurant sur les listes américaines dinterdiction dexportation les plus courantes,ni vers un quelconque pays soumis à embargo par les Etats-Unis, ou à des contrôles anti-terroristes, comme prévu par la législationaméricaine en matière dexportations. Vous nutiliserez, ni ne fournirez les Produits, Services ou données techniques pour aucune utilisationfinale liée aux armes nucléaires, chimiques ou biologiques ou aux missiles.

LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIESEXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, YCOMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNEUTILISATION PARTICULIERE OU A L’ABSENCE DE CONTREFAÇON.

CE MANUEL DE RÉFÉRENCE DOIT ÊTRE UTILISÉ DANS LE CADRE D’UN COURS DE FORMATION DIRIGÉ PAR UNINSTRUCTEUR (ILT). IL NE S’AGIT PAS D’UN OUTIL DE FORMATION INDÉPENDANT. NOUS VOUS DÉCONSEILLONS DEL’UTILISER DANS LE CADRE D’UNE AUTO-FORMATION.

Page 5: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

vCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Table of Contents

About This Course ...............................................................Preface-xiCourse Goals.......................................................................... Preface-xiTopics Not Covered.............................................................Preface-xiiiHow Prepared Are You?.....................................................Preface-xivIntroductions .........................................................................Preface-xvHow to Use Course Materials ............................................Preface-xviConventions .........................................................................Preface-xvii

Typographical Conventions ................................... Preface-xviii

DTrace Fundamentals ......................................................................1-1Objectives ........................................................................................... 1-1Relevance............................................................................................. 1-2Additional Resources ........................................................................ 1-3DTrace Features.................................................................................. 1-4

Transient Failures...................................................................... 1-4Debugging Transient Failures................................................. 1-5DTrace Capabilities................................................................... 1-6

DTrace Architecture........................................................................... 1-7Probes and Probe Providers .................................................... 1-7DTrace Components ................................................................. 1-8

DTrace Tour ...................................................................................... 1-12Listing Probes .......................................................................... 1-12Writing D Scripts..................................................................... 1-21

Using DTrace ....................................................................................2-1Objectives ........................................................................................... 2-1Relevance............................................................................................. 2-2Additional Resources ........................................................................ 2-3DTrace Performance Monitoring Capabilities............................... 2-4

Features of the DTrace Performance MonitoringCapabilities ............................................................................. 2-4

Aggregations.............................................................................. 2-4Examining Performance Problems Using the vminfo Provider . 2-8

The vminfo Probes.................................................................... 2-9

Page 6: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

vi Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Finding the Source of Page Faults Using vminfo Probes.. 2-11Examining Performance Problems Using the sysinfo

Provider .......................................................................................... 2-17The sysinfo Probes ............................................................... 2-18Using the quantize Aggregation Function With

the sysinfo Probes.............................................................. 2-21Finding the Source of Cross-Calls ........................................ 2-22

Examining Performance Problems Using the io Provider ........ 2-26The io Probes .......................................................................... 2-26Information Available When io Probes Fire ...................... 2-27Finding I/O Problems ........................................................... 2-32

Obtaining System Call Information .............................................. 2-36The syscall Provider............................................................ 2-36D Language Variables ............................................................ 2-43Associative Arrays .................................................................. 2-44Thread-Local Variables .......................................................... 2-45Timing a System Call.............................................................. 2-46Following a System Call........................................................ 2-48

Creating D Scripts That Use Arguments ...................................... 2-53Built-in Macro Variables ....................................................... 2-54PID Argument Example......................................................... 2-56Executable Name Argument Example................................. 2-57Custom Monitoring Tools..................................................... 2-60

Debugging Applications With DTrace............................................ 3-1Objectives ........................................................................................... 3-1Relevance............................................................................................. 3-2Additional Resources ........................................................................ 3-3Application Profiling......................................................................... 3-4

The pid Provider....................................................................... 3-4The profile Provider............................................................ 3-19

Application Variables...................................................................... 3-30Displaying Process Global Variables ................................... 3-30Displaying Library Global Variables ................................... 3-34

The plockstat Provider ................................................................ 3-36Transient System Call Errors.......................................................... 3-38

User Stack Traces on System Call Failures.......................... 3-39Processes Using a Lot of System Time................................ 3-41

Open Files.......................................................................................... 3-45Accessing System Call Pointer Arguments......................... 3-45Displaying Names of Files Being Opened........................... 3-48

Finding System Problems With DTrace......................................... 4-1Objectives ........................................................................................... 4-1Relevance............................................................................................. 4-2Additional Resources ........................................................................ 4-3Accessing Kernel Variables .............................................................. 4-4

Page 7: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

viiCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Using the D Language to Access Kernel Symbols ............... 4-4Monitoring Kernel Variables................................................... 4-5Accessing Kernel Data Structures........................................... 4-6Accessing Lock Contention Information ............................. 4-12The proc Provider and the system () Function.................. 4-18

Displaying Read Call Information................................................. 4-19Tracing Read Calls System-Wide ......................................... 4-19Tracing Read Calls Using the iosnoop.d D Script............ 4-22Aggregating Read Data.......................................................... 4-22

Using the Anonymous Tracing Facility........................................ 4-25Creating an Anonymous Enabling....................................... 4-25Performing Anonymous Tracing.......................................... 4-25

Using the Speculative Tracing Facility ......................................... 4-30Speculative Tracing Functions ............................................. 4-31Speculative Tracing Example ................................................ 4-32Application Debugging With Speculative Tracing............ 4-34

DTrace Privileges ............................................................................. 4-37Using the Least Privilege Facility ......................................... 4-37Kernel-Destructive Actions .................................................. 4-38Setting DTrace User Privileges.............................................. 4-38Setting DTrace Process Privileges......................................... 4-44Summarizing the DTrace Privilege Levels......................... 4-47

Troubleshooting DTrace Problems.................................................5-1Objectives ........................................................................................... 5-1Relevance............................................................................................. 5-2Additional Resources ........................................................................ 5-3Minimizing DTrace Performance Impact ....................................... 5-4

Limiting Enabled Probes.......................................................... 5-4Using Aggregations .................................................................. 5-5Using Cacheable Predicates..................................................... 5-5

Using and Tuning DTrace Buffers................................................... 5-8Principal Buffers........................................................................ 5-8Principal Buffer Policies ........................................................... 5-8DTrace Option Settings ............................................................ 5-9The switch Buffer Policy....................................................... 5-10The fill Buffer Policy ........................................................... 5-12The ring Buffer Policy ........................................................... 5-13Other Buffers............................................................................ 5-14Buffer Resizing Policy ............................................................ 5-14

Debugging DTrace Scripts.............................................................. 5-15Avoiding Syntax Errors in D Scripts .................................... 5-15Avoiding Run-Time Errors in D Scripts ............................. 5-18

Actions and Subroutines ................................................................A-1Default Action ................................................................................... A-2Data Recording Actions .................................................................. A-3

Page 8: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

viii Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The void trace( expression ) Action................................ A-3The void tracemem( address , size_t nbytes ) Action. A-3The void printf(string f ormat , ...) Action............ A-3The printa Action................................................................. A-10The stack () Action ................................................................ A-12The ustack () Action .............................................................. A-13

Destructive Actions......................................................................... A-16Process Destructive Actions ................................................. A-16Kernel Destructive Actions................................................... A-18

Special Actions ............................................................................... A-21Actions Associated With Speculative Tracing................... A-21The void exit(int status) Action................................ A-22

Subroutines ..................................................................................... A-22The void *alloca(size_t size) Subroutine ............... A-22The string basename(char *str ) Subroutine.............. A-23The void bcopy(void *src, void *dest, size_t size)

Subroutine............................................................................ A-23The string cleanpath(char *str) Subroutine........... A-23The void *copyin(uintptr_t addr, size_t size)

Subroutine............................................................................ A-24The string copyinstr(uintptr_t addr) Subroutine A-24The string dirname(char *str) Subroutine ............... A-25The size_t msgdsize(mblk_t *mp) Subroutine........... A-25The size_t msgsize(mblk_t *mp) Subroutine ............. A-25The int mutex_owned(kmutex_t *mutex) Subroutine A-25The kthread_t *mutex_owner(kmutex_t *mutex)

Subroutine............................................................................ A-25The int mutex_type_adaptive(kmutex_t *mutex)

Subroutine............................................................................ A-26The int progenyof(pid_t pid) Subroutine................... A-26The int rand(void) Subroutine ....................................... A-26The int rw_iswriter(krwlock_t * rwlock) Subroutine.......

A-26The int rw_write_held(krwlock_t * rwlock) Subroutine ..

A-27The int speculation(void) Subroutine ........................ A-27The string strjoin(char *str1, char *str2)

Subroutine............................................................................ A-27The size_t strlen(string str) Subroutine ............... A-27

D Built-in and Macro Variables .......................................................B-1Built-in Variables................................................................................B-2Macro Variables..................................................................................B-4

D Operators ......................................................................................C-1Arithmetic Operators........................................................................ C-2Relational Operators......................................................................... C-3

Page 9: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

ixCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Logical Operators.............................................................................. C-4Bitwise Operators.............................................................................. C-5Assignment Operators ..................................................................... C-6Increment and Decrement Operators............................................. C-8Conditional Expressions .................................................................. C-9

Page 10: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New
Page 11: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Preface-xiCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Preface

About This Course

Course Goals

Upon completion of this course, you should be able to:

● Describe the features and architecture of the Solaris™ DynamicTracing (DTrace) facility

● Use the DTrace facility to find the source of intermittent problems

● Use DTrace to help debug applications

● Use DTrace to look at the cause of performance problems

● Troubleshoot DTrace script problems

Page 12: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Course Goals

Preface-xii Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Course Map

The following course map enables you to see what you haveaccomplished and where you are going in reference to the course goals.

Understanding and Using the DTrace Facility

Using DTrace to Debug Applications and Find System Problems

Troubleshooting DTrace

DTrace Fundamentals

Debugging ApplicationsWith DTrace

Finding SystemProblems with DTrace

Using DTrace

Troubleshooting DTraceProblems

Page 13: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Topics Not Covered

About This Course Preface-xiiiCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Topics Not Covered

This course does not cover the following topic. Many topics are covered inother courses offered by Sun Educational Services:

Performance management

Refer to the Sun Educational Services catalog for specific information andregistration.

Page 14: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

How Prepared Are You?

Preface-xiv Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

How Prepared Are You?

To be sure you are prepared to take this course, can you answer yes to thefollowing questions?

● Do you have some previous programming experience?

● Can you use the truss command to diagnose application problems?

● Do you know the basics of the kernel structure?

● Are you familiar with basic troubleshooting concepts?

Page 15: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Introductions

About This Course Preface-xvCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Introductions

Now that you have been introduced to the course, introduce yourself tothe other students and the instructor, addressing the following items:

● Name

● Company affiliation

● Title, function, and job responsibility

● Experience related to topics presented in this course

● Reasons for enrolling in this course

● Expectations for this course

Page 16: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

How to Use Course Materials

Preface-xvi Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

How to Use Course Materials

To enable you to succeed in this course, these course materials contain alearning module that is composed of the following components:

● Goals – You should be able to accomplish the goals after finishingthis course and meeting all of its objectives.

● Objectives – You should be able to accomplish the objectives aftercompleting a portion of instructional content. Objectives supportgoals and can support other higher-level objectives.

● Lecture – The instructor presents information specific to the objectiveof the module. This information helps you learn the knowledge andskills necessary to succeed with the activities.

● Activities – The activities take various forms, such as reviewquestions, labs, discussion, and demonstration. Activities helpfacilitate the mastery of an objective.

● Visual aids – The instructor might use several visual aids to convey aconcept, such as a process, in a visual form. Visual aids commonlycontain graphics, animation, and video.

Page 17: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Conventions

About This Course Preface-xviiCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Conventions

The following conventions are used in this course to represent varioustraining elements and alternative learning resources.

Icons

Additional resources – Indicates other references that provide additionalinformation on the topics described in the module.

?!

Discussion – Indicates a small-group or class discussion on the currenttopic is recommended at this time.

Note – Indicates additional information that can help students but is notcrucial to their understanding of the concept being described. Studentsshould be able to understand the concept or complete the task withoutthis information. Examples of notational information include keywordshortcuts and minor system adjustments.

Caution – Indicates that there is a risk of personal injury from anonelectrical hazard, or risk of irreversible damage to data, software, orthe operating system. A caution indicates that the possibility of a hazard(as opposed to certainty) might happen, depending on the action of theuser.

Caution – Indicates that either personal injury or irreversible damage ofdata, software, or the operating system will occur if the user performs thisaction. A warning does not indicate potential events; if the action isperformed, catastrophic events will occur.

Page 18: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Conventions

Preface-xviii Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Typographical Conventions

Courier is used for the names of commands, files, directories,programming code, and on-screen computer output; for example:

Use ls -al to list all files.system% You have mail .

Courier is also used to indicate programming constructs, such as classnames, methods, and keywords; for example:

The getServletInfo method is used to get author information.The java.awt.Dialog class contains Dialog constructor.

Courier bold is used for characters and numbers that you type; forexample:

To list the files in this directory, type:# ls

Courier bold is also used for each line of programming code that isreferenced in a textual description; for example:

1 import java.io.*;2 import javax.servlet.*;3 import javax.servlet.http.*;

Notice the javax.servlet interface is imported to allow access to itslife cycle methods (Line 2).

Courier italics is used for variables and command-line placeholdersthat are replaced with a real name or value; for example:

To delete a file, use the rm filename command.

Courier italic bold is used to represent variables whose values are tobe entered by the student as part of an activity; for example:

Type chmod a+rwx filename to grant read, write, and executerights for filename to world, group, and users.

Palatino italics is used for book titles, new words or terms, or words thatyou want to emphasize; for example:

Read Chapter 6 in the User’s Guide.These are called class options.

Page 19: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

1-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Module 1

DTrace Fundamentals

Objectives

Upon completion of this module, you should be able to:

● Describe the features of the Solaris™ Dynamic Tracing (DTrace)facility

● Describe the DTrace architecture

● List and enable probes, and create action statements and D scripts

Page 20: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Relevance

1-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Relevance

?!

Discussion – The following questions are relevant to understandingDTrace:

● Would the ability to turn on trace points for any one of the majorityof functions in the kernel be beneficial?

● Would it be useful to know who is issuing kill (2) system calls?

Page 21: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Additional Resources

DTrace Fundamentals 1-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Additional Resources

Additional resources – The following references provide additionalinformation on the topics described in this module:

● Sun Microsystems, Inc. Solaris Dynamic Tracing Guide, part number817-6223-10.

● The /usr/demo/dtrace directory contains all of the sample scriptsfrom the Solaris Dynamic Tracing Guide.

● Cantrill Bryan M., Michael W. Shapiro, and Adam H. Leventhal.“Dynamic Instrumentation of Production Systems.” Paper presentedat 2004 USENIX Conference.

● BigAdmin System Administration Portal[http://www.sun.com/bigadmin/content/dtrace ].

● The dtrace( 1M) manual page.

Page 22: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Features

1-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

DTrace Features

DTrace is a comprehensive dynamic tracing facility that is bundled intothe Solaris™ 10 Operating System (Solaris 10 OS). It is intended for use bysystem administrators, service support personnel, kernel developers,application program developers, and users who are given explicit accesspermission to the DTrace facility

DTrace has the following features:

● Enables dynamic modification of the system to record arbitrary data

● Promotes tracing on live systems

● Is completely safe—its use cannot induce fatal failure

● Allows tracing of both the kernel program and user-level programs

● Functions with low overhead when tracing is enabled and zerooverhead when tracing is not being performed.

Transient Failures

DTrace provides answers to the causes of transient failures. A transientfailure is any unacceptable behavior that does not result in fatal failure ofthe system. You might have a clear, specific failure, such as:

● read (2) is returning EIO errno values on a device that is notreporting any errors.

● An application occasionally does not receive its expected timersignal.

● A thread is missing a condition variable wakeup.

The transient failure can be based on your own definition of“unacceptable” system operation:

● “We were expecting to accommodate 100 users per CPU, but wecannot support more than 60 users per CPU.”

● “Why does system time go way up when I run application ‘X’?”

● “Every morning between 9:30 a.m. and 10:00 a.m. the systemperforms poorly.”

Page 23: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Features

DTrace Fundamentals 1-5Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

In these situations, you must understand the problem and either eliminatethe performance inhibitors or reset your expectations. Eliminating theperformance inhibitors could involve:

● Adding more resources, such as memory or central processing units(CPUs)

● Reconfiguring existing resources, for example, tuning parameters orrewriting software

● Lessening the load

Debugging Transient Failures

DTrace was developed to provide a more efficient and cost-effectivemethod of diagnosing transient failures. Historically users have debuggedtransient failures using process-centric tools such as truss (1), pstack (1),or prstat (1M). These tools were not designed to debug systemicproblems. The tools that were intended for debugging systemic problems,such as mdb(1) and Solaris Crash Analysis Tool (Solaris CAT), aredesigned for postmortem analysis.

Debugging Using Postmortem Analysis

You can use postmortem analysis to debug transient problems byinducing fatal failure during the period of transient failure. This techniquehas the following disadvantages:

● It requires inducing fatal failure, which nearly always results in moredowntime than the transient failure

● It requires solving a dynamic problem from a static snapshot of thesystem’s state

Debugging Using Invasive Techniques

If existing tools cannot find the root cause of a transient failure, then youmust use more invasive techniques. Typically this means developingcustom instrumentation for the failing user program, the kernel, or both.This can involve using the Trace Normal Form (TNF) facility. You thenreproduce the problem using the instrumented binaries. This techniquerequires:

● Running the instrumented binaries in production

● Reproducing a transient problem in a development environment

Page 24: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Features

1-6 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Such invasive techniques are undesirable because they are slow, error-prone, and often ineffective.

Relying on the existing static TNF trace points found in the kernel, whichyou can enable with the prex (1) command, is also unsatisfactory. Thenumber of TNF trace points in the kernel is limited and the overhead issubstantial.

DTrace Capabilities

The DTrace framework allows you to enable tens of thousands of tracingpoints called probes. When these instrumentation points are hit, you candisplay arbitrary data in the kernel (or user process).

An example of a probe provided by the DTrace framework is entry intoany kernel function. Information that you can display when this probefires includes:

● Any argument to the function

● Any global variable in the kernel

● A nanosecond timestamp of when the function was called

● A stack trace to indicate what code called this function

● The process that was running when the function was called

● The thread that made the call to this function

Using DTrace, you can explore all aspects of the Solaris 10 OS to:

● Understand how the software works

● Determine the root cause of performance problems

● Examine all layers of software sequentially from the user level to thekernel

● Track down the source of aberrant behavior

DTrace comes with powerful data management primitives to eliminatethe need for postprocessing of gathered data. Unwanted data is pruned asclose to the source as possible to avoid the overhead of generating andlater filtering unwanted data.

DTrace also provides a mechanism to trace during boot and to retrieve alltraced data from a kernel crash dump.

Page 25: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Architecture

DTrace Fundamentals 1-7Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

DTrace Architecture

DTrace helps you understand a software system by enabling you todynamically modify the operating system kernel and user processes torecord additional data that you specify at locations of interest calledprobes.

Probes and Probe Providers

A probe is a program location or activity—for example, every systemclock tick—to which DTrace can bind a request to perform a set of actions,such as recording a stack trace, a timestamp, or the argument to afunction.

How Probes Work

Probes are like programmable sensors inserted at strategic points of yourSolaris 10 OS. You use DTrace to program the appropriate sensors torecord the information that you want. As each probe fires, DTrace gathersthe data from your probes and reports it back to you. If you do not specifyany actions for a probe, DTrace simply records each time the probe firesand on what CPU.

DTrace provides tens of thousands of probes of various types. Probes areimplemented by probe providers. A provider is a kernel module thatenables a requested probe to fire when it is hit. An example of a provideris the “function boundary tracing” or fbt provider. It provides entry andreturn probes for almost every function in every kernel module.

How Probes Are Enabled

You define probes and actions using a programming language called D,which is based on the C programming language. Usually D programs areplaced in script files ending in a .d suffix. The D programs are passed to aDTrace consumer. The primary, generic DTrace consumer is thedtrace (1M) command.

The user-specified D program is compiled by the DTrace consumer into aform referred to as D Intermediate Format (DIF), which is then sent to theDTrace framework within the kernel for execution. There, the probes thatare named within the D program are enabled, and the correspondingprovider performs the instrumentation required to activate them.

Page 26: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Architecture

1-8 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

DTrace Components

DTrace has the following components: probes, providers, consumers, andthe D programming language. The entire DTrace framework resides in thekernel. Consumer programs access the DTrace framework through a well-defined application programming interface (API).

Probes

A probe has the following attributes:

● It is made available by a provider.

● It identifies the module and function that it instruments.

● It has a name.

These four attributes define a 4-tuple that uniquely identifies each probe:

provider:module:function:name

In addition, DTrace assigns a unique integer identifier to each probe.

Providers

A provider represents a methodology for instrumenting the system.Providers make probes available to the DTrace framework. A providerreceives information from DTrace regarding when a probe is to be enabledand transfers control to DTrace when an enabled probe is hit.

DTrace offers the following providers:

● The function boundary tracing (fbt ) provider can dynamically tracethe entry and return of every function in the kernel.

● The syscall provider can dynamically trace the entry and return ofevery Solaris system call.

● The lockstat provider can dynamically trace the kernelsynchronization primitives to observe lock contention and holdtimes.

● The plockstat provider makes probes available for user-levelsynchronization primitives including lock contention and hold times.

● The sched provider can dynamically trace key scheduling events.

● The profile provider enables you to add a configurable-rate timerinterrupt to the system.

Page 27: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Architecture

DTrace Fundamentals 1-9Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

● The dtrace provider enables pre-processing and post-processing (aswell as D program error-processing) capabilities.

● The pid provider enables function boundary tracing within a processas well as tracing of any instruction in the virtual address space ofthe process.

● The statically defined tracing (sdt) provider creates probes at sites aprogrammer has explicitly designated in their own application.

● The vminfo provider makes available probes that correspond to thekernel’s virtual memory statistics.

● The sysinfo provider makes available probes that correspond to thekernel’s “sys” statistics.

● The proc provider makes available probes that pertain to process andthread creation and termination as well as signals.

● The mib provider makes available probes that correspond to countersin the Solaris management information bases (MIBs), which are usedby the simple network management protocol (SNMP).

● The io provider makes available probes giving details related to diskinput and output (I/O).

● The fpuinfo provider makes available probes that correspond to thesimulation of floating point instructions on SPARC®-basedmicroprocessors.

Note – You should check the Solaris Dynamic Tracing Guide, part number817-6223, regularly for the addition of any new DTrace providers.

Consumers

A DTrace consumer is a process that interacts with DTrace. There is onemain DTrace consumer called dtrace (1M). It acts as a generic front-end tothe DTrace facility. Most other consumers are rewrites of previouslyexisting utilities such as lockstat (1M).

There is no limit on the number of concurrent consumers. That is, manyusers can simultaneously run the dtrace (1M) command. DTrace handlesthe multiplexing.

Page 28: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Architecture

1-10 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

D Programming Language

The D programming language enables you to specify probes of interestand bind actions to those probes. To do this, you construct scripts calledD scripts. The nature of D scripts is similar to awk(1)’s “pattern action”pairs. The D programming language also borrows heavily from the Cprogramming language.

Even if you have no experience with the C programming language orwith awk(1), D programs are fairly easy to write and understand.

Features of the D language include the following:

● Enables complete access to kernel C types, such as vnode_t

● Provides complete access to kernel static and global variables

● Provides complete support for American National StandardsInstitute (ANSI)-C operators

● Supports strings as a built-in type (unlike C, which uses theambiguous char * or char[] types).

Architecture Summary

To summarize, the DTrace facility consists of user-level consumerprograms such as dtrace (1M), providers packaged as kernel modules,and a library interface for the consumer programs to access the DTracefacility through the dtrace (7D) kernel driver.

Page 29: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Architecture

DTrace Fundamentals 1-11Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Figure 1-1 shows the overall DTrace architecture.

Figure 1-1 DTrace Architecture

D program source files

DTrace

. . .

dtrace(1M) lockstat(1M)

libdtrace(3LIB)

dtrace(7D)

plockstat(1M)

sysinfo io

syscall profile fbt sched

vminfo

intrstat(1M)

a.d b.d

DTrace consumers

DTrace providers

userland

kernel

Page 30: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

1-12 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

DTrace Tour

In this section you tour the DTrace facility and learn to perform thefollowing tasks:

● List the available probes using various criteria:

● Probes associated with a particular function

● Probes associated with a particular module

● Probes with a specific name

● All probes from a specific provider

● Explain how to enable probes

● Explain default probe output

● Describe action statements

● Create a simple D script

Listing Probes

You can list all DTrace probes with the -l option of the dtrace (1M)command:

# dtrace -l ID PROVIDER MODULE FUNCTION NAME 1 dtrace BEGIN 2 dtrace END 3 dtrace ERROR 4 syscall nosys entry 5 syscall nosys return 6 syscall rexit entry 7 syscall rexit return 8 syscall forkall entry 9 syscall forkall return 10 syscall read entry 11 syscall read return 12 syscall write entry 13 syscall write return 14 syscall open entry 15 syscall open return...

Page 31: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

DTrace Fundamentals 1-13Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

You can use an additional option to list specific probes, as follows:

● In a specific function: -f function

# dtrace -l -f cv_waitID PROVIDER MODULE FUNCTION NAME12921 fbt genunix cv_wait entry12922 fbt genunix cv_wait return

● In a specific module: -m module

# dtrace -l -m sdID PROVIDER MODULE FUNCTION NAME17147 fbt sd sdopen entry17148 fbt sd sdopen return17149 fbt sd sdclose entry17150 fbt sd sdclose return17151 fbt sd sdstrategy entry17152 fbt sd sdstrategy return...

● With a specific name: -n name

# dtrace -l -n BEGIN ID PROVIDER MODULE FUNCTION NAME 1 dtrace BEGIN

● From a specific provider: -P provider

# dtrace -l -P lockstatID PROVIDER MODULE FUNCTION NAME 469 lockstat genunix mutex_enter adaptive-acquire 470 lockstat genunix mutex_enter adaptive-block 471 lockstat genunix mutex_enter adaptive-spin 472 lockstat genunix mutex_exit adaptive-release 473 lockstat genunix mutex_destroy adaptive-release 474 lockstat genunix mutex_tryenter adaptive-acquire...

● Realize that a specific function or module can be supported by manyproviders:

# dtrace -l -f readID PROVIDER MODULE FUNCTION NAME 10 syscall read entry 11 syscall read return 4036 sysinfo genunix read readch 4040 sysinfo genunix read sysread 7885 fbt genunix read entry 7886 fbt genunix read return

Page 32: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

1-14 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The previous output shows that for each probe, the following isdisplayed:

● The probe’s uniquely assigned probe ID (The probe ID is onlyunique within a given release or patch level of Solaris).

● The provider name.

● The module name (if applicable).

● The function name (if applicable).

● The probe name.

Specifying Probes in DTrace

Probes are fully specified by separating each component of the 4-tuplewith a colon:

provider:module:function:name

Empty components match anything. For example, fbt::alloc:entryspecifies a probe with the following attributes:

● From the fbt provider

● In any module

● In the alloc function

● Named entry

Elements of the 4-tuple can be left off from the left-hand side. For example,open:entry matches probes from all providers and kernel modules thathave a function name of open and a probe name of entry :

# dtrace -l -n open:entry ID PROVIDER MODULE FUNCTION NAME 14 syscall open entry 7386 fbt genunix open entry

Probe descriptions also support a pattern matching syntax similar to theshell File Name Generation syntax described in sh(1). The special characters* , ?, and [ ] are all supported. For example, the syscall::open*:entryprobe description matches both the open and open64 system calls. The ?character represents any single character in the name and [ ] characterslets you specify a choice of specific characters in the name.

Page 33: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

DTrace Fundamentals 1-15Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Enabling Probes

Probes are enabled with the dtrace (1M) command by specifying themwithout the -l option. When enabled in this way, DTrace performs thedefault action when the probe fires. The default action indicates only thatthe probe fired. No other data is recorded. For example, the followingcode example enables every probe in the sd module:

# dtrace -m sdCPU ID FUNCTION:NAME 0 17329 sd_media_watch_cb:entry 0 17330 sd_media_watch_cb:return 0 17167 sdinfo:entry 0 17168 sdinfo:return 0 17151 sdstrategy:entry 0 17152 sdstrategy:return 0 17661 ddi_xbuf_qstrategy:entry 0 17662 ddi_xbuf_qstrategy:return 0 17649 xbuf_iostart:entry 0 17341 sd_xbuf_strategy:entry 0 17385 sd_xbuf_init:entry 0 17386 sd_xbuf_init:return 0 17342 sd_xbuf_strategy:return 0 17177 sd_mapblockaddr_iostart:entry 0 17178 sd_mapblockaddr_iostart:return 0 17179 sd_pm_iostart:entry 0 17365 sd_pm_entry:entry 0 17366 sd_pm_entry:return 0 17180 sd_pm_iostart:return 0 17181 sd_core_iostart:entry 0 17407 sd_add_buf_to_waitq:entry...

As you can see from the output, the default action displays the CPUwhere the probe fired, the DTrace assigned probe ID, the function wherethe probe fired, and the probe name.

Page 34: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

1-16 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

To enable probes provided by the syscall provider:

# dtrace -P syscalldtrace: description 'syscall' matched 452 probesCPU ID FUNCTION:NAME 0 99 ioctl:return 0 98 ioctl:entry 0 99 ioctl:return 0 98 ioctl:entry 0 99 ioctl:return 0 234 sysconfig:entry 0 235 sysconfig:return 0 234 sysconfig:entry 0 235 sysconfig:return 0 168 sigaction:entry 0 169 sigaction:return 0 168 sigaction:entry 0 169 sigaction:return 0 98 ioctl:entry 0 99 ioctl:return 0 234 sysconfig:entry 0 235 sysconfig:return 0 38 brk:entry 0 39 brk:return...

To enable probes named zfod:

# dtrace -n zfoddtrace: description 'zfod' matched 3 probesCPU ID FUNCTION:NAME 0 4080 anon_zero:zfod 0 4080 anon_zero:zfod^C

To enable probes provided by the syscall provider in the open function,use the -n option with the fully specified 4-tuple syntax:

# dtrace -n syscall::open*:dtrace: description 'syscall::open:' matched 2 probesCPU ID FUNCTION:NAME 0 14 open:entry 0 15 open:return 0 14 open:entry 0 15 open:return 0 14 open:entry^C

Page 35: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

DTrace Fundamentals 1-17Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

To enable the entry probe in the clock function (which should fire every1/100th second):

# dtrace -n clock:entrydtrace: description 'clock:entry' matched 1 probeCPU ID FUNCTION:NAME 0 4198 clock:entry 0 4198 clock:entry 0 4198 clock:entry 0 4198 clock:entry 0 4198 clock:entry 0 4198 clock:entry 0 4198 clock:entry^C

DTrace Actions

Actions are user-programmable statements that are executed within thekernel by the DTrace virtual machine. The following are properties ofactions:

● Actions are taken when a probe fires.

● Actions are completely programmable (in the D language).

● Most actions record some specified state in the system.

● Some actions can change the state of the system in a well-definedway.

● These are called destructive actions.

● Destructive actions are not allowed by default.

● Many actions use expressions in the D language.

For now, you will use D expressions that consist only of built-in Dvariables. The following are some of the most useful built-in D variables.See Appendix B for a complete list of the D built-in variables.

● pid – The current process ID

● execname – The current executable name

● timestamp – The time since boot in nanoseconds

● curthread – A pointer to the kthread_t structure that representsthe current thread

● probemod – The current probe’s module name

● probefunc – The current probe’s function name

Page 36: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

1-18 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

● probename – The current probe’s name

There are also many built-in functions that perform actions. Appendix A,“Actions and Subroutines” provides the complete list of D built-infunctions. Start with the trace() function, which records the result of a Dexpression to the trace buffer. For example:

● trace(pid) traces the current process ID.

● trace(execname) traces the name of the current executable.

● trace(curthread->t_pri) traces the t_pri field of the currentthread.

● trace(probefunc) traces the function name of the probe.

Actions are indicated by following a probe specification with“{ action } ”. For example:

# dtrace -n 'readch {trace(pid)}'dtrace: description 'readch ' matched 4 probesCPU ID FUNCTION:NAME 0 4036 read:readch 2040 0 4036 read:readch 2177 0 4036 read:readch 2177 0 4036 read:readch 2040 0 4036 read:readch 2181 0 4036 read:readch 2181 0 4036 read:readch 7...

In the last example the process identification number (PID) appears in thelast column of output.

Page 37: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

DTrace Fundamentals 1-19Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The following example traces the executable name:

# dtrace -m 'ufs {trace(execname)}'dtrace: description 'ufs ' matched 889 probesCPU ID FUNCTION:NAME

0 14977 ufs_lookup:entry ls0 15748 ufs_iaccess:entry ls0 15749 ufs_iaccess:return ls0 14978 ufs_lookup:return ls0 14977 ufs_lookup:entry ls0 15748 ufs_iaccess:entry ls0 15749 ufs_iaccess:return ls0 14978 ufs_lookup:return ls0 14977 ufs_lookup:entry ls0 15748 ufs_iaccess:entry ls0 15749 ufs_iaccess:return ls0 14978 ufs_lookup:return ls0 14977 ufs_lookup:entry ls

...0 15005 ufs_rwunlock:entry utmpd0 15006 ufs_rwunlock:return utmpd0 14963 ufs_close:entry utmpd0 14964 ufs_close:return utmpd0 15007 ufs_seek:entry utmpd0 15008 ufs_seek:return utmpd0 14963 ufs_close:entry utmpd

^C

Page 38: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

1-20 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The next action example traces the time of entry to each system call:

# dtrace -n 'syscall:::entry {trace(timestamp)}'dtrace: description 'syscall:::entry ' matched 226 probesCPU ID FUNCTION:NAME 0 312 portfs:entry 157088479572713 0 98 ioctl:entry 157088479637542 0 98 ioctl:entry 157088479674339 0 234 sysconfig:entry 157088479767243 0 234 sysconfig:entry 157088479774432 0 168 sigaction:entry 157088479993155 0 168 sigaction:entry 157088480229390 0 98 ioctl:entry 157088480318855 0 234 sysconfig:entry 157088480398692 0 38 brk:entry 157088480422525 0 38 brk:entry 157088480438097 0 98 ioctl:entry 157088480794819 0 98 ioctl:entry 157088480959666 0 98 ioctl:entry 157088480986498 0 98 ioctl:entry 157088481033225 0 60 fstat:entry 157088481050686 0 60 fstat:entry 157088481074680...

Multiple actions can be specified; they must be separated by semicolons:

# dtrace -n 'zfod {trace(pid);trace(execname)}'dtrace: description 'zfod ' matched 3 probesCPU ID FUNCTION:NAME 0 4080 anon_zero:zfod 2195 dtrace 0 4080 anon_zero:zfod 2195 dtrace 0 4080 anon_zero:zfod 2195 dtrace 0 4080 anon_zero:zfod 2195 dtrace 0 4080 anon_zero:zfod 2195 dtrace 0 4080 anon_zero:zfod 2197 bash 0 4080 anon_zero:zfod 2207 vi 0 4080 anon_zero:zfod 2207 vi...

Page 39: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

DTrace Fundamentals 1-21Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The following example traces the executable name in every entry to thepagefault function:

# dtrace -n 'fbt::pagefault:entry {trace(execname)}'dtrace: description 'fbt::pagefault:entry ' matched 1 probeCPU ID FUNCTION:NAME

0 2407 pagefault:entry dtrace0 2407 pagefault:entry dtrace0 2407 pagefault:entry dtrace0 2407 pagefault:entry sh0 2407 pagefault:entry sh0 2407 pagefault:entry sh0 2407 pagefault:entry sh0 2407 pagefault:entry sh

...

Writing D Scripts

Complicated DTrace enablings become difficult to manage on thecommand line. The dtrace (1M) command supports scripts, specifiedwith the -s option. Alternatively, you can create executable DTraceinterpreter files. Interpreter files always begin with:

#!/usr/sbin/dtrace -s

Executable D Scripts

For example, you can write a script to trace the executable name uponentry to each system call as follows:

# cat syscall.dsyscall:::entry{ trace(execname);}

Page 40: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

1-22 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

By convention, D scripts end with a .d suffix. You can run this D script asfollows:

# dtrace -s syscall.ddtrace: script 'syscall.d' matched 226 probesCPU ID FUNCTION:NAME

0 312 pollsys:entry java0 98 ioctl:entry dtrace0 98 ioctl:entry dtrace0 234 sysconfig:entry dtrace0 234 sysconfig:entry dtrace0 168 sigaction:entry dtrace0 168 sigaction:entry dtrace0 98 ioctl:entry dtrace0 234 sysconfig:entry dtrace0 38 brk:entry dtrace

^C

If you give the syscall.d file execute permission and add a first line toinvoke the interpreter, you can run the script by entering its name on thecommand line as follows:

# cat syscall.d#!/usr/sbin/dtrace -s

syscall:::entry{ trace(execname);}# chmod +x syscall.d# ls -l syscall.d-rwxr-xr-x 1 root other 62 May 12 11:30 syscall.d# ./syscall.ddtrace: script './syscall.d' matched 226 probesCPU ID FUNCTION:NAME

0 98 ioctl:entry java0 98 ioctl:entry java0 312 pollsys:entry java0 312 pollsys:entry java0 312 pollsys:entry java0 98 ioctl:entry dtrace0 98 ioctl:entry dtrace0 234 sysconfig:entry dtrace0 234 sysconfig:entry dtrace

Page 41: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Tour

DTrace Fundamentals 1-23Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

D Literal Strings

The D language supports literal strings that you can use with the tracefunction as follows:

# cat string.d#!/usr/sbin/dtrace -sfbt::bdev_strategy:entry{ trace(execname); trace(" is initiating a disk I/O\n");}

The \n at the end of the literal string produces a new line. To run thisscript, enter the following:

# dtrace -s string.ddtrace: script 'string.d' matched 1 probeCPU ID FUNCTION:NAME 0 9215 bdev_strategy:entry bash is initiating a disk I/O

0 9215 bdev_strategy:entry vi is initiating a disk I/O

0 9215 bdev_strategy:entry vi is initiating a disk I/O

0 9215 bdev_strategy:entry vi is initiating a disk I/O

0 9215 bdev_strategy:entry sched is initiating a disk I/O

The quiet mode option, -q , in dtrace (1M) tells DTrace to record only theactions explicitly stated. This option suppresses the default outputnormally produced by the dtrace command. The following exampleshows the use of the -q option on the string.d script:

# dtrace -q -s string.dls is initiating a disk I/Ocat is initiating a disk I/Ofsflush is initiating a disk I/Ovi is initiating a disk I/Ovi is initiating a disk I/O

Page 42: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

The BEGINand ENDProbes

The simple dtrace provider has only three probes. They are BEGIN, END,and ERROR. The BEGIN probe fires before all others and performs pre-processing steps. For example, it enables you to initialize variables, aswell as to display headings for output that is displayed by other actionsthat occur later. The ENDprobe fires after all other probes have fired andenables you to perform post-processing. The ERRORprobe fires when thereare any runtime errors in your D programs. The following example showsa simple use of the BEGIN and ENDprobes of the dtrace provider:

# cat beginEnd.d#!/usr/sbin/dtrace -sBEGIN{ trace("This is a heading\n");}

END{ trace("This should appear at the END\n");}

# ./beginEnd.ddtrace: script './beginEnd.d' matched 2 probesCPU ID FUNCTION:NAME 0 1 :BEGIN This is a heading

^C 0 2 :END This should appear at the END

# dtrace -qs beginEnd.dThis is a heading^CThis should appear at the END

Note – The ENDprobe does not fire until you interrupt (^C) the dtracecommand.

Page 43: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

2-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Module 2

Using DTrace

Objectives

Upon completion of this module, you should be able to:

● Describe the DTrace performance monitoring capabilities

● Examine performance problems using the vminfo provider

● Examine performance problems using the sysinfo provider

● Examine performance problems using the io provider

● Use DTrace to obtain information about system calls

● Create D scripts that use arguments

Page 44: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Relevance

2-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Relevance

?!

Discussion – The following questions are relevant to understanding howto use DTrace:

● What performance monitoring tools exist in the Solaris 10 OS?

● Would it be useful to know which process is making which systemcalls?

● What advantage does the ability to pass arguments to a D scriptprovide?

Page 45: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Additional Resources

Using DTrace 2-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Additional Resources

Additional resources – The following references provide additionalinformation on the topics described in this module:

● Cantrill, Bryan M., Michael W. Shapiro, and Adam H. Leventhal.“Dynamic Instrumentation of Production Systems.” paper presentedat the 2004 USENIX Conference.

● BigAdmin System Administration Portal[http://www.sun.com/bigadmin/content/dtrace ].

● Sun Microsystems, Inc. Solaris Dynamic Tracing Guide (Beta), partnumber 817-6223-10.

● The /usr/demo/dtrace directory contains all of the sample scriptsfrom the Solaris Dynamic Tracing Guide.

● dtrace (1M) manual page in the Solaris 10 OS manual pages, Solaris10 Reference Manual Collection.

Page 46: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Performance Monitoring Capabilities

2-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

DTrace Performance Monitoring Capabilities

A number of the DTrace providers implement probes that correspond toexisting Solaris OS performance monitoring tools:

● The vminfo provider – Implements probes that correspond to thevmstat (1M) tool

● The sysinfo provider – Implements probes that correspond to thempstat (1M) tool

● The io provider – Implements probes that correspond to theiostat (1M) tool

In addition, the syscall provider implements probes that correspond tothe truss (1) command.

Features of the DTrace Performance MonitoringCapabilities

Using the DTrace facility, you can extract the same information that thebundled tools provide, with significant added flexibility. DTrace enablesyou to gather only the specific information you need to diagnose theaberrant behavior. It also provides additional related information such asprocess and thread identification, stack traces, and other arbitrary kernelinformation available at the time the probes fire.

Aggregations

Aggregated data is more useful than individual data points in answeringperformance-related questions. For example, if you want to know thenumber of page faults by process, you do not necessarily care about eachindividual page fault. Rather, you want a table that lists the process namesand the total number of page faults.

DTrace provides several built-in aggregating functions. An aggregatingfunction has this property: if it is applied to subsets of a collection ofgathered data and then applied again to the results, it returns the sameresult as it does when applied to the whole collection. Examples ofaggregating functions are count (), sum(), min (), and max(); A medianfunction would not be considered an aggregating function because it lacksthe above stated property.

Page 47: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Performance Monitoring Capabilities

Using DTrace 2-5Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

DTrace is not required to store the entire set of data items foraggregations; it keeps a running count, needing only the currentintermediate result and the new element. Intermediate results are kept percentral processing unit (CPU), enabling a scalable implementation(because of not requiring the use of locks).

DTrace Aggregation Syntax

The general form of a DTrace aggregation is:

@name[ keys ] = aggfunc( args );

These variables are defined as follows:

● name – The name of the aggregation that is preceded by the @character

● keys – A comma-separated list of D expressions

● aggfunc – One of the DTrace aggregating functions

● args – A comma-separated list of arguments appropriate to theaggregating function

DTrace Aggregating Functions

Table 2-1 lists the DTrace aggregating functions.

Table 2-1 DTrace Aggregating Functions

FunctionName Arguments Result

count none The number of times called.

sum scalarexpression

The total value of the specified expressions.

avg scalarexpression

The arithmetic average (mean) of the specifiedexpressions.

min scalarexpression

The smallest value of the specified expressions.

max scalarexpression

The largest value of the specified expressions.

Page 48: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Performance Monitoring Capabilities

2-6 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Example Use of Aggregating Function

In the following example, the count aggregating function is used to countthe number of write (2) system calls per process:

# cat writes.d#!/usr/sbin/dtrace -ssyscall::write:entry{ @numWrites[execname] = count();}

# ./writes.ddtrace: script 'writes.d' matched 1 probe^C dtrace 1 date 1 bash 3 grep 20 file 197 ls 201

Note – No data is output from the aggregation until dtrace(1M) isterminated. The output data is a summary up to that point.

lquantize scalarexpression,lower bound,upper bound,step value

A linear frequency distribution, sized by the specifiedrange, of the values of the specified expression.Increments the value in the highest bucket that is lessthan or equal to the specified expression.

quantize scalarexpression

A power-of-two frequency distribution of the values ofthe specified expression. Increments the value in thehighest power-of-two bucket that is less than or equal tothe specified expression.

Table 2-1 DTrace Aggregating Functions (Continued)

FunctionName Arguments Result

Page 49: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Performance Monitoring Capabilities

Using DTrace 2-7Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Arguments Supplied by Providers

The syscall provider gives you access to a system call’s arguments,using the syntax arg0 , arg1 , arg2 , for the function’s first, second, third,and so on, arguments. These argument values are of type int64_t . Youcan also refer to the correctly typed arguments through the args[] array:args[0] , args[1] , and so on. The following example displays theaverage write size per process:

# cat writes2.d#!/usr/sbin/dtrace -ssyscall::write:entry{ @avgSize[execname] = avg(arg2);}

# ./writes2.ddtrace: script 'writes2.d' matched 1 probe^C dtrace 1 bash 27 date 29 file 37 grep 60 ls 68

Page 50: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

2-8 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Examining Performance Problems Using the vminfoProvider

The vminfo provider makes available probes from the virtual memory(vm) kernel statistics (kstat) kept by the kernel kstat facility. You canexamine any unexplainable behavior observed from the vm specificoutput of the vmstat (1M) command using this DTrace provider. A probeprovided by the vminfo provider fires immediately before thecorresponding vmkstat value is incremented. To display both the namesand the current values (counts) of the vmnamed kstat, you can use thekstat (1M) command as shown in the following command example.

# kstat -n vmmodule: cpu instance: 0name: vm class: misc anonfree 0 anonpgin 4 anonpgout 0 as_fault 157771 cow_fault 34207 crtime 0.178610697 dfree 56 execfree 0 execpgin 3646 execpgout 0 fsfree 56 fspgin 16257 fspgout 57 hat_fault 0 kernel_asflt 0 maj_fault 6743 pgfrec 34215 pgin 9188 pgout 36 pgpgin 19907 pgpgout 57 pgrec 34216 pgrrun 4 pgswapin 0 pgswapout 0 prot_fault 39794 rev 0 scan 28668

Page 51: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

Using DTrace 2-9Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

snaptime 349429.087071013 softlock 165 swapin 0 swapout 0 zfod 12835

The vminfo Probes

Table 2-2 describes the vminfo probes.

Table 2-2 The vminfo Probes

Probe Name Description

anonfree Probe that fires when an unmodified anonymous page is freed as part ofpaging activity. Anonymous pages are those that are not associated witha file; memory containing such pages include heap memory, stackmemory, or memory obtained by explicitly mapping zero (7D).

anonpgin Probe that fires when an anonymous page is paged in from a swapdevice.

anonpgout Probe that fires when a modified anonymous page is paged out to a swapdevice.

as_fault Probe that fires when a fault is taken on a page and the fault is neither aprotection fault nor a copy-on-write fault.

cow_fault Probe that fires when a copy-on-write fault is taken on a page. The arg0argument contains the number of pages that are created as a result of thecopy-on-write.

dfree Probe that fires when a page is freed as a result of paging activity. Whendfree fires, exactly one of the anonfree , execfree, or fsfree probesalso subsequently fires.

execfree Probe that fires when an unmodified executable page is freed as a result ofpaging activity.

execpgin Probe that fires when an executable page is paged in from the backingstore.

execpgout Probe that fires when a modified executable page is paged out to thebacking store. Most paging of executable pages occurs in terms of theexecfree probe; the execpgout probe can only fire if an executable pageis modified in memory, an uncommon occurrence in most systems.

fsfree Probe that fires when an unmodified file system data page is freed as partof paging activity.

Page 52: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

2-10 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

fspgin Probe that fires when a file system page is paged in from the backing store.

fspgout Probe that fires when a file system page is paged out to the backing store.

kernel_asflt Probe that fires when a page fault is taken by the kernel on a page in itsown address space. When the kernel_asflt probe fires, it is immediatelypreceded by a firing of the as_fault probe.

maj_fault Probe that fires when a page fault is taken that results in input/output(I/O) from a backing store or swap device. Whenever maj_fault fires, itis immediately preceded by a firing of the pgin probe.

pgfrec Probe that fires when a page is reclaimed from the free page list.

pgin Probe that fires when a page is paged in from the backing store or from aswap device. This differs from the maj_fault probe in that themaj_fault probe only fires when a page is paged in as a result of a pagefault; the pgin probe fires when a page is paged in, regardless of thereason.

pgout Probe that fires when a page is paged out to the backing store or to a swapdevice.

pgpgin Probe that fires when a page is paged in from the backing store or from aswap device. The only difference between the pgpgin probe and the pginprobe is that the pgpgin probe contains the number of pages paged in asthe arg0 argument. (The pgin probe always contains 1 in the arg0argument.)

pgpgout Probe that fires when a page is paged out to the backing store or to a swapdevice. The only difference between the pgpgout probe and the pgoutprobe is that the pgpgout probe contains the number of pages paged outas the arg0 argument. (The pgout probe always contains 1 in the arg0argument.)

pgrec Probe that fires when a page is reclaimed.

pgrrun Probe that fires when the pager is scheduled.

pgswapin Probe that fires when a process is swapped in.

pgswapout Probe that fires when a process is swapped out.

prot_fault Probe that fires when a page fault is taken due to a protection violation.

rev Probe that fires when the page daemon begins a new revolution throughall pages.

scan Probe that fires when the page daemon examines a page.

Table 2-2 The vminfo Probes (Continued)

Probe Name Description

Page 53: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

Using DTrace 2-11Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Finding the Source of Page Faults Using vminfoProbes

Consider the following example output, obtained by running the vmstatcommand.

# vmstat 5 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s2 s1 -- in sy cs us sy id 0 0 0 648560 437016 3 11 13 0 0 0 8 0 1 0 0 406 42 50 0 0 100 0 0 0 598912 396136 0 11 27 0 0 0 0 0 0 0 0 615 113 67 0 0 100 0 0 0 598888 396112 0 1 0 0 0 0 0 0 0 0 0 604 69 47 0 0 100 0 0 0 598864 396088 0 1 0 0 0 0 0 0 0 0 0 616 69 72 0 0 100 0 0 0 598864 396088 0 0 0 0 0 0 0 0 0 0 0 619 73 89 0 0 100 0 1 0 598104 393456 4 45 3588 0 0 0 0 0 474 0 0 2014 5138 1013 3 17 79 0 0 0 595224 381544 0 2 5273 0 0 0 0 0 698 0 0 2593 7545 1448 3 31 66 0 0 0 592024 368832 0 1 5509 0 0 0 0 0 725 0 0 2674 7840 1503 3 26 71 0 0 0 588792 362640 1 3 3679 0 0 0 0 0 485 0 0 2009 5259 1027 3 20 77 0 0 0 587984 361848 0 3 4 0 0 0 0 0 0 0 0 605 80 70 0 0 100 0 0 0 587960 361800 0 4 20 0 0 0 0 0 2 0 0 624 74 91 0 0 100 0 0 0 587944 361768 0 1 0 0 0 0 0 0 0 0 0 614 76 78 0 0 100 0 0 0 587920 361744 0 1 0 0 0 0 0 0 0 0 0 616 69 80 0 0 100 0 0 0 587848 361672 0 1 0 0 0 0 0 0 18 0 0 689 69 69 0 0 100 0 0 0 587832 361656 0 1 0 0 0 0 0 0 0 0 0 611 74 67 0 0 100 0 0 0 587808 361632 0 5 0 0 0 0 0 0 0 0 0 611 71 66 0 0 100 0 0 0 587784 361608 40 193 844 0 0 0 0 0 107 0 0 953 905 260 3 5 92 0 0 0 588184 362576 0 1 0 0 0 0 0 0 0 0 0 611 69 71 0 0 100

Here the pi column denotes the number of kilobytes paged in per second.

Executable Causing Page Faults

The vminfo provider makes it easy to discover more about the source ofthese page-ins. The following example uses an anonymous aggregation:

softlock Probe that fires when a page is faulted as a part of placing a software lockon the page.

swapin Probe that fires when a swapped-out process is swapped back in.

swapout Probe that fires when a process is swapped out.

zfod Probe that fires when a zero-filled page is created on demand.

Table 2-2 The vminfo Probes (Continued)

Probe Name Description

Page 54: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

2-12 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# dtrace -n 'pgin {@[execname] = count()}'dtrace: description 'pgin ' matched 1 probe^C utmpd 2 in.routed 2 init 2 snmpd 5 automountd 5 vi 5 vmstat 17 sh 23 grep 33 dtrace 35 bash 62 file 198 find 4551

This output shows that the find command is responsible for most of thepage-ins. For a more complete picture of the find command in terms ofvm behavior, you can enable all vminfo probes. Before doing this,however, you must introduce a filtering capability of DTrace called apredicate.

Predicates

A D program consists of a set of probe clauses. A probe clause has thefollowing general form:

probe descriptions

/ predicate /

{

action statements

}

Predicates are D expressions enclosed in slashes / / that are evaluated atprobe firing time to determine whether the associated actions should beexecuted. If the D expression evaluates to zero it is false; if it evaluates tonon-zero it is true. Predicates are optional, but you must place thembetween the probe description and the action statements.

Page 55: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

Using DTrace 2-13Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Details About the Executable Causing Page Faults

The following example examines the system’s detailed vm behavior whilethe find command runs:

# cat find.d#!/usr/sbin/dtrace -svminfo:::/execname == "find"/{ @[probename] = count(); }

Before running this D program, run a find command in the backgroundwhile another utility uses up a substantial portion of the system’smemory, as shown in the following example.

# (sleep 10 ; find / -name fubar & mkfile 300m /tmp/junk)&[1] 840# ps PID TTY TIME CMD 615 pts/2 0:00 sh 841 pts/2 0:00 sleep 625 pts/2 0:00 bash 840 pts/2 0:00 bash 842 pts/2 0:00 ps# ps PID TTY TIME CMD 615 pts/2 0:00 sh 843 pts/2 0:02 find 625 pts/2 0:00 bash 840 pts/2 0:00 bash 845 pts/2 0:00 ps 844 pts/2 0:02 mkfile# ps PID TTY TIME CMD 615 pts/2 0:00 sh 843 pts/2 0:08 find 625 pts/2 0:00 bash 846 pts/2 0:00 ps[1]+ Done ( sleep 10 ; find / -name fubar & mkfile 300m /tmp/junk )# ps PID TTY TIME CMD 615 pts/2 0:00 sh 847 pts/2 0:00 ps 625 pts/2 0:00 bash

Page 56: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

2-14 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The following dtrace command was started in another terminal windowimmediately after the above command group was started in thebackground.

# dtrace -s find.ddtrace: script 'find.d' matched 44 probes^C prot_fault 2 cow_fault 8 softlock 11 execpgin 15 kernel_asflt 40 zfod 52 as_fault 170 pgrec 5417 pgfrec 5417 maj_fault 18068 fspgin 18103 pgpgin 18118 pgin 18118

You might wonder why, with such a large memory load, scans do notshow up in the output of the dtrace command. This is because thepageout daemon is running during scans, not the find user process. Thefollowing example shows this behavior.

# cat mem.d#!/usr/sbin/dtrace -svminfo:::{ @vm[execname,probename] = count();}

END{ printa("%16s\t%16s\t%@d\n", @vm);}

# dtrace -qs mem.d^C sleep prot_fault 1 rm prot_fault 1 pageout rev 1 dtrace pgfrec 1 bash kernel_asflt 1 in.routed anonpgin 1

Page 57: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

Using DTrace 2-15Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

mkfile prot_fault 1 find prot_fault 1 dtrace pgrec 1 mkfile execpgin 2 mkfile kernel_asflt 2 vmstat prot_fault 2 rm zfod 3 find execpgin 3 sleep zfod 3 mkfile zfod 3 sendmail anonpgin 3 mkfile cow_fault 4 rm cow_fault 4 bash anonpgin 4 rm maj_fault 4 sendmail pgfrec 4 sleep cow_fault 4 find cow_fault 4 sendmail pgrec 4... bash pgrec 205 pageout fspgout 293 pageout anonpgout 293 pageout pgpgout 293 pageout pgout 293 pageout execpgout 293 pageout pgrec 293 pageout anonfree 360 pageout execfree 510 bash as_fault 519 pageout fsfree 519 sched dfree 523 sched pgrec 523 sched pgout 523 sched pgpgout 523 sched anonpgout 523 sched anonfree 523 sched execpgout 523 sched execfree 523 pageout dfree 803 rm pgrec 1388 rm pgfrec 1388 find maj_fault 5067 find fspgin 5085 find pgin 5088 find pgpgin 5088

Page 58: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the vminfo Provider

2-16 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

pageout scan 78852

The printa () built-in formatting function gives you increased controlover the output of an aggregation. For example, consider the followingcode line:

{printa("%16s\t%16s\t%@d\n", @vm);}

It provides these formatting instructions:

● %16s\t%16s prints the first and second elements of the aggregationkeys in a 16-character-wide column (right justified).

● \t outputs a <Tab>.

● %@dprints the aggregation value as a decimal number.

Note – Appendix A provides more details on the format letters availableto the printa () function and the more general printf () function (whichresembles the printf (3C) function from the Standard C Library).

Page 59: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

Using DTrace 2-17Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Examining Performance Problems Using the sysinfoProvider

The sysinfo provider makes available probes that correspond to the“sys ” kernel statistics. Because these statistics provide the input forsystem monitoring utilities such as mpstat (1M), the sysinfo providerenables quick exploration of observed aberrant behavior.

The sysinfo provider probes fire immediately before the sys namedkstat is incremented. The following example displays the sys namedkstat.

# kstat -n sysmodule: cpu instance: 0name: sys class: misc bawrite 112 bread 6359 bwrite 1401 canch 374 cpu_ticks_idle 2782331 cpu_ticks_kernel 46571 cpu_ticks_user 12187 cpu_ticks_wait 30197 cpumigrate 0... syscall 3991217 sysexec 1088 sysfork 1043 sysread 131334 sysvfork 47 syswrite 676775 trap 266286 ufsdirblk 1027383 ufsiget 1086164 ufsinopage 873613 ufsipage 2 wait_ticks_io 30197 writech 5144172931 xcalls 0 xmtint 0

Page 60: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

2-18 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The sysinfo Probes

Table 2-3 describes the sysinfo probes.

Table 2-3 The sysinfo Probes

Probe Name Description

bawrite Probe that fires when a buffer is about to be asynchronously writtenout to a device.

bread Probe that fires when a buffer is physically read from a device. Thebread probe fires after the buffer has been requested from thedevice, but before blocking pending its completion.

bwrite Probe that fires when a buffer is about to be written out to a devicesynchronously or asynchronously.

cpu_ticks_idle Probe that fires when the periodic system clock has determined thata CPU is idle. Note that this probe fires in the context of the systemclock and therefore fires on the CPU running the system clock; onemust examine the cpu_t argument (arg2 ) to determine the CPUthat has been deemed idle.

cpu_ticks_kernel Probe that fires when the periodic system clock has determined thata CPU is executing in the kernel. Note that this probe fires in thecontext of the system clock and therefore fires on the CPU runningthe system clock; one must examine the cpu_t argument (arg2 ) todetermine the CPU that has been deemed to be executing in thekernel.

cpu_ticks_user Probe that fires when the periodic system clock has determined thata CPU is executing in user mode. Note that this probe fires in thecontext of the system clock and therefore fires on the CPU runningthe system clock; one must examine the cpu_t argument (arg2 ) todetermine the CPU that has been deemed to be running in user-mode.

cpu_ticks_wait Probe that fires when the periodic system clock has determined thata CPU is otherwise idle, but on which some threads are waiting forI/O. Note that this probe fires in the context of the system clock andtherefore fires on the CPU running the system clock; one mustexamine the cpu_t argument (arg2 ) to determine the CPU that hasbeen deemed waiting on I/O.

idlethread Probe that fires when a CPU enters the idle loop.

intrblk Probe that fires when an interrupt thread blocks.

Page 61: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

Using DTrace 2-19Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

inv_swtch Probe that fires when a running thread is forced to involuntarilygive up the CPU.

lread Probe that fires when a buffer is logically read from a device.

lwrite Probe that fires when a buffer is logically written to a device.

modload Probe that fires when a kernel module is loaded.

modunload Probe that fires when a kernel module is unloaded.

msg Probe that fires when a msgsnd(2) or msgrcv (2) system call is made,but before the message queue operations have been performed.

mutex_adenters Probe that fires when an attempt is made to acquire an ownedadaptive lock. If this probe fires, one of the lockstat providerprobes (adaptive-block or adaptive-spin ) also fires.

namei Probe that fires when a name lookup is attempted in the file system.

nthreads Probe that fires when a thread is created.

phread Probe that fires when a raw I/O read is about to be performed.

phwrite Probe that fires when a raw I/O write is about to be performed.

procovf Probe that fires when a new process cannot be created because thesystem is out of process table entries.

pswitch Probe that fires when a CPU switches from executing one thread toexecuting another.

readch Probe that fires after each successful read, but before control isreturned to the thread performing the read. A read can occurthrough the read (2), readv (2), or pread (2) system calls. The arg0argument contains the number of bytes that were successfully read.

rw_rdfails Probe that fires when an attempt is made to read-lock areaders/writer lock when the lock is either held by a writer, ordesired by a writer. If this probe fires, the lockstat provider's rw-block probe also fires.

rw_wrfails Probe that fires when an attempt is made to write-lock areaders/writer lock when the lock is held either by some number ofreaders or by another writer. If this probe fires, the lockstatprovider's rw-block probe also fires.

sema Probe that fires when a semop(2) system call is made, but before anysemaphore operations have been performed.

Table 2-3 The sysinfo Probes (Continued)

Probe Name Description

Page 62: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

2-20 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

sysexec Probe that fires when an exec (2) system call is made.

sysfork Probe that fires when a fork (2) system call is made.

sysread Probe that fires when a read (2), readv (2) or pread (2) system call ismade.

sysvfork Probe that fires when a vfork (2) system call is made.

syswrite Probe that fires when a write (2), writev (2), or pwrite (2) systemcall is made.

trap Probe that fires when a processor trap occurs. Note that someprocessors (in particular, UltraSPARC® variants) handle somelightweight traps through a mechanism that does not cause thisprobe to fire.

ufsdirblk Probe that fires when a directory block is read from the UFS filesystem. See the ufs (7FS) man page for details on UFS.

ufsiget Probe that fires when an inode is retrieved. See the ufs (7FS) manpage for details on UFS.

ufsinopage Probe that fires after an in-core inode without any associated datapages has been made available for reuse. See the ufs (7FS) manpage for details on UFS.

ufsipage Probe that fires after an in-core inode with associated data pageshas been made available for reuse and therefore after the associateddata pages have been flushed to disk. See the ufs (7FS) man pagefor details on UFS.

wait_ticks_io Probe that fires when the periodic system clock has determined thata CPU is otherwise idle, but on which some threads are waiting forI/O. Note that this probe fires in the context of the system clock andtherefore fires on the CPU running the system clock; one mustexamine the cpu_t argument (arg2 ) to determine the CPU that hasbeen deemed waiting on I/O. Note that there is no semanticdifference between wait_ticks_io and cpu_ticks_io ;wait_ticks_io exists purely for historical reasons.

writech Probe that fires after each successful write, but before control isreturned to the thread performing the write. A write can occurthrough the write (2), writev (2), or pwrite (2) system calls. Thearg0 argument contains the number of bytes that were successfullywritten.

Table 2-3 The sysinfo Probes (Continued)

Probe Name Description

Page 63: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

Using DTrace 2-21Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Using the quantize Aggregation Function Withthe sysinfo Probes

The quantize aggregation function displays a power-of-two frequencydistribution bar graph of its argument. The following example shows howyou can determine the size of reads being performed by all processes overa 10-second period. The arg0 argument for the sysinfo probes states theamount to increment the statistic; it is 1 for most sysinfo probes. Twoexceptions are the readch and writech probes, for which the arg0argument is set to the actual number of bytes read or written respectively.

# cat -n read.d 1 #!/usr/sbin/dtrace -s 2 sysinfo:::readch 3 { 4 @[execname] = quantize(arg0); 5 } 6 7 tick-10sec 8 { 9 exit(0); 10 }

# dtrace -s read.ddtrace: script 'read.d' matched 5 probesCPU ID FUNCTION:NAME 0 36754 :tick-10sec

bash value ------------- Distribution ------------- count 0 | 0 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13 2 | 0

file value ------------- Distribution ------------- count -1 | 0

xcalls Probe that fires when a cross-call is about to be made. A cross-call isthe operating system's mechanism for one CPU to requestimmediate work from another.

Table 2-3 The sysinfo Probes (Continued)

Probe Name Description

Page 64: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

2-22 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 | 2 1 | 0 2 | 0 4 | 6 8 | 0 16 | 0 32 | 6 64 | 6 128 |@@ 16 256 |@@@@ 30 512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 199 1024 | 0 2048 | 0 4096 | 1 8192 | 1 16384 | 0

grep value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@@@@@@ 99 1 | 0 2 | 0 4 | 0 8 | 0 16 | 0 32 | 0 64 | 0 128 | 1 256 |@@@@ 25 512 |@@@@ 23 1024 |@@@@ 24 2048 |@@@@ 22 4096 | 4 8192 | 3 16384 | 0

Finding the Source of Cross-Calls

Consider the following output from the mpstat (1M) command:

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2189 0 1302 14 1 215 12 54 28 0 12995 13 14 0 73 1 3385 0 1137 218 104 195 13 58 33 0 14486 19 15 0 66 2 1918 0 1039 12 1 226 15 49 22 0 13251 13 12 0 75

Page 65: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

Using DTrace 2-23Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

3 2430 0 1284 220 113 201 10 50 26 0 13926 10 15 0 75

The xcal and syscl columns display relatively high numbers, whichmight be affecting the system’s performance. Yet the system is relativelyidle, and is not spending time waiting on input/output (I/O). The xcalnumbers are per-second and are read from the xcalls field of the syskstat. To see which executables are responsible for the xcalls , enter thefollowing dtrace (1M) command:

# dtrace -n 'xcalls {@[execname] = count()}'dtrace: description 'xcalls ' matched 3 probes^C find 2 cut 2 snmpd 2 mpstat 22 sendmail 101 grep 123 bash 175 dtrace 435 sched 784 xargs 22308 file 89889#

This output indicates the source of the cross-calls: some number offile (1) and xargs (1) processes are inducing the majority of them. Youcan find these processes using the pgrep (1) and ptree (1) commands:

# pgrep xargs15973# ptree 15973204 /usr/sbin/inetd -s 5650 in.telnetd 5653 -sh 5657 bash 15970 /bin/sh ./findtxt configuration 15971 cut -f1 -d: 15973 xargs file 16686 file /usr/bin/tbl /usr/bin/troff /usr/bin/ul/usr/bin/vgrind /usr/bin/catman

Page 66: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

2-24 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The xargs and file commands appear to be part of a custom user shellscript. You can locate this script as follows:

# find / -name findtxt/users1/james/findtxt# cat /users1/james/findtxt#!/bin/shfind / -type f | xargs file | grep text | cut -f1 -d: >/tmp/findtxt$$cat /tmp/findtxt$$ | xargs grep $1rm /tmp/findtxt$$#

The script is running many processes concurrently with much inter-process communication occurring through pipes. This script appears to bequite resource intensive: it is trying to find every text file in the systemand is then searching each one for some specific text. You expect theseprocesses to run concurrently on this system’s four processors while theysend data to each other.

Stack Trace xcall Details

You can gather more details on which kernel code is involved in all of thecross-calls while the file and xargs commands are running. Thefollowing example uses the stack () built-in DTrace function as theaggregation key to show which kernel code is requesting the cross-call.The number of unique kernel stack traces is being counted.

# dtrace -n 'xcalls {@[stack()] = count()}'dtrace: description 'xcalls ' matched 3 probes^C SUNW,UltraSPARC-IIIi send_mondo_set+0x9c unix`xt_some+0xc4 unix`xt_sync+0x3c unix`hat_unload_callback+0x6ec unix`memscrub_scan+0x298 unix`memscrubber+0x308 unix thread_start+0x4 2

SUNW,UltraSPARC-IIIi send_mondo_set+0x9c unix`xt_some+0xc4 unix`sfmmu_tlb_demap+0x118 unix`sfmmu_hblk_unload+0x368 unix`hat_unload_callback+0x534 unix`memscrub_scan+0x298 unix`memscrubber+0x308

Page 67: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the sysinfo Provider

Using DTrace 2-25Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

unix thread_start+0x4 2... SUNW,UltraSPARC-IIIi send_mondo_set+0x9c unix`xt_some+0xc4 unix`xt_sync+0x3c unix`hat_unload_callback+0x6ec genunix`anon_private+0x204 genunix`segvn_faultpage+0x778 genunix`segvn_fault+0x920 genunix`as_fault+0x4a0 unix`pagefault+0xac unix trap+0xc14 unix`utl0+0x4c 2303

SUNW,UltraSPARC-IIIi send_mondo_set+0x9c unix`xt_some+0xc4 unix`sfmmu_tlb_range_demap+0x190 unix`sfmmu_chgattr+0x2e8 genunix`segvn_dup+0x3d0 genunix`as_dup+0xd0 genunix`cfork+0x120 unix`syscall_trap32+0xa8 7175

SUNW,UltraSPARC-IIIi send_mondo_set+0x9c unix`xt_some+0xc4 unix`xt_sync+0x3c unix`sfmmu_chgattr+0x2f0 genunix`segvn_dup+0x3d0 genunix`as_dup+0xd0 genunix`cfork+0x120 unix`syscall_trap32+0xa8 11492

As this output shows, the majority of the cross-calls are the result of asignificant number of fork (2) system calls. (Shell scripts are notorious forabusing their fork (2) privileges.) Page faults of anonymous memory arealso involved, which probably accounts for the large number of minorpage faults seen in the mpstat output.

Page 68: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

2-26 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Examining Performance Problems Using the io Provider

The io provider makes available probes related to disk input and output(I/O). The io provider is designed to enable quick exploration of behaviorobserved through I/O monitoring tools such as iostat (1M). The ioprovider describes the nature of the system’s I/O by providing data suchas the following:

● Device

● I/O type

● Process ID

● Application name

● File name

● File offset

The io Probes

Table 2-4 describes the io probes.

Table 2-4 The io Probes

Probe Name Description

start Probe that fires when an I/O request is about to be made to a diskdevice or to an NFS server. The buf (9S) structure corresponding to theI/O request is pointed to by the args[0] argument. The devinfo_tstructure of the device to which the I/O is being issued is pointed toby the args[1] argument. The fileinfo_t structure of the file thatcorresponds to the I/O request is pointed to by the args[2]argument. Note that file information availability depends on the filesystem making the I/O request.

done Probe that fires after an I/O request has been fulfilled. The buf (9S)structure corresponding to the I/O request is pointed to by theargs[0] argument. The devinto_t structure of the device to which theI/O was issued is pointed to by the args[1] argument. Thefileinfo_t structure of the file that corresponds to the I/O request ispointed to by the args[2] argument.

Page 69: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

Using DTrace 2-27Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Information Available When io Probes Fire

The io probes fire for all I/O requests to disk devices, and for all file readand file write requests to an NFS server (except for metadata requests,such as readdir (3C)).

The io provider uses three I/O structures: the buf (9S) structure, thedevinfo_t structure, and the fileinfo_t structure.

When the io probes fire, the following arguments are made available:

● args[0] – Set to point to the buf (9S) structure corresponding to theI/O request.

● args[1] – Set to point to the devinfo_t structure of the device towhich the I/O was issued.

● args[2] – Set to point to the fileinfo_t structure containing filesystem related information regarding the issued I/O request.

wait-start Probe that fires immediately before a thread begins to wait pendingcompletion of a given I/O request. The buf (9S) structurecorresponding to the I/O request for which the thread will wait ispointed to by the args[0] argument. The devinfo_t structure of thedevice to which the I/O was issued is pointed to by the args[1]argument. The fileinto_t structure of the file that corresponds to theI/O request is pointed to by the args[2] argument. Some time after thewait-start probe fires, the wait-done probe fires in the same thread.

wait-done Probe that fires immediately after a thread wakes up from waiting for apending completion of a given I/O request. The buf (9S) structurecorresponding to the I/O request for which the thread was waiting ispointed to by the args[0] argument. The devinfo_t structure of thedevice to which the I/O was issued is pointed to by the args[1]argument. The fileinfo_t structure of the file that corresponds to theI/O request is pointed to by the args[2] argument. Some time after thewait-start probe fires, the wait-done probe fires in the same thread.

Table 2-4 The io Probes (Continued)

Probe Name Description

Page 70: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

2-28 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The buf (9S) Structure

The buf (9S) structure is the abstraction that describes an I/O request. Theaddress of this structure is made available to your D programs throughthe args[0] argument. Here is its definition:

struct buf { int b_flags; /* flags */ size t b_bcount; /* number of bytes */ caddr_t b_addr; /* buffer address */ uint64_t b_blkno; /* expanded block # on device */ uint64_t b_lblkno; /* block # on device */ size_t b_resid; /* # of bytes not transferred */ size t b_bufsize; /* size of allocated buffer */ caddr_t b_iodone; /* I/O completion routine */ int b_error; /* expanded error field */ dev_t b_edev; /* extended device */}

The b_flags member indicates the state of the I/O buffer and consists ofa bitwise OR operator of different state values.

Table 2-5 shows the valid state values for the b_flags field.

Table 2-5 The b_flags Field Values

Flag Value Description

B_DONE Indicates the data transfer has completed.

B_ERROR Indicates an I/O transfer error. It is set in conjunction withthe b_error field.

B_PAGEIO Indicates the buffer is being used in a paged I/O request.See the description of the b_addr field (Table 2-6) for moreinformation.

B_PHYS Indicates the buffer is being used for physical (direct) I/Oto a user data area.

B_READ Indicates that data is to be read from the peripheral deviceinto main memory.

B_WRITE Indicates that the data is to be transferred from mainmemory to the peripheral device.

Page 71: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

Using DTrace 2-29Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Table 2-6 shows the field descriptions for the buf( 9S) structure.

Table 2-6 The buf (9S) Structure Field Descriptions

Field Description

b_bcount Indicates the number of bytes to be transferred as part ofthe I/O request.

b_addr Indicates the virtual address of the I/O request, unlessB_PAGEIOis set. The address is a kernel virtual addressunless B_PHYSis set, in which case it is a user virtualaddress. If B_PAGEIOis set, the b_addr field containskernel private data. Note that either B_PHYSor B_PAGEIOor neither can be set, but not both.

b_lblkno Identifies which logical block on the device is to beaccessed. The mapping from a logical block to a physicalblock (cylinder, track, and so on) is defined by the device.

b_resid Indicates the number of bytes not transferred because ofan error.

b_bufsize Contains the size of the allocated buffer.

b_iodone Identifies a specific routine in the kernel that is called whenthe I/O is complete.

b_error Holds an error code returned from the driver in the eventof an I/O error. b_error is set in conjunction with theB_ERROR bit set in the b_f1ags member.

b_edev Contains the major and minor device numbers of thedevice accessed. Consumers can use the D built-infunctions getmajor () and getminor () to extract the majorand minor device numbers from the b_edev field.

Page 72: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

2-30 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The devinfo_t Structure

The devinfo_t structure provides information about a device. A pointerto this structure is available to D programs through the args[1]argument. Its members are as follows:

typedef struct devinfo { int dev_major; /* major number */ inc dev_minor; /* minor number */ inc dev_instance; /* instance number */ srring dev_name; /* name of device */ string dev_statname; /* name of device + instance/minor */ string dev_pathname; /* pathname of device */} devinfo_t;

Table 2-7 shows the field descriptions for the devinfo_t structure.

Table 2-7 The devinfo_t Structure Field Descriptions

Field Description

dev_major Indicates the major number of the device; seegetmajor (9F).

dev_minor Indicates the minor number of the device; seeqetminor (9F).

dev_instance Indicates the instance number of the device. Theinstance of a device is different from the minornumber: where the minor number is an abstractionmanaged by the device driver, the instance number isa property of the device node. Device node instancenumbers can be displayed with the prtconf (lM)command.

dev_name Indicates the name of the device driver that managesthe device. (Device driver names can be viewed withthe -D option to prtconf (1M).)

dev_statname Indicates the name of the device as reported by theiostat (1M) command. This name also corresponds tothe name of the device as reported by the kstat (1M)command. This field is provided to enable aberrantiostat or kstat output to be correlated to actual I/Oactivity.

dev_pathname Indicates the complete path of the device.

Page 73: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

Using DTrace 2-31Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The fileinfo_t Structure

The fileinfo_t structure provides information about a file. The file towhich an I/O corresponds is pointed to by the args[2] argument in thestart , done , wait-start , and wait-done probes. Note that fileinformation is contingent upon the file system providing this informationwhen dispatching I/O requests; some file systems, especially third-partyfile systems, do not provide the information. Moreover, I/O requests forwhich there is no file information can emanate from the file system. Forexample, I/O to file system metadata is not associated with a specific file.Following is the definition of the fileinfo_t structure:

typedef struct fileinfo { strinq fi_name; /* name (basename of fi_pathname) */ string fi_dirname /* directory (dirname of fi_pathname) */ string fi_pathname; /* full pathname */ offset_t fi_offset; /* offset within file */ string fi_fs; /* filesystem */ string fi_mount /* mount point of file system */} fileinfo_t;

Table 2-8 shows the field descriptions for the fileinfo_t structure.

Table 2-8 The fileinfo_t Structure Field Descriptions

Field Description

fi_name Contains the name of the file without any directorycomponents. If there is no file information associatedwith an I/O, the fi_name field is set to the string“<none> .” In rare cases, the pathname associated witha file is unknown; in this case, the fi_name field is setto the string “<unknown> .”

fi_dirname Contains only the directory component of the filename. As with fi_name, this can be set to “<none> ” ifthere is no file information present, or to “<unknown> ”if the pathname associated with the file is not known.

fi_pathname Contains the complete pathname to the file. As withfi_name , this can be set to “<none> ” if there is no fileinformation present, or to “<unknown> ” if thepathname associated with the file is not known.

fi_offset Contains the offset within the file, or -1 if fileinformation is not present or if the offset is otherwiseunspecified by the file system.

Page 74: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

2-32 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Finding I/O Problems

Consider the following output from the iostat (1M) command.

extended device statisticsdevice r/s w/s kr/s kw/s wait actv svc_t %w %bfd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0sd0 2.5 168.7 20.0 10937.7 0.0 3.7 21.7 0 75sd2 106.6 0.0 4319.9 0.0 0.0 0.7 6.5 0 54sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 extended device statisticsdevice r/s w/s kr/s kw/s wait actv svc_t %w %bfd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0sd0 0.5 168.7 4.0 16162.5 0.0 9.6 56.9 0 72sd2 80.9 0.0 7570.5 0.0 0.0 1.1 13.2 0 68sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 extended device statisticsdevice r/s w/s kr/s kw/s wait actv svc_t %w %bfd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0sd0 1.0 166.3 8.1 18973.0 0.0 24.5 146.5 1 88sd2 43.8 0.0 10949.6 0.0 0.0 0.9 20.4 0 62sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 extended device statisticsdevice r/s w/s kr/s kw/s wait actv svc_t %w %bfd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0sd0 1.0 189.5 8.0 11047.6 0.0 2.7 14.4 0 67sd2 129.5 0.5 2836.3 14.5 0.0 0.7 5.6 0 59sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0^C

This output indicates that a large amount of data is being read from diskdrive sd2 and written to disk drive sd0 . Someone appears to betransferring many megabytes of data between these two drives. Bothdisks are consistently over 50% busy. Is someone running a file transfercommand such as tar (1), cpio (1), cp(1), or dd(1M)? The iosnoop.d Dscript enables you to determine who is performing this I/O.

Page 75: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

Using DTrace 2-33Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The iosnoop.d D Script

The following D script displays data that enables you to determine whichcommands are running, what type of I/O those commands areperforming, and which disk devices are involved.

# cat -n iosnoop.d 1 #!/usr/sbin/dtrace -qs 2 BEGIN 3 { 4 printf("%16s %5s %40s %10s %2s %7s\n", "COMMAND", "PID", "FILE", 5 "DEVICE", "RW", "MS"); 6 } 7 8 io:::start 9 { 10 start[args[0]->b_edev, args[0]->b_blkno] = timestamp; 11 command[args[0]->b_edev, args[0]->b_blkno] = execname; 12 mypid[args[0]->b_edev, args[0]->b_blkno] = pid; 13 } 14 15 io:::done 16 /start[args[0]->b_edev, args[0]->b_blkno]/ 17 { 18 elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno]; 19 printf("%16s %5d %40s %10s %2s %3d.%03d\n", command[args[0]->b_edev, 20 args[0]->b_blkno], mypid[args[0]->b_edev, args[0]->b_blkno], 21 args[2]->fi_pathname, args[1]->dev_statname, 22 args[0]->b_flags&B_READ? "R": "W", elapsed/1000000, 23 (elapsed/1000)%1000); 24 start[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */ 25 command[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */ 26 mypid[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */ 27 }

You can decipher this D script as follows:

● You use the BEGIN probe to print out column headings.

● You use an associative array to store the nanosecond timestamp ofwhen a particular I/O starts from a specific device. You must alsostore the executable name and PID of the command issuing the I/Orequest; this information is not available at I/O completion timebecause you are running in the context of an interrupt handler.

● When the I/O is done you determine the elapsed time and then printout the relevant information.

● You retrieve the file undergoing the I/O from the fileinfo_tstructure; the args[2] argument is set up to point to thefileinfo_t structure when the done probe fires.

Page 76: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

2-34 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

● You retrieve the iostat -compatible device name from thedevinfo_t structure, which is pointed to by the args[1] argument.

● You use a D conditional expression to display “R” or “W” based ontesting the B_READbit in the b_flags field of the buf structure,which is pointed to by the args[0] argument.

● You use the D modulo operator (%) to determine the fractional portionof the time in milliseconds.

● Finally, you set the associative array elements to zero. Setting anassociative array element to zero de-allocates the underlyingdynamic memory that was being used. This avoids potential dynamicvariable drops.

The following output results from running the previous iosnoop.d script.It clearly shows who is performing the I/O operations. Someone iscopying the shared object files from /usr/lib on drive sd2 to a backupdirectory on drive sd0 .

# ./iosnoop.d COMMAND PID FILE DEVICE RW MS bash 725 /usr/bin/bash sd2 R 9.471 bash 725 /usr/lib sd2 R 7.128 bash 725 /usr/lib sd2 R 3.193 bash 725 /usr/lib sd2 R 11.283 bash 725 /lib/libc.so.1 sd2 R 7.696 bash 725 /lib/libnsl.so.1 sd2 R 10.293 bash 768 /lib/libnsl.so.1 sd2 R 0.582 cp 768 /lib/libc.so.1 sd2 R 10.154 cp 768 /lib/libc.so.1 sd2 R 7.262 cp 768 /lib/libc.so.1 sd2 R 9.914 cp 768 /usr/lib/[email protected] sd2 R 9.270 cp 768 /usr/lib/[email protected] sd2 R 13.654 cp 768 /mnt/lib.backup/[email protected] sd0 W 2.431 cp 768 /usr/lib/ld.so sd2 R 6.890 cp 768 /usr/lib/ld.so sd2 R 7.085 cp 768 /usr/lib/ld.so sd2 R 0.376 cp 768 /mnt/lib.backup/ld.so sd0 W 6.698 cp 768 /mnt/lib.backup/ld.so sd0 W 6.437 cp 768 /mnt/lib.backup/ld.so.1 sd0 W 4.394 cp 768 <unknown> sd2 R 2.206 cp 768 /mnt/lib.backup/ld.so.1 sd0 W 8.479 cp 768 /mnt/lib.backup/ld.so.1 sd0 W 8.440 cp 768 /usr/lib/lib300.so.1 sd2 R 5.771 cp 768 /usr/lib/lib300.so.1 sd2 R 6.003 cp 768 /usr/lib/lib300.so.1 sd2 R 0.530 cp 768 /usr/lib/lib300.so.1 sd2 R 7.912 cp 768 <unknown> sd2 R 3.014 cp 768 /mnt/lib.backup/lib300.so sd0 W 7.861

Page 77: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Examining Performance Problems Using the io Provider

Using DTrace 2-35Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

cp 768 /mnt/lib.backup/lib300.so.1 sd0 W 6.794 cp 768 /usr/lib/lib300s.so.1 sd2 R 3.326 cp 768 /usr/lib/lib300s.so.1 sd2 R 3.525 cp 768 /usr/lib/lib300s.so.1 sd2 R 0.553 cp 768 /usr/lib/lib300s.so.1 sd2 R 7.397 cp 768 /mnt/lib.backup/lib300s.so sd0 W 2.996 cp 768 /mnt/lib.backup/lib300s.so.1 sd0 W 1.970... cp 768 /usr/dt/lib/libXm.so.3 sd2 R 32.020 cp 768 /usr/dt/lib/libXm.so.3 sd2 R 6.471 cp 768 /usr/dt/lib/libXm.so.3 sd2 R 14.494 cp 768 <none> sd0 R 10.184 cp 768 /usr/dt/lib/libXm.so.3 sd2 R 22.211 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 9.777 cp 768 /usr/dt/lib/libXm.so.3 sd2 R 28.813 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 26.279 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 24.141 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 22.075 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 19.989 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 21.710 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 39.809 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 37.459 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 32.631 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 30.378 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 28.308 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 29.701 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 28.327 cp 768 <unknown> sd2 R 24.986 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 28.021 cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 26.601 cp 768 /mnt/lib.backup/libXm.so.3 sd0 W 5.353 cp 768 /mnt/lib.backup/libXm.so.3 sd0 W 4.603 cp 768 /mnt/lib.backup/libXm.so.3 sd0 W 13.232 cp 768 /mnt/lib.backup/libXm.so.3 sd0 W 11.242 cp 768 /mnt/lib.backup/libXm.so.3 sd0 W 12.412... cp 768 /usr/lib/libgtk-x11-2.0.so.0.100.0 sd2 R 2.374 cp 768 /mnt/lib.backup/libgthread-2.0.so.0 sd0 W 7.732 cp 768 /mnt/lib.backup/libgthread-2.0.so.0.7.0 sd0 W 7.605 cp 768 <none> sd2 R 10.678 cp 768 /usr/lib/libgtk-x11-2.0.so.0.100.0 sd2 R 5.677 cp 768 /usr/lib/libgtk-x11-2.0.so.0.100.0 sd2 R 39.864 cp 768 /usr/lib/libgtk-x11-2.0.so.0.100.0 sd2 R 61.555 cp 768 /usr/lib/libgtk-x11-2.0.so.0.100.0 sd2 R 17.175 cp 768 /mnt/lib.backup/libgtk-x11-2.0.so sd0 W 44.225 cp 768 /mnt/lib.backup/libgtk-x11-2.0.so sd0 W 42.075

^C

Page 78: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-36 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Obtaining System Call Information

System calls serve as the main interface between user-level applicationsand the kernel. You can learn much about the system by knowing thesystem calls that are being issued by the set of running applications.

Note – System calls are documented in Section 2 of the Solaris 10 OSmanual pages.

Traditionally, system calls of an application were determined using thetruss (1) command. The DTrace syscall provider, however, enables youto quickly gather more detailed data with which to analyze aberrantbehavior related to system calls. For example, not only can DTrace showyou the system calls being issued by a given application, but it can alsoindicate which applications are issuing a given system call. In addition,you can time (in nanoseconds) how long a particular system call takes,such as a read (2). These operations cannot be performed with thetruss (1) command.

The syscall Provider

The syscall provider makes available a probe at the entry and return ofevery system call in the system. An example of a fully-specified probedescription for the entry probe of the read (2) system call is:

syscall::read:entry

The probe for return from the read (2) system call is:

syscall::read:return

Note that the module name is undefined for the syscall provider probes.

Page 79: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-37Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

System Call Names

The system call names are usually, but not always, the same as thosedocumented in Section 2 of the Solaris 10 OS manual pages. The actualnames are listed in the /etc/name_to_sysnum system file. Examples ofsystem call names that do not match the manual pages are:

● rexit for exit (2)

● gtime for time (2)

● semsy for semctl (2), semget (2), semids (2), and semtimedop (2)

● signotify , which has no manual page, and is used for POSIX.4message queues

● Large file system calls such as:

● creat64 for creat (2)

● lstat64 for lstat (2)

● open64 for open (2)

● mmap64for mmap(2)

Arguments for entry and return Probes

For the entry probes, the arguments (arg0 , arg1 , ... arg n) are thearguments to the system call. For return probes, both arg0 and arg1contain the same value: return value from the system call. You can checksystem call failure in the return probe by referencing the errno Dvariable. The following example shows which system calls are failing forwhich applications and with what errno value.

# cat errno.d#!/usr/sbin/dtrace -qssyscall:::return/arg0 == -1 && execname != "dtrace"/{ printf("%-20s %-10s %d\n", execname, probefunc, errno);}

# ./errno.dsac read 4ttymon pause 4ttymon read 11nscd lwp_kill 3in.routed ioctl 12in.routed ioctl 12

Page 80: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-38 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

tty open 2tty stat 2bash setpgrp 13bash waitsys 10bash stat64 2snmpd ioctl 12^C

The errno.d D program has a predicate that uses the “AND” operator:&&. The predicate states that the return from the system call must be -1,which is how all system calls indicate failure, and that the processexecutable name cannot be dtrace . The printf built-in D function usesthe %-20s and %-10s format specifications to left-justify the strings in thegiven minimum column width.

D Script Example Using the syscall Provider

The following simple D script counts the number of system calls beingissued system wide.

# cat syscall.d#!/usr/sbin/dtrace -qssyscall:::entry{ @[probefunc] = count();}# ./syscall.d^C mmap64 1 mkdir 1 umask 1 getloadavg 1 getdents64 2... stat 1754 ioctl 1956 close 2708 write 2733 mmap 3006 read 3880 sigaction 7886 brk 12695

Page 81: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-39Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The output indicates that the majority of the system calls are setting upsignal handling (sigaction (2)) or growing the heap (brk (2)). Thefollowing D script enables you to discover who is making the brk (2)system calls.

# cat brk.d#!/usr/sbin/dtrace -qssyscall::brk:entry{ @[execname] = count();}# ./brk.d^C dtrace 6 prstat 22 nroff 48 cat 48 tbl 142 eqn 144 rm 166 ln 166 col 222 expr 332 head 492 fgrep 492 dirname 581 grep 722 instant 738 sh 917 nawk 984 sgml2roff 1259 nsgmls 13296# ps -ef | grep nsgmls root 591 590 2 07:56:32 pts/2 0:00 /usr/lib/sgml/nsgmls -gl -m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/s# man nsgmlsNo manual entry for nsgmls.# man -k sgmlsgml sgml (5) - Standard Generalized Markup Languagesolbook sgml (5) - Standard Generalized Markup Language

Apparently some process is working with the Standard GeneralizedMarkup Language (SGML). Use the ptree command to see who iscreating this process:

# ptree 591#

Page 82: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-40 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The ptree command returns no results because the nsgmls process is tooshort-lived for the command to be run on it. You have learned, however,that the problem is not a long-lived process causing a memory leak. Nowwrite a quick D script to print out the ancestry. You must keep trying thenext previous parent iteratively, because many of the other processesinvolved are also short-lived.

Note – This particular D script fails if an ancestor does not exist. This isbecause the top ancestor, the sched process has no parent. You cannotharm the kernel even if a D script uses a bad pointer. The intent of thisexample is to show how you can quickly create custom D scripts toanswer questions about system behavior. Many of your D scripts will bethrow-away scripts that you will not re-use. You can fix the script bytesting each parent pointer with a predicate before printing. You will seethis fix later with the ancestors3.d D script.

# cat ancestors.d# cat -n ancestors.d 1 #!/usr/sbin/dtrace -qs 2 syscall::brk:entry 3 /execname == "nsgmls"/ 4 { 5 printf("process: %s\n", 6 curthread->t_procp->p_user.u_psargs); 7 printf("parent: %s\n", 8 curthread->t_procp->p_parent->p_user.u_psargs); 9 printf("grandparent: %s\n", 10 curthread->t_procp->p_parent->p_parent->p_user.u_psargs); 11 printf("greatgrandparent: %s\n", 12 curthread->t_procp->p_parent->p_parent->p_parent->p_user.u_psargs); 13 printf("greatgreatgrandparent: %s\n", 14 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs); 15 printf("greatgreatgreatgrandparent: %s\n", 16 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs); 17 }

# ./ancestors.dprocess: /usr/lib/sgml/nsgmls -gl -m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/sparent: /usr/lib/sgml/instant -d -c/usr/share/lib/sgml/locale/C/transpec/roff.cmap -s/ugrandparent: /bin/sh /usr/lib/sgml/sgml2roff /usr/share/man/sman4/rt_dptbl.4greatgrandparent: sh -c cd /usr/share/man; /usr/lib/sgml/sgml2roff/usr/share/man/sman4/rt_dptbl.greatgreatgrandparent: catmangreatgreatgreatgrandparent: bash

# ps -ef | grep catman root 2333 2332 1 08:26:05 pts/1 0:03 catman root 16984 2880 0 08:41:10 pts/2 0:00 grep catman# ptree 2333299 /usr/sbin/inetd -s 2324 in.rlogind

Page 83: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-41Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

2326 -sh 2332 bash 2333 catman 17232 sh -c cd /usr/share/man; rm -f /usr/share/man/cat4/variables.4;ln -s ../cat4/e

17235 sh -c cd /usr/share/man; rm -f /usr/share/man/cat4/variables.4;ln -s ../cat4/e

The previous output indicates that all of the brk (2) system calls resultedfrom the catman (1M) command, creating many short-lived children thatissued this system call.

The curthread built-in D variable gives access to the address of therunning kernel thread. Like the C language, the D language accessesmembers of a structure with the -> symbol when you have a pointer tothat structure. Through this pointer to the kernel kthread_t structure,you can access the process name and arguments (kept in the proc_tstructure’s p_user structure) as well as any parent, grandparent, great-grandparent, and so on. To do this you follow the parent pointers back.Refer to the <sys/thread.h> , <sys/proc.h> and <sys/user.h> headerfiles for details of these fields.

Page 84: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-42 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Figure 2-1 shows a diagram of the kernel data structures being accessedby this example.

Figure 2-1 Thread and Process Data Structures

New Approach to Analyzing Transient Failures

As the previous example demonstrates, each result obtained from usingthe DTrace facility can lead to further questions, which are answered withavailable commands or with new D programs that you can write quickly.In this way, the DTrace facility significantly shortens the diagnostic loop:

hypothesis->instrumentation->data gathering->analysis->hypothesis

This tightened loop introduces a new paradigm for diagnosing transientfailures. It enables the emphasis to shift from instrumentation tohypothesis, which is less labor intensive.

curthreadkthread_t

t_state

t_pri

t_lwp

t_procp

/usr/include/sys/thread.h

/usr/include/sys/user.h

/usr/include/sys/proc.h

user_t

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

proc_t proc_t proc_t

p_exec

p_as

p_cred

p_parent

p_tlist

p_exec

p_as

p_cred

p_parent p_parent

p_tlist

p_user p_user

u_psargs[ ]

p_user

u_psargs[ ]

u_start

u_ticks

u_psargs[ ]

u_cdir

Page 85: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-43Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

D Language Variables

The D language has five basic variable types:

● Scalar variables – Have fixed-size values such as integers, structuresand pointers

● Associative arrays – Store values indexed by one or more keys,similar to aggregations

● Thread-local variables – Have one name, but storage is local to eachseparate kernel thread. These variables are prefixed with the self->keyword.

● Clause-local variables – Appear when an action block is entered;storage is reclaimed after leaving the probe clause. These variablesare prefixed with the this-> keyword.

● Kernel external variables – DTrace has access to all kernel global andstatic variables. These variables are prefixed with a backquote (‘).

Associative arrays (start , command, and mypid ) were used in theiosnoop.d script. Clause-local variables are similar to automatic or localvariables in the C Language. The elapsed variable in the iosnoop.dscript was a global scalar variable, but could have been made into aclause-local variable which is slightly more efficient. Clause-localvariables come into existence when an action block (tied to a specificprobe) is entered and their storage is reclaimed when the action block isleft. They help save storage and are quicker to access than associativearrays.

Note – For more information on D variables, refer to the Solaris DynamicTracing Guide, part number 817-6223-10.

You can access kernel global and static variables within your D programs.To access these external variables, you prefix the global kernel variablewith the ‘ (back quote or grave accent) character. For example, toreference the freemem kernel global variable use: ‘freemem . If thevariable is part of a kernel module that conflicts with other modulevariable names, use the ‘ character between the module name and thevariable name. For example, sd‘sd_state references the sd_statevariable within the sd kernel module.

Page 86: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-44 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Associative Arrays

Associative arrays enable the storing of scalar values in elements of anarray (or table) that are identified by one or more sequences of comma-separated key fields (an n-tuple). The keys can be any combination ofstrings or integers. The following code example shows the use of anassociative array to track how often any command issues more than agiven number of any single system call:

# cat -n assoc2.d 1 #!/usr/sbin/dtrace -qs 2 syscall:::entry 3 { 4 ++namesys[pid,probefunc]; 5 x = namesys[pid,probefunc] > 5000 ? 1 : 0; 6 } 7 syscall:::entry 8 /x && execname != "dtrace"/ 9 { 10 printf("Process: %d %s has just made more than 5000 %scalls\n", 11 pid, execname, probefunc); 12 namesys[pid,probefunc] = 0; /* reset the count */ 13 }

# ./assoc2.dProcess: 14837 find has just made more than 5000 lstat64 callsProcess: 14837 find has just made more than 5000 lstat64 callsProcess: 14854 ls has just made more than 5000 lstat64 callsProcess: 14854 ls has just made more than 5000 acl callsProcess: 14854 ls has just made more than 5000 lstat64 calls^C

Page 87: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-45Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The assoc2.d D program uses an associative array indexed by theunique combination of process ID (PID) and system call name. The ++operator is incrementing the array element by one each time a processwith that PID is making that system call. The array element, like allvariables (except clause-local variables), is initialized to 0. The secondstatement in the action block uses a conditional expression that has threeparts:

expression ? value1 : value2

A conditional expression has the value of value1 when the Dexpression is nonzero (true), and has the value of value2 when theexpression is zero (false). Therefore, in the assoc2.d D program, theglobal scalar variable x is 1 when that element of the associative array isgreater than 5000, and 0 when it is not greater than 5000 . The next actionblock is only executed if x is not 0 and the executable name is not“dtrace ”. After printing the command that made more than 5000 of agiven system call, you reset the array element to 0 to begin countingagain. Note that a comment is used in this D program. Like comments inthe C language, a comment in the D language is text that is enclosedbetween /* and */ .

Thread-Local Variables

Thread-local variables are useful when you wish to enable a probe andmark with a tag every thread that fires the probe. Thread-local variablesshare a common name but refer to separate data storage associated witheach thread. Thread-local variables are referenced with the specialkeyword self followed by the two characters -> , as shown in thefollowing example:

syscall::read:entry{ self->read = 1;}syscall::read:return/self->read/{ printf("Same thread is returning from read\n");}

Page 88: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-46 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Timing a System Call

Thread-local variables enable you to determine the amount of time athread spends in any particular system call. The following example timeshow long the grep (1) command takes in each read (2) system call. It alsodisplays the number of bytes read (arg0 is the return value of read ).

# cat -n timegrep.d 1 #!/usr/sbin/dtrace -qs 2 BEGIN 3 { 4 printf("size\ttime\n"); 5 } 6 syscall::read:entry 7 /execname == "grep"/ 8 { 9 self->start = timestamp; 10 } 11 syscall::read:return 12 /self->start/ 13 { 14 printf("%d\t%d\n", arg0, timestamp - self->start); 15 self->start = 0; 16 }

# ./timegrep.dsize time8192 7108972319 15266160 121123293 56633290 18816^C

The first read took 7,108,972 nanoseconds or 7.1 milliseconds, which isreasonable for an 8-Kbyte disk read. As you might expect, the first read of0 bytes took only 12 microseconds.

The next example uses an associative array to time every system callperformed by the grep command.

# cat -n timesys.d 1 #!/usr/sbin/dtrace -qs 2 BEGIN 3 { 4 printf("System Call Times for grep:\n\n");

Page 89: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-47Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

5 printf("%20s\t%10s\n", "Syscall", "Microseconds"); 6 } 7 syscall:::entry 8 /execname == "grep"/ 9 { 10 self->name[probefunc] = timestamp; 11 } 12 syscall:::return 13 /self->name[probefunc]/ 14 { 15 printf("%20s\t%10d\n", probefunc, 16 (timestamp - self->name[probefunc])/1000); 17 self->name[probefunc] = 0; /* free memory */ 18 }# ./timesys.dSystem Call Times for grep:

Syscall Microseconds mmap 50 resolvepath 47 resolvepath 67 stat 37 open 46 stat 34 open 32... brk 25 open64 43 read 8126 brk 20 brk 28 read 24 close 26^C

Predictably, the system call that took the most time was read , because ofthe disk I/O wait time (the second read was of 0 bytes).

Page 90: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-48 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Following a System Call

You can follow a system call from entry into the kernel through allsubsequent internal kernel function calls and returns back to the originalpoint of entry of the system call function. You do this by using thesyscall and fbt providers together with a thread-local variable. Thefollowing example traces all of the functions involved in the read (2)system call as issued by the grep (1) command:

# cat -n follow.d 1 #!/usr/sbin/dtrace -s 2 syscall::read:entry 3 /execname == "grep"/ 4 { 5 self->start = 1; 6 } 7 8 syscall::read:return 9 /self->start/ 10 { 11 exit(0); 12 } 13 14 fbt::: 15 /self->start/ 16 { 17 }

The fbt provider probe clause has an empty action. The default action forDTrace tracks every time you enter and return from all kernel functionsinvolved in a read (2) system call until it terminates. Option -F of thedtrace (1M) command indents the output of each nested function call andshows this with the -> symbol; it un-indents the output when thatfunction returns back up the call tree and shows this with the <- symbol.

# dtrace -F -s follow.ddtrace: script './follow.d' matched 38108 probesCPU FUNCTION 0 -> read32 0 <- read32 0 -> read 0 -> getf 0 -> set_active_fd 0 <- set_active_fd 0 <- getf...

Page 91: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-49Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 <- ufs_rwlock 0 -> fop_read 0 <- fop_read 0 -> ufs_read 0 -> ufs_lockfs_begin... 0 -> rdip 0 -> rw_write_held 0 <- rw_write_held 0 -> segmap_getmapflt 0 -> get_free_smp 0 -> grab_smp 0 -> segmap_hashout... 0 <- sfmmu_kpme_lookup 0 -> sfmmu_kpme_sub... 0 <- page_unlock 0 <- grab_smp 0 -> segmap_pagefree 0 -> page_lookup_nowait 0 -> page_trylock... 0 <- segmap_hashin 0 -> segkpm_create_va 0 <- segkpm_create_va 0 -> fop_getpage 0 -> ufs_getpage 0 -> ufs_lockfs_begin_getpage 0 -> tsd_get... 0 <- page_exists 0 -> page_lookup 0 <- page_lookup 0 -> page_lookup_create 0 <- page_lookup_create 0 -> ufs_getpage_miss 0 -> bmap_read 0 -> findextent 0 <- findextent 0 <- bmap_read 0 -> pvn_read_kluster 0 -> page_create_va 0 -> lgrp_mem_hand... 0 <- page_add

Page 92: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-50 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 <- page_create_va 0 <- pvn_read_kluster 0 -> pagezero 0 -> ppmapin 0 -> sfmmu_get_ppvcolor 0 <- sfmmu_get_ppvcolor 0 -> hat_memload 0 -> sfmmu_memtte 0 <- sfmmu_memtte... 0 -> xt_some 0 <- xt_some 0 <- xt_sync... 0 <- sema_init 0 <- pageio_setup 0 -> lufs_read_strategy 0 -> logmap_list_get 0 <- logmap_list_get 0 -> bdev_strategy 0 -> bdev_strategy_tnf_probe 0 <- bdev_strategy_tnf_probe 0 <- bdev_strategy 0 -> sdstrategy 0 -> getminor... 0 <- drv_usectohz 0 -> timeout 0 <- timeout 0 -> timeout_common... 0 <- getminor 0 -> scsi_transport 0 <- scsi_transport 0 -> glm_scsi_start 0 -> ddi_get_devstate... 0 <- ddi_get_soft_state 0 -> pci_pbm_dma_sync 0 <- pci_pbm_dma_sync 0 <- pci_dma_sync 0 <- glm_start_cmd 0 <- glm_accept_pkt 0 <- glm_scsi_start 0 <- sd_start_cmds 0 <- sd_core_iostart

Page 93: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

Using DTrace 2-51Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 <- xbuf_iostart 0 <- lufs_read_strategy 0 -> biowait 0 -> sema_p 0 -> disp_lock_enter 0 <- disp_lock_enter 0 -> thread_lock_high 0 <- thread_lock_high 0 -> ts_sleep 0 <- ts_sleep 0 -> disp_lock_exit_high 0 <- disp_lock_exit_high 0 -> disp_lock_exit_nopreempt 0 <- disp_lock_exit_nopreempt 0 -> swtch 0 -> disp 0 -> disp_lock_enter 0 <- disp_lock_enter 0 -> disp_lock_exit 0 <- disp_lock_exit 0 -> disp_getwork 0 <- disp_getwork 0 <- disp 0 <- swtch 0 -> resume 0 <- resume 0 -> disp_lock_enter,,, 0 <- hat_page_getattr 0 <- segmap_getmapflt 0 -> uiomove 0 -> xcopyout 0 <- xcopyout 0 <- uiomove 0 -> segmap_release 0 -> get_smap_kpm... 0 <- ufs_imark 0 <- ufs_itimes_nolock 0 <- rdip... 0 <- cv_broadcast 0 <- releasef 0 <- read 0 -> read

Page 94: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Obtaining System Call Information

2-52 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Although more than half of the functions were removed from theprevious output, the example shows that a great many functions arerequired to perform a disk file read. Some of the key functions aredescribed below:

● read – read (2) system call entered

● ufs_read – UFS file being read

● segmap_getmapflt – Find segmap page for the I/O

● segmap_pagefree – Free underlying previous physical page tied tothis segmap virtual page onto the cachelist (this policy replaced theold priority paging)

● ufs_getpage – Ask UFS to retrieve the page

● page_lookup – First check to see if the page is in memory (it is not)

● page_create_va – Get new physical page for the I/O

● hat_memload – Map the virtual page to the physical page

● xt_some – Issue cross-trap call to some CPUs

● sdstrategy – Issue Small Computer System Interface (SCSI)command to read page from disk into segmap page

● timeout – Prepare for SCSI timeout of disk read request

● glm_scsi_start – In glm host bus adapter driver

● biowait – Wait for block I/O

● sema_p – Use semaphore to wait for I/O

● ts_sleep – Put timesharing (TS) thread on sleep queue

● swtch – Do a context switch (have thread give up the CPU while itwaits for the I/O)

● disp_getwork – Find another thread to run while this thread waitsfor its I/O

● resume – I/O has completed and CPU is returned to resumerunning

● uimove – Move data from kernel buffer (page) to user-land buffer

● segmap_release – Release segmap page for use by another I/Olater

● read – Read operation ends

Page 95: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

Using DTrace 2-53Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Creating D Scripts That Use Arguments

As with shell and other interpretive programming language commandssuch as the perl (1) command, you can use the dtrace (1M) command tocreate executable interpreter files. The file must start with the followingline and must have execute permission:

#!/usr/sbin/dtrace -s

You can specify other options to the dtrace (1M) command on this line; besure, however, to use only one dash (- ) followed by the options, with sbeing last:

#!/usr/sbin/dtrace -qvs

You can also specify all options to the dtrace (1M) command by using#pragma lines inside the D script:

# cat -n mem2.d 1 #!/usr/sbin/dtrace -s 2 3 #pragma D option quiet 4 #pragma D option verbose 5 6 vminfo::: 7 { 8 @[execname,probename] = count(); 9 } 10 11 END 12 { 13 printa("%-20s %-15s %@d\n", @); 14 }

Note – For the list of option names used in #pragma lines, see the SolarisDynamic Tracing Guide, part number 817-6223-10.

Page 96: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

2-54 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Built-in Macro Variables

The D compiler defines a set of built-in macro variables that you can referto inside a D script. These macro variables include:

● $pid – Process ID of dtrace interpreter running script

● $ppid – Parent process ID of dtrace interpreter running script

● $uid – Real user ID of user running script

● $gid – Real group ID of user running script

● $0 – Name of script

● $1, $2, $3, and so on – First, second, third command-line argumentspassed to script

● $$1 , $$2 , $$3 , and so on - First, second, third command-linearguments converted to double quoted (" " ) strings

The complete list of D macro variables is given in Appendix B. Thefollowing D script uses some of these D macro variables:

# cat -n params.d 1 #!/usr/sbin/dtrace -s 2 #pragma D option quiet 3 4 tick-2sec 5 /$1 == $11 && $$3 == "fubar"/ 6 { 7 printf("name of script: %s\n", $0); 8 printf("pid of script: %d\n", $pid); 9 printf("9th arg passed to script: %s\n", $$9); 10 exit(0); 11 }

# ./params.d 1 2 fubar 4 5 6 7 8 9 10 1name of script: ./params.dpid of script: 53639th arg passed to script: 9

# ./params.d 1 2 3 4 5 6 7 8 9 10 11^C

Page 97: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

Using DTrace 2-55Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The last invocation of the script did not output anything because thevalue of the first argument did not match the value of the eleventhargument. The following invocations show that the type and number ofarguments must match those referenced inside the D script. This is anexample of the error-checking capability of the DTrace facility:

# ./params.d 1 2 3 4 5 6 7 8 9dtrace: failed to compile script ./params.d: line 5: macro argument $11is not defined# ./params.d 1 2 3 4 5 6 7 8 9 10 11 12 13dtrace: failed to compile script ./params.d: line 12: extraneous argument'13' ($13 is not referenced)# ./params.d a b c d e f g h i j kdtrace: failed to compile script ./params.d: line 5: failed to resolve a:Unknown variable name

The defaultargs option to the dtrace (1M) command allows you todefault the values of $1, $2, and so on to zero if the user does not typeany arguments when invoking the dtrace (1M) command. The $$1 , $$2 ,and so on references become NULLstrings when the user does not typeany arguments. Options can be specified on the dtrace (1M) commandline as an argument to the -x option. The following examples show thesefeatures:

# cat -n args.d 1 #!/usr/sbin/dtrace -qs 2 BEGIN 3 { 4 x = 5; 5 } 6 7 tick-2sec 8 { 9 x = x + $1; 10 name = $$2 11 } 12 13 tick-11sec 14 { 15 printf("x: %d\n", x); 16 printf("name: %s\n", name); 17 exit(0); 18 }# ./args.d 2 foox: 15name: foo

Page 98: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

2-56 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# ./args.ddtrace: failed to compile script args.d: line 10: macro argument $1 isnot defined# dtrace -x defaultargs -qs args.dx: 5name:# dtrace -x defaultargs -qs args.d 2 3 4dtrace: failed to compile script args.d: line 20: extraneous argument '4'($3 is not referenced)

PID Argument Example

The following example passes the PID of a running vi process to thesyscalls2.d D script. You use the pgrep command to determine the PIDof the vi process. The D script terminates when the vi command exits.

# cat -n syscalls2.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall:::entry 4 /pid == $1/ 5 { 6 @[probefunc] = count(); 7 } 8 syscall::rexit:entry 9 { 10 exit(0); 11 }

# pgrep vi2208# ./syscalls2.d 2208

rexit 1 setpgrp 1 creat 1 getpid 1 open 1 lstat64 1 stat64 1 fdsync 1 unlink 2 close 2 alarm 2

Page 99: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

Using DTrace 2-57Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

lseek 3 sigaction 5 ioctl 45 read 143 write 178

Executable Name Argument Example

In the following example the ancestors.d D script is modified to make itmore general. Remember that this script was created because theprocesses involved were too short-lived for a ptree command to beexecuted on them. The modified script can retrieve the ancestry back tothe great-great-great-grandparent of any process you catch making anyspecified system call. The $$1 references the first command line argumentas a quoted string.

# cat -n ancestors2.d 1 #!/usr/sbin/dtrace -qs 2 syscall::$2:entry 3 /execname == $$1/ 4 { 5 printf("process: %s\n", curthread->t_procp->p_user.u_psargs); 6 printf("parent: %s\n", curthread->t_procp->p_parent->p_user.u_psargs); 7 printf("grandparent: %s\n", 8 curthread->t_procp->p_parent->p_parent->p_user.u_psargs); 9 printf("greatgrandparent: %s\n", 10 curthread->t_procp->p_parent->p_parent->p_parent->p_user.u_psargs); 11 printf("greatgreatgrandparent: %s\n", 12 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs); 13 printf("greatgreatgreatgrandparent: %s\n", 14 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs); 15 exit(0); 16 }

# ./ancestors2.d nsgmls brkprocess: /usr/lib/sgml/nsgmls -gl -m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/sparent: /bin/sh /usr/lib/sgml/sgml2roff /usr/share/man/sman2/fork.2grandparent: /bin/sh /usr/lib/sgml/sgml2roff /usr/share/man/sman2/fork.2greatgrandparent: sh -c cd /usr/share/man; /usr/lib/sgml/sgml2roff/usr/share/man/sman2/fork.2greatgreatgrandparent: catmangreatgreatgreatgrandparent: bash

You can run the same script with a different process name and systemcall, which shows the power of being able to pass in arguments to a Dscript:

# ./ancestors2.d vi sigactionprocess: vi /etc/systemparent: bash

Page 100: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

2-58 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

grandparent: -shgreatgrandparent: /usr/sbin/in.telnetdgreatgreatgrandparent: /usr/lib/inet/inetd startgreatgreatgreatgrandparent: /sbin/init

The ancestors3.d D script fixes the problem with trying to printnonexistent ancestry:

# . /ancestors2.d cron readdtrace: error on enabled probe ID 1 (ID 10: syscall::read:entry): invalidaddress (0x0) in action #4dtrace: error on enabled probe ID 1 (ID 10: syscall::read:entry): invalidaddress (0x0) in action #4dtrace: error on enabled probe ID 1 (ID 10: syscall::read:entry): invalidaddress (0x0) in action #4

# cat -n ancestors3.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall::$2:entry 4 /execname == $$1/ 5 { 6 printf("process: %s\n", curthread->t_procp->p_user.u_psargs); 7 nextpaddr = curthread->t_procp->p_parent; 8 } 9 10 syscall::$2:entry 11 /(execname == $$1) && nextpaddr/ 12 { 13 printf("parent: %s\n", nextpaddr->p_user.u_psargs); 14 nextpaddr = curthread->t_procp->p_parent->p_parent; 15 } 16 17 syscall::$2:entry 18 /(execname == $$1) && nextpaddr/ 19 { 20 printf("grandparent: %s\n", nextpaddr->p_user.u_psargs); 21 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent; 22 } 23 24 syscall::$2:entry 25 /(execname == $$1) && nextpaddr/ 26 { 27 printf("greatgrandparent: %s\n", nextpaddr->p_user.u_psargs); 28 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent->p_parent; 29 } 30 31 syscall::$2:entry 32 /(execname == $$1) && nextpaddr/ 33 { 34 printf("greatgreatgrandparent: %s\n", nextpaddr->p_user.u_psargs); 35 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent; 36 } 37

Page 101: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

Using DTrace 2-59Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

38 syscall::$2:entry 39 /(execname == $$1) && nextpaddr/ 40 { 41 printf("greatgreatgreatgrandparent: %s\n", nextpaddr->p_user.u_psargs); 42 exit(0); 43 }

# ./ ancestors3.d cron readprocess: /usr/sbin/cronparent: /sbin/initgrandparent: schedprocess: /usr/sbin/cronparent: /sbin/initgrandparent: schedprocess: /usr/sbin/cronparent: /sbin/initgrandparent: schedprocess: /usr/sbin/cronparent: /sbin/initgrandparent: sched^C

Page 102: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

2-60 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Custom Monitoring Tools

The intended use of the vminfo , sysinfo , and io providers is to furtherinvestigate potential problems shown by the output of the existing Solarismonitoring tools such as vmstat (1M), sar (1), mpstat (1M), andiostat (1M). The following two examples show that you can also usethese providers to create custom versions of the existing monitoring tools.It also shows the arithmetic capabilities of the D Language.

Example of a Custom Tool Resembling the sar -c Command

The following D script uses the sysinfo provider to implement a toolsimilar to the sar -c command.

# cat -n sar-c.d 1 #!/usr/sbin/dtrace -qs 2 /* 3 * Usage: ./sar-c.d interval count 4 */ 5 6 BEGIN 7 { 8 printf("%10s %10s %10s %10s %10s %10s %10s\n", "scall/s", 9 "sread/s", "swrit/s", "fork/s", "exec/s", "rchar/s", "wchar/s"); 10 rchar = 0; 11 wchar = 0; 12 } 13 14 syscall:::entry 15 { 16 ++scall; 17 } 18 19 sysinfo:::sysread 20 { 21 ++sread; 22 } 23 24 sysinfo:::syswrite 25 { 26 ++swrit; 27 } 28 29 sysinfo:::sysfork 30 { 31 ++fork; 32 } 33 34 sysinfo:::sysexec 35 {

Page 103: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

Using DTrace 2-61Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

36 ++exec; 37 } 38 39 sysinfo:::readch 40 { 41 rchar = rchar + arg0; 42 } 43 44 sysinfo:::writech 45 { 46 wchar = wchar + arg0; 47 } 48 49 tick-1sec 50 { 51 ++i; 52 } 53 54 tick-1sec 55 /i == $1/ 56 { 57 ++n; 58 printf("%10d %10d %10d %10d %10d %10d %10d\n", scall/i, 59 sread/i, swrit/i, fork/i, exec/i, rchar/i, wchar/i); 60 i = 0; 61 scall = 0; 62 sread = 0; 63 swrit = 0; 64 fork = 0; 65 exec = 0; 66 rchar = 0; 67 wchar = 0; 68 } 69 70 tick-1sec 71 /n == $2/ 72 { 73 exit(0); 74 }

# ./sar-c.d 5 6 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s 43 0 0 0 0 0 15 70 1 2 0 0 1 32 42 2 2 0 0 2 17 75 0 1 0 0 351 39 436 26 34 3 3 3329 317 38 0 0 0 0 0 15

Page 104: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

2-62 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Example of a Custom Tool Resembling the vmstat (1M)Command

The following D script uses the vminfo provider to implement a toolsimilar to the vmstat (1M) command. It displays three fields from thevmstat (1M) command:

● free field – Displays the system’s average value of freemem inkilobytes

● re field – Displays the average page reclaims per second

● sr field – Displays the average page scans per second performed bythe page daemon

Page 105: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

Using DTrace 2-63Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# cat -n vm.d 1 #!/usr/sbin/dtrace -qs 2 /* 3 * Usage: vmd.d interval count 4 */ 5 6 BEGIN 7 { 8 printf("%8s %8s %8s\n", "free", "re", "sr"); 9 } 10 11 tick-1sec 12 { 13 ++i; 14 @free["freemem"] = sum(8* freemem); 15 } 16 17 vminfo:::pgrec 18 { 19 ++re; 20 } 21 22 vminfo:::scan 23 { 24 ++sr; 25 } 26 27 tick-1sec 28 /i == $1/ 29 { 30 normalize(@free, $1); 31 printa("%8@d ", @free); 32 printf("%8d %8d\n", re/i, sr/i); 33 ++n; 34 i = 0; 35 re = 0; 36 sr = 0; 37 clear(@free); 38 } 39 40 tick-1sec 41 /n == $2/ 42 { 43 exit(0); 44 }

Page 106: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Creating D Scripts That Use Arguments

2-64 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# ./vm.d 5 12 free re sr 385296 0 0 385296 0 0 385296 0 0 385296 0 0 316180 2 0 22297 1 19040 1976 2 31727 1964 3 31727 1971 2 31727 1968 3 31727 1964 3 31727 1955 4 31728

Like the vmstat (1M) command, the vm.d script expects two arguments:the interval value and a count value. The i , re , sr , and n variables are Dglobal scalar variables used for counting. Note the special reference to thekernel’s freemem variable: ‘freemem . The script multiplies ‘freemem by 8because it sums in units of kilobytes, not pages, and the assumption isthat a page is 8 Kbytes in size. The script uses the sum() aggregation withthe normalize () built-in function which divides the current sum by theinterval value to get per second averages. The script also clears therunning sum of ‘freemem every interval with the clear () built-infunction. The printa () built-in function, which is covered in detail inAppendix A, prints the value of the sum() aggregation.

Because you are using integer-truncated arithmetic, you can lose somedata. This is also true when using the vmstat (1M) command. Forexample, if there are only four page reclaims in the five-second interval,then the average per second shows as 0. This output shows that thesystem is experiencing sustained scanning of memory by the pagedaemon, as indicated by the consistently high number of scans persecond. It also shows that someone has used most of the free memorywithin a short period of time, which explains the high scan rates.

Page 107: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

3-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Module 3

Debugging Applications With DTrace

Objectives

Upon completion of this module, you should be able to:

● Use DTrace to profile an application

● Use DTrace to access application variables

● Use Dtrace to find transient system call errors in an application

● Use DTrace to determine the names of files being opened

Page 108: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Relevance

3-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Relevance

?!

Discussion – The following questions are relevant to understanding howto use DTrace for application debugging:

● Would it be useful to follow the software stack sequentially from theapplication into the kernel?

● Would it be useful to display path names being passed to systemcalls while an application is running?

● Would it be useful to know where an application is spending themajority of its time?

Page 109: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Additional Resources

Debugging Applications With DTrace 3-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Additional Resources

Additional resources – The following references provide additionalinformation on the topics described in this module:

● Sun Microsystems, Inc. Solaris Dynamic Tracing Guide, part number817-6223-10.

● Cantrill Bryan M., Michael W. Shapiro, and Adam H. Leventhal.“Dynamic Instrumentation of Production Systems.” Paper presentedat 2004 USENIX Conference.

● BigAdmin System Administration Portal[http://www.sun.com/bigadmin/content/dtrace ].

● dtrace (1M) manual page in the Solaris 10 OS manual pages, Solaris10 Reference Manual Collection.

● The /usr/demo/dtrace directory contains all of the sample scriptsfrom the Solaris Dynamic Tracing Guide.

Page 110: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Application Profiling

DTrace provides tools for understanding the behavior of user processes. Itcan help you to:

● Debug applications

● Analyze application performance problems

● Understand the behavior of a complex application

These tools can be used alone to determine the cause of problems withapplication program behavior, or as an adjunct to traditional debuggingtools such as the mdb(1) debugger.

This module describes the DTrace facilities used to trace user processactivity. It also provides examples of how to use those facilities.

The pid Provider

The pid provider can trace the entry and return of any function in a userapplication. It can also trace any instruction of the running application asspecified by its virtual address, which can be given numerically or as afunction name plus offset. The pid provider has no probe effect overheadwhen probes are not enabled.

The pid provider defines a class of providers; any process can have itsown associated pid provider. You trace a process with processidentification number (PID) 1234, for example, by using the pid1234provider.

Unlike most other providers, the pid provider creates probes on demandbased on the probe descriptions found in your D programs. As a result,you do not see any pid probes listed in the output of the dtrace -lcommand until you have enabled them. This is shown in the followingexample:

# dtrace -l | awk '{print $2}' | sort -uPROVIDERdtracefasttrapfbtfpuinfoiolockstatmib

Page 111: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-5Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

procprofileschedsdtsyscallsysinfovminfo#

Enabling pid Probes

In the following example, you enable all of the function entry probes forthe shell:

# echo $$8586# dtrace -n 'pid8586:::entry'dtrace: description 'pid8586:::entry' matched 6653 probes^C

# dtrace -l | awk '{print $2}' | sort -uPROVIDERdtracefasttrapfbtfpuinfoiolockstatmibpid8586procprofileschedsdtsyscallsysinfovminfo

Page 112: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-6 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Naming pid Probes

The module portion of the probe description refers to an object loaded inthe address space of the corresponding process. You can list the objectsusing the mdb(1) debugger, as shown in the following example:

# mdb -p 8586Loading modules: [ ld.so.1 libc.so.1 ]> ::objects BASE LIMIT SIZE NAME 10000 a4000 94000 /usr/bin/bashff3b0000 ff3da000 2a000 /lib/ld.so.1ff350000 ff37a000 2a000 /lib/libcurses.so.1ff320000 ff32c000 c000 /lib/libsocket.so.1ff200000 ff290000 90000 /lib/libnsl.so.1ff3a0000 ff3a2000 2000 /lib/libdl.so.1ff100000 ff1d2000 d2000 /lib/libc.so.1ff2d0000 ff2d4000 4000 /usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3> $q

#

You name the object using only the file name portion, not the completepath name. You can also omit the suffixes. The following names describethe same probe:

pid8586:libc.so.1:strcmp:entrypid8586:libc.so:strcmp:entrypid8586:libc:strcmp:entry

For the executable load object, use either the file name of the executable ora.out . The following two probe descriptions name the same probe:

pid8586:bash:main:returnpid8586:a.out:main:return

Tracing Library Functions

The following example shows that executing a simple date (1) commandin the bash shell results in 14 strcmp function calls:

# ps -ef | grep bash root 8567 8561 0 07:36:26 pts/1 0:00 bash root 8577 8571 0 07:37:03 pts/2 0:00 bash root 8586 8580 0 07:37:31 pts/3 0:01 bash root 8888 8577 0 14:14:25 pts/2 0:00 grep bash

Page 113: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-7Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# echo $$8577# dtrace -n 'pid8567:libc:strcmp:entry'dtrace: description 'pid8567:libc:strcmp:entry' matched 1 probeCPU ID FUNCTION:NAME 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry 0 45136 strcmp:entry

Tracing User Functions

The simplest mode of operation for the pid provider is as the user-levelanalogue to the fbt provider. The following example traces all functionentries and returns made from a given function. The tracecalls.d Dscript takes two command-line arguments: $1 for the PID of the processbeing traced, and $2 for the function name from which you want to traceall function calls. The simple C program that the script is going to trace isshown below. This C program calls one function after another, performingsimple arithmetic operations:

# cat -n calls.c 1 int f5(int a, int b) 2 { 3 return (a+b); 4 } 5 6 int f4(int a, int b) 7 { 8 int r; 9 10 r = f5(a,b)+13; 11 return(r); 12 } 13 14 int f3(int a)

Page 114: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-8 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

15 { 16 int r; 17 18 usleep(650); 19 r = f4(a-3, a+3); 20 return(r); 21 } 22 23 int f2(int a) 24 { 25 return(f3(5*a)); 26 } 27 28 int f1(int a, int b) 29 { 30 int r; 31 32 usleep(90); 33 r = f2(a-b); 34 return(r); 35 } 36 37 main() 38 { 39 int x; 40 41 x = f1(13,6); 42 printf("%d\n", x); 43 x = f1(17,5); 44 printf("%d\n", x); 45 }# gcc calls.c -o calls# calls83133# cat -n tracecalls.d 1 #!/usr/sbin/dtrace -s 2 3 pid$1:calls:$2:entry 4 { 5 self->trace = 1; 6 } 7 8 pid$1:calls:$2:return 9 /self->trace/ 10 {

Page 115: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-9Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

11 self->trace = 0; 12 } 13 14 pid$1:calls::entry, 15 pid$1:calls::return 16 /self->trace/ 17 { 18 }

You start the calls application in a second window through the mdb(1)debugger. This enables you to stop it as soon as possible in the start-upfunction that calls the main() function. The _start:b command sets abreakpoint in the _start function where the application starts running.The :r command starts the process running; it immediately hits thebreakpoint and stops. You then escape from the debugger by using the!ps command to find the PID of the calls process:

# mdb calls> _start:b> :rmdb: stop at _startmdb: target stopped at:_start: clr %fp> !ps PID TTY TIME CMD 8916 pts/3 0:00 ps

8914 pts/3 0:00 calls 8586 pts/3 0:01 bash 8915 pts/3 0:00 sh 8580 pts/3 0:00 sh 8913 pts/3 0:00 mdb

You can now run the dtrace command in the first terminal window totrace the function calls, starting with the f1 function. You must alsocontinue the process with the :c mdb command after starting the dtracecommand:

# dtrace -F -s tracecalls.d 8914 f1dtrace: script 'tracecalls.d' matched 16 probes

In the second terminal window you continue the process:

> :c83133mdb: target has terminated> $q

Page 116: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-10 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The call sequence is shown in the first, dtrace terminal window:

CPU FUNCTION 0 -> f1 0 -> f2 0 -> f3 0 -> f4 0 -> f5 0 <- f5 0 <- f4 0 <- f3 0 <- f2 0 -> f1 0 -> f2 0 -> f3 0 -> f4 0 -> f5 0 <- f5 0 <- f4 0 <- f3 0 <- f2^C

Tracing Function Arguments

By adding a line to the tracecalls.d script, you can print the argumentsto the functions as well as return value information. Arguments tofunctions are represented with arg0 , arg1 , arg2 , and so on. The functionreturn value is placed in the arg1 argument, with the arg0 argumentcontaining the offset within the function where the return occurred. Thefollowing D script example prints the arguments to functions:

# cat -n tracecalls2.d 1 #!/usr/sbin/dtrace -s 2 3 pid$1:calls:$2:entry 4 { 5 self->trace = 1; 6 } 7 8 pid$1:calls:$2:return 9 /self->trace/ 10 { 11 self->trace = 0; 12 } 13

Page 117: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-11Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

14 pid$1:calls::entry, 15 pid$1:calls::return 16 /self->trace/ 17 { 18 printf("%d %d", arg0, arg1); 19 }

# dtrace -F -s tracecalls2.d 8944 f1dtrace: script 'tracecalls2.d' matched 16 probesCPU FUNCTION 0 -> f1 13 6 0 -> f2 7 7 0 -> f3 35 35 0 -> f4 32 38 0 -> f5 32 38 0 <- f5 40 70 0 <- f4 56 83 0 <- f3 68 83 0 <- f2 52 83 0 -> f1 17 5 0 -> f2 12 12 0 -> f3 60 60 0 -> f4 57 63 0 -> f5 57 63 0 <- f5 40 120 0 <- f4 56 133 0 <- f3 68 133 0 <- f2 52 133^C

The following commands are entered in the mdb(1) window which startedthe calls program. On return from a function, the arg0 argument is theoffset within the function where the restore instruction executed to leavethe function, and the arg1 argument is the return value, as follows:

> f5+0t40/if5+0x28:f5+0x28: restore> f5+0x24,2/if5+0x24:f5+0x24: retf5+0x28: restore> f2+0t48,2/if2+0x30:f2+0x30: retf2+0x34: restore

Page 118: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-12 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

>

The f5+0t40 address represents 40 decimal bytes into the f5 function,which the trace output shows was placed in the arg0 argument when thef5 function returned. For arg1 , the return value from the f5 function onthe first return was 70; on the second return it was 120 . The f5+0x24,2/icommand in the mdb(1) debugger displays two instructions starting ataddress f5+0x24 . Functions typically return by using these two SPARC®

instructions. All SPARC instructions are four bytes in length. At addressf2+0x34 is another restore instruction.

Tracing Calls Into the Kernel

In the following example you trace a simpler version of the callsprogram into the kernel:

# cat -n calls2.c 1 int f5(int a, int b) 2 { 3 return (a+b); 4 } 5 6 int f4(int a, int b) 7 { 8 int r; 9 10 r = f5(a,b)+13; 11 return(r); 12 } 13 14 int f3(int a) 15 { 16 int r; 17 18 r = f4(a-3, a+3); 19 return(r); 20 } 21 22 int f2(int a) 23 { 24 return(f3(5*a)); 25 } 26 27 int f1(int a, int b) 28 { 29 int r;

Page 119: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-13Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

30 31 r = f2(a-b); 32 return(r); 33 } 34 35 main() 36 { 37 int x; 38 39 x = f1(13,6); 40 printf("%d\n", x); 41 }# cat -n traceall.d 1 #!/usr/sbin/dtrace -qs 2 #pragma D option flowindent 3 4 pid$1::$2:entry 5 { 6 self->trace = 1; 7 } 8 9 pid$1:::entry, pid$1:::return, fbt::: 10 /self->trace/ 11 { 12 printf("%s\n", curlwpsinfo->pr_syscall ? "K": "U"); 13 } 14 15 pid$1::$2:return 16 /self->trace/ 17 { 18 self->trace = 0; 19 }

The traceall.d D script uses a #pragma statement to set the equivalent-F option of the dtrace (1M) command to indent the function calls. Thepr_syscall field of the lwp information data structure to which thecurlwpsinfo built-in variable points is 0 when not in the kernel –otherwise it is the system call number when the thread is in the kernel.You use this to indicate whether you are tracing user code or kernel code.

The traced calls follow. Many of the function calls are for setting up thedynamic binding to the library functions on first call. The followingexample shows a portion of the output of this script:

# traceall.d 12861 main

Page 120: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-14 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

CPU FUNCTION 0 -> main U 0 -> f1 U 0 -> f2 U 0 -> f3 U 0 -> f4 U 0 -> f5 U 0 <- f5 U 0 <- f4 U 0 <- f3 U 0 <- f2 U 0 <- f1 U 0 -> elf_rtbndr U 0 -> elf_bndr U 0 -> enter U 0 -> rt_bind_guard U 0 <- rt_bind_guard U 0 -> _ti_bind_guard U 0 <- _ti_bind_guard U 0 -> rt_mutex_lock U 0 <- rt_mutex_lock U 0 -> _lwp_mutex_lock U 0 <- _lwp_mutex_lock U 0 <- enter U 0 -> lookup_sym U 0 -> elf_hash U 0 <- elf_hash U 0 -> callable U 0 <- callable U 0 -> elf_find_sym U 0 -> strcmp U... 0 <- elf_bndr U 0 <- elf_rtbndr U 0 -> printf U 0 -> _flockget U 0 -> mutex_lock U 0 <- mutex_lock U 0 -> mutex_lock_impl U 0 <- mutex_lock_impl U 0 <- _flockget U 0 -> _setorientation U 0 <- _setorientation U 0 -> _ndoprnt U 0 -> elf_rtbndr U 0 -> elf_bndr U

Page 121: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-15Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 -> enter U 0 -> rt_bind_guard U... 0 -> _write U 0 -> pre_syscall K 0 -> syscall_mstate K 0 <- syscall_mstate K 0 <- pre_syscall K 0 -> write32 K 0 <- write32 K 0 -> write K 0 -> getf K 0 -> set_active_fd K... 0 <- clear_active_fd K 0 -> cv_broadcast K 0 <- cv_broadcast K 0 <- releasef K 0 <- write K 0 -> post_syscall K 0 -> clear_stale_fd U 0 <- clear_stale_fd U 0 -> syscall_mstate U 0 <- syscall_mstate U 0 <- post_syscall U 0 <- _xflsbuf U 0 -> ferror_unlocked U 0 <- ferror_unlocked U 0 <- _ndoprnt U 0 -> ferror_unlocked U 0 <- ferror_unlocked U 0 -> mutex_unlock U 0 <- mutex_unlock U 0 <- printf U 0 <- main U^C

Page 122: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-16 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Tracing Arbitrary Instructions

You can use the pid provider to trace any instruction in any user function.Upon demand, the pid provider creates a probe for every instruction in afunction. The name of each probe is the offset in hexadecimal of thecorresponding instruction in the function. The following example tracesthe instruction 10 (hexadecimal) bytes into the strcmp function while thebash shell runs the date (1) command:

# dtrace -n 'pid28845:libc:strcmp:10'dtrace: description 'pid28845:libc:strcmp:10' matched 1 probeCPU ID FUNCTION:NAME 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10 0 39492 strcmp:10^C

You see this instruction near the beginning of the strcmp C libraryfunction, where it is called 14 times when the bash shell runs the date (1)command. You can see which instructions within the strcmp C libraryfunction are executed by tracing all of the function’s instructions, asfollows:

# dtrace -n 'pid28845:libc:strcmp:'dtrace: description 'pid28845:libc:strcmp:' matched 128 probesCPU ID FUNCTION:NAME 0 39494 strcmp:entry 0 39495 strcmp:0 0 39496 strcmp:4 0 39497 strcmp:8 0 39498 strcmp:c 0 39492 strcmp:10 0 39499 strcmp:14 0 39500 strcmp:18 0 39511 strcmp:44 0 39512 strcmp:48

Page 123: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-17Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 39513 strcmp:4c 0 39582 strcmp:160 0 39583 strcmp:164 0 39584 strcmp:168 0 39585 strcmp:16c 0 39586 strcmp:170 0 39587 strcmp:174 0 39588 strcmp:178 0 39589 strcmp:17c 0 39597 strcmp:19c 0 39598 strcmp:1a0 0 39599 strcmp:1a4 0 39600 strcmp:1a8 0 39601 strcmp:1ac 0 39602 strcmp:1b0 0 39603 strcmp:1b4 0 39604 strcmp:1b8 0 39605 strcmp:1bc 0 39606 strcmp:1c0 0 39607 strcmp:1c4 0 39618 strcmp:1f0 0 39619 strcmp:1f4 0 39493 strcmp:return 0 39494 strcmp:entry 0 39495 strcmp:0 0 39496 strcmp:4

The previous output shows the strcmp function executing eachinstruction sequentially until the instruction at strcmp+0x18 branches tostrcmp+0x44 . You can display some of the assembly instructions usingthe mdb(1) debugger:

# mdb -p 8567Loading modules: [ ld.so.1 libc.so.1 ]> libc`strcmp,14/ailibc.so.1`strcmp:libc.so.1`strcmp: subcc %o0, %o1, %o2libc.so.1`strcmp+4: be +0xac <libc.so.1`strcmp+0xb0>libc.so.1`strcmp+8: sethi %hi(0x1010000), %o5libc.so.1`strcmp+0xc: andcc %o0, 3, %o3libc.so.1`strcmp+0x10: or %o5, 0x101, %o5libc.so.1`strcmp+0x14: be +0x30 < libc.so.1`strcmp+0x44 >libc.so.1`strcmp+0x18: sll %o5, 7, %o4libc.so.1`strcmp+0x1c: sub %o3, 4, %o3libc.so.1`strcmp+0x20: ldub [%o1 + %o2], %o0libc.so.1`strcmp+0x24: ldub [%o1], %g1libc.so.1`strcmp+0x28: subcc %o0, %g1, %o0libc.so.1`strcmp+0x2c: bne +0x1c4 <libc.so.1`strcmp+0x1f0>libc.so.1`strcmp+0x30: addcc %o0, %g1, %g0

Page 124: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-18 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

libc.so.1`strcmp+0x34: be +0x1bc <libc.so.1`strcmp+0x1f0>libc.so.1`strcmp+0x38: addcc %o3, 1, %o3libc.so.1`strcmp+0x3c: bne -0x1c <libc.so.1`strcmp+0x20>libc.so.1`strcmp+0x40: add %o1, 1, %o1libc.so.1`strcmp+0x44: andcc %o1, 3, %o3libc.so.1`strcmp+0x48: be +0x118 < libc.so.1`strcmp+0x160 >libc.so.1`strcmp+0x4c: cmp %o3, 2

The instruction at the strcmp+0x18 address is a shift left logical (sll ),which is in the delay slot after the conditional branch instruction: be. Thisinstruction executes before the one at address: strcmp+0x44 even whenthe branch is taken, which in this execution it was. Another conditionalbranch was taken at address: strcmp+0x48.

DTrace enables you to trace, instruction by instruction, the actualexecution flow through the logic of a program. This is an improvementover the traditional debugging techniques of inserting print statements inyour application or of running the application under a debugger andsetting breakpoints where appropriate.

Determining Time Spent in Functions

Using an associative array and the quantize aggregation built-infunction, you can determine the amount of time spent in every function ofan application. The following D script displays a power-of-twodistribution of how much time (in nanoseconds) is spent in every functionof the calls application. A clause-local variable is used to calculate theelapsed time:

# cat -n timespent.d 1 #!/usr/sbin/dtrace -qs 2 3 pid$1:::entry 4 { 5 self->t[probefunc] = timestamp; 6 } 7 8 pid$1:::return 9 /self->t[probefunc]/ 10 { 11 this->elapsed = timestamp - self->t[probefunc]; 12 @[probefunc] = quantize(this->elapsed); 13 self->t[probefunc] = 0; /* frees memory */ 14 }

# ./timespent.d 8950^C

Page 125: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-19Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

...

usleep value ------------- Distribution ------------- count 1048576 | 0 2097152 |@@@@@@@@@@ 1 4194304 |@@@@@@@@@@ 1 8388608 |@@@@@@@@@@@@@@@@@@@@ 2 16777216 | 0... f4 value ------------- Distribution ------------- count 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@ 1 65536 |@@@@@@@@@@@@@@@@@@@@ 1 131072 | 0... f1 value ------------- Distribution ------------- count 4194304 | 0 8388608 |@@@@@@@@@@@@@@@@@@@@ 1 16777216 |@@@@@@@@@@@@@@@@@@@@ 1 33554432 | 0... main value ------------- Distribution ------------- count 16777216 | 0 33554432 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 67108864 | 0

The profile Provider

The profile provider provides unanchored probes: probes that are notassociated with any particular point of execution. When you specify theseprobes, you leave off both the module and the function portion of theprobe description. Instead of being tied to a specific program location, theprofile probes are associated with an asynchronous, time-basedinterrupt that fires at a fixed, specified time interval. You can use theseprobes to sample an aspect of system state at the specified interval. Forexample, you can sample the state of the current thread, the state of acentral processing unit (CPU), or the current machine instruction. You canthen use the samples to infer system behavior.

Page 126: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-20 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Using the profile- n Probes

A profile- n probe fires every fixed interval on every CPU at highinterrupt level. These probes are used to profile the execution of anapplication because you do not know what CPU it may be running on atany instant in time. The profile- n probes fire n times per second. Youcan add the following suffixes to change the time units: ns fornanoseconds, us for microseconds, ms for milliseconds, mfor minutes, hfor hours, or d for days. For example, the following probes fire at the samerate:

● profile-200 – Fires 200 times per second on every CPU

● profile-5ms – Fires every 5 milliseconds on every CPU

● profile-5000us – Fires every 5000 microseconds on every CPU

The following probes fire once per day:

● profile-1d

● profile-24h

The following script should output numbers that increase byapproximately one million (nanoseconds):

# dtrace -q -n 'profile-1ms {printf("%d\n", timestamp)}'274817618640560274817619628282274817620626998274817621624780274817622624686

^C

Currently you cannot specify a time interval less than 200 microsecondswith the profile provider, as the following example shows:

# dtrace -q -n 'profile-199us {printf("%d\n", timestamp)}'dtrace: invalid probe specifier profile-199us {printf("%d\n",timestamp)}: probe description :::profile-199us does not match any probes# dtrace -q -n 'profile-200us {printf("%d\n", timestamp)}'275328143837997275328144030602275328144229696275328144431022^C

Page 127: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-21Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Sampling Process Activity

The following D script samples 109 times per second to see whichprocesses are running. The count indicates which processes have run themost often during the interval that the script runs:

# cat -n running.d 1 #!/usr/sbin/dtrace -qs 2 3 profile-109 4 /pid != 0/ 5 { 6 @[pid, execname] = count(); 7 } 8 9 END 10 { 11 printf("%-8s %-40s %s\n", "PID", "CMD", "COUNT"); 12 printa("%-8d %-40s %@d\n", @); 13 }# ./running.d^CPID CMD COUNT9190 grep 19191 bash 19190 bash 19189 bash 19188 uptime 28586 bash 29191 vi 123 fsflush 249192 find 80

You can use the profile- n provider to sample information about aspecific process. The following script samples, slightly quicker than every5 milliseconds, the priority of the shell thread while it is running in aninfinite loop:

# echo $$8586# while : ; do : ; done

Page 128: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-22 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

In another window, run the following D script:

# cat -n profilepri.d 1 #!/usr/sbin/dtrace -qs 2 profile-211 3 /pid == $1/ 4 { 5 @[execname] = lquantize(curlwpsinfo->pr_pri, 0, 100, 10); 6 }

# ./profilepri.d 8586^C

bash value ------------- Distribution ------------- count < 0 | 0 0 |@@@@@@@@@@@@@@@@@@@@@@@@ 271 10 |@@@@@@ 63 20 |@@@@ 48 30 |@@@ 32 40 |@ 15 50 |@@ 24 60 | 0

In the previous example, the curlwpsinfo built-in variable points to astructure containing lwp information. This structure is described in theproc (4) manual page. It shows the Solaris timesharing scheduler’s biastowards zero for compute-bound threads. The high counts indicate thatthis thread is running more frequently than other threads on the system.

In the following example, you see the results of running the nextinvocation of the script when the shell is running in its more normal modeof executing a few interactive commands:

# ./profilepri.d 8586^C

bash value ------------- Distribution ------------- count 30 | 0 40 |@@@@@@@@@@@@@ 1 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 60 | 0

Page 129: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-23Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

This shows that the shell’s priority is higher when run interactively, whereit spends most of its time waiting on input; the small counts indicate thatit was not running frequently.

Using the tick- n probes

Like profile- n probes, tick- n probes fire every fixed interval at highinterrupt level. However, the tick- n probes fire only on one CPU perinterval, rather than on every CPU like the profile- n probes. Theseprobes should not be used to profile an application because it many runon any CPU at any instant in time. You specify the n suffix just as you dofor the profile- n probes. For example, tick-20ms fires every 20milliseconds, but only on one CPU. One use of the tick- n probes is toprovide periodic output or to take periodic action. You saw this usage inModule 2 with the custom monitoring tools.

Using Arguments to the profile Provider

You can use the arguments to the profile probes to determine if theexecuting thread is currently in kernel mode and, if it is not, where withinits process address space it is executing when the probe fires. Theprogram counter (PC) register’s value is made available when theprofile probes fire. The arguments are set as follows:

● The arg0 argument – The PC register value in the kernel at the timethe probe fired, or 0 if the current thread was not executing in thekernel at the time that the probe fired

● The arg1 argument – The PC register value in the user-level processat the time the probe fired, or 0 if the current thread was executing inthe kernel at the time the probe fired

Profiling an Application Using the profile Provider

You can learn whether your application is executing within its ownprocess address space or within the kernel space by using the arg0 andarg1 arguments, which are set when the profile probes fire. The followingD script samples the PC slightly faster than every millisecond. The scriptruns for 10 seconds on a compute-bound application. It also shows howmany time intervals, out of the total that occurred in 10 seconds, theapplication used:

# cat -n profile.d 1 #!/usr/sbin/dtrace -qs 2

Page 130: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-24 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

3 profile-1009 4 { 5 ++t; 6 } 7 8 profile-1009 9 /pid == $1/ 10 { 11 @pc[arg1] = count(); 12 @mode[arg0 ? "kernel" : "user"] = count(); 13 ++n; 14 } 15 16 tick-10sec 17 /n/ 18 { 19 printa("%-10x\t%@u\n", @pc); 20 printf("Total: %u out of %u\n", n, t); 21 exit(0); 22 }# ./profile.d 9240ff3163ac 10 5107f8 6010810 6010710 641084c 6410754 6510734 6610824 691083c 6910738 691081c 7110820 73106f4 7510730 7510728 7610744 7710814 771074c 791074c 79106e4 79106d8 791075c 8010770 8010828 80

Page 131: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-25Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

1072c 8210760 83106f0 8610758 86106dc 87106d4 88106d0 92ff2a11e8 13210764 13420ac8 13720acc 141ff2a11ec 14210840 14420ac4 14710834 172106cc 306106e0 56210714 611107fc 623ff2a11e4 716ff2a11e0 3723Total: 9887 out of 10002

kernel 5 user 9882

In the previous example, the high count in user mode versus kernel modeindicates that this process is compute-bound. By using the mdb(1)debugger as shown in the following example, you can tell where theprocess is spending most of its time:

> ff2a11e0/ilibc.so.1 .umul:libc.so.1 .umul:umul %o0, %o1, %o0> ff2a11e4/ilibc.so.1 .umul+4: rd %y, %o1> 107fc/imod+0x34: cmp %o0, %o1> 10714/iprod+0x1c: cmp %o0, %o1> 106e0/isum+0x14: add %o0, %o1, %o0

Page 132: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-26 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

This output shows that this process spent most of its time in the C librarymultiply function: .umul . It spent most of the remaining time in its ownmod, prod , and sum functions. The programmer should investigatecompiler options to have the multiplication occur with hardwareinstructions instead of in software. This program was compiled with thegcc compiler with no optimizations.

Page 133: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-27Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Determining Time Spent in Functions

You can use the timespent2.d D script to obtain a graph of the timespent in each function of this process. A special macro, $target , is set tothe process ID of the application that is started for you with the -c optionto the dtrace (1M) command. The command after the -c must be quotedif it contains arguments:

# cat -n timespent2.d 1 #!/usr/sbin/dtrace -qs 2 3 pid$target:::entry 4 { 5 self->t[probefunc] = timestamp; 6 } 7 8 pid$target:::return 9 /self->t[probefunc]/ 10 { 11 this->elapsed = timestamp - self->t[probefunc]; 12 @[probefunc] = quantize(this->elapsed); 13 self->t[probefunc] = 0; /* frees memory */ 14 }# dtrace -s timespent2.d -c ./pgmdtrace: script 'timespent2.d' matched 5836 probes^C... .rem value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15 16384 | 0

memchr value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@ 5 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 11 32768 | 0

.div value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@ 5 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24 16384 |@ 1

Page 134: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

3-28 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

32768 | 0

mutex_lock value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16 32768 | 0...sum value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@13986319 16384 | 15890 32768 | 419 65536 | 14174 131072 | 426 262144 | 282 524288 | 59 1048576 | 57 2097152 | 24...prod value ------------- Distribution ------------- count 17179869184 | 0 34359738368 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14 68719476736 |@@@ 1 137438953472 | 0... .umul value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@27699230 16384 | 37144 32768 | 943 65536 | 30290 131072 | 864 262144 | 579 524288 | 111 1048576 | 157 2097152 | 74 4194304 | 3

Page 135: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Profiling

Debugging Applications With DTrace 3-29Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

This output shows that the process is spending an average of only 8–16microseconds in both the sum and the .umul functions, but they are beingcalled significantly more often than the other functions. The process spentbetween 34–68 seconds in the prod function 14 times that it was calledand between 68–137 seconds the other time it was called.

Finally, the following command builds a table of which functions of anapplication are called the most frequently:

# dtrace -n 'pid$target:::entry {@[probefunc] = count()}' -c ./pgmdtrace: description 'pid$target:::entry ' matched 2931 probes^C... main 1 hdl_create 1 elf_entry_pt 1 unused 1 rtld_db_postinit 1 call_init 1 munmap 1... printf 3 .rem 3 mod 3 free 4 prod 4 defrag 4 strncpy 5 plt_full_range 5 strlen 5... strcmp 39 rt_bind_clear 42 sum 3549598 .umul 6249598

Page 136: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Variables

3-30 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Application Variables

Accessing process address space information is more difficult thanaccessing kernel information because DTrace actions run in the kernel.Therefore, to access process data such as application variables or systemcall argument strings (for example, path names), you must copy theinformation from the process address space to the kernel. DTrace providestwo built-in functions to accomplish this:

● void *copyin(uintptr_t addr , size_t size )

The copyin function copies the specified size in bytes from thespecified user address into a DTrace scratch buffer and returns theaddress of this buffer. The user address is interpreted as being withinthe address space of the process associated with the currentlyrunning thread when the probe fires.

● string copyinstr(uintptr_t addr )

The copyinstr function copies a null-terminated C string from thespecified user address into a scratch buffer and returns its address.

Displaying Process Global Variables

The following example shows how to display global variables from anapplication when a probe fires. Two global variables have been added tothe calls.c C program you saw previously:

# cat -n calls3.c 1 int y = 15; 2 int z = 8; 3 4 int f5(int a, int b) 5 { 6 ++z; 7 return (a+b); 8 } 9 10 int f4(int a, int b) 11 { 12 int r; 13 14 r = f5(a,b)+13; 15 y = z+r; 16 return(r); 17 }

Page 137: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Variables

Debugging Applications With DTrace 3-31Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

18 19 int f3(int a) 20 { 21 int r; 22 23 usleep(650); 24 r = f4(a-3, a+3); 25 z = r*y; 26 return(r); 27 } 28 29 int f2(int a) 30 { 31 return(f3(5*a)); 32 } 33 34 int f1(int a, int b) 35 { 36 int r; 37 38 usleep(90); 39 r = f2(a-b); 40 y = z*r; 41 return(r); 42 } 43 44 main() 45 { 46 int x; 47 48 x = f1(13,6); 49 printf("x=%d y=%d z=%d\n", x, y, z); 50 x = f1(17,5); 51 printf("x=%d y=%d z=%d\n", x, y, z); 52 }# calls3x=83 y=633788 z=7636x=133 y=137443530 z=1033410

The following D script is passed three arguments:

● $1 – The virtual address of a global variable

● $2 – The global variable’s size

● $$3 – The name of the variable

Page 138: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Variables

3-32 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

You have dtrace (1M) start the process by using the -c option.dtrace (1M) sets the $target macro to the process PID. The scriptdisplays the value of a global variable on entry and return to everyfunction in the program that is called after the main function.

# cat -n uservariables.d 1 #!/usr/sbin/dtrace -qs 2 3 pid$target:a.out:main:entry 4 { 5 started = 1; 6 } 7 8 pid$target:a.out::entry 9 /started/ 10 { 11 v = (int *)copyin($1, $2); 12 printf("On entry to %s: %s=%d\n", probefunc, $$3, *v); 13 } 14 15 pid$target:a.out::return 16 /started/ 17 { 18 v = (int *)copyin($1, $2); 19 printf("On return from %s: %s=%d\n", probefunc, $$3, *v); 20 } 21 22 pid$target:a.out:main:return 23 { 24 exit(0); 25 }

The (int *) in front of the copyin function is called a cast, which is afeature taken from the C language. A cast converts one data type intoanother data type. In this case, the data type is converted from void * ,which is the type of the buffer address into which the variable is copied,to an integer pointer, because you are copying in an integer. You use a * infront of the v variable in the printf statements to dereference the pointerto that which it points, namely the integer.

The nm(1) command is used to display the symbol table entry for the zvariable in the calls3 executable file.

# / usr/ccs/bin/nm calls3 | grep '|z$'[70] | 133952| 4|OBJT |GLOB |0 |16 |z

Page 139: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Variables

Debugging Applications With DTrace 3-33Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# dtrace -qs uservariables.d -c calls3 133952 4 zx=83 y=633788 z=7636x=133 y=137443530 z=1033410On entry to main: z=8On entry to f1: z=8On entry to f2: z=8On entry to f3: z=8On entry to f4: z=8On entry to f5: z=8On return from f5: z=9On return from f4: z=9On return from f3: z=7636On return from f2: z=7636On return from f1: z=7636On entry to f1: z=7636On entry to f2: z=7636On entry to f3: z=7636On entry to f4: z=7636On entry to f5: z=7636On return from f5: z=7637On return from f4: z=7637On return from f3: z=1033410On return from f2: z=1033410On return from f1: z=1033410On return from main: z=1033410

You can easily display the y variable, as follows:

# /usr/ccs/bin/nm calls3 | grep '|y$'[67] | 133948| 4|OBJT |GLOB |0 |16 |y# dtrace -qs uservariables.d -c calls3 133948 4 yx=83 y=633788 z=7636x=133 y=137443530 z=1033410On entry to main: y=15On entry to f1: y=15On entry to f2: y=15On entry to f3: y=15On entry to f4: y=15On entry to f5: y=15On return from f5: y=15On return from f4: y=92On return from f3: y=92On return from f2: y=92On return from f1: y=633788On entry to f1: y=633788On entry to f2: y=633788

Page 140: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Variables

3-34 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

On entry to f3: y=633788On entry to f4: y=633788On entry to f5: y=633788On return from f5: y=633788On return from f4: y=7770On return from f3: y=7770On return from f2: y=7770On return from f1: y=137443530On return from main: y=137443530

Displaying Library Global Variables

The following example displays various errno variables from librarieslinked with the bash shell every 211 milliseconds. Run an infinite loop ofcd commands that fail in the bash shell. The assumption is that errnoshould be set to 2 (No such file or directory) by the bash shell when thecd commands fail:

# cat -n libvars.d 1 #!/usr/sbin/dtrace -qs 2 3 tick-211ms 4 /pid == $1/ 5 { 6 v = (int *)copyin($2, $3); 7 printf("The value of %s=%d\n", $$4, *v); 8 }# ps -ef | grep bash root 9593 9587 0 15:35:27 pts/2 0:00 bash root 9583 9577 0 15:35:04 pts/1 0:00 bash# echo $$9593# mdb -p 9583Loading modules: [ ld.so.1 libc.so.1 ]> ::objects BASE LIMIT SIZE NAME 10000 b2000 a2000 /usr/bin/bashff3b0000 ff3dc000 2c000 /lib/ld.so.1ff350000 ff37a000 2a000 /lib/libcurses.so.1ff320000 ff32c000 c000 /lib/libsocket.so.1ff200000 ff292000 92000 /lib/libnsl.so.1ff3a0000 ff3a2000 2000 /lib/libdl.so.1ff100000 ff1d4000 d4000 /lib/libc.so.1ff2d0000 ff2d4000 4000 /usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3

Page 141: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Application Variables

Debugging Applications With DTrace 3-35Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

> ::nm ! grep '|errno$'0xff3ee670|0x00000004|OBJT |LOCL |0x2 |21 |errno0xff1ec03c|0x00000004|OBJT |GLOB |0x0 |21 |errno> $q# ./libvars.d 9583 0xff3ee670 4 errnoThe value of errno=2The value of errno=2The value of errno=2The value of errno=2The value of errno=2^C# ./libvars.d 9583 0xff1ec03c 4 errnoThe value of errno=0The value of errno=0The value of errno=0The value of errno=0The value of errno=0^C

The libvars.d D script was run while the bash shell performed thefollowing loop:

# while :; do cd /fubar; donebash: cd: /fubar: No such file or directorybash: cd: /fubar: No such file or directorybash: cd: /fubar: No such file or directorybash: cd: /fubar: No such file or directory

This shows that the first errno at address 0xff3ee670 is the one set as aresult of the cd command failing in the bash shell. The “No such fileor directory ” error message corresponds to an errno value of 2.

Page 142: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

The plockstat Provider

3-36 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The plockstat Provider

The plockstat provider gives you details about user-level lockingevents. It is used similarly to the pid provider when identifying theprocess to be traced. For example plockstat1234 would trace user-levellock events for the process with PID 1234 . The three types of lock eventsare hold events, contention events, and error events. Hold events occurwhen a lock is acquired or released; contention events occur when theapplication thread must wait for a lock; error events are any detectederrors when using the locks. The following example shows how tomonitor all lock events for a particular process:

# pgrep sendmail1196# dtrace -n 'plockstat1196::: {trace(timestamp)}'dtrace: description 'plockstat1196::: ' matched 39 probesCPU ID FUNCTION:NAME 0 51440 lmutex_lock:mutex-acquire 1523449860253331 0 51460 lmutex_unlock:mutex-release 1523449860271845 0 51440 lmutex_lock:mutex-acquire 1523449860283483 0 51460 lmutex_unlock:mutex-release 1523449860290833 0 51440 lmutex_lock:mutex-acquire 1523449860325499 0 51460 lmutex_unlock:mutex-release 1523449860332171 0 51440 lmutex_lock:mutex-acquire 1523449860341438 0 51460 lmutex_unlock:mutex-release 1523449860347632 0 51440 lmutex_lock:mutex-acquire 1523449860378587 0 51460 lmutex_unlock:mutex-release 1523449860385554 0 51440 lmutex_lock:mutex-acquire 1523449860394887 0 51460 lmutex_unlock:mutex-release 1523449860401081 0 51440 lmutex_lock:mutex-acquire 1523449860447728 0 51460 lmutex_unlock:mutex-release 1523449860455464 0 51440 lmutex_lock:mutex-acquire 1523449860465297 0 51460 lmutex_unlock:mutex-release 1523449860471519^C

The next example monitors readers/writer lock activity for the voldprocess. The -p option to dtrace (1M) attaches to a running process andsets the $target macro it its PID:

# pgrep vold1098# dtrace -n 'plockstat$target:::rw* {trace(timestamp)}' -p 1098dtrace: description 'plockstat$target:::rw* ' matched 11 probesCPU ID FUNCTION:NAME 0 51474 rwlock_lock:rw-block 1529287107214473 0 51494 rwlock_lock:rw-acquire 1529287107231728

Page 143: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

The plockstat Provider

Debugging Applications With DTrace 3-37Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 51496 __rw_unlock:rw-release 1529287107252733 0 51474 rwlock_lock:rw-block 1529287107403403 0 51494 rwlock_lock:rw-acquire 1529287107412819 0 51496 __rw_unlock:rw-release 1529287107423097 0 51474 rwlock_lock:rw-block 1529287107575211 0 51494 rwlock_lock:rw-acquire 1529287107583872 0 51496 __rw_unlock:rw-release 1529287107593238 0 51474 rwlock_lock:rw-block 1529287107816907 0 51494 rwlock_lock:rw-acquire 1529287107826079 0 51496 __rw_unlock:rw-release 1529287107836362 0 51474 rwlock_lock:rw-block 1529287107928393 0 51494 rwlock_lock:rw-acquire 1529287107936277 0 51496 __rw_unlock:rw-release 1529287107945832 0 51474 rwlock_lock:rw-block 1529287108042880 0 51494 rwlock_lock:rw-acquire 1529287108051591 0 51496 __rw_unlock:rw-release 1529287108060852 0 51474 rwlock_lock:rw-block 1529287108261326 0 51494 rwlock_lock:rw-acquire 1529287108270476 0 51496 __rw_unlock:rw-release 1529287108280748

The plockstat (1M) command is a DTrace consumer that uses theplockstat provider to show detailed application lock usage information.The plockstat (1M) command is comparable to the lockstat (1M)command which shows detailed lock contention details for kernel locks.

Page 144: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Transient System Call Errors

3-38 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Transient System Call Errors

The following D program displays pertinent information any time anyprocess’s system call fails. System call failures return a value of -1 , whichis placed in the arg0 argument when a syscall return probe fires. Youexclude looking at dtrace system call errors by comparing the PID of theprocess whose system call failed with that of the dtrace command.

When a system call returns -1 , the C library interface sets a global uservariable named errno to a positive error code, as shown in the followingexample. These errno values are documented in the Intro (2) manualpage and in the /usr/inlude/sys/errno.h header file.

# cat -n errno.d 1 #!/usr/sbin/dtrace -qs 2 syscall:::return 3 /arg0 == -1 && pid != $pid/ 4 { 5 printf("%-20s %-10s %d\n",execname,probefunc,errno); 6 }

# ./errno.dsvc.startd portfs 62nscd lwp_park 62fmd lwp_park 62svc.startd portfs 62svc.startd portfs 62bash stat64 2bash chdir 2bash chdir 2bash stat64 2nscd lwp_kill 3find open 2find stat 2bash setpgrp 13bash waitsys 10date open 2date stat 2ls open 2ls stat 2bash setpgrp 13bash waitsys 10nscd lwp_kill 3^C

Page 145: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Transient System Call Errors

Debugging Applications With DTrace 3-39Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

User Stack Traces on System Call Failures

By using the ustack () built-in DTrace function, you can also display astack trace of the application code that issued the failed system call:

# cat -n errno2.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall:::return 4 /arg0 == -1 && pid != $pid/ 5 { 6 printf("\n%-20s %-10s %d", execname, probefunc, errno); 7 ustack(); 8 }

# ./errno2.dbash setpgrp 13 libc.so.1`_syscall6+0x1c 35c6c 34fa8 bash`execute_command_internal+0x414 bash`execute_command+0x50 bash reader_loop+0x220 bash`main+0x90c bash`_start+0x108

svc.startd portfs 62 libc.so.1`_portfs+0x4 svc.startd`wait_thread+0x30 libc.so.1`_lwp_start

svc.startd portfs 62 libc.so.1`_portfs+0x4 svc.startd`wait_thread+0x30 libc.so.1`_lwp_start

bash waitsys 10 libc.so.1`_waitid+0x8 libc.so.1`waitpid+0x60 410a0 41004 libc.so.1`__sighndlr+0xc libc.so.1`call_user_handler+0x3b8 libc.so.1`__lwp_sigmask+0x30 libc.so.1`pthread_sigmask+0x1b4 libc.so.1`sigprocmask+0x20

Page 146: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Transient System Call Errors

3-40 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

bash`make_child+0x254 35c6c 34fa8 bash`execute_command_internal+0x414 bash`execute_command+0x50 bash reader_loop+0x220 bash`main+0x90c bash`_start+0x108

bash stat64 2 libc.so.1`stat64+0x4 bash`sh_canonpath+0x258 63638 bash`cd_builtin+0x364 352a0 35a8c 34fc8 bash`execute_command_internal+0x414 bash`execute_command+0x50 bash reader_loop+0x220 bash`main+0x90c bash`_start+0x108find open 2 ld.so.1`__open+0x4 ld.so.1`elf_config+0x120 ld.so.1`setup+0xc20 ld.so.1`_setup+0x37c ld.so.1`_rt_boot+0x88

Hexadecimal addresses are shown on the stack trace output when thedtrace command cannot resolve the PC value to a symbol. To find whattransient system call errors are occurring in a specific application andwhere, you simply change the errno2.d script to pass in the PID of theapplication.

Page 147: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Transient System Call Errors

Debugging Applications With DTrace 3-41Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Processes Using a Lot of System Time

Suppose you saw the following prstat (1M) command output:

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 12663 root 1104K 672K run 0 0 0:00:13 47% unknown/1 12662 root 4736K 4392K cpu0 59 0 0:00:00 0.2% prstat/1 278 root 2976K 1832K sleep 59 0 0:00:15 0.0% nscd/23 9593 root 2840K 2096K sleep 59 0 0:00:01 0.0% bash/1 12577 root 2808K 2056K sleep 59 0 0:00:00 0.0% bash/1 478 root 4696K 1312K sleep 59 0 0:00:21 0.0% sendmail/1 451 root 10M 5016K sleep 59 0 0:00:09 0.0% snmpd/1 517 root 2016K 472K sleep 59 0 0:00:00 0.0% ttymon/1 434 root 3624K 1464K sleep 59 0 0:00:00 0.0% snmpXdmid/2 9584 root 4520K 2200K sleep 59 0 0:00:00 0.0% in.telnetd/1 422 root 2280K 824K sleep 59 0 0:00:00 0.0% snmpdx/1 426 root 4920K 1168K sleep 59 0 0:00:00 0.0% dtlogin/1 439 root 2968K 1584K sleep 59 0 0:00:00 0.0% vold/3 476 root 2032K 720K sleep 59 0 0:00:00 0.0% ttymon/1 433 root 3048K 1032K sleep 59 0 0:00:00 0.0% dmispd/1 353 root 1872K 136K sleep 59 0 0:00:00 0.0% smcboot/1 351 root 1880K 168K sleep 59 0 0:00:00 0.0% smcboot/1 352 root 1872K 152K sleep 59 0 0:00:00 0.0% smcboot/1 339 root 1200K 472K sleep 59 0 0:00:01 0.0% utmpd/1 329 root 1560K 488K sleep 59 0 0:00:00 0.0% powerd/2 281 root 2616K 1200K sleep 59 0 0:00:00 0.0% inetd/1 265 root 2520K 792K sleep 59 0 0:00:00 0.0% cron/1 251 root 3800K 1432K sleep 59 0 0:00:01 0.0% automountd/3 260 root 3784K 1568K sleep 59 0 0:00:00 0.0% syslogd/13 171 root 2096K 1016K sleep 59 0 0:00:16 0.0% in.routed/1 185 daemon 2424K 584K sleep 59 0 0:00:00 0.0% rpcbind/1 189 root 2384K 352K sleep 59 0 0:00:00 0.0% keyserv/2 68 root 3128K 56K sleep 59 0 0:00:00 0.0% picld/4 65 daemon 3544K 1208K sleep 59 0 0:00:00 0.0% kcfd/3 59 root 2368K 152K sleep 59 0 0:00:00 0.0% syseventd/14Total: 38 processes, 109 lwps, load averages: 0.14, 0.11, 0.09

You can obtain more details on the unknown process by using thefollowing command:

# prstat -m -p 12663 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 12663 root 43 57 0.0 0.0 0.0 0.0 0.0 0.3 0 129 .12 0 unknown/1

Page 148: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Transient System Call Errors

3-42 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The unknown process is using a lot of system time. The following Dprogram can determine what system calls are being made:

# dtrace -n 'syscall:::entry /pid == 12663/ { @syscalls[probefunc] = count();}'dtrace: description 'syscall:::entry ' matched 226 probes^C

read 940592

This process appears to be stuck in an endless loop of read (2) systemcalls. The following truss(1) command confirms this, and shows that thereads are failing:

# truss -p 12663read(3, 0xFFBFFD0B, 1) Err#89 ENOSYSread(3, 0xFFBFFD0B, 1) Err#89 ENOSYSread(3, 0xFFBFFD0B, 1) Err#89 ENOSYSread(3, 0xFFBFFD0B, 1) Err#89 ENOSYS...

The errno2.d D script shows further evidence of a runaway loop offailing read (2) system calls:

# ./errno2.d

unknown read 89 libc.so.1`_read+0x8 unknown`main+0x134 unknown`_start+0x5c

unknown read 89 libc.so.1`_read+0x8 unknown`main+0x134 unknown`_start+0x5c

unknown read 89 libc.so.1`_read+0x8 unknown`main+0x134 unknown`_start+0x5c^C# grep 89 /usr/include/sys/errno.h/* Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T */ * (c) 1983,1984,1985,1986,1987,1988,1989 AT&T.#define ENOSYS 89 /* Unsupported file system operation */# pkill unknown

Page 149: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Transient System Call Errors

Debugging Applications With DTrace 3-43Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Suppose you saw the following similar prstat (1M) command output:

# prstat -m -p 12745 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 12745 root 17 81 0.0 0.0 0.0 0.0 0.0 1.5 0 132 .5M 0 readchar/1

You can again get details on what system calls are being made, as follows:

# dtrace -n 'syscall:::entry /pid == 12745/ { @syscalls[probefunc] = count();}'dtrace: description 'syscall:::entry ' matched 225 probes^C

stat 6 open 6 write 6 close 6 read 760747# truss -p 12745read(3, "\b", 1) = 1read(3, "92", 1) = 1read(3, "10", 1) = 1read(3, "\0", 1) = 1read(3, "14", 1) = 1read(3, " @", 1) = 1read(3, "\0", 1) = 1read(3, "82", 1) = 1read(3, " #", 1) = 1read(3, "90", 1) = 1^C

As its name implies, this readchar process is reading a single character ata time. Now run the iosnoop.d D script from Module 2 to get details onthe disk input/output (I/O):

# ./iosnoop.d COMMAND PID FILE DEVICE RW MS readchar 12745 /usr/lib/nss_ldap.so.1 sd2 R 6.492 readchar 12745 /usr/lib/nss_ldap.so.1 sd2 R 6.492 readchar 12745 /usr/lib/nss_ldap.so.1 sd2 R 6.492 readchar 12745 /usr/lib/nss_ldap.so.1 sd2 R 6.638 readchar 12745 /usr/lib/nss_ldap.so.1 sd2 R 2.264 readchar 12745 <none> sd2 R 6.398 readchar 12745 /usr/lib/nss_ldap.so.1 sd2 R 0.696 readchar 12745 /usr/lib/passwdutil.so.1 sd2 R 0.729 readchar 12745 /usr/lib/passwdutil.so.1 sd2 R 1.133 readchar 12745 <none> sd2 R 6.646 readchar 12745 /usr/lib/passwdutil.so.1 sd2 R 5.656 readchar 12745 /usr/lib/watchmalloc.so.1 sd2 R 6.622 readchar 12745 /usr/lib/watchmalloc.so.1 sd2 R 6.842 readchar 12745 /usr/lib/watchmalloc.so.1 sd2 R 0.368 readchar 12745 /usr/lib/watchmalloc.so.1 sd2 R 6.488 readchar 12745 <none> sd2 R 6.315 readchar 12745 /usr/lib/cpp sd2 R 7.896

Page 150: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Transient System Call Errors

3-44 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

readchar 12745 /usr/lib/cpp sd2 R 8.128 readchar 12745 /usr/lib/cpp sd2 R 1.637 readchar 12745 /usr/lib/cpp sd2 R 1.744 readchar 12745 <unknown> sd2 R 5.968 readchar 12745 <unknown> sd2 R 0.309 readchar 12745 /usr/lib/libz.so.1 sd2 R 2.075 readchar 12745 /usr/lib/libz.so.1 sd2 R 5.438 readchar 12745 /usr/lib/llib-lz sd2 R 7.249 readchar 12745 /usr/lib/llib-lz.ln sd2 R 0.586 readchar 12745 /usr/lib/llib-lz.ln sd2 R 0.796 readchar 12745 /usr/lib/llib-lz.ln sd2 R 0.409 readchar 12745 <unknown> sd2 R 0.303 readchar 12745 /lib/libm.so.2 sd2 R 4.507 readchar 12745 /lib/libm.so.2 sd2 R 0.484 readchar 12745 /lib/libm.so.2 sd2 R 0.500 readchar 12745 <none> sd2 R 5.174 readchar 12745 /lib/libm.so.2 sd2 R 18.945 readchar 12745 /lib/libm.so.2 sd2 R 0.506 readchar 12745 /lib/libm.so.2 sd2 R 2.169 readchar 12745 <unknown> sd2 R 0.416 readchar 12745 /lib/libm.so.1 sd2 R 6.297^C

This application appears to be reading all of the files under the /usr/libdirectory one byte at a time. This programmer must not realize that usingthe standard I/O library functions to buffer reads is more efficient thanissuing system call reads of one character at a time. The OS is reading thedisk in blocks, as the iosnoop.d D script output indicates, but theapplication is only extracting the information from the kernel buffers onebyte at a time.

Page 151: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

Debugging Applications With DTrace 3-45Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Open Files

In this section you learn how to display the path names of files beingopened. Note that in DTrace it is more difficult to display pointerarguments passed to system calls than those passed as integer arguments.Examples of system calls that take pointer arguments are open (2), stat (2),unlink (2), and chmod(2), which each take path name string arguments.There are also system calls that pass the address of structures, forexample, sigaction (2). You must use the appropriate copinstr () andcopyin () built-in functions to display the actual strings or structures beingpassed to the kernel.

Accessing System Call Pointer Arguments

Suppose you knew an application was writing out literal strings using thewrite (2) system call, as follows:

# cat -n writemsg.c 1 main() 2 { 3 write(1, "This is some text being", 23); 4 write(1, " written to standard output", 29); 5 write(1, " to prove a point\n", 18); 6 }

# gcc writemsg.c -o writemsg# writemsgThis is some text being written to standard output to prove a point#

You might try to display these strings using the following D script:

# cat -n write.d 1 #!/usr/sbin/dtrace -s 2 3 syscall::write:entry 4 /pid == $target/ 5 { 6 printf("%s\n", stringof(arg1)); 7 }# dtrace -s write.d -c writemsgdtrace: script 'write.d' matched 1 probeThis is some text being written to standard output to prove a pointdtrace: pid 1532 exited with status 1

Page 152: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

3-46 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

dtrace: error on enabled probe ID 1 (ID 12: syscall::write:entry):invalid address (0x10000) in action #1dtrace: error on enabled probe ID 1 (ID 12: syscall::write:entry):invalid address (0x10000) in action #1dtrace: error on enabled probe ID 1 (ID 12: syscall::write:entry):invalid address (0x10000) in action #1^C

The arg1 argument used in the write.d D script is the second argumentto the write (2) system call, which in this case is the address of the stringyou want to display. It is a process address, however, and DTrace isrunning the action statements in the kernel’s address space. Thestringof () built-in function converts the write (2) system call argumentto the proper string type. For the script to work, you must use thecopyinstr () or copyin () built-in DTrace functions showed previously.The following example shows the correct way to access the process’sstring arguments:

# cat -n write2.d 1 #!/usr/sbin/dtrace -s 2 3 syscall::write:entry 4 /pid == $target/ 5 { 6 printf("%s\n", copyinstr(arg1)); 7 }# dtrace -s write2.d -c writemsgdtrace: script 'write2.d' matched 1 probeThis is some text being written to standard output to prove a pointdtrace: pid 1537 exited with status 1CPU ID FUNCTION:NAME 0 12 write:entry This is some text being

0 12 write:entry written to standard output

0 12 write:entry to prove a point

The following changes to the D script enable it to work on all system-widewrite (2) system calls (except those issued by the dtrace (1M) command):

# cat -n write3.d 1 #!/usr/sbin/dtrace -s 2 3 syscall::write:entry 4 /pid != $pid/ 5 {

Page 153: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

Debugging Applications With DTrace 3-47Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

6 printf("%s\n", copyinstr(arg1)); 7 }# ./write3.ddtrace: script './write3.d' matched 1 probeCPU ID FUNCTION:NAMEore--ion, name)iption specifiers (provider, module, func-e describes how to use4maction]]

0 914 write:entry sys61#./write2.ddwrite2.dted token newline'ctory_______________________________________________________________________________________________________________________________________________________________________________________________________

0 914 write:entry pys61#./write2.ddwrite2.dted token newline'ctory_______________________________________________________________________________________________________________________________________________________________________________________________________

You received garbage output because the write (2) system call does notnecessarily write out null terminated strings. The copyin() system call isthe more appropriate function to use for specifying the size of the write:

# cat -n write4.d 1 #!/usr/sbin/dtrace -s 2 3 syscall::write:entry 4 /pid != $pid/ 5 { 6 printf("%s\n", stringof(copyin(arg1, arg2))); 7 }

# ./write4.ddtrace: script './write4.d' matched 1 probeCPU ID FUNCTION:NAME 0 914 write:entry p

0 914 write:entry w

Page 154: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

3-48 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 914 write:entry d

0 914 write:entry

0 914 write:entry /var/dtrace/mod3

0 914 write:entry sys61#

0 914 write:entry d

0 914 write:entry a

0 914 write:entry t

0 914 write:entry e

0 914 write:entry

0 914 write:entry Sun Jun 13 16:55:28 MDT 2004

0 914 write:entry sys61#

^C

Displaying Names of Files Being Opened

The following example shows how to display the names of files beingopened systemwide:

# cat -n open.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall::open*:entry 4 { 5 printf("%s opening %s\n", execname, copyinstr(arg0)); 6 }# ./open.dinit opening /etc/inittabinit opening /etc/svc/volatile/init-next.stateinit opening /etc/svc/volatile/init-next.stateinit opening /etc/inittab

Page 155: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

Debugging Applications With DTrace 3-49Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

man opening /var/ld/ld.configman opening /lib/libc.so.1man opening /usr/share/man/man.cfman opening /usr/share/man/windexman opening /usr/share/man/sman1m/dtrace.1msh opening /var/ld/ld.configsh opening /lib/libc.so.1more opening /var/ld/ld.configmore opening /lib/libcurses.so.1more opening /lib/libc.so.1more opening /usr/share/lib/terminfo//x/xtermutmpd opening /var/adm/utmpxutmpd opening /var/adm/utmpxutmpd opening /proc/12571/psinfoutmpd opening /proc/9587/psinfodate opening /var/ld/ld.configdate opening /lib/libc.so.1date opening /usr/share/lib/zoneinfo/US/Mountainvi opening /var/ld/ld.configvi opening /usr/lib/libmapmalloc.so.1vi opening /lib/libcurses.so.1vi opening /lib/libc.so.1vi opening /lib/libgen.so.1vi opening /usr/share/lib/terminfo//x/xtermvi opening //.exrcvi opening /var/tmp/ExTcaqBzvi opening /var/tmp/ExUcaqBzvi opening /etc/system^C

Displaying Path Names When open System Calls Fail

The following example shows how to know when an open (2) system callfails and how to display the pertinent information to determine theproblem:

# cat -n failedopen.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall::open*:entry 4 /pid == $1/ 5 { 6 self->path = copyinstr(arg0); 7 self->entry = 1; 8 } 9

Page 156: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

3-50 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

10 syscall::open*:return 11 /self->entry && arg0 == -1/ 12 { 13 printf("open for '%s' failed, errno=%d", self->path, errno); 14 ustack(); 15 self->entry = 0; 16 }

# failedopen.d 13026open for '/usr/openwin/lib/X11/XtErrorDB' failed, errno=2 febbcf78 febb05a0 fec97b38 fec97a78 fedbbffc fedbbeac fedbbe40 fedc0220 fedc037c fed8fb6c fed8f2f8 fed8f290 cf3f8 3f648 d1c98 5c658^C

Displaying a Symbolic Stack Trace

The failedopen.d D script was run on the dtmail graphical userinterface (GUI) utility as it was started up over a telnet session. Thedtrace (1M) command could not determine the symbols at the place thefunctions were called. This may be due to the application exiting beforethe dtrace(1M) consumer has a chance to read its symbol table. You canuse the mdb(1) debugger to display the PC locations symbolically:

# mdb /usr/dt/bin/dtmail> _start:b> :rmdb: stop at _startmdb: target stopped at:_start: clr %fp> !ps PID TTY TIME CMD 13025 pts/1 0:00 mdb

Page 157: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

Debugging Applications With DTrace 3-51Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

13027 pts/1 0:00 sh 12571 pts/1 0:00 sh 13026 pts/1 0:00 dtmail 13028 pts/1 0:00 ps 12577 pts/1 0:06 bash> :clibSDtMail: Error: Xt Error: Can't open display: 129.150.33.103:0.0mdb: target has terminated> 5c658/i_start+0x108:_start+0x108: call +0x75618 <main>> d1c98 /imain+0x28:main+0x28: jmpl %i1, %o7> 3f648/i__0fHRoamAppKinitializePiPPc+0x310:__0fHRoamAppKinitializePiPPc+0x310: call +0x8fd24<__0fLApplicationKinitializePiPPc>> cf3f8/i__0fLApplicationKinitializePiPPc+0x8c:__0fLApplicationKinitializePiPPc+0x8c: call +0x52718<PLT:XtAppInitialize>> fed8f290/ilibXt.so.4`XtAppInitialize+0x54:libXt.so.4`XtAppInitialize+0x54:call +0x56800<PLT:XtOpenApplication>> fed8f2f8/ilibXt.so.4`XtOpenApplication+0x48: call +0x56774<PLT:_XtAppInit>> fed8fb6c/ilibXt.so.4`_XtAppInit+0x138: call +0x553cc <PLT:XtErrorMsg>> febbcf78/ilibc.so.1`__open+4: ta 8> febbcf78:b> :cmdb: stop at libc.so.1`__open+4mdb: target stopped at:libc.so.1`__open+4: ta 8> $clibc.so.1`__open+4(ff2893ec, 2000, 1b6, 38e70, ff3b3508, febe2264)libc.so.1`open+0x64(ff2893ec, 2000, 1b6, ff3ea0f8, ff3ec46c, 0)libnsl.so.1`__nsl_fopen+0x8c(ff2893ec, ff2893fc, ff24fbb4, ff3ea0f8,ff3ec46c, ff2893fc)libnsl.so.1`getnetlist+0x20(0, 69bcc, ff292690, 0, 0, ff290f30)libnsl.so.1`setnetconfig+0x38(0, ff294a58, ff292690, 0, 763a8, febea4c0)libnsl.so.1`__rpc_getconfip+0xd8(ff296ea8, 0, 0, 0, 4144c, 0)

Page 158: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

3-52 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

libnsl.so.1`getipnodebyname+0x1c(ffbfef50, 1a, 3, ffbfef3c, 1010101,57f74)libsocket.so.1`get_addr+0x158(0, fe8920a0, ffbff0c0, 17700000, 0, 0)libsocket.so.1`_getaddrinfo+0x710(fe8920a0, 1770, ffbff168, 15950c, 0, 2)libX11.so.4`_X11TransSocketINETConnect+0x178(1594d0, fe8920a0, ffbff188,ffbff32c, fed13100, 0)libX11.so.4`_X11TransConnect+0x58(1594d0, ffbff3e8, 7ffffc00, fe892090,fed13104, fe982078)libX11.so.4`_X11TransConnectDisplay+0x6e0(e, 1594d0, 1, ffbff3e8, 0, 0)libX11.so.4`XOpenDisplay+0xe8(0, fed20bc4, 158f88, ffbffdec, 9ebc4, 0)libXt.so.4`XtOpenDisplay+0xe4(158190, 0, ffbffdcc, fe982010, 0, 0)libXt.so.4`_XtAppInit+0xfc(ffbff71c, fe982010, 0, 0, ffbffdcc, ffbff778)libXt.so.4`XtOpenApplication+0x48(12bc78, fe982010, 0, 0, ffbffdcc,ffbffdec)libXt.so.4`XtAppInitialize+0x54(1346f4, fede7638, fede4000, 120008,54da8, 14e634)__0fLApplicationKinitializePiPPc+0x8c(12bc68, ffbffdcc, ffbffdec, 0,1346f4, 134400)__0fHRoamAppKinitializePiPPc+0x310(12bc68, ffbffdcc, ffbffdec, 14d400, 0,136000)main+0x28(136000, 3f338, 12bc68, 12d0cc, 0, 136000)_start+0x108(0, 0, 0, 0, 0, 0)>

A breakpoint was set on the C library open function and the dtmailutility was continued in the debugger to hit the breakpoint. The $c mdbcommand was used to display the stack trace symbolically after thebreakpoint hit.

Examining Another Failed open Example

The next example shows the failedopen2.d D script run on the cat (1)command while it opens a non-existent file. This script assumes thatdtrace (1M) will start the command.

# cat -n failedopen2.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall::open*:entry 4 /pid == $target/ 5 { 6 self->path = copyinstr(arg0); 7 self->entry = 1; 8 } 9 10 syscall::open*:return

Page 159: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

Debugging Applications With DTrace 3-53Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

11 /self->entry && arg0 == -1/ 12 { 13 printf("open for '%s' failed, errno=%d", self->path, errno); 14 ustack(); 15 self->entry = 0; 16 }# dtrace -s failedopen2.d -c "cat /nothing"dtrace: script 'failedopen2.d' matched 4 probescat: cannot open /nothingdtrace: pid 1612 exited with status 2CPU ID FUNCTION:NAME 0 397 open64:return open for '/nothing' failed,errno=2 libc.so.1`__open64+0x4 libc.so.1`_endopen+0x88 libc.so.1 fopen64+0x1c cat main+0x318 cat _start+0x108

Page 160: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Open Files

3-54 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Accessing structure members in the sigaction (2) system call.

The sigaction (2) system call passes the kernel an address of asigaction structure. In order to access its members you must first copyin the structure using the copyin () DTrace function. The following scriptshows how to do this. It uses a clause-local variable to point to the copiedin sigaction structure.

# cat -n sigaction.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall::sigaction:entry 4 { 5 this->sa_struct = (struct sigaction *)copyin(arg1,sizeof(struct sigaction)); 6 printf("%s called sigaction on signal %d with flags: %x\n", 7 execname, arg0, this->sa_struct->sa_flags); 8 }# ./sigaction.d...tcsh called sigaction on signal 2 with flags: 0tcsh called sigaction on signal 15 with flags: 12tcsh called sigaction on signal 15 with flags: 0tcsh called sigaction on signal 3 with flags: 12...vi called sigaction on signal 8 with flags: 2vi called sigaction on signal 10 with flags: 2vi called sigaction on signal 11 with flags: 2vi called sigaction on signal 13 with flags: 2vi called sigaction on signal 24 with flags: 16

Page 161: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

4-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Module 4

Finding System Problems With DTrace

Objectives

Upon completion of this module, you should be able to:

● Use DTrace to access kernel variables

● Use DTrace to obtain information about read calls

● Use DTrace to perform anonymous tracing

● Use DTrace to perform speculative tracing

● Explain the privileges necessary to run DTrace operations

Page 162: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Relevance

4-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Relevance

?!

Discussion – The following questions are relevant to understanding howto use DTrace for finding system problems:

● Would the ability to access any kernel variable when a probe fires bebeneficial?

● Would it be useful to know who is issuing which type of read calls?

● Would it be advantageous to trace device driver code during systemboot?

● Would it be beneficial to give regular user accounts access to theDTrace facility that is limited to user-owned processes?

Page 163: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Additional Resources

Finding System Problems With DTrace 4-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Additional Resources

Additional resources – The following references provide additionalinformation on the topics described in this module:

● Sun Microsystems, Inc. Solaris Dynamic Tracing Guide, part number817-6223-10.

● Cantrill Bryan M., Michael W. Shapiro, and Adam H. Leventhal.“Dynamic Instrumentation of Production Systems.” Paper presentedat 2004 USENIX Conference.

● BigAdmin System Administration Portal[http://www.sun.com/bigadmin/content/dtrace ].

● dtrace (1M) manual page in the Solaris 10 OS manual pages, Solaris10 Reference Manual Collection.

● The /usr/demo/dtrace directory contains all of the sample scriptsfrom the Solaris Dynamic Tracing Guide.

Page 164: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Accessing Kernel Variables

The DTrace instrumentation executes inside the Solaris™ OperatingSystem (Solaris OS) kernel. This means that, in addition to accessingDTrace variables and probe arguments such as pid and arg1 , you canalso access kernel data structures, symbols, and types. These capabilitiesallow advanced DTrace users, experienced system administrators, supportservice personnel, and driver developers to examine low-level behavior ofthe operating system kernel and the device drivers.

Using the D Language to Access Kernel Symbols

The D language uses the backquote character (‘ ) as a scoping operator foraccessing symbols that are defined in the operating system and not inyour D programs. For example, the Solaris kernel contains a C languagedeclaration of a system tunable named kmem_flags for enabling memoryallocator debugging features. This tunable is declared in C in the kernelsource code as follows:

int kmem_flags;

To display the value of this variable, you can write the D statement:

printf(“%x\n”, ‘kmem_flags);

Examining Naming Conflicts

DTrace associates each kernel symbol with the type used for it in theoperating system C code, providing source-based access to the nativeoperating system data structures. Because kernel symbol names are keptin a separate namespace from D variables and function identifiers, namingconflicts are not an issue.

When you prefix a variable with a backquote, the D compiler searches theknown kernel symbols in order, using the list of loaded modules to find amatching variable definition. Because the Solaris OS kernel supportsdynamically loaded modules with separate symbol namespaces, the samevariable name or function name can be used more than once in the kernel.You resolve this conflict by preceding the variable or function name withthe kernel module name and the backquote character as a separator. Forexample, you refer to the _init (9E) function in the sd module as follows:

sd‘_init

Page 165: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

Finding System Problems With DTrace 4-5Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

You can apply any of the D operators to external kernel variables, exceptthose that modify values. When you launch DTrace, the D compiler loadsthe set of variable names corresponding to active kernel modules, sodeclarations of these variables are not required.

Monitoring Kernel Variables

The following D script displays, every five seconds, the value of threeglobal kernel variables:

● The nproc variable – Holds the current number of Solaris OSprocesses

● The nthread variable – Holds the current number of Solaris OSthreads

● The freemem variable – Holds the current amount of system freememory not owned by the memory allocator

You must precede each reference to these kernel variables with abackquote character (‘ ), as shown in the following example:

# cat -n monitor.d 1 #!/usr/sbin/dtrace -qs 2 3 BEGIN 4 { 5 printf("%-14s %-10s %10s\n", "Processes", 6 "Threads", "Free Memory"); 7 } 8 9 tick-5sec 10 { 11 printf("%-14d %-10d %9dmb\n", nproc, 12 nthread, ( freemem*8)/1024); 13 }

# ./monitor.dProcesses Threads Free Memory41 232 322mb42 232 306mb41 232 322mb53 242 320mb47 249 251mb41 232 252mb41 232 252mb41 232 232mb41 232 111mb

Page 166: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-6 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

47 235 110mb47 241 110mb

Accessing Kernel Data Structures

When a probe fires, DTrace sets many useful built-in variables. Three ofthese variables and their associated data structures are:

● The curpsinfo variable – Points to a process information structure

● The curlwpsinfo variable – Points to a lightweight process (LWP)information structure

● The curcpu variable – Points to a central processing unit (CPU)information structure

The first two structures are part of the proc (4) interface and are used bycommands like ps(1) and prstat (1M). These variables provide access tokernel state information at the time any probe fires. The followingexamples define the data structures.

The psinfo Data Structure

The following shows the psinfo data structure:

typedef struct psinfo { int pr_nlwp; /* number of active lwps in the process */ pid_t pr_pid; /* unique process id */ pid_t pr_ppid; /* process id of parent */ pid_t pr_pgid; /* pid of process group leader */ pid_t pr_sid; /* session id */ uid_t pr_uid; /* real user id */ uid_t pr_euid; /* effective user id */ gid_t pr_gid; /* real group id */ gid_t pr_egid; /* effective group id */ uintptr_t pr_addr; /* address of process */ dev_t pr_ttydev; /* controlling tty device (or PRNODEV) */ timestruc_t pr_start; /* process start time, from the epoch */ char pr_fname[PRFNSZ]; /* name of execed file */ char pr_psargs[PRARGSZ]; /* initial characters of arg list */ int pr_argc; /* initial argument count */ uintptr_t pr_argv; /* address of initial argument vector */ uintptr_t pr_envp; /* address of initial environment vector */ char pr_dmodel; /* data model of the process */ taskid_t pr_taskid; /* task id */ projid_t pr_projid; /* project id */ poolid_t pr_poolid; /* pool id */ zoneid_t pr_zoneid; /* zone id */} psinfo_t;

Page 167: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

Finding System Problems With DTrace 4-7Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The lwpsinfo Data Structure

The following shows the lwpsinfo data structure:

typedef struct lwpsinfo { int pr_flag; /* lwp flags */ id_t pr_lwpid; /* lwp id */ uintptr_t pr_addr; /* internal address of lwp */ uintptr_t pr_wchan; /* wait addr for sleeping lwp */ char pr_stype; /* synchronization event type */ char pr_state; /* numeric lwp state */ char pr_sname; /* printable character for pr_state */ char pr_nice; /* nice for cpu usage */ short pr_syscall; /* system call number (if in syscall) */ int pr_pri; /* priority, high value is high priority */ char pr_clname[PRCLSZ]; /* scheduling class name */ processorid_t pr_onpro; /* processor which last ran this lwp */ processorid_t pr_bindpro; /* processor to which lwp is bound */ psetid_t pr_bindpset; /* processor set to which lwp is bound */} lwpsinfo_t;

Page 168: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-8 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The cpuinfo Data Structure

The following shows the cpuinfo data structure:

typedef struct cpuinfo { processorid_t cpu_id; /* CPU identifier */ psetid_t cpu_pset; /* processor set identifier */ chipid_t cpu_chip; /* chip identifier */ lgrp_id_t cpu_lgrp; /* locality group identifier */ processor_info_t cpu_info; /* CPU information */} cpuinfo_t;

The curthread Variable

Another built-in D variable that is set when a probe fires is the curthreadvariable, which you used in the ancestry.d D script in Module 2. Thecurthread variable points to the kthread_t kernel structure of thecurrently running thread. Using the curthread pointer to accessinformation in the kthread_t structure (or most other kernel datastructures) provides a less stable interface than using the lwpsinfo_t andpsinfo_t structures. The reason for this is that the psinfo_t andlwpsinfo_t structures are abstractions of process and thread informationas advertised by the proc (4) interface. In contrast, curthread gets at theactual kernel implementation of this information which may change. Formore details on the stability of DTrace interfaces, see the Solaris DynamicTracing Guide, part number 817-6223-10. The dtrace (1M) command has a-v option that will tell you the stability of a D program.

Example D Script Using Data Structures

The following D script uses the psinfo_t and lwpsinfo_t structures todisplay thread and process information for any thread that calls a specifickernel function:

# cat -n ps.d 1 #!/usr/sbin/dtrace -qs 2 3 BEGIN 4 { 5 printf("TID\tPID\tPPID\tUID\tPRI\tCOMMAND\n"); 6 } 7 8 fbt::$1:entry 9 /pid != $pid && pid != 0/ 10 { 11 ++nlines; 12 printf("%d\t%d\t%d\t%d\t%d\t%s\n", curlwpsinfo->pr_lwpid,

Page 169: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

Finding System Problems With DTrace 4-9Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

13 curpsinfo->pr_pid, curpsinfo->pr_ppid, curpsinfo->pr_uid, 14 curlwpsinfo->pr_pri, curpsinfo->pr_psargs); 15 } 16 17 fbt::$1:entry 18 /nlines > 20/ 19 { 20 printf("TID\tPID\tPPID\tUID\tPRI\tCOMMAND\n"); 21 nlines = 0; 22 }

# ./ps.d bdev_strategyTID PID PPID UID PRI COMMAND1 4640 4639 0 55 find / -type f1 4640 4639 0 55 find / -type f1 4698 4641 0 51 file/var/sadm/pkg/SUNWfontconfig-root/save/pspool/SUNWfontconfig-root/install/1 4640 4639 0 55 find / -type f1 4698 4641 0 51 file/var/sadm/pkg/SUNWfontconfig-root/save/pspool/SUNWfontconfig-root/install/^C# ps.d nanosleepTID PID PPID UID PRI COMMAND11 279 1 0 59 /usr/sbin/nscd12 279 1 0 59 /usr/sbin/nscd21 279 1 0 59 /usr/sbin/nscd18 279 1 0 59 /usr/sbin/nscd17 279 1 0 59 /usr/sbin/nscd16 279 1 0 59 /usr/sbin/nscd13 279 1 0 59 /usr/sbin/nscd1 2120 2119 0 59 sleep 512 279 1 0 59 /usr/sbin/nscd11 279 1 0 59 /usr/sbin/nscd13 279 1 0 59 /usr/sbin/nscd14 279 1 0 59 /usr/sbin/nscd15 279 1 0 59 /usr/sbin/nscd16 279 1 0 59 /usr/sbin/nscd17 279 1 0 59 /usr/sbin/nscd18 279 1 0 59 /usr/sbin/nscd21 279 1 0 59 /usr/sbin/nscd18 279 1 0 59 /usr/sbin/nscdTID PID PPID UID PRI COMMAND17 279 1 0 59 /usr/sbin/nscd16 279 1 0 59 /usr/sbin/nscd

Page 170: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-10 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

^C

The sched Provider

The sched DTrace provider enables probes related to thread scheduling.For example, the on-cpu probe fires when a CPU begins to execute athread, and the off-cpu probe fires when a thread is about to be taken offof a CPU.

Note – Refer to the Solaris Dynamic Tracing Guide for details on the probesprovided by the sched provider.

To list the sched probes, use the following command:

# dtrace -l -P sched | awk '{print $NF}' | sort -uNAMEchange-pridequeueenqueueoff-cpuon-cpupreemptremain-cpuschedctl-nopreemptschedctl-preemptschedctl-yieldsleepsurrendertickwakeup

The following D script uses the on-cpu sched probe to display the nameof the executable process starting to run on a CPU and the priority of itsthread:

# cat -n start2run.d 1 #!/usr/sbin/dtrace -qs 2 3 sched:::on-cpu 4 /pid != $pid && pid != 0/ 5 { 6 printf("Thread %d from: %s starting on CPU %d at priority %d\n", 7 curlwpsinfo->pr_lwpid, curpsinfo->pr_psargs, curcpu->cpu_id, 8 curlwpsinfo->pr_pri); 9 }

Page 171: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

Finding System Problems With DTrace 4-11Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# ./start2run.dThread 1 from: fsflush starting on CPU 0 at priority 60Thread 1 from: bash starting on CPU 0 at priority 59Thread 1 from: bash starting on CPU 2 at priority 49Thread 1 from: pgm starting on CPU 1 at priority 49Thread 1 from: pgm starting on CPU 1 at priority 29Thread 1 from: pgm starting on CPU 1 at priority 29Thread 1 from: pgm starting on CPU 1 at priority 19Thread 1 from: pgm starting on CPU 1 at priority 9Thread 1 from: pgm starting on CPU 1 at priority 9Thread 1 from: pgm starting on CPU 1 at priority 0Thread 6 from: /lib/svc/bin/svc.startd starting on CPU 0 at priority 59Thread 1 from: fsflush starting on CPU 0 at priority 60Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 1 at priority 59Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 1 at priority 59Thread 4 from: /usr/lib/picl/picld starting on CPU 2 at priority 59Thread 1 from: fsflush starting on CPU 0 at priority 60Thread 18 from: /usr/sbin/nscd starting on CPU 0 at priority 59Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 1 at priority 59Thread 4 from: /usr/lib/picl/picld starting on CPU 2 at priority 59Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 2 at priority 59Thread 2 from: /usr/lib/autofs/automountd starting on CPU 2 at priority 59Thread 1 from: fsflush starting on CPU 0 at priority 60Thread 18 from: /usr/sbin/nscd starting on CPU 0 at priority 59Thread 1 from: /usr/lib/sendmail -bd -q15m starting on CPU 0 at priority 59Thread 1 from: bash starting on CPU 0 at priority 59Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 0 at priority 59Thread 1 from: fsflush starting on CPU 0 at priority 60

The following D script uses the on-cpu sched probe with an aggregationto display a summary of who has recently been running on what CPU:

# cat -n whorun.d 1 #!/usr/sbin/dtrace -qs 2 3 sched:::on-cpu 4 /pid != $pid && pid != 0/ 5 { 6 @[curpsinfo->pr_psargs, curcpu->cpu_id] = count(); 7 } 8 9 END 10 { 11 printf("%-30s %4s %6s\n", "Command", "CPU", "Count"); 12 printa("%-30s %4d %@6d\n", @); 13 }

# ./whorun.d^CCommand CPU Count

Page 172: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-12 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

/usr/lib/fm/fmd/fmd 1 1uptime 2 1find / -name fubar 3 2/usr/lib/autofs/automountd 2 3-sh 2 3-sh 1 3/usr/lib/picl/picld 1 4/usr/lib/fm/fmd/fmd 3 6/usr/sbin/nscd 3 8/usr/lib/fm/fmd/fmd 2 11/usr/sbin/nscd 2 14/usr/sbin/nscd 0 15/usr/lib/sendmail -bd -q15m 0 16ls -lR / 1 18/usr/sfw/sbin/snmpd 0 18-sh 3 20/usr/lib/utmpd 0 20/usr/lib/sendmail -bd -q15m 2 20/usr/lib/picl/picld 0 32/usr/sfw/sbin/snmpd 3 44/usr/sfw/sbin/snmpd 2 55fsflush 0 72/usr/sbin/nscd 1 77find / -name fubar 1 152/usr/sbin/vold 2 152/usr/sfw/sbin/snmpd 1 237

Accessing Lock Contention Information

The lockstat provider makes available probes that give informationregarding locking behavior on the system. For example, when theadaptive-block probe fires, you know that a kernel thread had to waitfor an adaptive mutex, and the arg1 argument tells you how long it sleptwaiting for the lock’s release. This gives you a sense of how muchcontention there is for the data (or code) that the mutex is protecting.

Note – See the Solaris Dynamic Tracing Guide for details on other lockstatprovider probes.

Page 173: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

Finding System Problems With DTrace 4-13Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The lockstat Provider Probes

To list the lockstat provider probes, use the following command:

# dtrace -l -P lockstat ID PROVIDER MODULE FUNCTION NAME 467 lockstat genunix mutex_enter adaptive-acquire 468 lockstat genunix mutex_enter adaptive-block 469 lockstat genunix mutex_enter adaptive-spin 470 lockstat genunix mutex_exit adaptive-release 471 lockstat genunix mutex_destroy adaptive-release 472 lockstat genunix mutex_tryenter adaptive-acquire 473 lockstat genunix lock_set spin-acquire 474 lockstat genunix lock_set spin-spin 475 lockstat genunix lock_set_spl spin-acquire 476 lockstat genunix lock_set_spl spin-spin 477 lockstat genunix lock_try spin-acquire 478 lockstat genunix lock_clear spin-release 479 lockstat genunix lock_clear_splx spin-release 480 lockstat genunix CLOCK_UNLOCK spin-release 481 lockstat genunix rw_enter rw-acquire 482 lockstat genunix rw_enter rw-block 483 lockstat genunix rw_exit rw-release 484 lockstat genunix rw_tryenter rw-acquire 485 lockstat genunix rw_tryupgrade rw-upgrade 486 lockstat genunix rw_downgrade rw-downgrade 487 lockstat genunix thread_lock thread-spin 488 lockstat genunix thread_lock_high thread-spin

The following D script displays CPU, thread, process, wait time, and stacktrace information related to a thread blocking on an adaptive mutex:

# cat -n mutex.d 1 #!/usr/sbin/dtrace -qs 2 3 lockstat:::adaptive-block 4 { 5 printf("\nCPU\tTID\tPID\tUID\tWAIT TIME\tCOMMAND\n"); 6 printf("%d\t%d\t%d\t%d\t%d\t\t%s\n", curcpu->cpu_id, 7 curlwpsinfo->pr_lwpid, curpsinfo->pr_pid, 8 curpsinfo->pr_uid, arg1, curpsinfo->pr_psargs); 9 stack(); 10 }

Test the mutex.d D script by starting four instances of the readchar userapplication, which reads every file in the current directory one byte at atime using the read (2) system call:

# (cd /usr/lib; /var/dtrace/readchar)& (cd /usr/lib; /var/dtarce/readchar)&[1] 2323[2] 2325# (cd /usr/lib; /var/dtrace/readchar)& (cd /usr/lib; /var/dtrace/readchar)&

Page 174: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-14 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

[3] 2327[4] 2329# ./mutex.d^C

# mpstat 2CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2 0 0 409 307 45 8 0 0 0 65534 13 14 0 73 0 3 0 0 401 301 54 31 0 0 0 103605 21 79 0 0 0 0 0 0 406 305 50 30 0 0 0 100905 20 80 0 0 0 1 0 0 402 302 55 32 0 0 0 104497 21 79 0 0^C

Lock Contention on a Single Processor System

The mpstat (1M) command output indicates that you are on a singleprocessor system which is CPU-bound primarily in system mode. Thesystem call counts are high, which correlates with the high percentage ofsystem time. You expect such numbers from running four instances of thereadchar process. There is no mutex contention on a single processorsystem until you add more file system-intensive commands, as shown inthe following example:

# find / -name fubar & ls -lR / >/ll&[1] 2357[2] 2358# find / -name fubar & ls -lR / >/ll&[3] 2359[4] 2360# ./mutex.d

CPU TID PID UID WAIT TIME COMMAND0 0 0 0 56917 sched

genunix`clock+0x3f0 genunix`cyclic_softint+0xa4 unix`cbe_level10+0x8 unix intr_thread+0x144

CPU TID PID UID WAIT TIME COMMAND0 0 0 0 41076 sched

genunix`clock+0x3f0 genunix`cyclic_softint+0xa4 unix`cbe_level10+0x8 unix intr_thread+0x144

CPU TID PID UID WAIT TIME COMMAND0 0 0 0 50424 sched

sd`sdintr+0x14 glm`glm_doneq_empty+0x144

Page 175: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

Finding System Problems With DTrace 4-15Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

glm`glm_intr+0xf4 pcipsy`pci_intr_wrapper+0x9c unix intr_thread+0x144

CPU TID PID UID WAIT TIME COMMAND0 0 0 0 45321 sched

genunix`clock+0x3f0 genunix`cyclic_softint+0xa4 unix`cbe_level10+0x8 unix intr_thread+0x144

CPU TID PID UID WAIT TIME COMMAND0 0 0 0 41184 sched

genunix`clock+0x3f0 genunix`cyclic_softint+0xa4 unix`cbe_level10+0x8 unix intr_thread+0x144^C

CPU TID PID UID WAIT TIME COMMAND0 0 0 0 43214 sched

genunix`kmem_cache_free+0x4c uata`atapi_tran_destroy_pkt+0x58 scsi scsi_destroy_pkt+0x14 sd`sd_return_command+0x16c sd`sdintr+0x224 uata`ghd_doneq_process+0x64 unix intr_thread+0x144

Lock Contention on a Multiprocessor Server

The following output results from running four instances of the readcharprocess on a four-processor server. In this case you do not run the extrafind and ls -lR commands, as you did on the uniprocessor system.There is significantly more mutex contention, as indicated by the smtxcolumn (you should always ignore the first set of numbers output by thempstat (1M) command). There is also significantly more frequent outputfrom the mutex.d D script:

# mpstat 2CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 1 0 3 4 1 65 0 1 8 0 27 0 1 0 99 1 1 0 3 7 4 30 0 1 8 0 29 0 0 0 100 2 1 0 3 4 1 28 0 1 8 0 28 0 0 0 100 3 1 0 3 214 111 15 0 0 9 0 28 0 0 0 100CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2 0 5 21 1 56 17 8 74478 0 225870 14 81 0 4 1 0 0 5 29 4 67 22 8 76857 0 228291 11 83 0 6 2 0 0 1 53 1 150 49 5 83973 0 224372 16 74 0 9

Page 176: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-16 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

3 3 0 90 216 113 12 12 1 86392 0 227446 13 87 0 0CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 4 24 1 64 19 8 108269 0 189929 12 86 0 2 1 0 0 2 39 2 99 34 9 107282 0 189818 13 82 0 4 2 0 0 1 43 1 104 39 5 120189 0 173753 11 79 0 10 3 0 0 95 216 112 7 10 0 96010 0 229465 17 83 0 0^C

# ./mutex.d

CPU TID PID UID WAIT TIME COMMAND0 1 12523 0 23500 /var/dtrace/readchar

ufs rdip+0x150 ufs`ufs_read+0x208 genunix read+0x274 genunix read32+0x1c unix`syscall_trap32+0xa8

CPU TID PID UID WAIT TIME COMMAND1 1 12527 0 22200 /var/dtrace/readchar

ufs rdip+0x488 ufs`ufs_read+0x208 genunix read+0x274 genunix read32+0x1c unix`syscall_trap32+0xa8

CPU TID PID UID WAIT TIME COMMAND1 1 12527 0 20700 /var/dtrace/readchar

ufs rdip+0x150 ufs`ufs_read+0x208 genunix read+0x274 genunix read32+0x1c unix`syscall_trap32+0xa8

CPU TID PID UID WAIT TIME COMMAND2 1 12528 0 24400 /var/dtrace/readchar

ufs rdip+0x488 ufs`ufs_read+0x208 genunix read+0x274 genunix read32+0x1c unix`syscall_trap32+0xa8

CPU TID PID UID WAIT TIME COMMAND2 1 12552 0 28900 /var/dtrace/readchar

ufs`ufs_lockfs_end+0x70 ufs`ufs_read+0x25c genunix read+0x274 genunix read32+0x1c unix`syscall_trap32+0xa8

Page 177: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

Finding System Problems With DTrace 4-17Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

CPU TID PID UID WAIT TIME COMMAND3 1 12556 0 24800 /var/dtrace/readchar

ufs rdip+0x488 ufs`ufs_read+0x208 genunix read+0x274 genunix read32+0x1c unix`syscall_trap32+0xa8

The previous output shows that the mutex contention is in the UNIX® FileSystem (UFS) code. The sleep times are only between 21–29 microseconds.

Page 178: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Accessing Kernel Variables

4-18 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The proc Provider and the system () Function

The proc provider makes available probes related to process creation andtermination as well as signal delivery. The signal-send probe fires whena signal is being sent to a process or thread. The args[2] argument is setto the signal number which can be compared with the symbolic namessuch as SIGINT used in the signal (3head) manual page. The args[1]argument is set to point to the psinfo_t structure of the recievingprocess.

The system () built-in function allows you to run shell commands anytimea probe fires. This general capability provides great power in that anyprobe event can trigger the execution of any command. You can useformat specifications similar to the printf () built-in function toparameterize the shell command you wish to invoke. The system ()function requires destructive actions to be enabled with either the -woption to the dtrace (1M) command or with the #pragma statement usedinside the script with the destructive option.

The following script uses the signal-send probe as well as the built-insystem () function to display what user account is sending the SIGKILLsignal and to which process:

# cat -n whosend.d 1 #!/usr/sbin/dtrace -s 2 3 #pragma D option destructive 4 #pragma D option quiet 5 6 proc:::signal-send 7 /args[2] == SIGKILL/ 8 { 9 printf("SIGKILL was sent to %s by ", args[1]->pr_fname); 10 system("getent passwd %d | cut -d: -f5", uid); 11 }

# ./whosend.dSIGKILL was sent to vi by Super-UserSIGKILL was sent to bash by Mary Smith

Page 179: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Displaying Read Call Information

Finding System Problems With DTrace 4-19Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Displaying Read Call Information

DTrace provides several ways to display read information:

● You can trace system-wide activity or application-specific activity.

● You can show information about each individual read call orsummarize the data with an aggregation function.

● You can monitor read activity at the driver level with the io provideror at the application level with the pid provider, the syscallprovider, or the sysinfo provider.

This section demonstrates some of these methods.

Tracing Read Calls System-Wide

The first example traces, system-wide, each individual read (2) andpread (2) system call. There is a difference between the read size requestedin the read (2) and pread (2) system calls and the number of bytes actuallyread, which is given in the return value from these system calls. A 0return value indicates an end-of-file condition; a return of -1 indicatesthat the read (2) system call failed.

# cat -n reads.d 1 #!/usr/sbin/dtrace -qs 2 3 BEGIN 4 { 5 printf("FD\tREQUEST\tACTUAL\tCOMMAND\n"); 6 } 7 8 syscall::read:entry, syscall::pread*:entry 9 /execname != "dtrace"/ 10 { 11 self->started = 1; 12 self->arg0 = arg0; 13 self->arg2 = arg2; 14 } 15 16 syscall::read:return, syscall::pread*:return 17 /self->started/ 18 { 19 printf("%d\t%d\t%d\t%s\n", self->arg0, self->arg2, arg0,execname);

Page 180: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Displaying Read Call Information

4-20 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

20 self->started = 0; 21 ++nlines; 22 } 23 24 syscall::read:return, syscall::pread*:return 25 /nlines > 20/ 26 { 27 printf("FD\tREQUEST\tACTUAL\tCOMMAND\n"); 28 nlines = 0; 29 }

# ./reads.dFD REQUEST ACTUAL COMMAND0 1 1 bash0 1 1 bash0 1 1 bash0 1 1 bash3 877 877 date0 1 1 bash0 1 1 bash...0 1 1 bash3 152 152 uptime4 8192 4092 uptime4 8192 0 uptime3 877 877 uptime0 1 1 bash0 1 1 bash...0 1 1 bash0 1 1 bash0 1 1 bash3 8192 8192 grep3 8192 200 grep3 8192 0 grep1 8192 1006 init1 8192 0 init1 8192 1006 init1 8192 0 init...FD REQUEST ACTUAL COMMAND5 1024 61 nscd5 8192 4464 utmpd6 336 336 utmpd6 336 336 utmpd6 336 336 utmpd

Page 181: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Displaying Read Call Information

Finding System Problems With DTrace 4-21Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

5 8192 0 utmpd1 24 -1 sac2 8 8 ttymon2 8 -1 ttymon1 24 24 sac...FD REQUEST ACTUAL COMMAND0 1 1 bash0 1 1 bash0 128 4 sh0 128 3 sh0 128 4 sh...4 416 416 ps4 416 416 ps11 336 336 svc.startd4 416 416 ps4 416 416 ps4 416 416 ps4 416 416 ps4 416 416 ps4 416 416 ps^C

Using the previous output (and help from the truss (1) command), youcan determine the following:

● The date (1) command reads a time zone (US/Mountain)configuration file of size 877 bytes when it starts.

● The ps(1) command reads the psinfo_t structure of size 416 bytesmany times.

● The init (1M) command re-reads the /etc/inittab file periodically.

● The grep (1) command reads its file one page (8192 bytes) at a time.

● The sh(1) command reads a whole line from standard input into a128-byte buffer.

● The bash (1) command reads standard input one byte at a time(probably to implement command line editing).

● The uptime (1) command reads the same time zone configuration fileas the date (1) command.

● The sac (1M) and ttymon (1M) commands issued reads that failed.

Page 182: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Displaying Read Call Information

4-22 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Tracing Read Calls Using the iosnoop.d D Script

The following output results from running the iosnoop.d D script at thesame time as the previous reads.d D script. It shows that only thegrep (1) command performed actual disk reads. The other reads found thedata cached in memory.

# . /iosnoop.d COMMAND PID FILE DEVICE RW MS sched 0 <none> sd2 W 3.733 sched 0 <none> sd2 W 4.796 sched 0 <none> sd2 W 4.003 sched 0 <none> sd2 W 10.259 sched 0 <none> sd2 W 12.698 sched 0 <none> sd2 W 15.843 sched 0 <none> sd2 W 21.331 sched 0 <none> sd2 W 28.134 sched 0 <none> sd2 W 33.668 sched 0 <none> sd2 W 39.575 sched 0 <none> sd2 W 4.004 grep 2691 /usr/include/sys/zone.h sd2 R 4.817 fsflush 3 <none> sd2 W 13.120^C

Aggregating Read Data

The following D script uses the avg () aggregation function to display theaverage number of bytes read by file descriptor and process name:

# cat -n readsummary.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall::read:entry, syscall::pread*:entry 4 { 5 self->started = 1; 6 self->fd = arg0; 7 } 8 9 syscall::read:return, syscall::pread*:return 10 /self->started && execname != "dtrace" && arg0 != -1/ 11 { 12 @[self->fd, execname] = avg(arg0); 13 self->started = 0; 14 } 15 16 END 17 {

Page 183: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Displaying Read Call Information

Finding System Problems With DTrace 4-23Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

18 printa("%d\t%-24s\t%@d\n", @); 19 }

# ./readsummary.d^C4 instant 02 more 10 vi 14 readchar 10 bash 12 ttymon 84 rup 231 sac 245 nscd 5919 sgml2roff 1193 rup 4133 rpc.rstatd 4134 ps 4163 uptime 5141 init 5403 man 5503 ps 6875 rup 7874 vi 8033 grep 8453 date 8773 vi 14924 nroff 22214 uptime 22320 nroff 34790 tbl 38613 cat 38616 nsgmls 38940 eqn 39140 col 39790 instant 40723 instant 44423 nsgmls 44596 rpc.rstatd 44643 more 48425 man 58024 nsgmls 66060 grep 6815

Page 184: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Displaying Read Call Information

4-24 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

By changing the aggregation function from avg () to sum(), you can obtainthe total number of bytes read by file descriptor and process name:

# ./totalread.d^C4 instant 00 vi 62 more 85 nscd 610 bash 12111 svc.startd 33610 svc.startd 3363 man 5506 readchar 6713 date 8774 ls 8773 vi 298419 sgml2roff 32141 init 432423 readchar 457219 readchar 102766 nsgmls 116845 man 174084 nroff 177713 more 178767 readchar 1806420 readchar 181163 cat 184350 tbl 184354 vi 1888010 readchar 1950014 readchar 195000 eqn 203560 nroff 203560 col 224968 readchar 282520 grep 300953 nsgmls 337993 instant 5331411 readchar 566160 instant 1603604 nsgmls 17176321 readchar 192636

Page 185: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Anonymous Tracing Facility

Finding System Problems With DTrace 4-25Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Using the Anonymous Tracing Facility

Probes are usually enabled through a DTrace consumer process such asdtrace (1M). A DTrace consumer process cannot run, however, until youboot the system. Anonymous tracing allows you to enable tracing duringboot.

Anonymous tracing is not associated with any DTrace consumer. Anytracing that you can do interactively with the dtrace (1M) process you canalso do anonymously. Only the super-user can create an anonymousenabling, and there can only be one anonymous enabling at any time.

Most DTrace users do not need this feature, but because boot problemsare particularly difficult to debug, anonymous tracing can prove valuablefor kernel and device driver developers.

Creating an Anonymous Enabling

To create an anonymous enabling, use the -A option to a dtrace (1M)invocation that specifies the desired probes, predicates, actions, andoptions. The dtrace (1M) process modifies your /etc/system file to forcethe loading of the kernel modules that implement the needed DTraceproviders. The dtrace process then adds a series of driver propertiesrepresenting your request to the dtrace (7D) driver’s configuration file:/kernel/drv/dtrace.conf . These properties are read by the dtrace (7D)driver when it is loaded. The driver then enables the specified probeswith the specified actions, creating an anonymous state to associate withthe new enabling.

Reboot your system. While the system is booting, messages appear on theconsole describing the anonymous enabling.

After the machine boots, claim the anonymous state by specifying the -aoption to the dtrace (1M) command. By default the -a option claims theanonymous state, processes the existing data, and continues to run. Toprocess the anonymous state data and exit, add the -e option to thedtrace (1M) command.

Performing Anonymous Tracing

The following dtrace (1M) command performs anonymous tracing on theconskbd module, the console keyboard multiplexer driver:

Page 186: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Anonymous Tracing Facility

4-26 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

# dtrace -A -m conskbddtrace: cleaned up old anonymous enabling in /kernel/drv/dtrace.confdtrace: cleaned up forceload directives in /etc/systemdtrace: saved anonymous enabling in /kernel/drv/dtrace.confdtrace: added forceload directives to /etc/systemdtrace: run update_drv(1M) or reboot to enable changes# tail /etc/system* chapter of the Solaris Dynamic Tracing Guide for details.*forceload: drv/systraceforceload: drv/sdtforceload: drv/profileforceload: drv/lockstatforceload: drv/fbtforceload: drv/fasttrapforceload: drv/dtrace* ^^ Added by DTrace# reboot...# grep enabling /var/adm/messagesFeb 27 07:34:22 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 0 (:kmdb::)Feb 27 07:34:22 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 1 (dtrace:::ERROR)Feb 27 07:45:27 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 0 (:conskbd::)Feb 27 07:45:27 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 1 (dtrace:::ERROR)# dtrace -aeCPU ID FUNCTION:NAME 0 25339 conskbd_attach:entry 0 25340 conskbd_attach:return 0 25327 conskbdopen:entry 0 25328 conskbdopen:return 0 25331 conskbduwput:entry 0 25332 conskbduwput:return 0 25345 conskbdioctl:entry 0 25346 conskbdioctl:return 0 25327 conskbdopen:entry 0 25328 conskbdopen:return 0 25331 conskbduwput:entry 0 25332 conskbduwput:return 0 25345 conskbdioctl:entry 0 25346 conskbdioctl:return 0 25329 conskbdclose:entry 0 25330 conskbdclose:return

Page 187: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Anonymous Tracing Facility

Finding System Problems With DTrace 4-27Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 25327 conskbdopen:entry 0 25328 conskbdopen:return 0 25329 conskbdclose:entry 0 25330 conskbdclose:return

The forceload entries in the /etc/system are not automaticallyremoved after the reboot. Run the dtrace (1M) command with just the -Aoption to clean up these forceload entries:

# tail -18 /etc/system

* vvvv Added by DTrace** The following forceload directives were added by dtrace(1M) to allowfor* tracing during boot. If these directives are removed, the system will* continue to function, but tracing will not occur during boot asdesired.* To remove these directives (and this block comment) automatically, run* "dtrace -A" without additional arguments. See the "Anonymous Tracing"* chapter of the Solaris Dynamic Tracing Guide for details.*forceload: drv/systraceforceload: drv/sdtforceload: drv/profileforceload: drv/lockstatforceload: drv/fbtforceload: drv/fasttrapforceload: drv/dtrace* ^^ Added by DTrace# dtrace -Adtrace: cleaned up old anonymous enabling in /kernel/drv/dtrace.confdtrace: cleaned up forceload directives in /etc/system# tail /etc/system** To set variables in 'unix':** set nautopush=32* set maxusers=40** To set a variable named 'debug' in the module named 'test_module'** set test_module:debug = 0x13

Page 188: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Anonymous Tracing Facility

4-28 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The next example focuses only on those functions called from theconskbd_attach () function in the conskbd module:

# cat -n cons.d 1 #!/usr/sbin/dtrace -s 2 3 fbt::conskbd_attach:entry 4 { 5 self->trace = 1; 6 } 7 8 fbt::: 9 /self->trace/ 10 { 11 } 12 13 fbt::conskbd_attach:return 14 { 15 self->trace = 0; 16 }

# dtrace -AFs cons.ddtrace: saved anonymous enabling in /kernel/drv/dtrace.confdtrace: added forceload directives to /etc/systemdtrace: run update_drv(1M) or reboot to enable changes# reboot...# grep enabling /var/adm/messagesFeb 27 07:45:27 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 0 (:conskbd::)Feb 27 07:45:27 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 1 (dtrace:::ERROR)Feb 27 08:07:05 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 0 (fbt::conskbd_attach:entry)Feb 27 08:07:05 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 1 (fbt:::)Feb 27 08:07:05 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 2 (fbt::conskbd_attach:return)Feb 27 08:07:05 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enablingprobe 3 (dtrace:::ERROR)# dtrace -aeCPU FUNCTION 0 -> conskbd_attach 0 -> ddi_create_minor_node 0 -> ddi_create_minor_common 0 -> ddi_driver_major

Page 189: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Anonymous Tracing Facility

Finding System Problems With DTrace 4-29Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 <- ddi_driver_major 0 -> strcmp 0 <- strcmp 0 -> derive_devi_class 0 -> i_ddi_devi_class 0 <- i_ddi_devi_class 0 -> strncmp 0 <- strncmp... 0 <- kstat_compare_bykid 0 -> kstat_zone_compare 0 <- kstat_zone_compare 0 <- avl_find 0 <- kstat_hold 0 <- kstat_hold_bykid 0 <- kstat_install 0 -> kstat_rele 0 -> cv_broadcast 0 <- cv_broadcast 0 <- kstat_rele 0 <- conskbd_attach

Page 190: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Speculative Tracing Facility

4-30 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Using the Speculative Tracing Facility

Because of the comprehensive tracing coverage that DTrace provides, thechallenge for the user can be deciding what not to trace. The primarymechanism for filtering out uninteresting events is the predicatemechanism. Predicates are useful when you know at the time a probe fireswhether the probe event is interesting. For example, you might want toknow when the read (2) system call is entered only if a particular processissued the call. There are situations, however, in which you can determinethat a given probe event is interesting only some time after the probe hasfired.

For example, if a read (2) system call is failing sporadically with an EIOerrno code value, you might want to see the total code path leading tothe error (not just the current stack trace.) Tracing every code path ispossible with the fbt provider, but doing this while waiting for the failureto reappear results in too much recorded data. This causes one of twoproblems:

● Unwanted data that must be filtered afterwards

● Data loss caused by running out of buffer space in DTrace

To address this problem, DTrace provides a facility called speculativetracing. Speculative tracing allows you to tentatively trace data. Later, youcan decide that the traced data is interesting and commit it, or you candecide that the traced data is uninteresting and discard it.

Page 191: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Speculative Tracing Facility

Finding System Problems With DTrace 4-31Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Speculative Tracing Functions

The D functions shown in Table 4-1 compose the DTrace speculativetracing facility:

The speculation () function allocates a speculative buffer and returns aspeculation identifier (ID). You use this ID in subsequent calls to thespeculate () function. You must place the speculate () call before anydata recording action statement in the same clause. All such datarecording action statements are then speculatively traced. Probe clausescan contain speculative tracing or regular tracing, but not both.Aggregating actions, destructive actions, and exit actions can never bespeculative.

By default (without tuning), there is only one speculative buffer. Thereforeyou must be careful not to start a new speculation before committing ordiscarding an existing one. You use the commit () function to commit aspeculation. When you commit a speculative buffer, its data is copied intothe one (per CPU) principal buffer of DTrace. You cannot have any datarecording actions in a clause containing a commit () function. You use thediscard () function to discard a speculation. When a speculative buffer isdiscarded, its contents are thrown away.

Table 4-1 DTrace Speculative Tracing Functions

Function Name Args Description

speculation None Returns an identifier (ID) for anew speculative buffer

speculate ID Denotes that the remainder ofthe probe clause should betraced to the speculative bufferspecified by the ID

commit ID Commits the speculative bufferassociated with the ID

discard ID Discards the speculative bufferassociated with the ID

Page 192: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Speculative Tracing Facility

4-32 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Speculative Tracing Example

You can use speculations to highlight a particular code path. Thefollowing example displays the entire code path for the open (2) systemcall only when it fails. The script explicitly ignores failed opens of the/var/ld/ld.config file, which are common on this system:

# cat -n spec.d 1 #!/usr/sbin/dtrace -s 2 3 #pragma D option flowindent 4 5 syscall::open*:entry 6 /stringof(copyinstr(arg0)) != "/var/ld/ld.config"/ 7 { 8 self->spec = speculation(); 9 speculate(self->spec); 10 11 /* The following will only appear if later committed */ 12 printf("%s was opening: %s\n", execname, copyinstr(arg0)); 13 } 14 15 fbt::: 16 /self->spec/ 17 { 18 speculate(self->spec); /* default action */ 19 } 20 21 syscall::open*:return 22 /self->spec && arg0 == -1/ 23 { 24 printf("Open failed with errno: %d\n", errno); 25 } 26 27 syscall::open*:return 28 /self->spec && arg0 == -1/ 29 { 30 /* 31 * Move data recorded in speculative buffer 32 * to principal buffer, freeing speculative buffer 33 * for a new specualtion() 34 */ 35 commit(self->spec); 36 self->spec = 0; 37 } 38

Page 193: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Speculative Tracing Facility

Finding System Problems With DTrace 4-33Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

39 syscall::open*:return 40 /self->spec && arg0 != -1/ 41 { 42 /* Throw away data recorded in speculative buffer */ 43 discard(self->spec); 44 self->spec = 0; 45 }# ./spec.ddtrace: script './spec.d' matched 40768 probesCPU FUNCTION 0 <= open64 Open failed with errno: 2

0 => open64 grep was opening:/etc/sytem

0 -> open64 0 <- open64 0 -> copen 0 -> falloc 0 -> ufalloc 0 <- ufalloc 0 -> ufalloc_file 0 -> fd_find... 0 <- cv_broadcast 0 <- setf 0 -> unfalloc 0 -> crfree 0 <- crfree 0 <- unfalloc 0 -> kmem_cache_free 0 <- kmem_cache_free 0 -> set_errno 0 <- set_errno 0 <- copen^C

It appears that the spec.d D script never starts a new open speculationuntil the current open returns and the current speculation is eithercommitted or discarded. This is not the case, however, if an open blocksand does not return before another open is started. You learn in a labexercise how to tune the number of speculative buffers.

Page 194: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Speculative Tracing Facility

4-34 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Application Debugging With Speculative Tracing

The next example shows how to use speculative tracing for applicationdebugging. Infrequent errors can be difficult to debug because they can bedifficult to reproduce. Often you can identify a problem after a failureoccurs, but at that point it is too late to reconstruct the code path that ledto the failure condition. You can use the pid provider with speculativetracing to solve this common problem. The following script shows how totrace every instruction in a function only when it fails.

# cat -n appspec.d 1 #!/usr/sbin/dtrace -s 2 3 pid$target::malloc:entry 4 { 5 self->spec = speculation(); 6 speculate(self->spec); 7 printf("( %d )", arg0); 8 } 9 10 pid$target::malloc: /* trace all instructions */ 11 /self->spec/ 12 { 13 speculate(self->spec); 14 } 15 16 pid$target::malloc:return 17 /self->spec && arg1 == 0/ 18 { 19 commit(self->spec); 20 self->spec = 0; 21 } 22 23 pid$target::malloc:return 24 /self->spec && arg1 != 0/ 25 { 26 discard(self->spec); 27 self->spec = 0; 28 }# dtrace -s appspec.d -c myappdtrace: script 'appspec.d' matched 106 probes...CPU ID FUNCTION:NAME 0 42239 malloc:entry ( 1000000000 ) 0 42239 malloc:entry 0 42311 malloc:4 0 42312 malloc:8 0 42313 malloc:c 0 42314 malloc:10 0 42315 malloc:14 0 42316 malloc:18 0 42317 malloc:1c

Page 195: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Speculative Tracing Facility

Finding System Problems With DTrace 4-35Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

0 42318 malloc:20 0 42319 malloc:24 0 42320 malloc:28 0 42321 malloc:2c 0 42327 malloc:44 0 42328 malloc:48 0 42329 malloc:4c 0 42330 malloc:50 0 42331 malloc:54 0 42332 malloc:58 0 42333 malloc:5c 0 42334 malloc:60 0 42335 malloc:64 0 42336 malloc:68 0 42337 malloc:6c 0 42309 malloc:return...# mdb myapp> _start:b> :rmdb: stop at _startmdb: target stopped at:_start: clr %fp> malloc::nmValue Size Type Bind Other Shndx Name0xff2d1cf0|0x00000070|FUNC |GLOB |0x0 |9 |libc.so.1`malloc> 70%4=x 1c> malloc,1c/ailibc.so.1`malloc:libc.so.1`malloc: save %sp, -0x60, %splibc.so.1`malloc+4: mov %o7, %i3libc.so.1`malloc+8: call +8 <libc.so.1`malloc+0x10>libc.so.1`malloc+0xc: sethi %hi(0x92400), %i2libc.so.1`malloc+0x10: add %i2, 0x180, %i2libc.so.1`malloc+0x14: add %i2, %o7, %i4libc.so.1`malloc+0x18: mov %i3, %o7libc.so.1`malloc+0x1c: ld [%i4 + 0xec8], %i5libc.so.1`malloc+0x20: ld [%i5], %i1libc.so.1`malloc+0x24: cmp %i1, 0libc.so.1`malloc+0x28: bne +0x1c <libc.so.1`malloc+0x44>libc.so.1`malloc+0x2c: noplibc.so.1`malloc+0x30: call +0x93624 <PLT:___errno>libc.so.1`malloc+0x34: mov 0x30, %l7libc.so.1`malloc+0x38: st %l7, [%o0]libc.so.1`malloc+0x3c: retlibc.so.1`malloc+0x40: restore %g0, 0, %o0libc.so.1`malloc+0x44: call +0x657d4<libc.so.1`assert_no_libc_locks_held>libc.so.1`malloc+0x48: noplibc.so.1`malloc+0x4c: call +0x6437c <libc.so.1 lmutex_lock>libc.so.1`malloc+0x50: ld [%i4 + 0xec0], %o0libc.so.1`malloc+0x54: call +0x1c <libc.so.1`_malloc_unlocked>libc.so.1`malloc+0x58: mov %i0, %o0

Page 196: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using the Speculative Tracing Facility

4-36 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

libc.so.1`malloc+0x5c: mov %o0, %i0libc.so.1`malloc+0x60: call +0x6446c <libc.so.1 lmutex_unlock>libc.so.1`malloc+0x64: ld [%i4 + 0xec0], %o0libc.so.1`malloc+0x68: retlibc.so.1`malloc+0x6c: restore>

Page 197: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

Finding System Problems With DTrace 4-37Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

DTrace Privileges

By default, only the super-user can use DTrace. This is because DTraceenables visibility into all aspects of the system, including:

● User-level functions

● System calls

● Kernel functions

● Kernel data

In addition, some DTrace actions can modify a program’s state bystopping a process or even inducing a breakpoint in the kernel. Just as it isinappropriate to allow one user to stop another user’s process or accessanother user’s files, so it is inappropriate to grant a user full access to allof the DTrace facilities. The traditional UNIX “all or none” approach touser privileges is not suitable for managing the use of the DTracecapabilities.

Using the Least Privilege Facility

The Least Privilege facility in the Solaris operating system enables aSolaris system administrator to grant particular users or processes specificprivileges that permit access to individual DTrace capabilities.

Three specific privileges control access by a user or process to the DTracefeatures:

● The dtrace_proc privilege – Permits use of only the pid andplockstat providers for process-level tracing of processes owned bythe user.

● The dtrace_user privilege – Permits use of only the profile andsyscall providers on processes owned by the user.

● The dtrace_kernel privilege – Permits the use of every providerexcept the pid and plockstat providers, unless dtrace_procprivilege is also granted. Does not allow kernel-destructive actions.

In addition to the above DTrace specific privileges, if a user has bothdtrace_proc and proc_owner privileges then he is allowed to trace otherusers processes.

Page 198: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

4-38 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Kernel-Destructive Actions

Only the super-user can perform kernel-destructive actions. You enablesuch actions by running the dtrace (1M) command with the -w option.Three built-in DTrace functions cause kernel-destructive actions:

● The breakpoint () function – Action that induces a kernelbreakpoint, causing the system to stop, with control passing toOpenBoot™ PROM or kmdb(1), depending on how the systemwas booted.

● The panic () function – Action that induces a kernel panic withcrash files normally being created for postmortem analysis.

● The chill () function – Action that causes DTrace to spin for thespecified number of nanoseconds. Intended for dealing withrace condition situations.

Setting DTrace User Privileges

The Solaris Least Privilege facility enables system administrators to grantspecific privileges to specific Solaris users. To give a user a privilege atlogin, insert a line into the /etc/user_attr file, as follows:

username ::::defaultpriv=basic, privilege,...

The following examples show the effect of setting the three DTracespecific privileges.

No Specified DTrace Privileges

The following example shows a user with no DTrace privileges specified:

$ cat /etc/user_attr## Copyright (c) 2003 by Sun Microsystems, Inc. All rights reserved.## /etc/user_attr## user attributes. see user_attr(4)##pragma ident "@(#)user_attr 1.1 03/07/09 SMI"#adm::::profiles=Log Managementlp::::profiles=Printer Managementroot::::auths=solaris.*,solaris.grant;profiles=Web ConsoleManagement,All;lock_after_retries=no

Page 199: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

Finding System Problems With DTrace 4-39Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

user2::::defaultpriv=basic,dtrace_procuser3::::defaultpriv=basic,dtrace_useruser4::::defaultpriv=basic,dtrace_kerneluser5::::defaultpriv=basic,dtrace_kernel,dtrace_procuser6::::defaultpriv=basic,dtrace_proc,proc_owner$ iduid=1001(user1) gid=101(users)$ /usr/sbin/dtrace -ldtrace: failed to initialize dtrace: DTrace requires additional privileges$ echo $$919$ /usr/sbin/dtrace -n pid919:::dtrace: failed to initialize dtrace: DTrace requires additional privileges$

The dtrace_proc Privilege

This example shows the DTrace features available to a user with thedtrace_proc privilege:

$ iduid=1002(user2) gid=101(users)$ dtrace -l ID PROVIDER MODULE FUNCTION NAME 1 dtrace BEGIN 2 dtrace END 3 dtrace ERROR$ echo $$9447$ dtrace -n pid9447:::entrydtrace: description 'pid9447:::entry' matched 3179 probes^C

$ dtrace -qn 'pid$target:libc:memcpy:entry {printf("size: %d\n",arg2)}' -c dateSun Feb 27 10:02:01 MST 2005size: 16size: 15size: 1size: 15size: 5size: 521size: 44size: 28size: 28size: 48size: 48size: 308size: 56size: 36size: 29

$

Page 200: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

4-40 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

$ ps -ef | grep vi user2 1534 1528 0 09:48:20 pts/1 0:00 grep vi user5 1531 1452 0 09:47:55 pts/2 0:00 vi resume$ dtrace -n pid1531:::dtrace: invalid probe specifier pid1531:::: failed to grab pid 1531: permission denied$ dtrace -n syscall::read:dtrace: invalid probe specifier syscall::read:: probe description syscall::read: doesnot match any probes$

The dtrace_proc and proc_owner Privileges

$ iduid=1006(user6) gid=101(users)$ grep user6 /etc/user_attruser6::::defaultpriv=basic,dtrace_proc,proc_owner$ ps -ef | grep vi user6 650 637 0 09:41:30 pts/1 0:00 grep vi user5 649 630 0 09:41:16 pts/2 0:00 vi resume$ /usr/sbin/dtrace -n pid649:::entrydtrace: description 'pid649:::entry' matched 3951 probesCPU ID FUNCTION:NAME 0 42548 peekkey:entry 0 42544 getkey:entry 0 42546 getbr:entry 0 42548 peekkey:entry 0 42544 getkey:entry 0 42546 getbr:entry...

The dtrace_user Privilege

This example shows the DTrace features available to a user with thedtrace_user privilege:

$ iduid=1003(user3) gid=101(users)$ grep user3 /etc/user_attruser3::::defaultpriv=basic,dtrace_user$ echo $$1171$ dtrace -n pid1171:::entrydtrace: invalid probe specifier pid1171::: probe description pid1171::: does not matchany probes$ pgmf: 13 p: 0 q: -1952257862 m: -10f: 640001883 p: -2056615 q: -929109794 m: -7f: -1660723204 p: -1529159 q: 94444073 m: 25f: 2041630813 p: 749994 q: -42775360 m: -23

Page 201: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

Finding System Problems With DTrace 4-41Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

f: -1255556994 p: 1065403 q: 309691762 m: 14^C$ dtrace -qn 'syscall::write:entry /arg0 == 1/ {printf("T: %d\n",timestamp)}' -c pgmf: 13 p: 0 q: -1952257862 m: -10f: 640001883 p: -2056615 q: -929109794 m: -7f: -1660723204 p: -1529159 q: 94444073 m: 25f: 2041630813 p: 749994 q: -42775360 m: -23f: -1255556994 p: 1065403 q: 309691762 m: 14f: -1207459745 p: 1769677 q: -8640714 m: -35T: 150116053418082T: 150116222152140T: 150116388881669T: 150116558431666T: 150116728255203...$ dtrace -n 'pid$target:::entry' -c pgmdtrace: invalid probe specifier pid$target:::entry: probe description pid1208:::entrydoes not match any probes

$ dtrace -qn 'profile-109 {@[arg1] = count()}' -c pgmf: 13 p: 0 q: -1952257862 m: -10f: 640001883 p: -2056615 q: -929109794 m: -7f: -1660723204 p: -1529159 q: 94444073 m: 25...^C 133476 49 4280947012 226 4280947008 1094$ mdb pgm> _start:b> :rmdb: stop at pgm`_startmdb: target stopped at:mypgm`_start: clr %fp> 0t4280947008/ailibc.so.1 .umul:libc.so.1 .umul:umul %o0, %o1, %o0> $q$ (sleep 33; pwd)&1680$ dtrace -n 'syscall:::entry /pid != $pid/ {}'dtrace: description 'syscall:::entry ' matched 225 probes/export/home/user3CPU ID FUNCTION:NAME 0 18832 rexit:entry 0 18922 ioctl:entry 0 18908 setpgrp:entry 0 18922 ioctl:entry 0 19004 waitsys:entry 0 19214 getcwd:entry 0 18838 write:entry 0 18832 rexit:entry^C$

Page 202: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

4-42 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The dtrace_user privilege only allows the use of the syscall andprofile providers on processes owned by the user. Even though thereare many system calls occuring in the system, the above output showsonly the sh , sleep , and pwd command’s system calls.

The dtrace_kernel Privilege

This example shows the DTrace features available to a user with thedtrace_kernel privilege:

$ iduid=1004(user4) gid=101(users)$ grep user4 /etc/user_attruser4::::defaultpriv=basic,dtrace_kernel$ dtrace -qn 'sched:::on-cpu {printf("Starting to run: %s\n", execname)}'Starting to run: schedStarting to run: schedStarting to run: fsflushStarting to run: svc.configdStarting to run: inetdStarting to run: svc.startdStarting to run: fmdStarting to run: dtraceStarting to run: schedStarting to run: sched^C$ dtrace -qn 'io:::start {printf("Starting an I/O: %s\n", execname)}'Starting an I/O: bashStarting an I/O: bashStarting an I/O: bashStarting an I/O: fsflushStarting an I/O: findStarting an I/O: findStarting an I/O: findStarting an I/O: find^C$ echo $$6711$ dtrace -n pid6711:a.out::entrydtrace: invalid probe specifier pid6711:a.out::entry: probe descriptionpid6711:bash::entry does not match any probes

The preceding example demonstrates that you must have thedtrace_proc privilege to trace your own processes. The dtrace_kernelprivilege by itself is not sufficient.

$ iduid=1005(user5) gid=101(users)$ grep user5 /etc/user_attruser5::::defaultpriv=basic,dtrace_kernel,dtrace_proc$ echo $$

Page 203: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

Finding System Problems With DTrace 4-43Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

6736$ dtrace -n 'pid6736:a.out::entry'dtrace: description 'pid6736:a.out::entry' matched 211 probes^C$ dtrace -l | awk '{print $2}' | sort -uPROVIDERdtracefasttrapfbtfpuinfoiolockstatmibpid6736procprofileschedsdtsyscallsysinfovminfo$

Page 204: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

4-44 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Privilege Needed for Kernel-Destructive Actions

Only super-user can invoke kernel-destructive actions:

$ dtrace -wn 'syscall::fork1:entry {chill(2000); printf("OK, lets start: %s\n",execname);}'dtrace: description 'syscall::fork1:entry ' matched 1 probedtrace: allowing destructive actionsdtrace: error on enabled probe ID 2 (ID 18246: syscall::fork1:entry): invalid kernelaccess in action #1dtrace: error on enabled probe ID 2 (ID 18246: syscall::fork1:entry): invalid kernelaccess in action #1^C$ suPassword:# dtrace -wn 'syscall::fork1:entry {chill(2000); printf("OK, lets start: %s\n",execname);}'dtrace: description 'syscall::fork1:entry ' matched 1 probedtrace: allowing destructive actionsCPU ID FUNCTION:NAME 0 18246 fork1:entry OK, lets start: bash

0 18246 fork1:entry OK, lets start: bash

^C

Setting DTrace Process Privileges

The Least Privilege facility also enables a Solaris system administrator togrant privileges to specific processes. To give a running process anadditional privilege, use the ppriv (1) command:

# ppriv -s A+ privilege process-ID

The following interactive session shows the use of the ppriv (1) commandto give a shell specific DTrace privileges. Look at privileges (5) fordetails:

$ iduid=1001(user1) gid=101(users)$ /usr/sbin/dtrace -ldtrace: failed to initialize dtrace: DTrace requires additional privileges$ echo $$1774$ ppriv -s A+dtrace_proc 17741774: ppriv: Not owner$ suPassword:# ppriv -s A+dtrace_proc 1774# exit

Page 205: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

Finding System Problems With DTrace 4-45Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

$ /usr/sbin/dtrace -l ID PROVIDER MODULE FUNCTION NAME 1 dtrace BEGIN 2 dtrace END 3 dtrace ERROR$ /usr/sbin/dtrace -n 'pid$target:calls::entry' -c callsdtrace: description 'pid$target:calls::entry' matched 7 probes83133dtrace: pid 1787 exited with status 1CPU ID FUNCTION:NAME 0 28355 _start:entry 0 28362 _init:entry 0 28361 main:entry 0 28360 f1:entry 0 28359 f2:entry 0 28358 f3:entry 0 28357 f4:entry 0 28356 f5:entry 0 28360 f1:entry 0 28359 f2:entry 0 28358 f3:entry 0 28357 f4:entry 0 28356 f5:entry 0 28363 _fini:entry$ ppriv $$1774: -shflags = <none> E: basic,dtrace_proc I: basic,dtrace_proc P: basic,dtrace_proc L: all$ bashbash-2.05b$ ppriv $$1789: bashflags = <none> E: basic,dtrace_proc I: basic,dtrace_proc P: basic,dtrace_proc L: allbash-2.05b$ /usr/sbin/dtrace -n 'pid$target:calls::entry' -c callsdtrace: description 'pid$target:calls::entry' matched 7 probes83133dtrace: pid 1850 exited with status 1CPU ID FUNCTION:NAME 0 28355 _start:entry 0 28362 _init:entry 0 28361 main:entry 0 28360 f1:entry...bash-2.05b$ echo $$1789bash-2.05b$ su

Page 206: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Password:# ppriv -s A+dtrace_kernel 1789# ppriv $$1854: shflags = <none> E: all I: basic P: all L: all# exitbash-2.05b$ ppriv $$1789: bashflags = <none> E: basic,dtrace_kernel,dtrace_proc I: basic,dtrace_kernel,dtrace_proc P: basic,dtrace_kernel,dtrace_proc L: allbash-2.05b$ /usr/sbin/dtrace -qn 'fbt::cv_wait_sig:entry> {trace(execname);ustack();stack();exit(0);}'

more ff2bcb58 15684 149a4 13ad8 12780 1201c 115cc

genunix`str_cv_wait+0x28 genunix`strwaitq+0x238 genunix`strread+0x174 genunix read+0x274 unix`syscall_trap32+0xcc

Page 207: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

DTrace Privileges

Finding System Problems With DTrace 4-47Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Summarizing the DTrace Privilege Levels

Table 4-2 describes the DTrace privilege levels.

Table 4-2 DTrace Privilege Levels

Privilege Level Providers Actions Variables AddressSpaces

Any DTracePrivilege

dtrace exit printftracememdiscardspeculateprinta trace

args probemod thisepid probenametimestamp idprobeprovvtimestampprobefunc self

None

dtrace_procPrivilege

pidplockstat

copyincopyout stopcopyinstrraise ustack

execname pid uregs User

dtrace_userPrivilege

profilesyscall

copyincopyout stopcopyinstrraise ustack

execname pid uregs User

dtrace_kernelPrivilege

All exceptthe pidandplockstatproviders

All butdestructiveactions

All UserKernel

Page 208: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New
Page 209: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

5-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Module 5

Troubleshooting DTrace Problems

Objectives

Upon completion of this module, you should be able to:

● Describe how to lessen the performance impact of DTrace

● Describe how to use and tune DTrace buffers

● Debug DTrace scripts

Page 210: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Relevance

5-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Relevance

?!

Discussion – The following questions are relevant to understanding howto troubleshoot DTrace problems:

● Would the ability to write your D scripts with minimal performanceimpact be beneficial?

● Would it be useful to have control over buffer management policieswhen DTrace buffer space is exhausted?

● Would it be useful to detect common mistakes made in D scripts?

Page 211: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Additional Resources

Troubleshooting DTrace Problems 5-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Additional Resources

Additional resources – The following references provide additionalinformation on the topics described in this module:

● Sun Microsystems, Inc. Solaris Dynamic Tracing Guide, part number817-6223-10.

● Cantrill Bryan M., Michael W. Shapiro, and Adam H. Leventhal.“Dynamic Instrumentation of Production Systems.” Paper presentedat 2004 USENIX Conference.

● BigAdmin System Administration Portal[http://www.sun.com/bigadmin/content/dtrace ].

● dtrace (1M) manual page in the Solaris 10 OS manual pages, Solaris10 Reference Manual Collection.

● The /usr/demo/dtrace directory contains all of the sample scriptsfrom the Solaris Dynamic Tracing Guide.

Page 212: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Minimizing DTrace Performance Impact

5-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Minimizing DTrace Performance Impact

Enabling DTrace in any manner affects system performance in some way.Often, this effect is negligible, but it can be substantial if many probes areenabled with costly enablings. You can minimize the performance impactof DTrace by:

● Limiting enabled probes

● Using aggregations

● Using cacheable predicates

Limiting Enabled Probes

DTrace provides comprehensive tracing coverage of both kernel and userprocesses. This coverage allows for a major probe effect if tens ofthousands of probes are enabled. In general, you should only enable asmany probes as needed to solve your problem. Do not, for example,enable all fbt probes if a more concise enabling can answer yourquestion. When possible, limit enabled probes to a specific module orfunction of interest. The more concisely you can formulate the problemstatement, the better you will be at limiting your probe effect.

You should also be careful when using the pid provider, because it caninstrument every instruction of an application. This can result in millionsof probes being enabled in the application, slowing the target process to acrawl.

Nevertheless, there are many conditions in which you must enable a largenumber of probes to answer a question. DTrace has been designed withthis in mind. Enabling a large number of probes can slow down thesystem substantially, but it can never induce fatal failure of the machine.You should therefore not hesitate to enable many probes if necessary.

Page 213: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Minimizing DTrace Performance Impact

Troubleshooting DTrace Problems 5-5Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Using Aggregations

DTrace aggregations provide a scalable method of aggregating data.Although associative arrays appear to offer similar functionality, they areglobal, general-purpose variables that cannot provide the linear scalabilityof aggregations. Aggregating functions allow for intermediate results tobe kept per-CPU instead of in a shared global data structure. When asystem-wide result is required, the aggregating function may then beapplied to the set consisting of the per-CPU intermediate results. Youshould therefore use aggregations rather than associative arrays wheneverpossible. For example, you should avoid performing the action shown inthe following script:

syscall:::entry{++totals[execname];}

syscall::rexit:entry{printf(“%40s %d\n”, execname, totals[execname]);totals[execname] = 0;}

You should instead perform the following:

syscall:::entry{@totals[execname] = count();}

END{printa(“%40s %@d\n”, @totals);}

Using Cacheable Predicates

A tracing framework that offers comprehensive coverage must provide amechanism that enables you not to trace events, otherwise you are floodedwith unwanted data. DTrace does this with predicates, which enable youto trace data only when a specified condition is found to be true.

Page 214: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Minimizing DTrace Performance Impact

5-6 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

When enabling many probes, you tend to use predicates of a form thatidentifies a specific thread or threads of interest, such as /self->traceme/ or /pid == 12345/ . Many of these predicates evaluate to thesame (false) value for most threads in most probes, but the evaluationitself can become costly when done for every function entry and returnpoint in the kernel.

To reduce this cost, DTrace caches the evaluation of a predicate if itincludes only thread-local variables (as in the first example), onlyimmutable variables (as in the second), or both. The cost of evaluating acached predicate is much smaller than the cost of evaluating a non-cachedpredicate, especially if the predicate involves thread-local variables, stringcomparisons, or other relatively costly operations.

Examining Cacheable and Uncacheable Predicates

Predicate caching is transparent to the user (cache coherency ismaintained by DTrace). It does, however, require you to follow someguidelines to construct optimal predicates. Table 5-1 shows someexamples of cacheable as opposed to uncacheable predicate expressions.

Constructing Optimal Predicates

You should avoid constructing uncacheable predicates, such as thatshown in the following example:

syscall::read:entry{follow[pid, tid] = 1;}

fbt:::/follow[pid, tid]/

Table 5-1 Cacheable and Uncacheable Predicates

Cacheable Uncacheable

self->mumble mumble[curthread] or mumble[pid, tid]

execname == “pgm” curpsinfo->pr_fname or curthread->t_procp->p_user.u_comm

pid == 1234 curpsinfo->pr_pid or curthread->t_procp->p_pidp->pid_id

tid == 17 curlwpsinfo->pr_lwpid or curthread->t_tid

Page 215: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Minimizing DTrace Performance Impact

Troubleshooting DTrace Problems 5-7Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

{}

syscall::read:return/follow[pid, tid]/{follow[pid, tid] = 0;}

You should instead use thread-local variables, as in the followingexample:

syscall::read:entry{self->follow = 1;}

fbt:::/self->follow/{}

syscall::read:return/self->follow/{self->follow = 0;}

To be cacheable, a predicate must consist exclusively of cacheableexpressions. The following predicates are all cacheable:

/execname == “myprogram” //execname == $$1//pid == 12345//pid == $1//self->traceme == 1/

Because of the use of global variables, these predicates are all notcacheable:

/execname == one to_watch//traceme[execname]//pid == pid_i_care_about//se1f->traceme == my_global/

Page 216: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using and Tuning DTrace Buffers

5-8 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Using and Tuning DTrace Buffers

Data buffering and management is an essential service provided by theDTrace framework for its clients. In previous modules you used DTracewithout examining how traced data is transported from the DTraceframework to clients such as the dtrace (1M) utility. In this section, youexplore data buffering in detail and learn about options you can tune tochange the DTrace buffer management policies.

Principal Buffers

The buffer most fundamental to DTrace operation is the principal buffer.The principal buffer is present in every DTrace invocation, and is thebuffer to which tracing actions record their data by default. These actionsinclude:

● exit ()

● printf ()

● trace ()

● ustack ()

● printa ()

● stack ()

The principal buffers are always allocated on a per-CPU basis, althoughtracing (and thus buffer allocation) can be restricted to a single CPU byusing the cpu option.

Principal Buffer Policies

DTrace enables tracing in highly constrained contexts in the kernel. Inparticular, DTrace enables tracing in contexts in which you cannot reliablyallocate memory. The consequence of this flexibility of context is that therealways exists a possibility that you want to trace data when there is nospace available. DTrace must have policies to deal with such situationswhen they arise. Which policy you choose is dictated by the specifics ofhow you are using DTrace: sometimes it is best to discard the new data,while at other times it is desirable to reuse the space containing the oldestrecorded data to trace the new data. Usually, however, the best policy isthe one that minimizes the likelihood of running out of available space inthe first place.

Page 217: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using and Tuning DTrace Buffers

Troubleshooting DTrace Problems 5-9Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

To accommodate these varying demands, DTrace supports the followingbuffer policies:

● The switch policy

● The fill policy

● The ring policy

This support is implemented with the bufpolicy option, and can be seton a per-consumer basis.

DTrace Option Settings

You can set options in a D script by using the #pragma D optionstatement and the option name. If the option takes a value, the optionname should be followed by an equals sign (=) and the option value. Forexample, all of the following are valid option settings:

#pragma D option nspec=4#pragma D option grabanon#pragma D option bufsize=2g#pragma D option switchrate=64#pragma D option aggrate=l00#pragma D option bufresize=manual

The dtrace (1M) command also accepts option settings on the commandline as an argument to the -x option. For example:

# dtrace -x nspec=4 -x bufsize=2g -x switchrate=60 \-x aggrate=l0ms -x bufpolicy=switch -n zfod

You can also specify the bufsize option with the -b flag to thedtrace (1M) command:

# dtrace -b 2g -n zfod

Note – This section describes only those options relevant to buffermanagement. For details on the other DTrace options, see the SolarisDynamic Tracing Guide.

Page 218: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using and Tuning DTrace Buffers

5-10 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The switch Buffer Policy

By default, the principal buffer has a switch buffer policy. Under thispolicy, per-CPU buffers are allocated in pairs: one buffer is active, the otheris inactive. When a DTrace consumer asks to read its buffer out of thekernel, the kernel first switches the inactive and active buffers. Bufferswitching is done in such a manner that there is no window in whichtracing data can be lost. When the buffers are switched, the newly inactivebuffer is copied out to the DTrace consumer. This policy ensures that theconsumer always sees a self-consistent buffer (that is, a buffer is neversimultaneously traced to and copied out), and that no window isintroduced in which tracing is paused or otherwise prevented.

The consumer controls the rate at which the buffer is read out (and thusswitched) by using the switchrate option. As with any rate option,switchrate can be specified with any time suffix, but defaults to rate-per-second.

Dropped Data

Under the switch policy, if a given enabled probe would trace more datathan there is space available in the active principal buffer, the data isdropped and a per-CPU drop count is incremented. In the event of one ormore drops, the dtrace (1M) command displays this message or a similarone:

dtrace: 11 drops on CPU 0

You can reduce or eliminate drops by:

● increasing the size of the principal buffer with the bufsize option,or

● increasing the switching rate with the switchrate option

The switch policy allocates scratch space for the copyin (), copyinstr (),and alloca () commands out of the active buffer.

Example of Tuning Buffers to Alleviate Drops

The following D script causes significant drops:

# cat -n stress.d 1 #!/usr/sbin/dtrace -s 2

Page 219: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using and Tuning DTrace Buffers

Troubleshooting DTrace Problems 5-11Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

3 fbt::: 4 { 5 trace(timestamp); 6 } 7 8 tick-5sec 9 { 10 exit(0); 11 }

# ./stress.d >/var/tmp/stress.d.outdtrace: script './stress.d' matched 38665 probesdtrace: 451660 drops on CPU 0dtrace: 1100596 drops on CPU 0dtrace: 1028767 drops on CPU 0dtrace: 1103521 drops on CPU 0

# ls -l /var/tmp/stress.d.out-rw-r--r-- 1 root root 86004878 Mar 13 14:58/var/tmp/stress.d.out

The drops result from the limited buffer space, the low switchrate value,or both. The default buffer size for the principal buffer is 4 Mbytes and thedefault switchrate is one second. In the next invocation of the script youincrease the buffer size significantly:

# dtrace -x bufsize=300m -s stress.d >/var/tmp/stress.d.outdtrace: script 'stress.d' matched 38665 probesdtrace: buffer size lowered to 150m

# ls -l /var/tmp/stress.d.out-rw-r--r-- 1 root root 18177752 Mar 13 15:03/var/tmp/stress.d.out

Note that DTrace lowers the setting for buffer size because there is notenough memory. By increasing the buffer size, you eliminated all dropsand created 18 Mbytes of trace data. In the next example you use asmaller buffer size, but with an increased switchrate value:

# dtrace -x bufsize=64m -x switchrate=16 -s stress.d >>/var/tmp/stress.d.outdtrace: script 'stress.d' matched 38665 probes^C# ls -l /var/tmp/stress.d.out-rw-r--r-- 1 root root 33052791 Mar 13 15:06/var/tmp/stress.d.out

Page 220: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using and Tuning DTrace Buffers

5-12 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The fill Buffer Policy

For some problems it is useful to have a single in-kernel buffer. In suchsituations you might want to have a single, large in-kernel buffer, andcontinue tracing until one or more of the per-CPU buffers has filled. Youcan implement this solution using the fill buffer policy. The fill bufferpolicy is beneficial in helping to avoid drops that result in the loss of tracedata. Kernel buffer space is also saved since there is only one buffer perCPU.

Under the fill buffer policy, tracing continues until an enabled probe isabout to trace more data than there is space in the principal buffer. At thistime, the buffer is marked as filled and the consumer is notified that atleast one of its per-CPU buffers has filled. When the dtrace (1M) utilitydetects a single filled buffer, tracing is stopped, all buffers are processed,and dtrace exits. Note that no further data is traced to a filled buffer,even if the data would fit in the buffer.

To use the fill policy, set the bufpolicy option to fill . For example,the following invocation of DTrace traces every system call entry into aper-CPU 2-Kbyte buffer with the buffer policy set to fill :

# dtrace -n syscall:::entry -b 2k -x bufpolicy=fill

To allow for ENDtracing in fill buffers, DTrace calculates beforehand theamount of space potentially consumed by ENDprobes and subtracts thisfrom the size of the principal buffer. If the net size is negative, DTracerefuses to start, and the dtrace (1M) utility outputs a corresponding errormessage:

dtrace: END enablings exceed size of principal buffer

Reserving space beforehand ensures that a full buffer always hassufficient space for any and all ENDprobes.

Page 221: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using and Tuning DTrace Buffers

Troubleshooting DTrace Problems 5-13Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The ring Buffer Policy

When using DTrace to help diagnose failure (as opposed tounderstanding non-failing behavior), you often want to track the eventsleading to failure. Moreover, in cases where reproducing failure can takehours or days, you might want to keep only the most recent data. Tosupport such situations, DTrace provides the ring buffer policy. Underthis policy, when a principal buffer has filled, tracing wraps around to thefirst entry, thereby overwriting older tracing data. You establish a ringbuffer by setting the bufpolicy option to ring :

# dtrace -s stress.d -x bufpolicy=ring -b 16kdtrace: script 'stress.d' matched 38665 probes

CPU ID FUNCTION:NAME 0 9808 disp_lock_enter_high:entry 810424080584641 0 9809 disp_lock_enter_high:return 810424080586093 0 2288 setfrontdq:return 810424080588595 0 668 generic_enq_thread:entry 810424080590727 0 669 generic_enq_thread:return 810424080592504 0 14298 ts_preempt:return 810424080594241...

With the ring buffer policy, the dtrace (1M) utility does not display anyoutput until the process terminates; at that time the ring buffer isconsumed and processed.

Note that if a given record cannot fit in the buffer (that is, if the record islarger than the buffer size), the record is dropped regardless of bufferpolicy. By adding the following two lines to a D script, you can enablering buffering with a specific buffer size:

#praqma D option bufpolicy=ring#pragma D option bufsize=16k

Page 222: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Using and Tuning DTrace Buffers

5-14 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Other Buffers

Principal buffers exist in every DTrace enabling. In addition to principalbuffers, some DTrace consumers have additional in-kernel data buffers: anaggregation buffer, a number of speculative buffers, or both. You tune theaggregation buffer size with the aggsize option, and you tune thespeculative buffer size with the specsize option. You can tune the size ofeach buffer on a per-consumer basis. Note that setting the buffer sizesdenotes the sizes of the buffers on each CPU. Moreover, for the switchbuffer policy, bufsize denotes the individual sizes of the active andinactive buffers on each CPU.

Buffer Resizing Policy

In some cases there is not adequate free kernel memory to allocate abuffer of the desired size. There might be insufficient memory available,or the DTrace consumer might have exceeded a tunable limit. DTraceprovides a configurable policy when a buffer cannot be allocated.

The policy is set with the bufresize option, and defaults to auto . Underthe auto buffer resize policy, the size of a buffer is halved until asuccessful allocation occurs. The dtrace (1M) utility emits a message if abuffer as allocated is smaller than the requested size:

# dtrace -P syscall -b 4gdtrace: description 'syscall' matched 450 probesdtrace: buffer size lowered to 128m# dtrace -n 'fbt:::entry {@a[probefunc] = count()}' -x aggsize=1gdtrace: description 'fbt:::entry ' matched 16250 probesdtrace: aggregation size lowered to 128m

Alternatively, you can set the buffer resize policy to be manual by settingbufresize to manual . Under this policy, a failure to allocate causesDTrace to fail to start:

# dtrace -P syscall -x bufsize=500m -x bufresize=manualdtrace: description 'syscall' matched 450 probesdtrace: could not enable tracing: Not enough space

The bufresize option dictates the buffer resizing policy of all buffers—principal, speculative and aggregation.

Page 223: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

Troubleshooting DTrace Problems 5-15Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Debugging DTrace Scripts

As with any programming language, you can experience a multitude oferrors in the D language. As you write more D scripts, you find it easier todiagnose errors, whether they be syntax errors or run-time errors. Thissection provides requirements and recommendations for writing correct Dscripts.

Avoiding Syntax Errors in D Scripts

This section describes requirements that help you to avoid common Dscript syntax errors.

Start your scripts with the following first line: #!/usr/sbin/dtrace -s

# ./badstart.d./badstart.d: line 1: BEGIN: command not found./badstart.d: line 8: tick-1sec: command not found./badstart.d: line 10: syntax error near unexpected token`0'./badstart.d: line 10: exit(0);'

# cat comments.d/* This D script counts the number of read system calls */#!/usr/sbin/dtrace -ssyscall::read:entry{ @["Number of reads:"] = count();}# ./comments.d./comments.d: line 1: /bin: is a directory./comments.d: line 3: syscall::read:entry: command notfound./comments.d: line 5: syntax error near unexpected token('

./comments.d: line 5: @["Number of reads:"] = count();'

You must match up /* with an ending */ for comments in D scripts:

# cat comments2.d#!/usr/sbin/dtrace -s/* This D script counts the number of read system callssyscall::read:entry{

Page 224: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

5-16 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

@["Number of reads:"] = count();}

# ./comments2.ddtrace: failed to compile script ./comments2.d: line 7:end-of-file encountered before matching */

If you have more than one statement in a probe clause, make sure you endeach one with a semicolon:

...BEGIN{a=$1b=$2c=$3}...

# ./badstart2.d 1 2 3dtrace: failed to compile script ./badstart2.d: line 6:syntax error near "b"

When comparing values, make sure that you use the == relationaloperator and not =:

# cat test5.d#!/usr/sbin/dtrace -sfbt::sema_init:entry/arg1 = 1/{ trace(timestamp);}# ./test5.ddtrace: failed to compile script ./test5.d: line 4:operator = can only be applied to a writable variable

The first assignment to a variable determines its type. As in the Clanguage, you cannot mix types in the D language:

# cat test8.d#!/usr/sbin/dtrace -s

BEGIN{ vp = rootdir; i = 5;

Page 225: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

Troubleshooting DTrace Problems 5-17Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

}

tick-1sec{ i = *vp;}# ./test8.ddtrace: failed to compile script ./test8.d: line 11:operands have incompatible types: "int" = "vnode_t"

Remember that even with the -w dtrace (1M) option, which enablesdestructive actions, you cannot modify kernel variables:

# cat test6.d#!/usr/sbin/dtrace -ws

tick-5sec/ freemem < lotsfree/{ lotsfree = lotsfree*2;}

# ./test6.ddtrace: failed to compile script ./test6.d: line 6:operator = can only be applied to a writable variable

Page 226: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

5-18 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Avoiding Run-Time Errors in D Scripts

This section describes requirements that help you to avoid common Dscript run-time errors.

Make sure your D script file has execute permission:

# ./badstart.d./badstart.d: : Permission denied# chmod +x badstart.d

If you specify other options on the first line of a D script, be sure the soption is last:

# head badstart3.d#!/usr/sbin/dtrace -sq

BEGIN{a=$1b=$2c=$3}

tick-1sec# ./badstart3.ddtrace: failed to open q: No such file or directory

Make sure that you pass the correct number of arguments expected by thescript (unless you explicitly set the defaultargs option). For example,the badstart4.d script expects three command-line arguments:

# ./badstart4.ddtrace: failed to compile script ./badstart4.d: line 5:macro argument $1 is not defined# dtrace -x defaultargs -s badstart4.ddtrace: script 'badstart4.d' matched 2 probesCPU ID FUNCTION:NAME 0 36401 :tick-1sec

If an argument is a string, make sure that you either reference theargument in the script with $$3 (if it is the third argument) or type it onthe command line as ‘”string”’ :

# head badstart5.d#!/usr/sbin/dtrace -qs

Page 227: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

Troubleshooting DTrace Problems 5-19Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

BEGIN{a=$1;b=$2;}

tick-1sec/execname == $3/

# . /badstart5.d 1 2 initdtrace: failed to compile script ./badstart5.d: line 10:failed to resolve init: Unknown variable name# ./badstart5.d 1 2 '"init"'^C

Avoid misspelled words, which are a common problem in writing Dscripts:

# ./test1.ddtrace: failed to compile script ./test1.d: line 3: probedescription syscall::opn:entry does not match any probes

The following script uses an improper probe description:

# cat test2.d#!/usr/sbin/dtrace -s

syscall{trace(timestamp);}# ./test2.ddtrace: failed to compile script ./test2.d: line 3: probedescription :::syscall does not match any probes

When using the printf () and printa () built-in functions, make sure thatthe arguments match the format specifiers in type and number:

# cat -n test3.d 1 #!/usr/sbin/dtrace -qs 2 3 sched:::on-cpu 4 /pid != $pid && pid != 0/ 5 { 6 @[curpsinfo->pr_psargs, curcpu->cpu_id] =count();

Page 228: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

5-20 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

7 } 8 9 END 10 { 11 printf("%-30s %4s %6s\n", "Command", "CPU"); 12 printa("%-30s %4d %@6d\n", @); 13 }# ./test3.ddtrace: failed to compile script ./test3.d: line 11:printf( ) prototype mismatch: conversion #3 (%s) is missinga corresponding value argument# cat -n test3a.d 1 #!/usr/sbin/dtrace -qs 2 3 sched:::on-cpu 4 /pid != $pid && pid != 0/ 5 { 6 @[curpsinfo->pr_psargs, curcpu->cpu_id] =count(); 7 } 8 9 END 10 { 11 printf("%-30s %4s %6s\n", "Command", "CPU","Count"); 12 printa("%-30s %4s %@6d\n", @); 13 }

# ./test3a.ddtrace: failed to compile script ./test3a.d: line 12:printa( ) argument #3 is incompatible with conversion #2prototype: conversion: %s prototype: char [] or string (or use stringof) argument: processorid_t# cat test4.d#!/usr/sbin/dtrace -s

syscall::open:entry{ printf("%s was opening: %s\n", execname, arg0);}# ./test4.ddtrace: failed to compile script ./test4.d: line 5: printf() argument #3 is incompatible with conversion #2 prototype: conversion: %s

Page 229: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

Troubleshooting DTrace Problems 5-21Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

prototype: char [] or string (or use stringof) argument: int64_t

Remember that pointer arguments to system calls are user addresses, notkernel addresses. You must use the copyinstr () built-in function toretrieve the strings:

# cat test4a.d#!/usr/sbin/dtrace -s

syscall::open:entry{ printf("%s was opening: %s\n", execname, stringof(arg0));}# ./test4a.ddtrace: script './test4a.d' matched 1 probedtrace: error on enabled probe ID 1 (ID 37: syscall::open:entry): invalidaddress (0xff3d79d3) in action #2dtrace: error on enabled probe ID 1 (ID 37: syscall::open:entry): invalidaddress (0xff3ed570) in action #2dtrace: error on enabled probe ID 1 (ID 37: syscall::open:entry): invalidaddress (0xff3ef6d0) in action #2^C# cat test4b.d#!/usr/sbin/dtrace -ssyscall::open:entry{printf("%s was opening: %s\n", execname, copyinstr(arg0));}# ./test4b.ddtrace: script './test4b.d' matched 1 probeCPU ID FUNCTION:NAME 0 37 open:entry ls was opening: /var/ld/ld.config

0 37 open:entry ls was opening: /lib/libc.so.1

0 37 open:entry ls was opening:/usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3

0 37 open:entry cat was opening: /var/ld/ld.config

0 37 open:entry cat was opening: /lib/libc.so.1

0 37 open:entry cat was opening:/usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3^C

Page 230: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

5-22 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Numbering of Action Statements

The run-time error shown for test4a.d references action #2 althoughthere is only one action statement. Action statements are numbered asfollows: there is one action for each non-printf () expression, and one foreach data argument to printf . Therefore the stringof(arg0) dataargument to printf is action #2.

Avoid enabling probes that generate too much data, causing drops:

# cat drop.d#!/usr/sbin/dtrace -s

entry{ printf("%s %s %s\n", probeprov, probemod, probefunc);}# ./drop.d > /tmp/drop.outdtrace: script './drop.d' matched 19579 probesdtrace: 29569 drops on CPU 0dtrace: 903839 drops on CPU 0^Cdtrace: 448991 drops on CPU 0

If you cause any run-time exceptions in your D scripts, such as divide-by-zero, DTrace gives you run-time errors, but continues to run:

# cat -n test9.d 1 #!/usr/sbin/dtrace -s 2 3 BEGIN 4 { 5 x = 5*1024*1024; 6 } 7 8 tick-3sec 9 { 10 x = x/( pagesize-8192); 11 }

# ./test9.ddtrace: script './test9.d' matched 2 probesdtrace: error on enabled probe ID 2 (ID 36402:profile:::tick-3sec): divide-by-zero in action #1 at DIFoffset 20

Page 231: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Debugging DTrace Scripts

Troubleshooting DTrace Problems 5-23Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

dtrace: error on enabled probe ID 2 (ID 36402:profile:::tick-3sec): divide-by-zero in action #1 at DIFoffset 20^C

Page 232: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New
Page 233: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

A-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Appendix A

Actions and Subroutines

You have seen function calls used in D program examples. D functioncalls allow you to invoke two kinds of services provided by DTrace:

● Actions that trace data or modify state external to DTrace

● Subroutines that only affect internal DTrace state

This appendix formally defines the set of actions and subroutinesavailable in DTrace, along with their syntax and semantics. This appendixenables you to:

● Describe the default action

● Describe and use data recording actions

● Describe and use destructive actions

● Describe and use special actions

● Describe and use subroutines

Page 234: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Default Action

A-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Default Action

A clause need not contain an action; it may instead consist simply ofmanipulation of variable state, or of any combination of actions andmanipulations of variable state. If a clause contains no actions and no Dmanipulation (that is, if a clause is empty), the default action is taken. Thedefault action is to trace the enabled probe identifier (EPID) to theprincipal buffer.

The EPID identifies a particular enabling of a particular probe with aparticular predicate and actions. From the EPID, DTrace consumers candetermine which probe induced the action. Indeed, whenever data istraced, it must be accompanied by the EPID to allow the consumer tomake sense of the data; hence the default action is to trace the EPID andnothing else.

Using the default action allows for simple use of the dtrace (1M)command. For example, you can enable all probes in the TS module withthe default action by using:

# dtrace -m TS

(The TS module implements the timesharing scheduling class; seedispadmin (1M) for more information.) The above command results inoutput similar to the following:

# dtrace -m TSdtrace: description 'TS' matched 93 probesCPU ID FUNCTION:NAME 0 14297 ts_preempt:entry 0 14298 ts_preempt:return 0 14301 ts_sleep:entry 0 14302 ts_sleep:return 0 14301 ts_sleep:entry 0 14302 ts_sleep:return 0 14301 ts_sleep:entry 0 14302 ts_sleep:return 0 14329 ts_update:entry 0 14331 ts_update_list:entry 0 14327 ts_change_priority:entry 0 14328 ts_change_priority:return 0 14332 ts_update_list:return 0 14331 ts_update_list:entry 0 14332 ts_update_list:return 0 14331 ts_update_list:entry...

Page 235: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

Actions and Subroutines A-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

Data Recording Actions

Data recording actions compose the core DTrace actions. Each of theseactions records data to the principal buffer by default, but each can alsorecord data to speculative buffers. The descriptions below refer to thebuffer where actions are being recorded as the directed buffer.

The void trace( expression ) Action

The most basic action is the trace () action, which takes a D expression asits argument and traces the result to the directed buffer. All of thefollowing are valid trace () actions:

trace(execname);trace(curlwpsinfo->pr_pri);trace(timestamp / 1000);trace(‘lbolt);trace(“somehow managed to get here”);

The void tracemem( address , size_t nbytes ) Action

A cousin to trace () is the tracemem () action, which takes a D expressionas its first argument, address , and a constant as its second argument,nbytes . The tracemem () action copies the memory from the addressspecified by address into the directed buffer for the length specified bynbytes .

The void printf(string format , ...) Action

Like trace (), the printf () action traces D expressions, but printf ()allows for elaborate printf (3C)-style formatting. Like printf (3C), theparameters consist of a format string followed by a variable number ofarguments. The following action traces a string and an integer argumentwith appropriate labels:

printf(“execname is %s; priority is %d”, execname,curlwpsinfo->pr_pri);

Page 236: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

A-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The printf () action tells DTrace to trace the data associated with eachargument after the first argument, and then to format the results using therules described by the first printf () argument, known as a format string.The format string is a regular string that contains any number of formatconversions, each beginning with the %character, which describe how toformat the corresponding argument. The first conversion in the formatstring corresponds to the second printf () argument, the secondconversion to the third argument, and so on. All of the text betweenconversions is printed verbatim. The character following the conversioncharacter describes the format to use for the corresponding argument.Unlike the printf (3C) action, DTrace printf () is implemented as a built-in function that is recognized by the D compiler. The D compiler providesseveral useful services for the DTrace printf () action that are not foundin the C library printf ():

● The D compiler compares the arguments to the conversions in theformat string. If an argument’s type is incompatible with the formatconversion, the D compiler produces an error message explainingthe problem.

● The D compiler does not require the use of size prefixes withprintf () format conversions. The C printf () routine requires thatyou indicate the size of arguments by adding prefixes, such as %ldfor long or %lld for long long. The D compiler knows the size andtype of your arguments, so these prefixes are not required in your Dprintf () statements.

● DTrace provides additional format characters that are useful fordebugging and observability; for example, the %aformat conversioncan be used to print a pointer as a symbol name and offset.

In order to implement these features, the format string in the DTraceprintf () function must be specified as a string constant in your Dprogram; format strings cannot be dynamic variables of type string.

Conversion Specifications

Each conversion specification in the format string is introduced by the %character, after which the following appear in sequence:

● Zero or more flags (in any order), which modify the meaning of theconversion specification as described in the following subsection.

● An optional minimum field width. If the converted value has fewerbytes than the field width, it is padded with spaces on the left bydefault, or on the right if the left-adjustment flag (- ) is specified. The

Page 237: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

Actions and Subroutines A-5Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

field width can also be specified as an asterisk (* ), in which case thefield width is set dynamically based on the value of an additionalargument of type int .

● An optional precision that provides one of the following:

● The minimum number of digits to appear for the d, i , o, u, x,and X conversions (the field is padded with leading zeroes)

● The number of digits to appear after the radix character for thee, E, and f conversions

● The maximum number of significant digits for the g and Gconversions

● The maximum number of bytes to be printed from a string bythe a conversion

The precision takes the form of a period (. ) followed by either anasterisk (* ), as described in the “Width and Precision Specifiers”subsection, or by a decimal digit string.

● An optional sequence of size prefixes that indicate the size of thecorresponding argument (described in the “Size Prefixes”subsection). The size prefixes are not necessary in D and areprovided solely for compatibility with the C printf () function.

● A conversion specifier (described in the following subsection) thatindicates the type of conversion to be applied to the argument.

The printf (3C) function also supports conversion specifications of theform %n$ where n is a decimal integer; DTrace printf () does not supportthis type of conversion specification.

Flag Specifiers

You enable the printf () conversion flags by specifying one or more of thefollowing characters, which can appear in any order:

● (‘ ) – The integer portion of the result of a decimal conversion (%i,%d, %u, %f, %g, or %G) is formatted with thousands groupingcharacters using the non-monetary grouping character. Not alllocales, including the POSIX C locale, provide non-monetarygrouping characters for use with this flag.

● (- ) – The result of the conversion is left-justified within the field. Theconversion will be right-justified if this flag is not specified.

● (+) – The result of signed conversion always begins with a sign (+ or- ). If this flag is not specified, the conversion begins with a sign onlywhen a negative value is converted.

Page 238: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

A-6 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

● ( space ) – If the first character of a signed conversion is not a sign orif a signed conversion results in no characters, a space is placedbefore the result. If the space and + flags both appear, the space flagis ignored.

● (#) – The value is converted to an alternate form if one is defined forthe selected conversion. The alternate formats for conversions aredescribed below in the text corresponding to each conversion.

● (0) – For d, i , c, u, x, X, e, E, f , g, and Gconversions, leading zeroes(following any indication of sign or base) are used to pad to the fieldwidth; no space padding is performed. If the 0 and - flags bothappear, the 0 flag is ignored. For d, i , o, u, x, and X conversions, if aprecision is specified, the 0 flag is ignored. If the 0 and ‘ flags bothappear, the grouping characters are inserted before the zero padding.

Width and Precision Specifiers

You can specify the minimum field width as a decimal digit stringfollowing any flag specifier, as described previously, in which case thefield width is set to the specified number of columns. You can also specifythe field width as asterisk (* ), in which case an additional argument oftype int is accessed to determine the field width. For example, to print aninteger x in a field width determined by the value of the int variable w,you write this D statement:

printf(“%*d”, w, x);

Additionally, you can specify the field width using a ? character toindicate that the field width should be set based on the number ofcharacters required to format an address in hexadecimal in the data modelof the operating system kernel. The width is set to 8 if the kernel is usingthe 32-bit data model, or to 16 if the kernel is using the 64-bit data model.

The precision for the conversion can be specified as a decimal digit stringfollowing a period (. ) or by an asterisk (* ) following a period. If anasterisk is used to specify the precision, an additional argument of typeint prior to the conversion argument is accessed to determine theprecision. If both width and precision are specified as asterisks, the orderof arguments to printf () for the conversion should appear in the order:width, precision, value.

Page 239: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

Actions and Subroutines A-7Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

Size Prefixes

Size prefixes are required in ANSI-C programs that use printf (3C) inorder to indicate the size and type of the conversion argument. The Dcompiler performs this processing for your printf () calls automatically,so size prefixes are not required.

Although size prefixes are provided for C compatibility, their use isexplicitly discouraged in D programs because they also tend to bind yourcode to a particular data model when using derived types. For example, ifa typedef is redefined to different integer base types depending on thedata model, it is not possible to use a single C conversion that works inboth data models without explicitly knowing the two underlying typesand including a cast expression, or defining multiple format strings. TheD compiler solves this problem by allowing you to omit size prefixes andautomatically determining the argument size.

The size prefixes can be placed just before the format conversion nameand after any flags, widths, and precision specifiers. The size prefixes are:

● Optional h – specifies that a following a, i , o, u, x, or X conversionapplies to a short or unsigned short

● Optional l – specifies that a following d, i , o, u, x, or X conversionapplies to a long or unsigned long

● Optional ll – specifies that a following d, i , o, u, x, or X conversionapplies to a long long or unsigned long long

● Optional L – specifies that a following e, E, f , g, or Gconversionapplies to a long double

● Optional l – specifies that a following c conversion applies to awint_t argument; an optional l specifies that a following sconversion character applies to a pointer to a wchar_t argument

Conversion Formats

Each conversion character sequence results in fetching zero or morearguments. If you do not provide sufficient arguments for the formatstring, or if the format string is exhausted and arguments remain, the Dcompiler issues an error message. If you specify an undefined conversionformat, the D compiler issues an error message. The conversion charactersequences and their meanings are:

Page 240: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

A-8 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

● a – The pointer or uintptr_t argument is printed as a kernelsymbol name in the form module‘symbol-name plus an optionalhexadecimal byte offset. If the value does not fall within the rangedefined by a known kernel symbol, the value is printed as ahexadecimal integer.

● c – The char , short , or int argument is printed as an ASCIIcharacter.

● d – The char , short , int , long , or long long argument is printedas a decimal (base 10) integer. If the argument is signed , it is printedas a signed value. If the argument is unsigned , it is printed as anunsigned value. This conversion has the same meaning as i .

● e, E – The float , double , or long double argument is converted tothe style [- ]d. dddedd, where there is one digit before the radixcharacter (which is non-zero if the argument is non-zero) and thenumber of digits after it is equal to the precision. If you do notspecify the precision, the default precision value is 6. If the precisionis 0 and the # flag is not specified, no radix character appears. The Econversion format produces a number with E instead of eintroducing the exponent. The exponent always contains at least twodigits. The value is rounded up to the appropriate number of digits.

● f – The float , double , or long double argument is converted tothe style [- ]ddd. ddd, where the number of digits after the radixcharacter is equal to the precision specification. If you do not specifythe precision, the default precision value is 6. If the precision is 0 andthe # flag is not specified, no radix character appears. If a radixcharacter appears, at least one digit appears before it. The value isrounded up to the appropriate number of digits.

● g, G– The float , double, or long double argument is printed in thestyle f or e (or in style E in the case of a Gconversion character), withthe precision specifying the number of significant digits. If an explicitprecision is 0, it is taken as 1. The style used depends on the valueconverted: style e (or E) is used only if the exponent resulting fromthe conversion is less than -4 or greater than or equal to theprecision. Trailing zeroes are removed from the fractional part of theresult. A radix character appears only if it is followed by a digit. Ifthe # flag is specified, trailing zeroes are not removed from the resultas they normally are.

● i – The char , short , int , long , or long long argument is printedas a decimal (base 10) integer. If the argument is signed , it is printedas a signed value. If the argument is unsigned , it is printed as anunsigned value. This conversion has the same meaning as d.

Page 241: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

Actions and Subroutines A-9Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

● o – The char , short , int , long , or long long argument is printedas an unsigned octal (base 8) integer. Arguments that are signed orunsigned can be used with this conversion. If the # flag is specified,the precision of the result is increased if necessary to force the firstdigit of the result to be a zero.

● p – The pointer or uintptr_t argument is printed as a hexadecimal(base 16) integer. D accepts pointer arguments of any type. If the #flag is specified, a non-zero result has 0x prepended to it.

● s – The argument must be an array of char or a string . Bytes fromthe array or string are read up to a terminating null character or tothe end of the data and are interpreted and printed as ASCIIcharacters. If the precision is not specified, it is taken to be infinite, soall characters up to the first null character are printed. If theprecision is specified, only that portion of the character array thatdisplays in the corresponding number of screen columns is printed.If an argument of type char * is to be formatted, it should be cast tostring or prefixed with the D stringof operator to indicate thatDTrace should trace the bytes of the string and format them.

● u – The char , short , int , long , or long long argument is printedas an unsigned decimal (base 10) integer. Arguments that are signedor unsigned can be used with this conversion, and the result isalways formatted as unsigned .

● wc – The int argument is converted to a wide character (wchar_t )and the resulting wide character is printed.

● ws – The argument must be an array of wchar_t . Bytes from thearray are read up to a terminating null character or to the end of thedata and are interpreted and printed as wide characters. If theprecision is not specified, it is taken to be infinite, so all widecharacters up to the first null character are printed. If the precision isspecified, only that portion of the wide character array that displaysin the corresponding number of screen columns is printed.

● x, X – The char , short , int , long , or long long argument is printedas an unsigned hexadecimal (base 16) integer. Arguments that aresigned or unsigned can be used with this conversion. If the X formof the conversion is used, the letter digits abcdef are used. If the Xform of the conversion is used, the letter digits ABCDEFare used. Ifthe # flag is specified, a non-zero result has 0x (for %x) or 0X (for %X)prepended to it.

● %– Print a literal %character; no argument is converted. The entireconversion specification must be %%.

Page 242: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

A-10 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The printa Action

There are two forms of the printa action:

● void printa( aggregation )

● void printa(string format , aggregation )

The printa () action is used to format the results of aggregations in a Dprogram. If the first form of the action is used, the dtrace (1M) commandtakes a consistent snapshot of the aggregation data and produces outputequivalent to the default output format used for aggregations. If thesecond form of the function is used, the dtrace (1M) command takes aconsistent snapshot of the aggregation data and produces output based onthe conversions specified in the format string, according to the rulesdescribed in the following subsection.

Rules for Specifying Conversions in the format String

The rules for specifying conversions in the format string are as follows:

● The format conversions must match the tuple signature used tocreate the aggregation. Each tuple element can only appear once. Forexample, suppose you aggregate a count using the following Dstatements:

@a[”hello”, 123] = count();@a[”goodbye”, 456] = count();

If you then add the D statement printa( format-string , @a) to aprobe clause, the dtrace utility snapshots the aggregation data andproduces output as if you had entered the statements for each tupledefined in the aggregation, such as:

printf( format-string , “hello”, 123);printf( format-string , “goodbye”, 456);

● Unlike printf (), the format string you use for printa () need notinclude all elements of the tuple (that is, you can have a tuple oflength 3 and only one format conversion). Therefore you can omitany tuple keys from your printa () output by changing youraggregation declaration to move the ones you want to omit to theend of the tuple and then omitting corresponding conversionspecifiers for them from the printa () format string.

● The aggregation result itself can be included in the output by usingthe additional @format flag character, which is only valid when usedwith printa (). The @flag can be combined with any appropriateformat conversion specifier, and can appear more than once in a

Page 243: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

Actions and Subroutines A-11Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

format string. This means that your tuple result can appearanywhere in the output and can appear more than once. The set ofconversion specifiers that can be used with each aggregatingfunction are implied by the aggregating function’s result type, listedbelow:

● uint64_t avg()

● uint64_t count()

● int64_t lquantize()

● uint64_t max()

● uint64_t min()

● int64_t quantize()

● uint64_t sum()

For example, to format the results of avg (), you can apply the %d, %i,%o, %u, or %xformat conversions. The quantize () and lquantize ()functions format their results as an ASCII table rather than as asingle value.

Example of the printa () Action

The following D program shows a complete example of the printa ()action, using the profile provider to sample the value of caller andthen formatting the results as a simple table:

profile:::protile-997{@a[caller] = count();}

END{printa(“@8u %a\n”, @a);}

If you use the dtrace command to execute this program, then wait a fewseconds and type Control-C, you see output similar to the following:

# dtrace -s printa.d^C

CPU ID FUNCTION: NAME 1 2 :END 1 Oxl 1 ohci’ohci_handle root hub_status_change+0x148 1 specfs’spec_write+OxeO

Page 244: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

A-12 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

1 Oxffl4f950 1 genunix’cyclicsoftint+0x588 1 Oxfef228Oc 1 genunix‘getf+Oxdc 1 ufs‘ufs icheck+0x50 1 genunix‘infpollinfo+0x80 1 genunix’kmem_log_enter+tOxle8

...

The stack () Action

There are two forms of the stack () action:

● void stack(int nframes )

● void stack(void)

The stack () action records a kernel stack trace to the directed buffer. Thekernel stack is nframes in depth. If you do not provide nframes , thenumber of stack frames recorded is the number specified by thestackframes option. For example:

# dtrace -n ‘uiomove:entry{stack()}’CPU ID FUNCTION:NAME 0 12200 uiomove:entry ufs rdip+0x338 ufs`ufs_read+0x208 genunix`vn_rdwr+0x1c0 elfexec`getelfphdr+0xa4 elfexec`elf32exec+0x7a0 genunix`gexec+0x324 genunix`exec_common+0x278 genunix`exece+0xc unix`syscall_trap32+0xcc

0 12200 uiomove:entry ufs`ufs_readlink+0x11c genunix`pn_getsymlink+0x40 genunix lookuppnvp+0x414 genunix lookuppnat+0x120 genunix resolvepath+0x50 unix`syscall_trap32+0xcc...

The stack () action differs from other actions in that it can also be used asa key to an aggregation:

Page 245: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

Actions and Subroutines A-13Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

# dtrace -n ‘kmem_alloc:entry {@[stack()] = count()}’dtrace: description 'kmem_alloc:entry ' matched 1 probe^C

genunix installctx+0xc genunix`schedctl+0x5c unix`syscall_trap+0xac 1

genunix`schedctl_shared_alloc+0xc0 genunix`schedctl+0x18 unix`syscall_trap+0xac 1

unix lgrp_shm_policy_set+0x168 genunix`segvn_create+0x82c genunix`as_map+0xf0 genunix`schedctl_map+0x98 genunix`schedctl_shared_alloc+0x8c genunix`schedctl+0x18 unix`syscall_trap+0xac 1... sd`xbuf_iostart+0x7c ufs log_roll_write_bufs+0x100 ufs log_roll_write+0xe4 ufs trans_roll+0x2f8 unix thread_start+0x4 16

The ustack () Action

There are two forms of the ustack () action:

● void ustack(int nframes )

● void ustack(void)

Page 246: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

A-14 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The ustack () action records a user stack trace to the directed buffer. Theuser stack is nframes in depth. If you do not specify nframes, the numberof stack frames recorded is the number specified by the ustackframesoption. Although ustack () can determine the address of the callingframes when the probe fires, the stack frames are not translated intosymbols until the ustack () action is processed at user-level by the DTraceconsumer. Note that some functions are static and therefore do not haveentries in the symbol table; call sites in these functions are displayed withtheir hexadecimal address. Also, because ustack () symbol translationdoes not occur until after the data is recorded, there exists a possibilitythat the process in question has exited, making stack frame translationimpossible. In this case, the dtrace utility emits a warning, followed bythe hexadecimal stack frames. For example:

dtrace: failed to grab process 100941: no such process c7b834d4 c7bca95d c7bcala4 c7bd4 374 c7bc2528 8047efc

Finally, because the postmortem DTrace debugger commands cannotperform the frame translation, using ustack () with a ring buffer policyalways results in raw ustack () data.

Example of the ustack () Action

The following D program shows an example of the ustack () action:

syscall::brk:entry/execname == $1/{@a[ustack(40)] = count();}

# dtrace -s brk.d '"vi"'dtrace: script 'brk.d' matched 1 probe^C

libc.so.1`_brk_unlocked+0x4 libc.so.1`sbrk+0x24 vi morelines+0x4 vi append+0xc4 vi vdoappend+0x2c vi fixzero+0x28

Page 247: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Data Recording Actions

Actions and Subroutines A-15Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

vi ovbeg+0x30 vi vop+0x158 vi commands+0x13d0 vi main+0xf24 vi _start+0x108 1... libc.so.1`_brk_unlocked+0x4 libc.so.1`sbrk+0x24 vi morelines+0x4 vi append+0xc4 vi put+0xe4 vi vremote+0x64 vi vmain+0x1670 vi vop+0x25c vi commands+0x13d0 vi main+0xf24 vi _start+0x108 35

Page 248: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Destructive Actions

A-16 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Destructive Actions

Some actions are destructive in that they change the state of the system.Although they change the system in a well-defined way, they change itnonetheless. You cannot use destructive actions unless you have explicitlyenabled them. In the dtrace (1M) command, you enable destructiveactions with the -w option. If you attempt to use destructive actions in thedtrace (1M) command without explicitly enabling them, dtrace fails,returning an error message similar to:

dtrace: could not enable tracing: Destructive actionsnot allowed

Process Destructive Actions

Some destructive actions are destructive only to a process—the systemitself remains intact. These actions are available to those with thedtrace_proc or dtrace_user privileges.

The void stop(void) Action

The stop () action forces the process that hit the enabled probe to stopwhen it next leaves the kernel, as if stopped by a proc (4) action. You canuse the prun (1) utility to resume a process that has been stopped by thestop () action. You can use the stop () action to stop a process at anyDTrace probe point; this allows you to capture a program in a veryparticular state (which is difficult to achieve with a simple breakpoint).You can then attach a traditional debugger (such as mdb(1)) to examine theprogram’s state, or use the gcore (1) utility to capture that state in a corefile for later analysis.

The void raise(int signal ) Action

The raise () action sends the specified signal to the currently runningprocess. This is similar to using the kill (1) command to send a process asignal; however, you can use the raise () action to send a signal at aprecise point in a process’s execution.

Page 249: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Destructive Actions

Actions and Subroutines A-17Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

The void copyout(void * buf , uintptr_t addr , size_tnbytes ) Action

The copyout () action copies nbytes from the buffer specified by buf tothe address specified by addr in the address space of the processassociated with the current thread. If the user-space address does notcorrespond to a valid, faulted-in page in the current address space, anerror is generated.

The void copyoutstr(string str , uintptr_t addr ,size_t maxlen ) Action

The copyoutstr () action copies the string specified by str to the addressspecified by addr in the address space of the process associated with thecurrent thread. If the user-space address does not correspond to a valid,faulted-in page in the current address space, an error is generated. Thestring length is limited to the value set by the strsize option.

The void system(string program ... ) Action

The system () action causes the program to be executed as if it were givento a shell as input.The program string can contain any of the printf ()format conversions with corresponding arguments that follow.

Example of the system () Action

#pragma D option destructive#pragma D option quiet

proc:::signal-send/args[2] == SIGINT/{printf("SIGINT sent to %s by ", args[1]->pr_fname);system("getent passwd %d | cut -d: -f5", uid);}

# ./whosend.dSIGINT sent to run-mozilla.sh by Mary Smith^C

Page 250: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Destructive Actions

A-18 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Kernel Destructive Actions

Some destructive actions are destructive to the entire system. These mustbe used with extreme care, as they can affect any process on the system(and any other systems dependent upon your network services).

The void breakpoint(void) Action

The breakpoint () action induces a kernel breakpoint, causing the systemto stop and control to transfer to the kernel debugger. The kerneldebugger then emits a string denoting the DTrace probe that triggered theaction. For example, suppose you performed the following action:

# dtrace -w -n ‘clock:entry {breakpoint()}'dtrace: description 'clock:entry' matched 1 probedtrace: allowing destructive actions

On the Solaris™ Operating System running on SPARC®, you might seethe following on the console:

dtrace: breakpoint action at probefbt:genunix:clock:entry (ecb 30002765700)Type ‘go’ to resumeok

On Solaris running on x86, you might see the following on the console:

dtrace: breakpoint action at probefbt:genunix:clock:entry (ecb d2b97060)stopped at int2O+Oxb: retkadb [0]:

The address following the probe description is the address of the enablingcontrol block (ECB) within DTrace. You can use it to learn more detailsabout the probe enabling that induced the breakpoint action.

Note that a mistake with the breakpoint () action can cause it to be calledfar more often than intended. This can in turn prevent you from eventerminating the DTrace consumer that is inducing the breakpoint actions.If you find yourself in this situation, set the kernel integer variabledtrace_destructive_disallow to 1. This disallows all destructiveactions on the machine. This setting should be used only if you findyourself in this particular situation.

Page 251: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Destructive Actions

Actions and Subroutines A-19Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

The exact method for setting dtrace_destructive_disallow dependson the kernel debugger that you are using. If you are using OpenBoot™PROM on SPARC, follow these steps:

1. Use w! as follows:

ok 1 dtrace_destructive_disallow w!ok

2. Confirm that this has been set using w?:

ok dtrace_destructive_disallow w?1ok

3. Continue by using go:

ok go

If you are using the kadb (1M) debugger on x86, follow these steps:

1. Use the 4-byte write modifier (W) with the / formatting dcmd:

kadb[0]: dtrace_destructive_disallow/w 1dtrace_destructive_disallow: 0x0 = 0xlkadb[0]:

2. Continue by entering :c :

kadb[0]: :c

If you wish to re-enable destructive actions after continuing, you mustexplicitly reset dtrace_destructive_disallow back to 0. You do thisusing the mdb(1) debugger:

# echo “dtrace_destructive_disallow/W 0” | mdb -kwdtrace_destructive_disallow: 0xl = 0x0#

The void panic(void) Action

The panic () action induces a kernel panic when triggered. Use this actionto force a system crash dump at a time of interest. The panic () action canbe used together with ring buffering and postmortem analysis tounderstand a problem. When you use the panic () action, you see a panicmessage that denotes the probe inducing the panic. For example:

panic[cpu0]/thread=300Ol83Ob80: dtrace: panic action atprobesyscall::mmap:entry (ecb 300000acfc8)

Page 252: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Destructive Actions

A-20 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

000002al0050b840 dtrace:dtrace_probe+518 (fffe, 0, 1830f88,1830f88, 30002fb8040, 300000acfc8)%l0-3: 0000000000000000 00000300030e4d80 000003000341800000000300018c0800%l4-7: 000002a10050b980 0000000000000500 00000000000000000000000000000502000002a10050ba30 genunix:dtrace_systrace_syscall32+44 (0,2000, 5, 80000002, 3, 1898400)%l0-3: 00000300030de730 0000000002200008 00000000000000e0000000000184d928%l4-7: 00000300030de000 0000000000000730 00000000000000730000000000000010

syncing file systems... 2 donedumping to /dev/dsk/cOtOdOsl, offset 214827008, content:kernel100% done: 11837 pages dumped, compression ratio 4.66, dumpsucceededrebooting...

In addition, the syslogd (1M) emits a message upon reboot:

Jun 10 16:56:31 machinel savecore: [ID 570001 auth.error]reboot after panic:dtrace: panic action at probe syscall::mmap:entry (ecb300000actc8)

The message buffer of the crash dump will also contain the probe andECB responsible for the panic () action.

The void chill(int nanoseconds ) Action

The chill () action causes DTrace to spin for the specified number ofnanoseconds. This action is primarily useful for exploring problems thatmight be timing related. For example, you can use it to open racecondition windows, or to bring periodic events into or out of phase withone another.

Page 253: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Special Actions

Actions and Subroutines A-21Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

Because interrupts are disabled while in DTrace probe context, any use ofthe chill () action induces interrupt latency, scheduling latency, dispatchlatency, and so on. The chill () action can, therefore, cause strangesystemic effects, and should not be used indiscriminately. Moreover,because the liveness of the system relies on being able to periodicallyhandle interrupts, DTrace refuses to implement the chill () action forlonger than 500 milliseconds within any given one-second interval, andinstead reports an illegal operation error:

# dtrace -w -n 'syscall::open:entry {chill(500000001)}'dtrace: description 'syscall::open:entry ' matched 1 probedtrace: allowing destructive actionsdtrace: error on enabled probe ID 2 (ID 18022:syscall::open:entry): illegal operation in action #1

The cap is enforced even if the time is spread across multiple calls tochill (), or if the time is spread across multiple DTrace consumers for asingle probe.

Special Actions

Some actions do not fall into either the data recording action or thedestructive action category. These other special actions fall into one of twosets. The first set contains those actions associated with speculative tracing.The second set contains the exit () action.

Actions Associated With Speculative Tracing

Three actions are associated with speculative tracing:

● speculate(int id )

The speculate () action denotes that the remainder of the probeclause should be traced to the speculative buffer specified by id .

● commit(int id )

The commit () action commits the speculative buffer associated withid .

● discard(int id )

The discard () action discards the speculative buffer associated withid .

Page 254: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Subroutines

A-22 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The void exit(int status) Action

You use the exit () action to immediately stop tracing, and to inform theDTrace consumer that it should cease tracing, perform any finalprocessing, and call exit (3C) with the status specified. Because exit ()does return a status to user-level, it is a data-storing action. Unlike otherdata-storing actions, however, it cannot be speculatively traced. Theexit () action causes the DTrace consumer to exit regardless of bufferpolicy. Note that the data-storing nature of the exit () action means that itcan be dropped.

When the exit () action is called, only DTrace actions already underwayon other CPUs are taken; no subsequent actions are taken on any CPU.The only exception to this is the ENDprobe, which is called after theDTrace consumer has processed the exit () action and has indicated thattracing should stop.

Subroutines

Subroutines differ from actions in that they generally only affect internalDTrace state. There is therefore no such thing as a destructive subroutine,and subroutines never trace data into buffers. Many subroutines haveanalogs in Section 9F or Section 3C of the manual pages; see Intro (9F)and Intro (3), respectively.

The void *alloca(size_t size ) Subroutine

The alloca () subroutine allocates size bytes out of scratch space, andreturns a pointer to the allocated memory. The returned pointer isguaranteed to have 8-byte alignment. Scratch space is only valid for theduration of a clause; memory allocated with alloca () is deallocated whenthe clause completes. If insufficient scratch space is available, no memoryis allocated and an error is generated.

Page 255: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Subroutines

Actions and Subroutines A-23Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

The string basename(char *str ) Subroutine

The basename() subroutine is a D analogue for basename(1); it creates astring that consists of a copy of the specified string, but without any prefixthat ends in /. The returned string is allocated out of scratch memory, andis therefore valid only for the duration of the clause. If insufficient scratchspace is available, basename aborts and an error is generated.

The void bcopy(void * src , void * dest , size_tsize ) Subroutine

The bcopy () subroutine copies the bytes specified by the size variablefrom the memory pointed to by the src variable to the memory pointedto by the dest variable. All of the source memory must lie outside ofscratch memory and all of the destination memory must lie within it; ifthis is not the case, no copying takes place and an error is generated.

The string cleanpath(char * str ) Subroutine

The cleanpath () subroutine creates a string that consists of a copy of thepath indicated by the str variable, but with certain redundant elementseliminated. In particular, “/./ ” elements in the path are removed, and“/../ ” elements are collapsed.

Note that the collapsing of /../ elements is naïve in that the parentcomponent is collapsed without regard to symbolic links. As a result, thecleanpath () subroutine might take a valid path and return a shorter,invalid one. For example, if the path specified by str were“/foo/../bar, ” and /foo were a symbolic link to /net/foo/export ,then cleanpath () would return the string “/bar ” even though bar mightonly be in /net/foo, not in / . This limitation is due to the fact thatcleanpath () is called in the context of a firing probe, where full symboliclink resolution or arbitrary names are not possible. The returned string isallocated out of scratch memory, and is therefore valid only for theduration of the clause. If insufficient scratch space is available, cleanpathaborts and an error is generated.

Page 256: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Subroutines

A-24 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The void *copyin(uintptr_t addr , size_t size )Subroutine

The copyin () subroutine copies the specified size in bytes from thespecified user address into a DTrace scratch buffer, and returns theaddress of this buffer. The user address is interpreted as an address in thespace of the process associated with the current thread. The resultingbuffer pointer is guaranteed to have 8-byte alignment. The address inquestion must correspond to a faulted-in page in the current process. Ifthe address does not correspond to a faulted-in page, or if insufficientscratch space is available, NULL is returned, and an error is generated.

The string copyinstr(uintptr_t addr ) Subroutine

The copyinstr () subroutine copies a null-terminated C string from thespecified user address into a DTrace scratch buffer, and returns theaddress of this buffer. The user address is interpreted as an address in thespace of the process associated with the current thread. The string lengthis limited to the value set by the strsize option. As with the copyinsubroutine, the specified address must correspond to a faulted-in page inthe current process. If the address does not correspond to a faulted-inpage, or if insufficient scratch space is available, NULLis returned, and anerror is generated.

The void copyinto(uintptr_t addr , size_t size ,void * dest ) Subroutine

The copyinto () subroutine copies the specified size in bytes from thespecified user address into the DTrace scratch buffer specified by the destvariable. The user address is interpreted as an address in the space of theprocess associated with the current thread. The address in question mustcorrespond to a faulted-in page in the current process. If the address doesnot correspond to a faulted-in page, or if any of the destination memorylies outside scratch space, no copying takes place, and an error isgenerated.

Page 257: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Subroutines

Actions and Subroutines A-25Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

The string dirname(char * str ) Subroutine

The dirname () subroutine is a D analogue for dirname (1); it creates astring that consists of all but the last level of the path name specified bystr . The returned string is allocated out of scratch memory, and istherefore valid only for the duration of the clause. If insufficient scratchspace is available, dirname aborts and an error is generated.

The size_t msgdsize(mblk_t * mp) Subroutine

The msgdsize () subroutine returns the number of bytes in the datamessage pointed to by the mpvariable. See msgdsize (9F) for details. Notethat msgdsize () only includes data blocks of type M_DATAin the count.

The size_t msgsize(mblk_t * mp) Subroutine

The msgsize () subroutine returns the number of bytes in the messagepointed to by the mpvariable. Unlike the msgdsize () subroutine, whichreturns only the number of data bytes, msgsize () returns the total numberof bytes in the message.

The int mutex_owned(kmutex_t * mutex ) Subroutine

The mutex_owned () subroutine is an implementation of themutex_owned (9F) command. The mutex_owned () subroutine returns non-zero if the calling thread currently holds the specified kernel mutex, orzero if the specified adaptive mutex is currently unowned.

The kthread_t *mutex_owner(kmutex_t * mutex )Subroutine

The mutex_owner () subroutine returns the thread pointer of the currentowner of the specified adaptive kernel mutex. The mutex_owner ()subroutine returns NULL if the specified adaptive mutex is currentlyunowned, or if the specified mutex is a spin mutex. See mutex_owned (9F).

Page 258: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Subroutines

A-26 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

The int mutex_type_adaptive(kmutex_t * mutex )Subroutine

The mutex_type_adaptive () subroutine returns non-zero if the specifiedkernel mutex is of type MUTEX_ADAPTIVE, or zero if it is not. Mutexes areadaptive if they are:

● Declared statically

● Created with an interrupt block cookie of NULL, or

● Created with an interrupt block cookie that does not correspond to ahigh-level interrupt.

See mutex_init (9F) for more details on mutexes. The great majority ofmutexes in the Solaris kernel are adaptive.

The int progenyof(pid_t pid) Subroutine

The progenyof () subroutine returns non-zero if the calling process (theprocess associated with the thread that is currently triggering the matchedprobe) is among the progeny of the specified process ID.

The int rand(void) Subroutine

The rand () subroutine returns a pseudo-random integer. The numberreturned is a weak pseudo-random number, and should not be used forany cryptographic application.

The int rw_iswriter(krwlock_t * rwlock)Subroutine

The rw_iswriter () subroutine returns non-zero if the specified reader-writer lock is either held or desired by a writer. If the lock is neither heldnor desired by any writers (that is, it is held only by readers and no writeris blocked, or it is not held at all), rw_iswriter () returns zero. Refer torw_init (9F).

Page 259: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Subroutines

Actions and Subroutines A-27Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

The int rw_write_held(krwlock_t * rwlock)Subroutine

The rw_write_held () subroutine returns non-zero if the specified reader-writer lock is currently held by a writer. If the lock is held only by readersor not held at all, rw_write_held () returns zero. See rw_init (9F).

The int speculation(void) Subroutine

The speculation () subroutine reserves a speculative trace buffer for usewith the speculate () action, and returns an identifier for this buffer.

The string strjoin(char * str1 , char * str2 )Subroutine

The strjoin () subroutine creates a string that consists of the strlvariable concatenated with the str2 . variable. The returned string isallocated out of scratch memory, and is therefore valid only for theduration of the clause. If insufficient scratch space is available, strjoinaborts and an error is generated.

The size_t strlen(string str ) Subroutine

The strlen () subroutine returns the length of the specified string in bytes,excluding the terminating null byte.

Page 260: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New
Page 261: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

B-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Appendix B

D Built-in and Macro Variables

This appendix describes and lists:

● Built-in variables provided by the D language

● Macro variables provided by the D language

Page 262: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Built-in Variables

B-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Built-in Variables

You have seen a number of special built-in D variables in the exampleprograms, including timestamp , pid , and others. All of these variablesare scalar global variables; currently D does not define thread-localvariables, clause-local variables, or built-in associative arrays. Table B-1shows the complete list of D built-in variables.

Table B-1 DTrace Built-in Variables

Type and Name Description

int64_t arg0, ..., arg9 The first ten input arguments to a probe represented as raw64-bit integers. If fewer than ten arguments are passed tothe current probe, the remaining variables return zero.

args[] The typed arguments to the current probe, if any. Theargs[] array is accessed using an integer index, but eachelement is defined to be the type corresponding to thegiven probe argument. For example, if args[] isreferenced by a read (2) system call probe, args[0] is oftype int , args[1] is of type void * , and args[2] is oftype size_t .

unintptr_t caller The program counter location of the current thread justbefore entering the current probe.

lwpsinfo_t *curlwpsinfo The lightweight process (LWP) state of the LWP associatedwith the current thread. This structure is described infurther detail in proc (4).

psinfo_t *curpsinfo The process state of the process associated with the currentthread. This structure is described in further detail inproc (4).

kthread_t *curthread The address of the operating system kernel’s internal datastructure for the current thread, the kthread_t structure.The kthread_t is defined in <sys/thread.h> .

string cwd The name of the current working directory of the processassociated with the current thread.

epid The enabled probe ID (EPID) for the current probe. Thisinteger uniquely identifies a particular probe that isenabled with a specific predicate and set of actions.

int errno The error value returned by the last system call executedby this thread.

Page 263: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Built-in Variables

D Built-in and Macro Variables B-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A

string execname The name that was passed to exec (2) to execute the currentprocess.

uint_t id The probe ID for the current probe. This is the system-wideunique identifier for the probe as published by DTrace andlisted in the output of dtrace -l .

uint_t ipl The interrupt priority level (IPL) on the current CPU atprobe firing time.

pid_t pid The process ID of the current process.

string probefunc The function name portion of the current probe’sdescription.

string probemod The module name portion of the current probe’sdescription.

string probename The name portion of the current probe’s description.

string probeprov The provider name portion of the current probe’sdescription.

string root The name of the root directory of the process associatedwith the current thread.

unit_t stackdepth The current thread’s stack frame depth at probe firing time.

id_t tid The thread ID of the current thread. For threads associatedwith user processes, this value is equal to the result of a callto pthread_self (3C).

unint64_t timestamp The current value of a nanosecond timestamp counter. Thiscounter increments from an arbitrary point in the past andshould only be used for relative computations.

unint64_t uregs[] The current thread’s saved user-mode register values atprobe firing time.

unint64_t vtimestamp The current value of a nanosecond timestamp counter thatis virtualized to the amount of time that the current threadhas been running on a CPU, minus the time spent inDTrace predicates and actions. This counter incrementsfrom an arbitrary point in the past and should only be usedfor relative time computations.

Table B-1 DTrace Built-in Variables (Continued)

Type and Name Description

Page 264: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Macro Variables

B-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Macro Variables

The D compiler defines a set of built-in macro variables that you can usewhen writing D programs or interpreter files. Macro variables areidentifiers that are prefixed with a dollar sign ($) and are expanded onceby the D compiler when processing your input file. Table B-2 shows thecomplete list of D macro variables.

Table B-2 D Macro Variables

Name Description Reference

$[0-9]+ Macro arguments See Module 2, “Built-inMacro Variables”

$egid Effective group ID getegid (2)

$euid Effective user ID geteuid (2)

$gid Real group ID getgid (2)

$pid Process ID getpid (2)

$pgid Parent group ID getpgid (2)

$ppid Parent process ID getppid (2)

$projid Project ID getprojid (2)

$sid Session ID getsid (2)

$taskid Task ID getatskid (2)

$uid Real user ID getuid (2)

Page 265: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

C-1Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Appendix C

D Operators

This appendix defines and describes the following D operators:

● Arithmetic operators

● Relational operators

● Logical operators

● Bitwise operators

● Assignment operators

● Increment and decrement operators

This appendix also describes conditional expressions.

Page 266: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Arithmetic Operators

C-2 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Arithmetic Operators

D provides the standard arithmetic operators for use in your programs.These operators all have the same meaning as they do in ANSI-C forinteger operands. Table C-1 shows the D binary arithmetic operators.

Arithmetic in D can only be performed on integer operands or onpointers. Arithmetic cannot be performed on floating-point operands in Dprograms. The DTrace execution environment does not take any action oninteger overflow or underflow; you must check for these conditionsyourself in situations where they are applicable.

The DTrace execution environment does automatically check for andreport division by zero errors resulting from improper use of the / and %operators. If a D program executes an invalid division operation, DTraceautomatically disables the affected instrumentation and reports the errorto you. Errors detected by DTrace have no effect on other DTrace users oron the operating system kernel, so you do not need to worry aboutcausing any damage if your D program inadvertently contains one ofthese errors.

In addition to these binary operators, the + and - operators can also beused as unary operators; these have higher precedence than any of thebinary arithmetic operators. The order of precedence and associativityproperties for all the D operators is summarized at the end of thisAppendix. You can control precedence by grouping expressions inparentheses ( ) .

Table C-1 D Binary Arithmetic Operators

Operator Meaning

+ Integer addition

- Integer subtraction

* Integer multiplication

/ Integer division

% Integer modulus

Page 267: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Relational Operators

D Operators C-3Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Relational Operators

D provides binary relational operators for use in your programs. Theseoperators all have the same meaning as they do in ANSI-C. Table C-2shows the D relational operators.

Relational operators are most frequently used to write D predicates. Eachoperator evaluates to a value of type int, which is equal to 1 if thecondition is true, and 0 if it is false.

Relational operators can be applied to pairs of integers, pointers, orstrings. If pointers are compared, the result is equivalent to an integercomparison of the two pointers interpreted as unsigned integers. If stringsare compared, the result is determined as if by performing a strcmp (3C)on the two operands. Here are some example D string comparisons andtheir results:

Relational operators can also be used to compare a data object associatedwith an enumeration type with any of the enumerator tags defined by theenumeration. Enumerations are a facility for creating named integerconstants.

Table C-2 D Relational Operators

Operator Meaning

< Left-hand operand is less than right-hand operand

<= Left-hand operand is less than or equal to right-handoperand

> Left-hand operand is greater than right-hand operand

>= Left-hand operand is greater than or equal to right-handoperand

== Left-hand operand is equal to right-hand operand

!= Left-hand operand is not equal to right-hand operand

“coffee” < “espresso” ... returns 1 (true)

“coffee” == “coffee” ... returns 1 (true)

“coffee” >= “mocha” ... returns 0 (false)

Page 268: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Logical Operators

C-4 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Logical Operators

D provides binary logical operators for use in your programs. Table C-3shows the D logical operators. The first two are equivalent to thecorresponding ANSI-C operators.

Logical operators are most frequently used in writing D predicates. Thelogical AND operator performs short-circuit evaluation: if the left-handoperand is false, the right-hand expression is not evaluated. The logicalOR operator also performs short-circuit evaluation: if the left-handoperand is true, the right-hand expression is not evaluated. The logicalXOR operator does not short-circuit: both expression operands are alwaysevaluated.

In addition to the binary logical operators, the unary ! operator can beused to perform a logical negation of a single operand: it converts a zerooperand into a 1 and a non-zero operand into a 0. By convention, Dprogrammers use ! when working with integers that are meant torepresent Boolean values and == 0 when working with non-Booleanintegers, although both expressions are equivalent in meaning.

The logical operators can be applied to operands of integer type orpointer type. The logical operators interpret pointer operands as unsignedinteger values. As with all logical and relational operators in D, operandsare true if they have a non-zero integer value and false if they have a zerointeger value.

Table C-3 D Relational Operators

Operator Meaning

&& Logical AND: true if both operands are true

|| Logical OR: true if one or both operands are true

^^ Logical XOR: true if exactly one operand is true

Page 269: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Bitwise Operators

D Operators C-5Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Bitwise Operators

D provides binary operators for manipulating individual bits inside ofinteger operands. These operators all have the same meaning as they doin ANSI-C. Table C-4 shows the D bitwise operators.

You use the binary & operator to clear bits from an integer operand. Youuse the binary | operator to set bits in an integer operand. The binary ^operator returns 1 in each bit position where exactly one of thecorresponding operand bits is set.

You use the shift operators to move bits left or right in a given integeroperand. Shifting left fills empty bit positions on the right-hand side ofthe result with zeroes. Shifting right using an unsigned integer operandfills empty bit positions on the left-hand side of the result with zeroes.Shifting right using a signed integer operand (an action known as anarithmetic shift operation) fills empty bit positions on the left-hand sidewith the value of the sign bit.

Shifting an integer value by a negative number of bits or by a number ofbits larger than the number of bits in the left-hand operand itself producesan undefined result. The D compiler produces an error message if itdetects this condition when you compile your D program.

In addition to the binary logical operators, you can use the unary ~operator to perform a bitwise negation of a single operand: it convertseach 0 bit in the operand into a 1 bit, and each 1 bit in the operand into a0 bit.

Table C-4 D Bitwise Operators

Operator Meaning

& Bitwise AND

| Bitwise OR

^ Bitwise XOR

<< Shift the left-hand operand left by the number of bitsspecified by the right-hand operand

>> Shift the left-hand operand right by the number of bitsspecified by the right-hand operand

Page 270: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Assignment Operators

C-6 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Assignment Operators

D provides the following binary assignment operators for modifying Dvariables. Remember that you can only modify D variables and arrays:kernel data objects and constants cannot be modified using the Dassignment operators. The assignment operators have the same meaningas they do in ANSI-C. Table C-5 shows the D assignment operators.

Table C-5 D Assignment Operators

Operator Meaning

= Set the left-hand operand equal to the right-hand expressionvalue

+= Increment the left-hand operand by the right-handexpression value

-= Decrement the left-hand operand by the right-handexpression value

*= Multiply the left-hand operand by the right-hand expressionvalue

/= Divide the left-hand operand by the right-hand expressionvalue

%= Modulo the left-hand operand by the right-hand expressionvalue

|= Bitwise OR the left-hand operand with the right-handexpression value

&= Bitwise AND the left-hand operand with the right-handexpression value

^= Bitwise XOR the left-hand operand with the right-handexpression value

<<= Shift the left-hand operand left by the number of bitsspecified by the right-hand expression value

>>= Shift the left-hand operand right by the number of bitsspecified by the right-hand expression value

Page 271: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Assignment Operators

D Operators C-7Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

With the exception of the assignment operator =, the assignment operatorsare provided as short-hand for using the operator with one of the otheroperators described previously. For example, the expression x = x + 1 isequivalent to the expression x += 1 , except that the expression x isevaluated once. These assignment operators obey the same rules foroperand types as the binary forms described previously.

The result of any assignment operator is an expression equal to the newvalue of the left-hand expression. You can use the assignment operators,or any of the operators described so far, in combination to formexpressions of arbitrary complexity. You can use parentheses ( ) to groupterms in complex expressions.

Page 272: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Increment and Decrement Operators

C-8 Dynamic Performance Tuning and Troubleshooting With DTraceCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Increment and Decrement Operators

D provides the special unary ++ and -- operators for incrementing anddecrementing pointers and integers. These operators have the samemeaning as they do in ANSI-C. They can only be applied to variables, andcan be applied either before or after the variable name. If the operatorappears before the variable name, the variable is first modified and theresulting expression is equal to the new value of the variable. Forexample, the following two expressions produce identical results:

If the operator appears after the variable name, the variable is modifiedafter its current value is returned for use in the expression. For example,the following two expressions produce identical results:

You can use the increment and decrement operators to create newvariables without declaring them. If you omit a variable declaration andapply the increment or decrement operator to a variable, the variable isimplicitly declared to be of type int64_t .

You can apply the increment and decrement operators to integer orpointer variables. When applied to integer variables, the operatorsincrement or decrement the corresponding value by one. When applied topointer variables, the operators increment or decrement the pointeraddress by the size of the data type referenced by the pointer.

x += 1;y = x;

y = ++x;

y = x;x -= 1;

y = x--;

Page 273: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New

Conditional Expressions

D Operators C-9Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A

Conditional Expressions

Although D does not provide support for if-then-else constructs, it doesprovide support for simple conditional expressions using the ? and :operators. These operators permit a triplet of expressions to be associatedwhere the first expression is used to conditionally evaluate one of theother two. For example, the following D statement can be used to set avariable x to one of two strings, depending on the value of i :

x = i == 0 ? “zero” : “non-zero”;

In this example, the expression i == 0 is first evaluated to determine if itis true or false. If the first expression is true, the second expression isevaluated and the ?: expression returns its value. If the first expression isfalse, the third expression is evaluated and the ?: expression return itsvalue.

As with any D operator, you can use multiple ?: operators in a singleexpression to create more complex expressions. For example, thefollowing expression takes a char variable c containing one of thecharacters 0-9 , a-z , or A-Z and returns the value of this character wheninterpreted as a digit in a hexadecimal (base 16) integer:

hexval = (c >= ‘0’ && c <= ‘9’) ? c - ‘0’ :(c >= ‘a’ && c <= ‘z’) ? c + 10 - ‘a’ : c + 10 - ‘A’;

The first expression used with ?: must be a pointer or integer in order tobe evaluated for its truth value. The second and third expressions can beof any compatible types. You cannot construct a conditional expression inwhich, for example, one path returns a string and another an integer. Thesecond and third expressions also cannot invoke a tracing function, suchas trace () or printf (). If you want to trace data conditionally, you shoulduse a predicate instead.

Page 274: Dynamic Performance Tuning and Troubleshooting With DTrace (SA-327-S10) --New