the answer to free memory, swap, oracle and everything a presentation about using memory where...
TRANSCRIPT
The Answer to Free Memory, Swap, Oracle and everything
A presentation about using memory where it’s needed mostChristo Kutrovsky
The Pythian Group
2007 April
The Answer to Free Memory, Swap, Oracle and everything
A presentation about using memory where it’s needed mostChristo Kutrovsky
The Pythian Group
2007 April
The 45 minutes version
Who Am I?
Joined Pythian in 2003 Became team lead for one of Pythian's service
delivery teams in 2006 Notable clients: Palm Coast Data,
Freshdirect.com Presented at Collaborate '06, '07, RMOUG Special interest in 11g, RAC, Disk IO
performance, and memory Pythian's delegate to the 11g beta, participated at
the camp level (two visits)
Who is Pythian?
Provides turnkey global data architecture and operations teams on a linear-cost-to-effort basis
Founded in 1997, headquartered in Ottawa, Canada, with offices in India and Australia
Supporting almost 100 clients worldwide and more than 600 production databases
Almost 50 production engineers engaged in client service delivery
Broad data infrastructure expertise primarily focused on Oracle, Microsoft SQL Server, and MySQL on enterprise hardware
Agenda
Types of memory Virtual Memory areas How do we monitor memory usage
And make sense of it Oracle examples Case studies
Questions
How many developers How many managing linux How many managing unix (AIX, solaris) How many have root access How many have control of database
memory consumption
Terminology
What is memory The ability of a computer system to store data
Types of Memory
Short term RAM (memory)
Long term (“permament”) Disk, tape (storage)
Types of Memory - physical
CPU Registers fastest, very limited
CPU Cache (L1/L2/L3) some latency, LRU maintained
RAM major latency (relatively), partially LRU
Disk do something else while you wait
What is RAM
Faster, temporary storage A work area A place where you put your data while
you process it
The Many caches
CPU
CPU Registers2 ns CPU
Cache8 ns1:4
Main Memory (RAM)100ns1:12
Disk – Long term memory3’000’000 ns
1:30’000
TAPE – even longer
CPU Cache & CPU Registers
CPU Registers – your two hands (or more) You use them to hold the items while you
work on them CPU Cache – your desk
You use it as a quickly accessible location to store your most used items
Represents your current tasks
Main Memory - RAM
RAM – Random Access Memory It’s like your office
Need to get up from your desk to grab items to work on
You usually grab multiple at a time to save roundtrips
Our office
CPU
Your hands2 seconds CPU
Cache“Desk”4 sec.
Main Memory (RAM)“Your office”12 seconds
Disk“Flying to Australia”
8 hours
TAPE – use a cargo ship to go
Growing your office
You always need more Your “office” needs to handle all your
active clients, or they will be unhappy Running out of space in your office is not
acceptable
The Disk – extending the memory
The Solution? Ship some of your least needed binders to
Australia Relatively complex process
need to find the least needed binders need to know how to return them, when they
are needed
Introduction to virtual memory
Processes “see” memory independently, as if it was alone on the system
Each process has freedom to use addresses in the whole “user address space”
Typically – 3 Gb user space, 1 Gb system space (on 32 bit)
Virtual memory mapping
P1
P2
32 bit addressing space
0 gb 1 gb 2 gb 3 gb 4 gb
RAM
RAM split into 4 kb chunks
Reservedvirtualregion
for the system
(kernel)
VM Management
Implemented via per process page table Indicates:
page location (disk/memory) page permissions (read/write/execute) page attributes (ex. copy on write)
Virtual memory PTE table
P1
PTE Table for P1rw – in RAM – 0xFFArw – in RAM – 0xFFB
in RAM – 0xFFC – copy on write w – unallocated
rw – on disk - SWAPrx – on disk - FILE
unallocated
RAM
FILE SWAP
Additional benefits from VM
Protection Features
memory mapped files in memory file system shared memory shared memory – copy on write
Use more then what you have
Concept types of memory
Shared initially exists on disk
file cache(linux), buffers, system cache initially does not exist on disk
anonymous(linux), computed(aix)
Private does not exist on disk special case copy on write
Linux VM Components
direct “user” dependant types of memory Buffers (shared) Cached (shared) Anonymous (private or shared) Hugepages
indirect (system) managed areas Slab – kernel structures PageTables
VM areas with Oracle
Cached
SLAB
Pagetables
System User
Buffers
Mapped
IPC Memory (SGA)
Anonymous (PGA,PLSQL arrays)
Monitoring
Monitoring Memory
with Oracle in mind
TOP
top most commonly used tool most confused interpretation
top – sample outputtop - 22:03:11 up 3:19, 2 users, load average: 2.98, 1.22, 0.52Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombieCpu0 : 0.7% us, 0.8% sy, 0.0% ni, 0.3% id, 98.0% wa, 0.2% hi, 0.0% siCpu1 : 0.0% us, 0.8% sy, 0.0% ni, 97.6% id, 1.4% wa, 0.2% hi, 0.0% siCpu2 : 0.0% us, 0.2% sy, 0.0% ni, 99.7% id, 0.2% wa, 0.0% hi, 0.0% siCpu3 : 0.2% us, 0.2% sy, 0.0% ni, 33.6% id, 66.1% wa, 0.0% hi, 0.0% siMem: 8310308k total, 8049068k used, 261240k free, 36620k buffersSwap: 7823644k total, 572k used, 7823072k free, 3395900k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8494 oracle 16 0 1662m 1.6g 1.5g D 2.0 19.8 0:03.15 oracletest (LOCAL=YES) 4796 oracle 16 0 1626m 1.5g 1.5g S 1.0 19.5 0:03.91 ora_dbw1_test
4794 oracle 15 0 1626m 1.5g 1.5g S 0.7 19.5 0:12.23 ora_dbw0_test
4798 oracle 16 0 1626m 1.5g 1.5g S 0.7 19.5 0:03.97 ora_dbw2_test
4800 oracle 16 0 1626m 1.5g 1.5g S 0.7 19.5 0:04.09 ora_dbw3_test
1 root 16 0 2384 600 512 S 0.0 0.0 0:00.86 init [3]
2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 [migration/0]
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 [ksoftirqd/0]
Top – data comes from
/proc/<pid>/statuscat /proc/10450/statusName: oracleState: S (sleeping)SleepAVG: 98%Tgid: 10450Pid: 10450PPid: 1TracerPid: 0Uid: 503 503 503 503Gid: 503 503 503 503FDSize: 256Groups: 503 603
VmSize: 83424 kBVmLck: 0 kBVmRSS: 1484204 kBVmData: 1612 kBVmStk: 124 kBVmExe: 52720 kBVmLib: 8420 kB…
top – additional columns
top can have additional columns swap file usage
computed code data
THEY ARE ALL WRONG
vmstatvmstat 2procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3631424 11096 120204 0 0 35 31 255 20 0 0 99 0 0 0 0 3631488 11096 120204 0 0 0 0 1014 18 0 0 100 0 0 0 0 3631488 11096 120204 0 0 0 0 1012 16 0 0 100 0
r – run queue – how many processes currently waiting for or running on the CPU
b – how many processes waiting, usually waiting on IO swpd – swap memory usage free – free memory cache – file system cache
vmstat contvmstat 2procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3631424 11096 120204 0 0 35 31 255 20 0 0 99 0 0 0 0 3631488 11096 120204 0 0 0 0 1014 18 0 0 100 0 0 0 0 3631488 11096 120204 0 0 0 0 1012 16 0 0 100 0
si/so – swap in / out – in Kb/sec bi/bo – bytes in / out – in Kb/sec cs – context switches us/sy/id/wa – user/system/idle/wait time for CPUs
/proc/meminfo
cat /proc/meminfoMemTotal: 8310308 kBMemFree: 93448 kBBuffers: 132036 kBCached: 3413324 kBSwapCached: 0 kBActive: 1658252 kBInactive: 1942032 kBHighTotal: 7470528 kBHighFree: 8768 kBLowTotal: 839780 kBLowFree: 84680 kB
SwapTotal: 7823644 kBSwapFree: 7823072 kBDirty: 100 kBWriteback: 0 kBMapped: 82500 kBSlab: 92028 kBCommitted_AS: 490700 kBPageTables: 3952 kBVmallocTotal: 106488 kBVmallocUsed: 5964 kBVmallocChunk: 99900 kBHugePages_Total: 2200HugePages_Free: 1088Hugepagesize: 2048 kB
/proc/meminfo – 64 bit
cat /proc/meminfoMemTotal: 8165032 kBMemFree: 106428 kBBuffers: 219484 kBCached: 2864760 kBSwapCached: 69256 kBActive: 1508428 kBInactive: 1915392 kBHighTotal: 0 kBHighFree: 0 kBLowTotal: 8165032 kBLowFree: 106428 kB
SwapTotal: 4816888 kBSwapFree: 4192148 kBDirty: 252 kBWriteback: 0 kBMapped: 1350480 kBSlab: 461584 kBCommitLimit: 6851404 kBCommitted_AS: 4959776 kBPageTables: 46668 kBVmallocTotal: 536870911 kBVmallocUsed: 2992 kBVmallocChunk: 536867847 kBHugePages_Total: 2000HugePages_Free: 128Hugepagesize: 2048 kB
MemTotal
Total memory visible by the OS If it’s not what you’ve put in the machine,
probably you have a bad SIM/DIMM
MemFree
Memory that is currently un-occupied and available to use immediately
Not the maximum amount of memory available at the moment
Controlled by (Linux RH4) /proc/sys/vm/min_free_kbytes
MemFree – example
grep MemFree /proc/meminfoMemFree: 26568 kBecho 900000 > /proc/sys/vm/min_free_kbytesgrep MemFree /proc/meminfo MemFree: 210056 kB
Buffers
Cache of raw disk blocks Usually occupied with ext3 metadata
Mostly ext3 pointers (extent management) Not the cache of actual user data
In older kernels, was controllable
Cached
File system cache If direct IO is not used for datafiles – will have your
datafiles cached Binary (for execution) memory
includes the “oracle” binary caching all the libraries caching
Does not mean “occupied” – usually can be released immediately
The Oracle SGA – when not using hugepages
Cached – example part 1
[root@ ~]# cat /proc/meminfo …MemFree: 8232512 kBBuffers: 9328 kBCached: 28372 kB…du -smc indx01_*1714 indx01_01.dbf1761 indx01_02.dbf1722 indx01_03.dbf5197 total…cat indx01_* > /dev/null
Cached – example part 2
[root@ ~]# vmstat 2procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 8093888 10808 163392 0 0 0 0 1012 17 0 0 100 0 0 0 0 8093952 10808 163392 0 0 0 0 1012 16 0 0 100 0 0 1 0 7956736 10948 300272 0 0 68602 0 1567 1126 0 2 76 22 0 1 0 7808576 11092 448068 0 0 73992 80 1623 1210 0 2 75 23… 0 1 0 2847616 16104 5397616 0 0 65792 0 1542 1076 0 2 75 23 0 0 0 2766272 16180 5479180 0 0 40698 0 1341 675 0 1 85 14 0 0 0 2766208 16192 5479168 0 0 0 114 1033 22 0 0 100 0cat /proc/meminfo …MemFree: 2766464 kBBuffers: 16192 kBCached: 5479168 kB…
Cached – example #2 part 1cat indx01_* >newfilevmstat 2procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 2765312 17044 5479356 0 0 0 0 1012 17 0 0 100 0 0 3 0 2405376 17428 5833612 0 0 16 36866 1324 144 1 18 76 6 0 2 0 2143616 17688 6091532 0 0 4 111748 2000 213 0 16 50 34… 0 1 0 16832 6784 8198556 0 0 8556 26684 1942 1267 0 2 74 24 1 1 0 16832 6856 8198744 0 0 12518 20720 2130 1767 0 3 74 23…cat /proc/meminfo …MemFree: 16768 kBBuffers: 2192 kBCached: 8196908 kB…Dirty: 277468 kBWriteback: 0 kB…
Cached – example #2 part 2cat /proc/meminfo …MemFree: 20672 kBBuffers: 3300 kBCached: 8191900 kB…Dirty: 0 kBWriteback: 0 kB…rm newfileprocs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 23296 3380 8189480 0 0 0 28 1015 18 0 0 100 0 0 1 0 3257472 3948 4996372 0 0 284 0 1084 160 0 14 78 8 0 1 0 3255552 5828 4996572 0 0 940 0 1247 485 0 1 75 24 0 1 0 3253696 7616 4996344 0 0 884 96 1237 470 0 2 75 23 0 0 0 3253440 7988 4996492 0 0 186 0 1061 112 0 0 95 4 0 0 0 3253440 7988 4996492 0 0 0 0 1012 14 0 0 100 0
Swap
SwapTotal SwapFree SwapCached
written to swap, but still in memory applies only to anonymous memory OS will anticipate memory needs, and pre-swap
inactive data, but keep it in memory Actual swapping (memory that will need to be
read from disk) = SwapTotal - SwapFree - SwapCached
Active/Inactive
Active – recently used memory Includes all types of memory
(cached, buffers, anonymous) OS will try to keep it in RAM
Inactive – memory that will be first reused “free” memory
Can be used to gauge the “working set”
High/Low Total/Free
32 bit limitations, no high memory on 64 bit
Some kernel structures cannot be allocated in “high memory”
Used to be a problem in older kernels, newer kernels protect low memory
Dirty & Writeback
Dirty – cache/buffers memory that requires to be written to disk thresholds can be adjusted
Writeback – memory actively been written to disk Can reach high values with async writes with
large queue
Committed_AS & Mapped Committed_AS
Total memory requested on the system Not used, just requested If every process in the system is to touch and use
the memory it has requested, this is how much would be used
Mapped memory used for in-memory mapped files all anonymous memory includes committed & touched memory
Committed_AS - example
cat grab.c main() {void *p;p=malloc(1073741824);sleep(60);}
cat /proc/meminfo ...MemFree: 3230592 kB...Committed_AS: 49972 kB
./grabcat /proc/meminfo ...MemFree: 3230464 kB...Committed_AS: 1098808 kB
Slab
Slab – “in-kernel data structures cache” similar to Oracle’s “shared_pool” designed to prevent memory fragmentation detailed monitoring:
/proc/slabinfoslabtop
Basically “system space”
slabtop – ordered by cache size
Active / Total Objects (% used) : 88874 / 139343 (63.8%) Active / Total Slabs (% used) : 5839 / 5846 (99.9%) Active / Total Caches (% used) : 90 / 132 (68.2%) Active / Total Size (% used) : 17286.03K / 23311.27K (74.2%) Minimum / Average / Maximum Object : 0.01K / 0.17K / 128.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 32382 24900 76% 0.27K 2313 14 9252K radix_tree_node 56925 40013 70% 0.05K 759 75 3036K buffer_head 364 363 99% 4.00K 364 1 1456K size-4096 2485 2471 99% 0.54K 355 7 1420K ext3_inode_cache 2376 413 17% 0.50K 297 8 1188K size-512 256 256 100% 3.00K 128 2 1024K biovec-(256) 4576 4481 97% 0.15K 176 26 704K dentry_cache 10248 4548 44% 0.06K 168 61 672K size-64 4340 1215 27% 0.12K 140 31 560K size-128 1980 316 15% 0.25K 132 15 528K size-256…
HugePages
2Mb pages organized in a separate memory pool locked in memory available only to shared memory requests pre-allocated via kernel parameter
Shared memory mapping
P1
P2
32 bit addressing space
0 gb 1 gb 2 gb 3 gb 4 gb
RAM Reservedvirtualregion
for the system
(kernel)
Shared memory mapping (huge)
P1
P2
32 bit addressing space
0 gb 1 gb 2 gb 3 gb 4 gb
RAM
HugePagesPre-AllocatedMemory pool
Locked in RAM
VLM – 32 bit workarround
32 bit adress space is 4 Gb 32 bit systems with PAE (Intel)
up to 64 Gb of ram Memory filesystem
opens a file in /dev/shm for buffer cache shared pool still in Shared Memory
Beware of small Oracle block size
VLM – using 3gb+ on 32 bit
P1
P2
32 bit addressing space
0 gb 1 gb 2 gb 3 gb 4 gb
RAM
/dev/shm/ora_
ramfsmmap region
USE_INDIRECT_BUFFERS
RedHat/SUSE shmfs – needs size tmpfs – does not need size ramfs – does not need size + Locked
none can use HugePages shared pool can still use HugePages double-memory access due to mapping
DirectIO
Direct IO (O_DIRECT) – bypasses file system cache and access the files directly
DB activity does not pollute OS cache DB activity does not compete with
PGA/PLSQL memory
PageTables Memory for per-process page tables
B-Tree like structure – this number shows leaf blocks space
Memory to manage memory One entry of ~4 bytes per process, per used 4kb of
memory In Oracle’s case, assuming an SGA of 2gb
524’288 pages * 8 bytes = 4 Mb per process 1000 sessions = 4 Gb of memory, to manage
2gb of SGA
Case studies
PageTables using a lot of ram
PageTables – bad example
Config: 1.7 Gb sga (max on 32 bit without VLM) 1400 Mb in db_cache_size table sized to fit exactly in cache
Start 100 sessions, that full scan the table (cached) in order to touch the memory and allocate the PTEs Sessions will wait via dbms_lock.allocate to be released
Show before and after PageTables usage
PageTables – bad example cont. Before starting the sessions (db is UP)cat /proc/meminfo…MemFree: 1070472 kB…Committed_AS: 1881544 kBPageTables: 4932 kB…
After sessions have finished touching the memorycat /proc/meminfo…MemFree: 473496 kB…Committed_AS: 4708552 kBPageTables: 295068 kB
HugePages & Oracle
Locks SGA in memory no part of the SGA will ever be swapped out,
or even considered for swapping Reduces the number of PTE entries
Assuming 2 Gb SGA 1’000 PTEs * 8 bytes = 8 Kb per process
1000 sessions = 4 Mb of memory, a 512 fold reduction
HugePages & Performance Releases more memory for PGA or more
db_cache Guarantees that SGA will always be in
memory Improves TLB hit ratio
TLB is a CPU level cache of virtual to physical memory mappings, improving performance
8% improvement in a memory only TPC test not including the fact there is more memory
available
HugePages & 100 sessions The test from a few slides before Before starting 100 sessions (db up)cat /proc/meminfo …Committed_AS: 332264 kBPageTables: 3056 kB…
After…Committed_AS: 3124640 kBPageTables: 23100 kB…
No hugepages
Cached
SLAB
Pagetables
System User
Buffers
Mapped
IPC Memory (SGA)
Anonymous (PGA,PLSQL arrays)
With HugePages
Cached
SLAB
Pagetables
System User
Buffers
Mapped
IPC Memory (SGA)
Hugepages
Anonymous (PGA,PLSQL arrays)
HugePages – on Red Hat
what you need to setup RH4 /proc/sys/vm/nr_hugepages /proc/sys/vm/hugetlb_shm_group /etc/security/limits.conf
Case studies
Where is my free memory going?
Freshly bootedvmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 7725800 13036 497864 0 0 0 16 1013 25 0 0 100 0 0 1 0 7663272 13144 556516 0 0 14806 70 1164 393 2 2 80 16 0 1 0 7513128 13252 706168 0 0 37494 0 1318 631 2 3 75 20 0 1 0 7310824 13408 908032 0 0 50502 64 1429 862 3 4 75 18... 0 1 0 5503208 14724 2709556 0 0 59144 16 1493 987 3 4 75 18 1 0 0 5263080 14856 2948884 0 0 59838 128 1518 995 3 5 75 18 0 0 0 5111272 14944 3106096 0 0 39344 6 1332 663 2 5 82 11 0 0 0 5111272 14944 3106096 0 0 0 16 1013 26 0 0 100 0 0 0 0 5111144 14960 3106080 0 0 0 30 1016 36 0 0 100 0
Reading ~1.2 Gb of data, free memory drops twice as much file system cache oracle SGA been touched
Freshly booted – hugepagesvmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3436280 14236 286324 0 0 0 56 1027 54 0 0 100 0 0 1 0 3395192 14336 323924 0 0 18902 24 1193 447 2 1 81 16 0 1 0 3303928 14480 415040 0 0 45572 48 1387 775 3 1 75 21… 0 1 0 2566776 15560 1150020 0 0 49228 6 1416 828 3 1 75 20 0 1 0 2452152 15720 1264260 0 0 57230 18 1492 977 3 1 75 20
Reading 1.2 Gb of data, free memory drops with same amount file system cache consuming memory
Freshly booted – with directIOvmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3436280 14236 286324 0 0 0 56 1027 54 0 0 100 0 0 1 0 3395192 14336 323924 0 0 18902 24 1193 447 2 1 81 16 0 1 0 3303928 14480 415040 0 0 45572 48 1387 775 3 1 75 21… 0 1 0 2566776 15560 1150020 0 0 49228 6 1416 828 3 1 75 20 0 1 0 2452152 15720 1264260 0 0 57230 18 1492 977 3 1 75 20
1.2 Gb of data – 1.2 Gb drop in free memory NO CHANGE
DIRECT_IO Bugs
bug 3186847 filesystemio_options=directio is ignored on linux fixed in 9.2.0.6
Note: 297521.1 bug 2448994 introduced - O_DIRECT flag was not
passed to the open() system call fixed in 9.2.0.7
Basically you need 9.2.0.7
Shared memory monitoring
How to see shared memory? ipcs – shows the “IPC” shared memory
If you kill Oracle without freeing up shared memory ipcrm – to remove
ipcs
ipcs------ Shared Memory Segments --------key shmid owner perms bytes nattch status
0x00000000 4915200 oracle 600 2097152 14
0x00000000 4947969 oracle 600 1342177280 14
0x7157be04 4980738 oracle 600 278921216 14
------ Semaphore Arrays --------key semid owner perms nsems 0xb1adfd8c 622592 oracle 640 354
------ Message Queues --------key msqid owner perms used-bytes messages
To remove orphan segments
Identified via “sysresv” or number attached from ipcs or pmap of an oracle pid
Use ipcrm to remove
The End
Thank you,
Questions?
[email protected] my blog at
http://www.pythian.com/blogs/kutrovsky/
Christo Kutrovsky
The Pythian Group
2007 Aprilhttp://www.pythian.com/