draft: python for system administrators
DESCRIPTION
Draft of the EP14 TrainingTRANSCRIPT
DRAFTPython for System Administrator
EuroPython 2014, 24th July - Berlin
Roberto Polli - [email protected]
Babel Srl P.zza S. Benedetto da Norcia, 3300040, Pomezia (RM) - www.babel.it
24 July 2014
Roberto Polli - [email protected]
DRAFTAgenda
IntroipythonPath management: 10’Encoding: 10’Data Gathering: 20’
module: psutilmodule: subprocessThe /proc filesystem
Parsing: 60’Regular Expressions
Nosetest Intermezzo: 15’Processing: 45’
DistributionsDeviationCorrelationPlotting Time
End
Roberto Polli - [email protected]
DRAFTWho? What? Why?
• Use python to replace Grep Awk Sed Perl. Speed up your daily job.• Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java
and Python. Red Hat Certified Engineer and Virtualization Administrator.• Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures
based on Open Source software for Italian ISP and PA. Contributes tovarious FLOSS.
Intro Roberto Polli - [email protected]
DRAFTRequirements
• python 2.7+, ipython• course code from github
#git clone https://github.com/ioggstream/python-course• test your environment (eg. psutil, numpy, scipy, matplotlib)
#nosetests -vs test prerequisites.py• first part: nose, psutil• second part: scipy, numpy, matplotlib• ♦optional/advanced content ♦
Intro Roberto Polli - [email protected]
DRAFTHow
• Get ready before starting: code is here on github!• Type everything but #comments and try/except• Type fast with tab-completion and copy-paste• Be curious: inspect and print returned variables• Never ∗ close your iPython session: you’ll lose your precious variables
* (ok, sometimes you can).
Intro Roberto Polli - [email protected]
DRAFTReferences
• irc.freenode.net# python - The Python Community :D• Python Cookbook 3rd ed. O’Reilly - David Beazley and Brian K. Jones• Programming Python 4th ed. O’Reilly - Mark Lutz• Dive into Python3 2nd ed. Apress - Mark Pilgrim• nose.readthedocs.org• github.com/ioggstream/python-course
Intro Roberto Polli - [email protected]
DRAFTiPython I
• Interactive interpreter with tons of functionalities, and the main tool ofour training.
• The most fun way to learn and use python!• Supports tab-completion , readline , inline help• Allows pasting from clipboard with %paste , and multi-line editing with
%edit• Run it enabling plotting support:
# ipython --pylab
ipython Roberto Polli - [email protected]
DRAFTiPython II
# iPython supports inline-help appending ? to an objectstr?
# We can run commands and capture the output in a variable# don’t need to quote using the ! magic on unixret = !cat /etc/hosts
# windows has etc\hosts too ;)ret = !type c: windows\system32\drivers\etc\hosts
ipython Roberto Polli - [email protected]
DRAFTiPython III# returned objects can be filtered withret. grep (’localhost’)# Now get the first space-splitted column of the outputret. fields (0)ret.grep(’localhost’).fields(0)
# And the last returned value is stored inlocalip = _
# We can type long commands in an editor like ‘vi’ using%edit mytmp.py # type print(ret[0]), then exit (eg. wq!)> Editing... done. Executing edited code...
ipython Roberto Polli - [email protected]
DRAFTPath management: Goal
• Normalize paths on different platform• Create, copy and remove folders• Handle errors
modules: os, os.path, shutil, errnosee also: pathlib on Python 3.4+
Path management: 10’ Roberto Polli - [email protected]
DRAFTPath management: os.path, sys
basedir, hosts = "/", "etc/hosts"# Check the hosting platform with the sys modulefrom sys import platformif platform.startswith(’win’):
basedir = ’c:/windows/system32/drivers’
# Always use the os.path module!from os.path import join, normpathhosts = join(basedir, hosts)hosts = normpath(hosts)print("Normalized path is", hosts)
Path management: 10’ Roberto Polli - [email protected]
DRAFTPath management: os.path, sys
• os.path is the best way to manage paths!• multiplatform• safe
• join removes redundant ”/”• normpath fixes ”/” orientation and redundant ”..”• realpath resolves symlinks
And now, a rapid glance to other toolsPath management: 10’ Roberto Polli - [email protected]
DRAFTMove trees: shutil, os, os.path
from os import makedirs # ...tree creation...from os.path import isdir # ...checking...from shutil import copytree, rmtreemakedirs("/tmp/py/foo/bar")
# We can copy a whole tree and test itcopytree("/tmp/py/foo", "/tmp/py/foo2")assert isdir("/tmp/py/foo2/bar")
rmtree("/tmp/py/foo") # ... and finally delete itassert not isdir("/tmp/py/foo/bar")
Path management: 10’ Roberto Polli - [email protected]
DRAFTMove trees: errno
# We can use exception handlers to investigate errorstry:
# python2 does not allow to ignore existing directories...makedirs ("/tmp/py/foo/bar")# ...and raises an OSError
except OSError as e:# Just use the errno module to check the error valueimport errnoassert e.errno == errno.EEXIST
help(makedirs)
Path management: 10’ Roberto Polli - [email protected]
DRAFTEncoding: Goal
• A string more than a sequence of bytes• A string is a couple (bytes, encoding)• Use unicode literals in python2• Manage differently encoded filenames• A string is not a sequence of bytes
modules: os, os.path, glob
Encoding: 10’ Roberto Polli - [email protected]
DRAFTSong of Childhood
Als das Kind Kindwar, ging es mithangenden Armen,wollte der Bach sei einFluß, der Flußsei einStrom, und diesePfutze das Meer.Als das Kind Kindwar, wues nicht, daßesKind war, alles warihm beseelt, und alleSeelen waren eins.Als das Kind Kindwar, hatte es vonnichts eine Meinung,hatte keineGewohnheit, saßoft imSchneidersitz, lief ausdem Stand, hatteeinen Wirbel im Haarund machte keinGesicht beimfotografieren.
“‘When the child was a child,
characters were bytes, and
strings list of bytes”’
Als das Kind Kindwar, fielen ihm dieBeeren wie nurBeeren in die Handund jetzt immer noch,machten ihm diefrischen Walnusse einerauhe Zunge und jetztimmer noch, hatte esauf jedem Berg dieSehnsucht nach demimmer hoheren Berg,und in jeder Stadt dieSehnsucht nach dernoch groStadt, unddas ist immer nochso, griff im Wipfeleines Baums nachdem Kirschen ineinemHochgefuhl wieauch heute noch, eineScheu vor jedemFremden und hat sieimmer noch, wartetees auf den erstenSchnee, und wartet soimmer noch.
Encoding: 10’ Roberto Polli - [email protected]
DRAFTEncoding is a map
# Py3 doesn’t need the uthe_string = u "S\u00fcd" # Sud
# can be encoded in differentin_utf8 = the_string.encode(’utf-8’)in_win = the_string.encode(’cp1252’)
type(in_utf8) == bytes # byte-sequences
# Decoding bytes using the wrong map..# ...gives sad results ;)in_utf8.decode(’cp1252’) # SA1/4d
• Encoding is a one-to-onemap between atypographical characterand a byte-sequence
• Decoding is its reversemap
char ascii utf-8 cp1252a [97] [97] [97]u - [195, 188] [252]
Encoding: 10’ Roberto Polli - [email protected]
DRAFTEnters Encoding
# Filenames are binary data! Be careful when reading from# a (eg. vfat) filesystem!# To make python2 encoding-aware we shouldfrom __future__ import unicode_literals
# Create 3 windows-encoded filenames inbasedir = "/tmp/py"
# using the provided functionfrom course import create_wuerstelstrassecreate_wuerstelstrasse(basedir)
Encoding: 10’ Roberto Polli - [email protected]
DRAFTEncoded filenames: glob
from glob import glob as ls # expands wildcards like a shell.
files = ls("/tmp/py/*.txt") # To avoid encoding issues ...# UnicodeDecodeError : ’ascii’ codec can’t decode byte 0xFC0xFC == 252 # remember the u in cp1252 map?
files = ls( b "/tmp/py/*.txt") #..we explicitly use bytes
Encoding: 10’ Roberto Polli - [email protected]
DRAFTData Gathering: Goal
Gathering System Data with multiplatform and platform-dependent tools.• Get infos from files, /proc and /sys• Capture command output• Use psutil to get IO, CPU and memory data• Parse files with a strategy
modules: psutil, subprocess, os
Data Gathering: 20’ Roberto Polli - [email protected]
DRAFTData Gathering: grep
def grep(needle, fpath):"""is a minimal grep implementation
goal: open() is iterable and doesn’tneed splitlines()
goal: comprehension can filter iterables"""return [x for x in open(fpath) if needle in x]
# Do we have "localhost" in our "/etc/hosts"?grep("localhost", "/etc/hosts")
Data Gathering: 20’ Roberto Polli - [email protected]
DRAFTData Gathering: psutil
# The psutil module is very nice!import psutil
# Works on Windows, Linux and MacOSpsutil.cpu_percent()
# And its output is easy to managepsutil.disk_io_counters()
Exercise: Which other information does psutil provide?
Data Gathering: 20’module: psutil Roberto Polli - [email protected]
DRAFTData Gathering: Exercises
Write a vmstat-like function printing every second:• cpu usage % ;• bytes read and written in the given interval;• Hint: use psutil, time.sleep(1)• Hint: try on ipython and then write the function using
%edit vmstat.py
Data Gathering: 20’module: psutil Roberto Polli - [email protected]
DRAFTData Gathering: subprocess
# The check_output function returns the command stdoutfrom subprocess import check_output
# It takes a list as an argument!out = check_output("ping -w1 -c1 www.google.com". split ())
# and returns a stringprint(out)
Data Gathering: 20’module: subprocess Roberto Polli - [email protected]
DRAFTData Gathering: subprocess, sys
def sh(cmd, shell=False, timeout=0):"""Returns an iterable output of a command string, checking ... """from sys import version_info as python versionif python_version < (3, 3): # ..before using...
if timeout:raise ValueError("Timeout not supported")
output = check_output(cmd.split(), shell=shell)else:
output = check_output(cmd.split(), shell=shell, timeout=timeout)
return output. splitlines ()
Data Gathering: 20’module: subprocess Roberto Polli - [email protected]
DRAFTData Gathering: Exercises
Write a simple pgrep-like function for your OS which:• ppgrep signature is the following
def ppgrep(program):"""@param program - eg. firefox, explorer.exe"""raise NotImplementedError
• prints a list of processes executing ‘program‘;• Hint: use subprocess, os, and list-comprehension
items = [ x for x in a_list if ’firefox’ in x]
Data Gathering: 20’module: subprocess Roberto Polli - [email protected]
DRAFT♦Data Gathering: Parsing /proc I ♦
def linux_threads(pid):"""The Linux /proc filesystem is a cool place to get infos."""from glob import glob # replaces * and ?path = "/proc/{}/task/*/status".format(pid)
# Pick a set of fields to gather...t_info = (’Pid’, ’Tgid’, ’voluntary’) # a tuplefor t_path in glob(path):
# ...and use comprehension to get interesting data.print([x for x in open(t_path)
if x. startswith (t_info)] # accepts tuples!)
Data Gathering: 20’The /proc filesystem Roberto Polli - [email protected]
DRAFTData Gathering: Parsing /proc II
# On Linux, /proc/diskstats is the source of I/O infosdisk_l = grep("sda", "/proc/diskstats")
# To gather that data we put the headers in a multi-line stringfrom course import diskstats_headers as headers
disk_info = disk_l[0].split() # Take the 1st entry, split the datas ...zip(headers, disk_info) # ...and tie them with the headerslist(_) # On py3 you need to iterate the generator!
Data Gathering: 20’The /proc filesystem Roberto Polli - [email protected]
DRAFTData Gathering: Parsing /proc III# Or create a reusable commodity class withfrom collections import namedtuple# using headers as attributes# like the one provided by psutilDiskStats = namedtuple(’DiskStat’, headers )
# ... and disk_info as valuesdstat = DiskStats(*disk_info)dstat.device, dstat.writes_ms
# Homework: check further features withhelp(collections)
Data Gathering: 20’The /proc filesystem Roberto Polli - [email protected]
DRAFTParsing: Goal
• Plan a parsing strategy• Use basic regular expressions: match, search, sub• Benchmarking a parser• Running nosetests• Write a simple parser
modules: re, nose, %timeit
Parsing: 60’ Roberto Polli - [email protected]
DRAFTParsing is hard...
”System Administrators spent 24.3% of their work-life parsingfiles.”∗
*Independent analysis by The GASP1 Society ;)
1Grep Awk Sed PerlParsing: 60’ Roberto Polli - [email protected]
DRAFT...use a strategy!
1. Collect parsing samples2. Play in ipython and collect %history3. Write tests, then the parser4. Eventually benchmark
Parsing: 60’ Roberto Polli - [email protected]
DRAFTParsing postfix logs
# Before writing the parser, collect samples of# the interesting lines. For now justfrom course import mail_sent, mail_delivered
# and \%edit a simpledef test_sent():
hour, host, to = parse_line(mail_sent)assert hour == ’08:00:00’assert to = ’[email protected]’
Parsing: 60’ Roberto Polli - [email protected]
DRAFTParsing lines: split, zip
May 31 08:00:00 test-1 postfix/smtp[169]: 7CD8E730020: to=〈[email protected]〉, relay=mx2.foo.it[10.0.4.5]:25,
...
mail_sent.split() # Start using basic strings in ipython
# Then tie them with zip/zip()fields, counting = _, zip(range(20), _)fields = fields[:7] # We just care for the first 7 values
# and pick fields singularlyhour, host, dest = fields[2], fields[3], fields[6]
Parsing: 60’ Roberto Polli - [email protected]
DRAFTParse: Exercise I
In another window• edit 03 parsing test.py• complete the parse line(line) function
def parse_line(line):"""Write your function and test it
with test_sent()"""raise NotImplementedError
%paste your solution’s code in iPython and run manually the test functions
Parsing: 60’ Roberto Polli - [email protected]
DRAFTPython Regexp
# Python supports regular expressions viaimport re
# We start showing a grep-reloaded functiondef grep(expr, fpath):
one = re.compile(expr) # ...has two lookup methods...assert ( one.match # which searches from ˆ the beginning
and one. search ) # that searches anywhere
with open(fpath) as fp:return [x for x in fp if one.search(x)]
Parsing: 60’Regular Expressions Roberto Polli - [email protected]
DRAFTSplitting with re.split
from re import split # is a very nice function
# Let’s gather some ping statsif sys.platform.startswith(’win’):
cmd = "ping -n10 www.google.it"else:
cmd = "ping -c10 -w10 www.google.it"
# Split for both space and =ping_output = [ split("[ =]", x) for x in sh(cmd)]
Parsing: 60’Regular Expressions Roberto Polli - [email protected]
DRAFTSplitting with re.findall
from re import findall # can be misused too ;)
# eg. for adding the ":" to amac = "00""24""e8""b4""33""20"
# ...using thisre_hex = ’[0-9A-Fa-f]{2}’mac_address = ’:’.join(findall(re_hex, mac))print("The mac address is ", mac_address)
Actually this does a bit of validation, requiring all chars to be in the 0-F range
Parsing: 60’Regular Expressions Roberto Polli - [email protected]
DRAFTBenchmarking in iPython I
• Parsing big files needs benchmarks. iPython %timeit magic is a goodstarting point.test_regexps = ("..", "[a-fA-F0-9]{2}")for re_s in test_regexps:
%timeit ’:’.join(findall (re_s, mac))
• We can even compare compiled and inline regexpimport refor re_s in test_regexps:
re_c = re.compile (re_s)%timeit ’:’.join(re_c.findall (mac))
Parsing: 60’Regular Expressions Roberto Polli - [email protected]
DRAFTBenchmarking in iPython II
Or find other methods:• complex...
from re import sub as sed%timeit sed(r’(..)’, r’\1:’, mac)
• ...or simple%timeit ’:’.join([ mac[i:i+2] for i in range(0,12,2)])
• Outside iPython check the timeit module
Parsing: 60’Regular Expressions Roberto Polli - [email protected]
DRAFT♦Parsing: a real world Example ♦
# Don’t need to type this VSAN configuration script# which uses linux FC information from /sys filesystemfc_id_path = "/sys/class/fc_host/host*/port_name"for x in glob(fc_id_path):
# ...we boldly skip an explicit close()pwwn = open(x).read() # 0x500143802427e66cpwwn = pwwn[2:]# ...and even use the slower but readablepwwn = re.findall(r’..’, pwwn)print("member pwwn ", ’:’.join(pwwn))
Parsing: 60’Regular Expressions Roberto Polli - [email protected]
DRAFTParsing logs: a simple solution
def parse_line(line):import re# using _ we improve readability_, _, hour, host, _, _, dest = line.split()[:7]try:
# and if dest isn’t what we expect...dest = re.split(r’[<>]’,dest)[1]
except IndexError:# ...we set it to Nonedest = None
return (hour, host, dest)
Parsing: 60’Regular Expressions Roberto Polli - [email protected]
DRAFTParsing logs: II
# Now another test for the delivered messages# %edit 03_parsing_testdef test_delivered():
hour, host, destination = parse_line(test_str_2)assert hour == ’08:00:00’# Delivery logs should have destination == Noneassert destination is None
# Exercise: fix parse_line to work with both tests# and save test
Nosetest Intermezzo: 15’ Roberto Polli - [email protected]
DRAFTRunning nosetest
• Now run the following command from a shell# nosetests -vs 03_parsing_test.py03_parsing_test.test_sent ... ok03_parsing_test.test_delivered ... okRan 2 tests in 0.001s
• Nose is a test framework.• Nose runs every file matching test *• Nose runs every function matching test *
Nosetest Intermezzo: 15’ Roberto Polli - [email protected]
DRAFTSimple Test Script
• Open the 02 nosetests simple.py filedef setup():
print("is run before the testsuite, while")def teardown():
print("after all tests")def test_one():
# name a function like test_* to run it!assert 1 == 1
def test_two():# and use assert to test for successassert 1 == 0, "I was expecting 0"
Nosetest Intermezzo: 15’ Roberto Polli - [email protected]
DRAFT♦Complete Test Script: I ♦• A more flexible script is 02 nosetests full.py which uses a Test class
class Test(object):@classmethoddef setup_class(self): # is run once at startup,
# ..eg. to create database structureprint("setup testsuite environment")open("/tmp/test2.out", "w").write("0")
@classmethoddef teardown_class(self): # is run once after all tests to...
print("cleanup testsuite environment")os.unlink("/tmp/test2.out")
Nosetest Intermezzo: 15’ Roberto Polli - [email protected]
DRAFT♦Complete Test Script: II ♦• allowing pre-post testsuite and pre-post test fixtures
class Test(object):...# Using a Test class...def setup(self):
print("is_run_before_every_test") #..and..def teardown(self):
print("after_every_test") # eg truncate a table
# each test can use the prepared environmentdef test_a(self):
assert os.path.isfile("/tmp/test2.out")Nosetest Intermezzo: 15’ Roberto Polli - [email protected]
DRAFTSimple processing: Goal
• Handle gathered data with dict() and zip()• Find data relation with scipy• Get essential information like standard deviation σ and distributions δ• Linear correlation: what’s that, when can help• Plotting
modules: numpy, scipy, scipy.stats.stats, collections, random, time
Processing: 45’ Roberto Polli - [email protected]
DRAFTThe Chicken Paradox
“‘According to latest statistics,it appears that you eat one chicken per year:and, if that doesn’t fit your budget,you’ll fit into statistic anyway,because someone will eat two.”’ C. A. Salustri
Processing: 45’ Roberto Polli - [email protected]
DRAFTSimple processing: ExerciseHow to dismantle the chicken paradox? Gather data!
• Write the following function using our parsing strategydef ping_rtt(seconds=10):
"""@return: a list of ping RTT"""from course import sh# get sample output# find a solution in ipython# test and paste the coderaise NotImplementedError
• Gather 10 seconds of ping output• Hint: reuse the sh() function• Hint: slice and filter lists using comprehension
Processing: 45’Distributions Roberto Polli - [email protected]
DRAFTDistributions: set, defaultdictA distribution or δ shows the frequency of events, like how many people ate xchickens ;)
#Create a simple δ with set and dictd = {x: rtt.count(x) for x in set(rtt)}
# We can even usefrom collections import defaultdictd = defaultdict(int)for x in rtt:
distro[x] += 1
Distributions and Mean are both important!
Processing: 45’Distributions Roberto Polli - [email protected]
DRAFTStandard Deviation: scipy
• Standard deviation or σformula isσ2(X ) :=
∑(x−x)2
n• σ tells if δ is fair or not,
and how much the mean(x) is representative
• matplotlib.mlab.normpdfis a smooth functionapproximating thehistogram
from scipy import std, meanfair = [1, 1] # chickensunfair = [0, 2] # chickensassert mean(fair) == mean(unfair)
# Use standard deviation!std(fair) # 0std(unfair) # 1
Processing: 45’Deviation Roberto Polli - [email protected]
DRAFTSimple processing: scipy
Check your computed values vs the σ returned by ping (didn’t you notice pingreturned it?)"""goal: remember to convert to numeric / float
goal: use scipygoal: check stdev"""
from scipy import std, mean # max,min are builtinrtt = ping_rtt()
print(max(rtt), min(rtt), mean(rtt), std(rtt))
Processing: 45’Deviation Roberto Polli - [email protected]
DRAFTTime Distributions: Exercise
• Parse the provided maillog in ipython using its ! magic and get an hourlyemail δ
• Expected output:time_d = { # mail delivered (removed) between
0: xxx # 00:00 - 00:591: xxx # 01:00 - 01:59..}
Processing: 45’Deviation Roberto Polli - [email protected]
DRAFTTime Distributions: Exercise Solution
# deliveder emails are like the following#May 14 16:00:04 rpolli postfix/qmgr[122]: 4DC3DA: removed"
ret = !grep removed maillog # get the interesting lines
ts = ret.fields(2) # find the timestamp (3rd column)
hours = [ int(ts) for x in ts ]time_d = {x: count(x) for x in set(hours)}
Processing: 45’Deviation Roberto Polli - [email protected]
DRAFTPlotting distributions
# To plot data..from matplotlib import pyplot as plt# and set the interactive modeplt.ion()
# Plotting an histogram...frequency, bins, _ = hist(hours)
# .. returns adistribution = dict(zip(slots,
frequency))
This server works mostly atnight...
Processing: 45’Deviation Roberto Polli - [email protected]
DRAFTSize Distributions: Exercise
• Create a size δ using hist(..., bins=...)• Hint: help(hist)
size_d = { # mail size between0: xxx # 0 - 10k1: xxx # 10k - 20k..}
• Homework: Use the size δ to find size mean and size sigma and comparewith σ and mean evaluated from the original data-series
Processing: 45’Deviation Roberto Polli - [email protected]
DRAFT♦Simulating data with σ and x ♦
Mean and a stdev are useful starting point to simulate data using the gaussiandistribution.# A mail load generator creating attachments of a given size...from random import gaussmail_size = gauss(mean, sigma_s) # a random number
# and use time_d to simulate the load during the dayfrom time import localtimehour = localtime().tm_hourmail_per_minute = time_d[hour] / 60 # minutes in hour
Processing: 45’Deviation Roberto Polli - [email protected]
DRAFTLinear Correlation
# Let’s plot the following datasets# taken from a 4-hour distributionmail_sent = [1, 5, 500, 250, 100, 7]kB_s = [70, 300, 29000, 12500, 450, 500]
# A scatter plot can suggest relations# between dataplt.scatter(mail_sent, kB_s)
Correlating Mail and Thruput
100 0 100 200 300 400 500 600kMail sent
5000
0
5000
10000
15000
20000
25000
30000
35000
Thru
put
kB
/s
Correlating mail and thruput
Processing: 45’Correlation Roberto Polli - [email protected]
DRAFTLinear CorrelationThe Pearson Coefficient ρ is a relation indicator.
0 no relation1 direct relation (both dataset increase together)
-1 inverse relation (one increase as the other decrease)
ρ(X ,Y ) =
(∑(x − x)(y − y)
)√∑
(x − x)2√∑
(y − y)2(1)
from scipy.stats.stats import pearsonrret = pearsonr(mail_sent, kB_s)print(ret)>(0.9823, 0.0004)correlation, probability = ret
Processing: 45’Correlation Roberto Polli - [email protected]
DRAFTYou must (scatter) plot!
ρ does not detect non-linear correlation
Processing: 45’Correlation Roberto Polli - [email protected]
DRAFTCombinations
# Given a table with many data seriesfrom course import tabletable = {...
’cpu_usr’: [10, 23, 55, ..],’byte_in’: [2132, 3212, 3942, ..], }
# We can combine all their names withfrom itertools import combinationslist(combinations(table,2))>[(’swap_in’, ’cpu_sys’),(’swap_in’, ’csw’), (’cpu_sys’, ’csw’)... ]
Combinating 4 suites,2 at a time.
♥♠♥♣♥♦♠♣♠♦♣♦
Processing: 45’Correlation Roberto Polli - [email protected]
DRAFTNetfishing correlation
We can try every combination between data series and check if there’s someρ.for k1, k2 in combinations(table, 2):
corr, probability = pearsonr(table[k1], table[k2])if corr < 0.5:
# I’m *still* not interested in data under this thresholdcontinue
print("linear correlation between {} and {} is {}".format(k1, k2, corr))
Processing: 45’Correlation Roberto Polli - [email protected]
DRAFTCorrelating I/O and Context SwitchNow we’ll generate some correlation plots from table data, like this one.
Processing: 45’Plotting Time Roberto Polli - [email protected]
DRAFTNetfishing correlation II
# create all combined plotfor k1, k2 in combinations(table, 2):
corr, probability = pearsonr(table[k1], table[k2])plt.scatter(table[k1], table[k2])
# 3 digit precision on titleplt.title("R={:0.3f}".format(corr))plt.xlabel(k1); plt.ylabel(k2)
# save and close the plotplt.savefig("{}_{}.png".format(k1, k2)); plt.close()
Processing: 45’Plotting Time Roberto Polli - [email protected]
DRAFTMark time with colors# Use 3 colors to mark time-slotsfrom itertools import cyclecolors = cycle(’rgb’) # Red Green Bluemy_list = range(10)
# then import a function to chunk datasetsfrom course import in_chunksin_chunks(my_list, size=4)) # returns a <generator object ...>list(_) # ... which iterates to...> [[0, 1, 2, 3], # Plotted in Red
[4, 5, 6, 7], # ..Green[8, 9]] # ..Blue
Processing: 45’Plotting Time Roberto Polli - [email protected]
DRAFTMark time with colors# Get combined data directly via itemsfor (k1, v1), (k2, v2) in combinations(table. items (), 2):
corr, probability = pearsonr(v1, v2)
# Two nice generatorstime_chunked = zip(in_chunks(v1, size=8*3600),
in_chunks(v2, size=8*3600))[plt.scatter(t1, t2, color= next(colors) ) # iterate colors!
for t1, t2 in time_chunked]
# save and close the plotplt.savefig("timed_{}_{}.png".format(k1, k2)); plt.close()
Processing: 45’Plotting Time Roberto Polli - [email protected]
DRAFTThat’s all folks!
Thank you for the attention!Roberto Polli - [email protected]
End Roberto Polli - [email protected]