bp301: q: what’s your second most valuable asset and nearly doubles every year?

40
BP 301: What’s your second most valuable asset and nearly doubles every year? Henning Kunz, panagenda Consulting Florian Vogler, panagenda

Upload: panagenda

Post on 18-Jul-2015

265 views

Category:

Software


0 download

TRANSCRIPT

Page 1: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

BP 301: What’s your second most valuable asset and nearly doubles every year?

Henning Kunz, panagenda Consulting Florian Vogler, panagenda

Page 2: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Introduction

Henning Kunz – For about 20 years Services and Consulting guy in the Collaboration space – More infrastructure than development – With panagenda more and more analytics as a basis for

agile transformation projects

Florian Vogler – For almost all his life Client Management guru – Development and infrastructure – panagendas visionary figurehead

Page 3: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Agenda

Speaking of the 2nd most valuable asset and introduction Why are we doing this? Where in the world are files? Collecting BIG data – Basics Statistics – Basics Collecting from the file system Collecting from IBM Notes & Domino Sample reports Possibilities are endless (this session is not)

Page 4: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Before we start with the introduction

Answer to 2nd most valuable asset

1st most valuable asset?

Page 5: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

What can you expect from this session?

Thoughts on companies file inventory

Some code snippets to gain inventory information

Demo is based on inventory information collected from our personal production notebooks (and a demo backend system) using the code snippets

– Visualization is prepared using a Visual Analytics Tool

Some ideas on how to use the outcome

Page 6: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

FILES ARE EVERYWHERE

Page 7: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

A file – from easy …

In the easiest sense, a file has – a potentially mind-boggling number of

attributes, e.g. • folder structure • filename • size

– Content (which may result in attributes, too)

Page 8: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

A file – … to complex

Content is king! – Zip files – header vs. files vs. file • Zipping the same files twice creates a unique hash for both zip files …

– Office files (pptx, xlsx, …) • Contains a lot of information „inside“

Page 9: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Why are we doing this (=Why are files so important / interesting)?

Storage Amount = Storage (and backup!) Cost – Increase free disk space, Reduce cost – Beware of DAOS, Centera, … before you get too excited

Understand which (types of) files are created (rather: originated), updated, … … and by whom identify knowledge / working-together clusters Social Business

Going further (not covered in this session) Security & Compliance Content Beyond Windows (Linux, Mac, Mobile, …)

Page 10: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Mostly for French and German attendees

Some of the use cases and examples covered could be a problem with regards to Worker‘s Council regulations

Rethink use case without end user information – E.g. instead of „who all has (created) PowerPoint files“ „how many PowerPoint files do we

have across how many users (min/avg/max – without information about actual end users)

Page 11: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

For everyone: Things to be aware of

The name of a file (or folder) can be a big problem on its own – 2015-01-27_money_transfers_to_carribean_account_789XA3_PW_richmaker.xls – Layoff_in_german_office_Q2_2015.docx – Increase_salary_of_mr_jones_to_200000.txt

The mere existence of a file (or folder) can create (at least an ethical) problem on its own

– On someone‘s laptop you find confidential, unauthorized, inappropriate information • e.g. internal DWG (CAD) files, a copy of the meeting minutes from the last meeting of the board

of management, customer data, performance figures, … – And now?

Page 12: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Where files are stored

„Local“ file system – „Fixed“ disks (C:, D:, …) – Local removable disks - A:, B:, USB Sticks, CD-Rom, …

Network file system – Mounted / mapped / UNC / synched (offline files) – File server

NSFs (Email / Applications) – Local (with or without consistent ACL, with or without DB level encryption) – Server – Beware of reader fields, author fields, …

Connections Files, FileNet, Documentum, SharePoint, Dropbox, Teamdrive, …

Page 13: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

How to collect: WYSIWYG or AYCE

“WYSIWYG” – Local execution = in context of current OS user • Other users have to login, too (may never happen)

– Network scanning in context of current OS user • Shared network drives across departments/company

“AYCE” – Local execution as Admin (e.g. with SuRunAs) • Includes Windows profiles from all users

– Batch network scanning – Root mount scanning

Page 14: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

What to collect

Simple File attributes – Name, “extension”, size, created, last modified, … (Dates and Time zoning!)

Complex (but much more useful) file attributes – Office properties like Author, Subject, last printed, last whatever, … – Zip / Rar / 7z / gzip / … – (e.g. MD5) hash (same same vs. similar)

Very complex file attributes – Security (R/W/…) – NSF & File system – Fingerprints (“Linux magic numbers”)

Hilariously complex: Content (also: similar instead of just same)

Page 15: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Mission impossible

“Impossible” File attributes – Not accessible – Not visible from viewpoint of scanner – Not used (e.g. multiuser PCs where a user doesn’t log on again) – Encrypted (e.g. Zip with password)

Page 16: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Examples of what not to do

Do not harm human beings, animals, plants or goods with your findings – Be good, do good, be a hero!

Do not analyze for files with same filename – Approx. 60-70% of all files on a single machine

Do not just delete duplicates

Also: do not do nothing

Page 17: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

A VERY SHORT STATISTICS PITCH

Page 18: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Frequency distribution

In statistics, a frequency distribution is a table that displays the frequency of various outcomes in a sample.

i.e. session survey feedback by 100 session participants

Answer COUNT

Speaker skill was brilliant 15

Speaker skill was good 60

Speaker skill was ok 12

Speaker skill was somewhat poor 8

Speaker skill was very poor 5

Page 19: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Grouped data

A raw dataset can be organized by constructing a table showing the frequency distribution of the variable (whose values are given in the raw dataset). Such a frequency table is often referred to as grouped data.

i.e. time taken to answer a survey by 15 participants

sorted in symmetric intervals (bins) or qualitative characteristics

Time taken [s] 10 11 9 10 14 20 11 9 14 10 9 13 12 21 24

Interval Count

<5 s 0

5s<=t<10s 3

10s<=t<15s 9

15s<=t<20s 0

20s<=t<25s 3

Interval Count

Fast <10s 3

Normal 10s<=t<20s 9

Slow >=20s 3

Page 20: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Histogram

A histogram is a graphical representation of the distribution of data. To construct a histogram, the first step is to "bin" the range of values and then count how many values fall into each interval. i.e. time needed in [s] to rush from Dolphin Southern Hemisphere 1 to Swan Mockingbird 1-2 (Sample of 50 Participants)

rushtime[s] Count 140 1 150 2 160 5 170 10 180 13 190 11 200 6 210 1 220 0 230 1

0

2

4

6

8

10

12

14

140 150 160 170 180 190 200 210 220 230

Coun

t

Rushtime [s]

197 187 186 179 156 179 181 173 188 188 163 202 174 178 193 169 192 170 185 172 192 169 179 174 164 181 161 137 204 167 198 185 186 148 148 185 197 231 175 184 176 175 176 187 210 180 174 180 204 158

Bin and Count

Colle

ct / M

easu

re

Page 21: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

SCAN FILESYSTEMS

Page 22: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Local

Scan local Windows based drives (locally mounted hard disks, portable drives or mounted)

Using PowerShell – Script 1. Collect file system information with MD5 and SHA1 hashes – Needs PowerShell V4 – Uses: Scripting.FileSystemObject, get-acl cmdlet, get-hash cmdlet – Run locally with ‘super user’ rights

3 Result files – Folders (Folder Path, LastWriteTime, Size, FileCount, Depth , FolderName) – ACLs (Folder Path, IdentityReference, AccessControlType) – Files (Folder Path, FileName, CreationTime, LastWriteTime, Size, Extension, MD5, SHA1)

Page 23: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

A short note on PowerShell Execution Policy

There is something like execution security in PowerShell Execution Policy is set to undefined by default

– Thus it permits individual commands from console, but will not run scripts

Policytypes – Restricted, AllSigned, RemoteSigned, Unrestricted, Bypass, Undefined

Scope – Local Workstation ,CurrentUser, Process

Page 24: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

A short note on PowerShell Execution Policy

To see current settings get-ExecutionPolicy –List

To set

set-ExecutionPolicy RemoteSigned –Scope CurrentUser RemoteSigned allows execution of “own” unsigned scripts

– “own” means scripts written/edited/saved in PowerShell ISE on local machine

– we will not talk about signing PowerShell scripts in this session, its not like “sign using current users id”

http://technet.microsoft.com/en-us/library/hh847748.aspx

Page 25: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

PowerShell Snippet

Page 26: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Enhancement: Collecting Office attributes for .doc* files

Scan local Widows based drives (locally mounted hard disks, portable drives or mounted )

Using PowerShell – Script 2. Collect file system information with MD5 and SHA1 hashes and .doc* attributes – Uses: -ComObject Word.Application

BuiltInDocumentProperties

3 Result files – Folders (Folder Path, LastWriteTime, Size, FileCount, Depth , FolderName) – ACLs (Folder Path, IdentityReference, AccessControlType) – Files (Folder Path, FileName, CreationTime, LastWriteTime, Size, Extension, MD5, SHA1,

Created, Author, Title, Last print date)

Page 27: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Snippet 2 BuiltinDocumentProperties

1 Title

2 Subject

3 Author

4 Keywords

5 Comments

6 Template

7 Last author

8 Revision number

9 Application name

10 Last print date

11 Creation date

12 Last save time

13 Total editing time

14 Number of pages

15 Number of words

16 Number of characters

17 Security

18 Category

19 Format

20 Manager

21 Company

22 Number of bytes

23 Number of lines

24 Number of paragraphs

25 Number of slides

26 Number of notes

27 Number of hidden Slides

28 Number of multimedia clips

29 Hyperlink base

30 Number of characters (with spaces)

Page 28: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Collecting inventory from “Fileserver 2.0”

Scan SharePoint Inventory Using PowerShell

– Script 3. Collect item information from SharePoint Server – Uses: SharePoint cmdlets

– Result: Web Application, Site, Web, List, Item ID, Item URL, Item Title, Item Created,

Item Modified, File Size, Author, Versions, Filename

Page 29: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Snippet 3

Page 30: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

SCAN FILES IN NSF CONTAINERS

Page 31: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

IBM Notes & Domino

NSFs (Email / Applications) – Local (with or without consistent ACL, with or without DB level encryption) – Server – ACL, reader fields, author fields, document / field encryption, … – zip-file content – Fields in general (Subject, from, to, cc:, bcc:, created, modified, Body, …) • The Subject of a Notes document can be just as problematic as the name of a file (attachment) • Actually this may apply to pretty much any field • Note: Message Tracking ID

– ATTNQ# (today‘s *00#.*)

Page 32: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Fs_free_main.exe ConnectED 2015 Edition

Special Stand-alone version to scan local file system and nsf files Inspects zip file content (deliberately limited to filesystem) Runs from command line with parameters

– Uses local notes.ini and user.id / server.id – Therefore in security context of used id-file (ACLs, Reader Fields, DB/Document Encryption) – Lists (unprotected) zip file content – Based on C-API

Result: Path,Size,Modified,md5,sha-1

Page 33: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

CHART TIME ….EXAMPLE RESULTS DEMO…

Script 1: 16,728 folders 127,000 files Script 2: 1,150 doc files Script 3: 1,316 SP files Fs.freemain: 1,200,000 records (250 MB)

Page 34: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

POSSIBILITIES ARE ENDLESS….

Page 35: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Beyond the shown

Until now we just analyzed what's out there

How could we use that information?

Lets think about some interesting use cases

Page 36: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

File Server Migrations – File Consolidations

Use the analysis to understand your file inventory With respect to

– File types which files fit into the target system (i.e. office files, pdf, jpg, png, wav versus xml, properties, files from non office applications)

– And their • Volume distribution • Count distribution

– Uniqueness of local files – Time stamps (retention, usage hint)

And act/size based on that information

Page 37: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Suggest Community Clusters

Based on analysis outcomes – Inventory overlap – Same authors, editors – Same access rights – Metadata

Think of it as a one time functionality to rearrange your files world in the first step

Could be used in the context of an attachment like SwiftFile* in the second step

– may require content analysis *http://www-01.ibm.com/support/docview.wss?uid=swg24034409

Page 38: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Companies File Locations

You do not have to store this file again…. As a hint for a so far unknown collaboration cluster/ community

Used in the context of an attachment inside notes

– Shows all MD5 identical files found at formerly scanned locations inside the company

Biggest challenges – Real time performance (needs ongoing periodic scanning of all sources) – Security trimming

(the accounts & groups of all scanned sources have to be resolved/mapped)

Page 39: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?
Page 40: BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

THANK YOU NOTE: POSSIBILITIES ARE ENDLESS – MORESO BEYOND FILES

[email protected], [email protected] come and visit us in the TechnOasis #PED G3 A-C! Download the latest slide deck and code snippets www.panagenda.com/connected2015files