analysis of dependencies among personal productivity tools

25
Libera Università di Bolzano - Freie Universität Bozen - Free University of Bolzano/Bozen Faculty of Computer Science - Bachelor of Science in Applied Computer Science Analysis of dependencies among personal productivity tools: a case study Thesis Supervisor: PROF. GIANCARLO SUCCI BRUNO ROSSI CS 2141 Academic Year 2003/2004 - Anno accademico 2003/2004 - Akademisches Jahr 2003/2004 2nd Graduation Session - October 29th, 2004

Upload: others

Post on 20-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of dependencies among personal productivity tools

Libera Università di Bolzano - Freie Universität Bozen - FreeUniversity of Bolzano/Bozen

Faculty of Computer Science - Bachelor of Science in Applied ComputerScience

Analysis of dependencies among personalproductivity tools: a case study

Thesis Supervisor:PROF. GIANCARLO SUCCI BRUNO ROSSI CS 2141

Academic Year 2003/2004 - Anno accademico 2003/2004 - Akademisches Jahr 2003/20042nd Graduation Session - October 29th, 2004

Page 2: Analysis of dependencies among personal productivity tools

Table of ContentsAbstract..............................................................................................................................3Chapter 1 - Introduction.....................................................................................................4

1.1 The study.................................................................................................................61.2 The Tools................................................................................................................ 6

Chapter 2 - Dependency Analyzer..................................................................................... 82.1 Analysis of Requirements....................................................................................... 82.2 Design..................................................................................................................... 82.3 Coding.....................................................................................................................92.4 Problems............................................................................................................... 122.5 Solution – The Detours Library............................................................................ 122.6 Testing and installation......................................................................................... 14

Chapter 3 - MacrosLister................................................................................................. 153.1 Analysis of Requirements..................................................................................... 153.2 Design................................................................................................................... 153.3 Coding...................................................................................................................163.4 Testing and installation......................................................................................... 173.5 Future improvements............................................................................................ 18

Chapter 4 - Final results of the analysis of dependencies................................................194.1 Incoming dependencies........................................................................................194.2 Outgoing dependencies........................................................................................ 204.3 Macros collection..................................................................................................214.4 Conclusions...........................................................................................................22

Acknowledgments........................................................................................................... 24Bibliography.................................................................................................................... 25

Page 2

Page 3: Analysis of dependencies among personal productivity tools

AbstractDependencies in a software environment are particular relations among software modules thatare necessary to offer a set of features otherwise impossible to offer. In many casesdependencies constrain users to a given solution. Existing dependencies, among other factors,have an impact on the convenience of a migration to a new software platform and are often notadequately considered. The reasons of their importance in a pre-migration phase are brieflyexposed.

A project regarding the migration to Open Source Software (OSS) in the Office Automationfield has represented the occasion to investigate dependencies in a real experimentation toforesee possible problems once the new solution is adopted. The different types of existingrelations that have been identified as critical have been collected with the help of two differentapplications programmed for this purpose. The different phases of development of bothprograms together with all the problems encountered are explained in detail.

The tools have been applied to the project during the first weeks of experimentation. Theresults obtained are summarized and exposed, together with suggestions for a smoothertransition in the case study.

Page 3

Page 4: Analysis of dependencies among personal productivity tools

Chapter 1 - IntroductionItalian Public Administrations expenditure for software in year 2001 was over €675M[1]. 61%

of this amount has been used to develop internal solutions, while the remaining 39% (€263M)has been devoted to software licenses. Open Source Software (OSS) represents a greatopportunity due to the free licenses and the availability of source code. Some PublicAdministrations are already using with success Open Source solutions, reducing in the longperiod costs and dependencies from software vendors.

The introduction of Open Source Software (OSS) can be seen as an interesting opportunity toreduce costs, bringing benefits also to the local economy. OSS does not automatically lead to areduction of the costs in the short period, instead they are going to increase, as Fig.1 shows theprojected costs of proprietary solutions versus OSS solutions in the short, medium and longperiod. As can be seen from the figure, costs for a new solution are going to increase up to thetop of the curve, due to training costs and reduced productivity, slowly the high switchingexpenditures are reduced reaching a point where OSS and proprietary software are in perfectequilibrium. OSS running costs are then going to converge on a line at a lower level thanproprietary software. This last affirmation is often a point of contrast between supporters ofOSS and proprietary software.

Fig. 1 - Introduction of OSS in an environment with proprietary software (Source:Giancarlo Succi's presentation for SVP-JG - COSPA[2] )

From the graph above it can also be noted that OSS never reaches zero costs, even if thesoftware is free there are always hidden costs, like the maintenance of the installed softwarewhich may be higher than the one of proprietary software.

For a successful transition to OSS, several aspects have to be taken in consideration:

[1] Ministero Innovazione e Tecnologie[2] COSPA- Consortium for Open Source in the Public Administrationwww.cospa-project.org

Page 4

Page 5: Analysis of dependencies among personal productivity tools

a) Cost of transition from previous solutionsb) Interoperability and integration with existing solutionsc) Cost of training personnel for the new tools and hostility to change d) Reduced productivity of the personnel

Point a) represents all the costs of transition that do not fit in the other categories like theresources necessary to convert existing key documents. Point b) is identified with all theexpenditures that are needed due to interoperability reasons. The cost of training personnel forthe new tools and hostility to change are key issues, especially the latter is a factor that cannotbe measured and can often determine the failure of a migration. Finally, the reducedproductivity during the first phases of a transition should be considered before the beginning ofthe transition.

In particular the Office Automation software sector represents a great opportunity to study

such issues, for the following reasons:

• Only a small part of the functionality implemented in modern personal productivity tools isreally needed by users.

• The sector is virtually monopolized by one player: MS Office. It is present in almost 95% ofthe desktop systems.[3]

• The strong presence in this sector of the OpenOffice[4] suite, a software that offersfunctionalities similar to Microsoft Office's.

At the same time OSS solutions have yet to prove well suited for the desktop environment,where functionality and user-friendliness are the key features for the success of an application.

One of the most important issues to consider when analyzing the interoperability andintegration with existing solutions aspect, is the collection and analysis of existingdependencies in the system. A dependency occurs when an application requires the presence ofanother to perform a requested operation. Dependencies make a transition from one software toanother difficult if not impossible under certain conditions, and is often a way for proprietarysoftware vendors to constrain users to a unique solution for a long time.

Fig. 2 - Dependencies between applications: application A needs B to complete a requestedoperation.If B is removed from the system the operation cannot be completed. (Source: myelaborations)

[3] IDA (Interchange of Data between Administrations) European Programme [4] OpenOffice.org - www.openoffice.org

Page 5

Page 6: Analysis of dependencies among personal productivity tools

Referring back to Figure 1, dependencies can translate the curve of the OSS upwards,increasing costs and making the migration less convenient.

For this thesis, two tools have been created to discover possible dependencies in an existingsoftware environment that could make a transition to OpenOffice more difficult. Once suchdependencies are discovered, the risks associated with the introduction of a new softwaresolution could be drastically reduced. In Chapter 1 the project is illustrated, together with themethodologies adopted. Chapters 2 and 3 show the different phases of creation of the toolsneeded to collect dependencies. In Chapter 4 a summary of all the data collected in the casestudy is presented and analyzed in detail.

1.1 The study

The software has been applied for the preliminary analysis of a study set-up by the Faculty ofComputer Science of the Free University of Bolzano-Bozen in collaboration with the Provinceof Bolzano-Bozen[5]. The study's main goal is to test and analyze the problems of a migration toOpen Office.

The core part of the project is a test-phase that includes 22 workplaces, of which 13 migratedto OpenOffice, while the others were used as a control group. Four phases of the project havebeen identified:

1. Selection of the test workplaces2. Analysis phase with only MS Office installed3. Selection of part of the test workplaces to migrate to OpenOffice4. Analysis phase with OpenOffice and Microsoft Office installed

During phases 2 and 4, the usage of MS Office and OpenOffice has been monitored withPROM[6], a tool for collecting software metrics. In particular, PROM records the time spentworking on a given document. The tools to collect dependencies have been applied to phase 2of the case study, to help analyzing the existing situation among the personal productivitysoftware installed, in particular Microsoft Word and Excel .

1.2 The ToolsThe following dependencies were thought to be crucial for a preliminary analysis of the

situation:

• Incoming dependencies (programs that call Word/Excel).• Outgoing dependencies (programs that are called by Word/Excel).• Number of macros present in the documents.

Two tools have been identified as needed to collect the data above, one for the dependenciesand one for the macros enumeration.

[5] Provincia Autonoma di Bolzano - Autonome Provinz Bozenwww.provinz.bz.it[6] PROM (PROMetrics) – CASE (Centre for Applied Software Engineering) Free University of Bolzano-Bozen http://zuse.case.unibz.it:8080/prom

Page 6

Page 7: Analysis of dependencies among personal productivity tools

The phases of creation for both followed the common schema• Requirements elicitation• Analysis of requirements• Design• Coding

The agile methodology was not strictly followed due to the nature of the tools beingprogrammed, however some techniques of the methodology were applied, in particular• test first • use of spikes (prototypes) since the early stages of development.

Page 7

Page 8: Analysis of dependencies among personal productivity tools

Chapter 2 - Dependency Analyzer

The first tool developed (Dependencies Analyzer) is a software to identify the dependenciesout of applications, in the case study specifically targeted at Microsoft Office. Thedependencies to register are all the applications that call and are called by the examinedprogram.

2.1 Analysis of Requirements

The first and most important requirement of the application is the transparency to the user, inorder to avoid interfering with his/her work. A consequence of this requirement is that theapplication has to be fast or at least the performance impact has to be kept very small (less than5% CPU load).

The second requirement regards the platform on which it runs: any version of the WindowsOS.

Near 90% of the desktop Pcs of the project had Windows Xp installed, but there are still oldmachines with Windows 98/Nt/2000 installed. The application should therefore be compatiblewith all those OS.

The application has to save a log file listing every call in and out from the requiredapplications specifying • name of the target program• date and time of the call

A simple text file has been considered adequate for the first requirements, with an xml outputas desirable but not mandatory.

To adhere to the requirements, it was chosen to write the application using the C++ languageto permit the fastest execution at runtime and to permit to have only the program installed,without the overhead a virtual machine to run a small application. The drawback of such adecision was a longer development time and the more effort required by the direct access to theWindows API, not relying on any framework. These problems were taken into considerationsince the start of the development. As the Windows API is written in C, small parts of theapplication had also to be written in C.

2.2 DesignTwo different software architectures could be applied to the application.

• Polling• Event notification

The best way to structure the program would be to register the application to receive anotification when a new process starts and when an old one ceases to exist. The alternative is touse polling, updating the list of processes in a given time-window and setting-up a comparisonto evaluate the differences. Regarding implementation details under Windows, polling requiresno effort, apart from setting a timer and a timespan to update the list of processes.

Page 8

Page 9: Analysis of dependencies among personal productivity tools

The current version of Dependency Analyzer is based on polling, but the next releases mightadopt the event notification technique. This decision is justified by the fact that setting up thenotification process for a small application like Dependency Analyzer introduces unneededoverhead to the program. The second reason for this choice is based on the assumption thatonce a well balanced interval has been found, polling can be adequate for the scope.

2.3 Coding

To get the available processes under Windows, it is possible to use the Microsoft ToolHelplibrary. Under Windows each running process has information about its parent process. In thecase represented in Fig.3 it is easy to find the application that started Microsoft Word, since theparent id that is memorized when the process is started is the one of Explorer.exe. By looking atall process ids present in the system, a match for Explorer.exe is found.

Fig. 3 - Incoming dependencies – Microsoft Word and Excel are started by explorer.exe(Source: my elaborations)

Fig.4 shows an outgoing dependency, where an external application is launched from insideMicrosoft Word, also in this case, it is a question of matching ids of processes that haveMicrosoft Word as the parent id.

Page 9

Page 10: Analysis of dependencies among personal productivity tools

Fig. 4 - Outgoing dependency – An external application is called from Microsoft Word(Source: my elaborations)

To get the list of all the processes running in a system using the Widows API directly givesmore control to the programmer, but also more responsibility than using manged code.

The Tool Help Library, part of the Windows Platform SDK, takes care of obtaininginformation about running applications.

The procedure is the following:• Create a snapshot of the running processes• Iterate through the list using Process32First() and Process32Next()• For each process call a callback function to process the collected information.

The collected information of each process is accessed through a PROCESSENTRY32 structurecontaining:

dwSize size of the structurecntUsage no more usedth32ProcessID process id th32DefaultHeapID no more used. th32ModuleID no more used. cntThreads number of process threads. th32ParentProcessID parent process identifierpcPriClassBase base priority of any threads created by this

process.dwFlags no more used. szExeFile Pointer to a null-terminated string that specifies

the name of the executable file.

Dependency Analyzer retrieves and stores th32ProcessID and th32ParentProcessID for eachprocess. The following code samples show this procedure in more detail, where theGetProcesses() function calls a callback function for every process it enumerates.The callbackfunction EnumProcess() stores the information of a process in a Standard Template List vectorfor later retrieval.

Page 10

Page 11: Analysis of dependencies among personal productivity tools

BOOL CALLBACK EnumProcess (DWORD dwId, DWORD dwParentId,LPCSTR lpcstr, std::vector <ProcessInfo*> *ptrList){ ProcessInfo *piProcInfo = new ProcessInfo(dwId,dwParentId, lpcstr);

ptrList->push_back(piProcInfo);return TRUE;

} Callback function to store each process information into a STL vector

BOOL WINAPI GetProcesses (PROCENUMPROC lpProc, std::vector<ProcessInfo*> *ptrlista){

HANDLE hSnapShot;PROCESSENTRY32 procEntry;BOOL bFlag;hSnapShot = CreateToolhelp32Snapshot

( TH32CS_SNAPPROCESS, NULL ); if (hSnapShot == INVALID_HANDLE_VALUE)

return FALSE; procEntry.dwSize = sizeof(PROCESSENTRY32); // get first processbFlag = Process32First( hSnapShot, &procEntry ); while (bFlag){if (lpProc

( procEntry.th32ProcessID,procEntry.th32ParentProcessID,procEntry.szExeFile, ptrlista) ) {

// next process procEntry.dwSize = sizeof(PROCESSENTRY32);

bFlag = Process32Next( hSnapShot, &procEntry);

}else{ // CALLBACK function failedbFlag = FALSE;

}} CloseHandle(hSnapShot); return TRUE;

}Function to get all running processes in the system, a pointer to a callback function is

passed to be used for every process.

Page 11

Page 12: Analysis of dependencies among personal productivity tools

2.4 ProblemsAfter being developed, the tool presented two kinds of problems:

1. When a new instance of Word/Excel is launched, a new window will be created but theunderlying process remains the same. Also its parent process id is the same, so evenmonitoring the opening windows of the application will not solve the problem.

2. Under Windows XP/2000/NT there is another issue: Word and Excel are so integrated inthe OS that when an external application is called the parent id of the new process is notWord or Excel, but Svchost – a service running on all Windows XP machines.

The solution to the problem which seemed as the only possible was to intercept API callsdirectly, the Detours library from Microsoft was chosen for this purpose. Two versions of theapplication are necessary due to the fact that this library is only compatible with NT-basedsystems, so the first version (that can run also on Windows NT/2000/XP) is faster but not asaccurate as the second one.

2.5 Solution – The Detours LibraryThe Detours library from Microsoft is a library created for profiling and debugging, in

particular research purposes.

The main features proposed by the library are:• the ability to intercept arbitrary Win32 binary functions• the ability to edit the import tables of binary files• the ability to attach arbitrary data segments to binary files.

A solution to the problem of Dependency Analyzer was found by using the Detours library. Inparticular intercepting the CreateProcess API calls, as every time a process is created, this APIcall is performed. This approach works because the CreateProcess and OpenProcess API callsare always sent to each application, even when – as the case of Microsoft Word – theapplication uses the same process for multiple windows.

To see how the Detours Library works, Fig.5 shows the comparison between the flow of anormal function call compared to the one using the Detours approach. As can be seen in Fig. 5 ,the normal flow is to call the target function from the source function. What Detours librarypermits is to call a detour function and a trampoline function before the call of the real target.The control is then returned to the Detour procedure for some post-processing, if needed, andfinally back to the calling function. The call to the trampoline function may also be avoided,thus excluding the original function from the call, substituting the implementation of a functioncall dynamically at runtime.

The process to rewrite the target function is the following:• Create a dll that contains the new function's implementation• Inject the dll into the target process• The dll modifies the target function to point to the new one• When the target function is called, the new one is executed

Page 12

Page 13: Analysis of dependencies among personal productivity tools

Fig. 5 - invocation with and without interception (Source: “Detours: Binary Interception ofWin32 Functions” - Galen Hunt and Doug Brubacher )

As can be seen in Fig. 6 , a jmp instruction is inserted at runtime at the beginning address ofthe target function, pointing to the Detour function. The trampoline function is a sort ofconjunction between the detour and the real target function.

Fig. 6 - Trampoline and target functions, before and after insertion of the detour (left andright). (Source: “Detours: Binary Interception of Win32 Functions” - Galen Hunt and DougBrubacher)

An example of a trampoline function can be seen in the following code samples, where theCreateProcess function is intercepted.

Page 13

Page 14: Analysis of dependencies among personal productivity tools

BOOL __stdcall Mine_CreateProcessA(LPCSTR a0, LPSTR a1, LPSECURITY_ATTRIBUTES a2, LPSECURITY_ATTRIBUTES a3, BOOL a4, DWORD a5, LPVOID a6, LPCSTR a7, struct _STARTUPINFOA* a8, LPPROCESS_INFORMATION a9){ _PrintEnter("CreateProcessA(%hs,%hs,%lx,%lx,%lx,%lx,%lx,%hs,%lx,%lx)\n", a0, a1, a2, a3, a4, a5, a6, a7, a8, a9); BOOL rv = 0; __try { rv = Real_CreateProcessA(a0, a1, a2, a3, a4, a5, a6,a7, a8, a9); } __finally { _PrintExit("CreateProcessA(,,,,,,,,,) -> %lx\n", rv); }; return rv;}Example of a detour function for the CreateProcess API call, for debugging purposes a

message is printed when entering and when exiting the function.

BOOL WINAPI DetourFunctionWithTrampoline(PBYTE pbTrampoline, PBYTE pbDetour); DetourFunctionWithTrampoline((PBYTE)Real_CreateProcessA, (PBYTE)Mine_CreateProcessA);Detour function with associated Trampoline, declaration and definition

The only drawback of this solution is that to intercept all the calls from the applicationspresent on a given system, a dll has to be injected into every process. A registry key is presenton every NT-based Windows system that permits to perform such operation, however there areperformance issues to be taken in consideration.

2.6 Testing and installation

To facilitate the testing, three configurations were chosen: Win98, Win2000 and WinXp.Since the target machines for the study were all based on WinXp, we mostly tested oursoftware on this platform.

Installation has not presented difficulties, also due to the fact that the application does notdepend on any runtime environment. The program has been installed on all 22 target machineswithout any kind of problem.

Page 14

Page 15: Analysis of dependencies among personal productivity tools

Chapter 3 - MacrosLister

The second tool developed to collect dependencies is a software to retrieve information aboutmacros present in Microsoft Office documents. Macros are lines of code inserted intodocuments to perform particular operations, in our case the dependency is a key issue due to thefact that the Macros contained in Microsoft Office products are not compatible withOpenOffice. These extra functionalities are lost if macros are not rewritten in the StarBasiclanguage recognized by OpenOffice.

3.1 Analysis of Requirements

The main requirements for MacrosLister were the following:

• Process all the Word/Excel files in a specified disk unit selected by the user.• Save a log file for each specified unit. Log file must contain:

• Total files found• Number of files with macros• Total number of lines of code inside each document• Files that cannot be opened (e.g. Password protected)

• Show progress of performed job.

To access macros in Word-Excel files the easiest way is to use Microsoft PIA (Primary InteropAssemblies), a library that is used to interface third-party applications with Microsoft Office.To use PIA the use of the .NET framework is mandatory, so the possible languages to choosefor a possible implementation are VB.Net, C# and C++ managed. C# was chosen since it offersthe best integration with the .NET platform.

3.2 Design

The classes extracted from the designed user-stories where the following:• Disk• Word• Excel• File• Log• Registry

Class name: Disk Class name: WordResponsibilities

• collect recursively all files ofa given directory

• distinguish betweenWord/Excel files

Collaborations

• Word

• Excel

• Log

Responsibilities

• open Word files• determine number of

macros in documents

Collaborations

• Log

• Registry

Page 15

Page 16: Analysis of dependencies among personal productivity tools

Class name: Excel Class name: Log (abstract)Responsibilities

• open Excel files• determine number of macros

in documents

Collaborations

• Log

• Registry

Responsibilities

• save information todisk

Collaborations

Class name: Registry Class name: FileResponsibilities

• check existing registrysituation

• modify registry to permitVBA accessed

• restore registry to previousstate

Collaborations Responsibilities

• process files collected

Collaborations

The UML diagram of the relations between classes is available in Fig.7. The Log class beingdesigned as abstract to give services to all classes needing to log information. The ScanInfoclass provides common behavior for both the Word and Excel classes.

Fig. 7 - UML diagram of MacrosLister (without GUI classes)

3.3 CodingUsing PIA to connect to Office programs is quite straightforward, all properties of a document

can be accessed and modified, if needed. A problem occurs however if we access macros. Dueto security reasons, Microsoft decided to disallow by default the external access to codecontained in documents, as this was used by viruses to copy parts of code from a file to other.This important configuration detail has been inserted into a registry key, that even a user withlimited rights can modify. Before accessing the code, MacrosLister modifies this registry key to

Page 16

Page 17: Analysis of dependencies among personal productivity tools

access macros and then resets the preexisting situation. This kind of operation has theimplication that the pc used to perform a scan may become infected by a virus contained in amacro that auto-starts when a document is opened, even if the security settings are set to highlevels.

As can be seen in the following code sample, each document collected is passed to the mainPIA's Application object for opening.

// Main Application ObjectMicrosoft.Office.Interop.Word.Application wdApp = newMicrosoft.Office.Interop.Word.Application();if (wdApp == null)throw new ApplicationException("Could not create WordApplication object.");// Main Word objectMicrosoft.Office.Interop.Word.Document objDoc = null;[..]// For each document// Open itobjDoc = wdApp.Documents.Open ( ref objfilename, [..] );

[..]Pseudo-code to access Word object properties

After opening in background the documents, the VBProject property is accessed, counting foreach component the number of lines of code stored. It is impossible to access this property ifthe security setting has not been lowered.

3.4 Testing and installation

Testing has been performed using some test documents containing:• Macros• Macros with errors in code• Macros auto-starting when a document was opened

These tests were also included into test first units, comparing for the correct number of macroscomputed.

The program requirements on the target machine are the following:

• .NET framework• Microsoft Office must be installed on target machine

The second requirement is due to the fact that PIA needs Microsoft Office to access thedocuments: every time the system accesses some properties of a document, that file is openedin background.

Page 17

Page 18: Analysis of dependencies among personal productivity tools

3.5 Future improvements

Some improvements useful for next releases of MacrosLister

• Scan of the file types on a system• Scan of the machines on a network• Releasing the program under GPL license

Page 18

Page 19: Analysis of dependencies among personal productivity tools

Chapter 4 - Final results of the analysis of dependenciesDuring phase 2 of the project, analysis phase with only Microsoft Office installed,

Dependency Analyzer and MacrosLister have been installed on the 22 target machines.

• Dependency Analyzer ran for the first 7 weeks of the project to monitor the use of MicrosoftOffice.

• MacrosLister has been used to get a snapshot of the situation regarding macros.

The results have been collected and analyzed and a summary of the results can be found inthis chapter.

4.1 Incoming dependencies

Applications that call MS Word

By looking at Fig. 8, almost 80% of the times Microsoft Word was called either by clicking ona .doc file or by starting the program and then opening a document: in this case the parentapplication is Explorer.exe, the Windows desktop environment. Next, 15% of the times Wordwas called via the Outlook mail client, for example when opening a .doc attachment. The otherpercentages are minimal, interesting the case of Oracle Forms which could be a compatibilityissue, having to depend on OpenOffice instead of MS Office in case of migration.

Fig. 8 - Application that call MS Word

Applications that call MS Excel

Fig. 9 shows that Microsoft Excel is less used than Microsoft Word (by comparing Fig.8 and 972% of usage respect to the latter), and is almost always started normally by clicking on its iconor on a .xls file. Also in this case Outlook is the second way to open an Excel file, but thepercentage is smaller than in the case of Microsoft Word.

Page 19

MS WORDApplication Description Week1 % Week2 % Week3 % Week4 % Week5 % Week6 % Week7 % Total %

Explorer.exe Normal start 358 78,17% 336 81,16% 380 83,33% 173 75,88% 233 78,45% 323 80,75% 347 84,63% 2150 80,74%Outlook.exe MS mail client 89 19,43% 42 10,14% 64 14,04% 36 15,79% 50 16,84% 56 14,00% 52 12,68% 389 14,61%DW.exe MS error reporting tool 5 1,09% 6 1,45% 1 0,22% 1 0,44% 2 0,67% 1 0,25% 1 0,24% 17 0,64%IFRUN60.EXE Oracle Forms 0 0,00% 3 0,72% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 3 0,11%Iexplorer.exe MS browser 0 0,00% 0 0,00% 5 1,10% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 5 0,19%Excel.exe MS Excel 0 0,00% 0 0,00% 1 0,22% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 1 0,04%Unknown - 6 1,31% 27 6,52% 5 1,10% 18 7,89% 12 4,04% 20 5,00% 10 2,44% 98 3,68%Total 458 100% 414 100% 456 100% 228 100% 297 100% 400 100% 410 100% 2663 100%

Week1 Week2 Week3 Week4 Week5 Week6 Week7

0

25

50

75

100

125

150

175

200

225

250

275

300

325

350

375

400

MS WORD started by

Explorer.exeOutlook.exeDW.exeIFRUN60.EXEIexplorer.exeExcel.exeUnknown

Page 20: Analysis of dependencies among personal productivity tools

Fig 9 - Applications that call MS Excel

4.2 Outgoing dependencies

In the processes called by Word or Excel (Fig. 10), the main role is played by the drivers usedto print documents. Oracle Forms gets called also in this case, so this a dependency that may beinteresting to investigate further. Also the calls to Microsoft Access represent a possibleproblem, as the Open Office suite has currently no substitute for this product.

Fig. 10 – Applications called by Microsoft Word/Excel

Page 20

EXCELApplication Description Week1 % Week2 % Week3 % Week4 % Week5 % Week6 % Week7 % Total %

Explorer.exe Normal start 308 96,86% 265 92,98% 245 96,84% 101 96,19% 128 90,78% 251 90,94% 247 95,37% 1545 94,38%Outlook.exe MS mail client 4 1,26% 10 3,51% 7 2,77% 3 2,86% 8 5,67% 13 4,71% 11 4,25% 56 3,42%DW.exe MS error reporting tool 0 0,00% 1 0,35% 0 0,00% 1 0,95% 3 2,13% 9 3,26% 1 0,39% 15 0,92%IFRUN60.EXE Oracle Forms 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00%Iexplorer.exe MS browser 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00%Excel.exe MS Excel 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00%Unknown - 6 1,89% 9 3,16% 1 0,40% 0 0,00% 2 1,42% 3 1,09% 0 0,00% 21 1,28%Total 318 100% 285 100% 253 100% 105 100% 141 100% 276 100% 259 100% 1637 100%

Week1 Week2 Week3 Week4 Week5 Week6 Week7

0

25

50

75

100

125

150

175

200

225

250

275

300

325

MS EXCEL

Explorer.exeOutlook.exeDW.exe

IFRUN60.EXEIexplorer.exeExcel.exeUnknown

Application WEEK1 % WEEK2 % WEEK3 % WEEK4 % WEEK5 % WEEK6 % WEEK7 % TOTAL DescriptionCPCQM.EXE 37 69,81% 22 55,00% 34 82,93% 23 76,67% 9 40,91% 40 62,50% 27 69,23% 192 Printing driver CanonDW.EXE 5 9,43% 7 17,50% 0 0,00% 2 6,67% 5 22,73% 10 15,63% 2 5,13% 31 MS Error Reporting tool E_L19111.EXE 3 5,66% 9 22,50% 0 0,00% 0 0,00% 3 13,64% 8 12,50% 0 0,00% 23 Printing driver HPMSOHELP.EXE 2 3,77% 2 5,00% 2 4,88% 1 3,33% 4 18,18% 1 1,56% 5 12,82% 17 MS Help MenuEXPLORER.EXE 2 3,77% 0 0,00% 1 2,44% 2 6,67% 0 0,00% 3 4,69% 3 7,69% 11 A folder viewerMSTORE.EXE 2 3,77% 0 0,00% 2 4,88% 1 3,33% 0 0,00% 0 0,00% 0 0,00% 5 Microsoft Clip OrganizerOUTLOOK.EXE 0 0,00% 0 0,00% 1 2,44% 1 3,33% 1 4,55% 0 0,00% 0 0,00% 3 MS Mail clientWINHELP32.EXE 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 2 3,13% 0 0,00% 2 MS Help guideIEXPLORER.EXE 0 0,00% 0 0,00% 1 2,44% 0 0,00% 0 0,00% 0 0,00% 1 2,56% 2 MS web browserMSACCESS.EXE 1 1,89% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 1 MS AccessIFRUN60.EXE 1 1,89% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 1EXCEL.EXE 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 0 0,00% 1 2,56% 1 MS ExcelTotal 53 100% 40 100% 41 100% 30 100% 22 100% 64 100% 39 100% 289

Oracle Forms (Runforms)

WEEK1 WEEK2 WEEK3 WEEK4 WEEK5 WEEK6 WEEK7

0

2,5

5

7,5

10

12,5

15

17,5

20

22,5

25

27,5

30

32,5

35

37,5

40

Applications started by MS Word/Excel

CPCQM.EXEMSOHELP.EXEDW.EXEMSACCESS.EXEEXPLORER.EXEE_L19111.EXEMSTORE.EXEIFRUN60.EXEIEXPLORER.EXEOUTLOOK.EXEWINHELP32.EXEEXCEL.EXE

Page 21: Analysis of dependencies among personal productivity tools

4.3 Macros collectionThe first scan performed by MacrosLister aimed at analyzing the users' local drives, the results

are summarized in Fig.11Two facts can be inferred:

• The predominance of Word documents over Excel ones (93% vs 7%)• Almost all macros can be found in Excel documents

Fig. 11 - Macros in documents on user's drive (without templates)LOC indicate lines of codes and CNO files that could not be opened.

The second analysis of macros regarded the network drives of users, where they usually storethe vast majority of their documents. These drives are shared between each office. In Fig. 12offices involved in the test have been given names A,B,C,D for anonymity. Three facts can bedinferred from the data collected:• Predominance of Word documents over Excel ones• Large presence of macros in one of the offices under exam (case D)• Large number of documents that has not been possible to open for analysis, possibly

because of password protection (40% of all the Excel files).

This cannot exclude a priori that macros are contained inside these documents.

Fig. 12 - Macros on each office network drive (without templates)LOC indicate lines of codes and CNO files that could not be opened.

Page 21

USER NR WORD FILES MACRO FILES LOC EXCEL FILES MACRO FILES LOC CNO1 1081 0 0 2 0 0 02 541 2 12 12 0 0 13 1231 0 0 87 0 0 124 1137 0 0 79 0 0 05 5 0 0 0 0 0 06 55 0 0 5 1 329 07 62 0 0 0 0 0 08 1128 0 0 92 0 0 09 1839 0 0 104 0 0 0

10 6166 0 0 600 0 0 011 1189 0 0 27 0 0 012 688 0 0 144 0 0 013 483 0 0 57 2 741 014 470 0 0 31 0 0 115 2285 0 0 40 0 0 016 567 0 0 82 0 0 017 217 0 0 5 0 0 0

% 93,34% 6,66%

Total 19144 2 12 1367 3 1070 14

WORD EXCELOffice N. Files With Macro CNO LOC N.Files With Macro CNO LOC

A 0 0 0 0 206 5 5 60B 2341 0 0 0 61 7 1 2302C 586 0 0 0 23 3 0 20D 1557 0 0 0 526 28 325 19100

Total 4484 0 0 0 816 43 331 21482% 100% 0,00% 0,00% - 100% 5,27% 40,56% -

Page 22: Analysis of dependencies among personal productivity tools

The last analysis regarded again the scan of the network drives of users, this time taking intoconsideration only templates of Microsoft Word and Excel (Fig. 13). Templates are often usedby users to create a new document, it is very rare the case when a document is not created inthis way. This was the reason behind this kind of scan.

Results of this scan summarized:• The majority of templates are Word ones.• No macros are present in Word templates.• In the few Excel templates found, the presence of macros is heavy (over 10000 lines of

code).

Fig. 13 - Macros on each office network drive (only templates)LOC indicate lines of codes and CNO files that could not be opened.

In conclusion, the results of the analysis of macros may be incomplete due to the fact thatsome documents that are password protected, could not be analyzed. This has the effect that thereal number of macros contained in documents may be bigger than the one reported by the tool,a fact that has to be taken in consideration when analyzing an existing situation.

4.4 Conclusions

From the data collected the migration in the case study was thought not to be problematic, fortwo reasons:• dependencies were not critical.• documents containing macros were not so many to represent problems.

However, some attention has to be given to Excel documents, as they are potentially an issuedue to macros contained and files that could not be opened for analysis.

Dependency Analyzer ran for 7 weeks on the test machines without problems, MacrosListercollected the required information from the documents analyzed.

The analysis of the dependencies is only a part of the study on the transition to Open SourceSoftware for Desktop Office Automation, other factors have to be taken in consideration whenmigrating to a new software platforms like the cost of training, the natural hostility of people to

Page 22

WORD (.DOT) EXCEL(.XLT)Office N. Files With Macro CNO LOC N.Files With Macro CNO LOC

A 0 0 0 0 0 0 0 0B 99 0 0 0 0 0 0 0C 526 0 0 0 0 0 0 0D 1557 0 0 0 9 3 0 10197

Total 2182 0 0 0 9 3 0 10197% 100% 0,00% 0,00% - 100% 33,33% 0,00% -

Page 23: Analysis of dependencies among personal productivity tools

embrace changes and the reduced productivity that will be the result of the first phases of themigration. Analyzing the dependencies can prove extremely useful in a study before a transitionto OSS, but cannot be used as the only discriminating factor in favor of one solution. However,understanding in advance what are the key dependencies present in a determinate configurationcan only bring benefits, as a strategy to face the constraints can be set-up.

Page 23

Page 24: Analysis of dependencies among personal productivity tools

AcknowledgmentsAcknowledgments to Prof. Giancarlo Succi for his competence in the field of OSS and the

support given during the writing of the thesis. This work has been possible thanks to thesupport given by the Autonome Provinz Bozen. In particular, the author would like to thank Dr.Hellmuth Ladurner and Erwin Pfeifer. Acknowledgments also to Dr. Hugo Leiter, Director ofthe EDP Unit of the Consortium of the Townships of the Province of Bolzano-Bozen: hisexperience gained during the last years of transition to OpenOffice has been of invaluable help.Finally, a warm greeting to all the participants to the project for the patience showed during thetest-phase.

Page 24

Page 25: Analysis of dependencies among personal productivity tools

Bibliography

[1] “Indagine conoscitiva della Commissione per il software a codice sorgente aperto nellaP.A.”, Ministero Innovazione e Tecnologie

[2] “On the Transition to an Open Source Solution for Desktop Office Automation” - Rossi,Russo, Zuliani, 2004 (paper submitted to TCGOV 2005) -http://www.inf.unibz.it/tcgov2005/cfp.html

[3] “Comparative assessment of Open Documents Formats Market Overview”, IDA(Interchange of Data between Administrations) European Programme ,http://europa.eu.int/ida/

[4] Microsoft ToolHelp Library, http://msdn.microsoft.com/library/en-us/perfmon/base/tool_help_library.asp

[5] Microsoft “Taking a Snapshot and Viewing Processes”, http://msdn.microsoft.com/library/en-us/perfmon/base/taking_a_snapshot_and_viewing_processes.asp

[6] “Detours: Binary Interception of Win32 Functions”, Galen Hunt and Doug Brubacher(Microsoft Research)http://research.microsoft.com/sn/detours

[7] Microsoft Primary Interop Assemblies,http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnoxpta/html/odc_oxppias.asp

Page 25