mark crawford boehringer ingelheim (uk) ltd. … and... · mark crawford boehringer ingelheim (uk)...

12
Design and Implementation of Clinical Data Management Applications on A PC Network by Mark Crawford Boehringer Ingelheim (UK) Ltd. ABSTRACT In March 1988 we started to assess both the potential and impact of SAS/ AF software on our Clinical Data Management systems. The presentation covers aspects of the whole process from installation of the central SAS version on a file server to the implementation of multi-user applications on that network. Some simple 'benchmark' performance figures will be given to highlight the efficiency differences when using this type of system configuration. 449

Upload: duonganh

Post on 28-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Design and Implementation of

Clinical Data Management Applications on A PC Network

by

Mark Crawford

Boehringer Ingelheim (UK) Ltd.

ABSTRACT

In March 1988 we started to assess both the potential and impact of SAS/ AF software on our Clinical Data Management systems. The presentation covers aspects of the whole process from installation of the central SAS version on a file server to the implementation of multi-user applications on that network.

Some simple 'benchmark' performance figures will be given to highlight the efficiency differences when using this type of system configuration.

449

Mark L. Crawford

During this presentation I will be talking about five general topics.

1) The Chronology of SAS usage in B.I.L.

2) The Scope and Size of the Current Network.

3) Current SAS Service to Users.

4) The General Philosophy.

5) Future Directions.

1) The Chronology of SAS Usage at Boehringer Ingelheim Limited.

Pre JAN 86

JAN 86

JUNE 86

OCf86

LONG GAP

MAR 88

JULY 88

OCf88

DEC 88

1989/1990

1991/1992

Mainframe SAS via modem link (sometimes !).

Took delivery of BASE module only.

The STAT module ar11ved.

Beta test version of FSP arrived.

Something to do with Clinical Trials.

The long awaited 6.03 including AF/SCL.

Networked the CDD, Data entry use as data server.

Networked SAS, machines use a "So/tware Server".

SAS gets first FRONT-END for data entry.

Many Menu driven applications being designed. OS/2 and SQL/SAS research.

Research a gateway to SAS on the AS400 for heavy number crunching and communications.

450

I· ..

Mark L. Crawford

You can see from the overhead that PC SAS usage started some three years ago and its use for statistics started only two and one half years ago. Data were still being entered via Lotus Symphony up until two years ago and until July 1988 we shared our offices with mountains of diskettes.

Currently, we have a number of systems revisions and new projects in hand which I shall discuss in more detail later.

2) The Scope and Size of The Current Network.

The P.C. Network which I am about to describe, constitutes the computing resource for the Medical Division of B.I.L. As such it has a considerable workload to get through.

HARDWARE:

The Network file servers (shared resource machines) are Compaq 386 (16 megahertz) machines. One machine, dedicated to the administrative Relational Database has four megabytes of extended memory whilst the SAS server uses two megabytes. Each machine has one megabyte of its extended memory configured for disk caching. Each is installed with a 130 megabyte hard disk which achieve an average disk access speed of 25 milliseconds and are currently around 75% full. The Network cabling and boards are thin Torus Ethernet.

We currently support the following makes/models of P.C on this Network.

IBM PC (8088 chip) IBMXT IBM AT IBMATX lIT XTRA "(8086) TANDON286 DELL 386 WANGP.C.

These machines all run well on the Network but the 8086 machines do seem very slow these days.

451

Mark L. Crawford

PERFORMANCE;

Rather than bore you with performance claims taken directly from manufacturers pro­motional literature, I shall give you the results of some simple benchmark tests run in February this year.

Sample Network Performance Tests

Task From Local Disk From Network Disk (6-8 concurrent users)

Loading SAS into local memory 15secs 15sees (16 using floppy control)

Data step (1.2mb) to local 56 sees 51 sees SASWORK directory.

DOS Copy 17 secs to Iiet. 23 secs to local disk

PROC PRINT of 1.2mb 19mins 35 secs 19 mins 38 sees . SAS dataset.

~

(446*146)

Alltests were run under realistic conditions, those being with 6 to 8 concurrent users. In addition to this, the Network benchmarks comparison machine was a Dell System 310. This is an 80386 ma:chine also, which, on first appearances should out perform the server on a one to one user basis .. The Dell has a 20 megahertz processor (faster than the Compaq). The reasons for some tasks being faster, even with concurrent users, lie mainly in disk handling and performance. More precisely, Novell actually manages the way in which users requests are serviced by the disk to achieve disk optimisation. In addition to this, the Compaq not only has a faster disk but uses 'fast' extended memory whereas the Dell uses 8 megahertz expanded memory.

It is perhaps surprising that P.C. SAS can now be run on a diskless workstation although it would not be advisable as the server disk would have increased workload and caballing traffic would vastly increase. However, it is possible, and some users running very simple and controlled

452

Mark L. Crawford

applications might have their machines configured in this way. All that is required is a high density disk with Network shell files, DOS, and only 82k of SAS files. The SAS files needed on the local station ate SAS.EXE and a modified CONFIG.SAS. Most PC's with a 640k RAM would be sufficient but having Network Shell files loaded does reduce its capabilities. In addition, being in a SAS menu system further reduces memory .

. Networked machines should have extended memory especially if this involves menued systems. We currently require extended memory in the order of 1.5mb (on slow memory boards - Intel 8mhz). The use of machines with 80386 chips does enhance performance but not as much as one might expect. Some quite interesting examples of the expected performance enhancements may be seen in the paper given by John Dalton (SAS Software Ltd, SUBI88).

SOFTWARE:

Lotus Symphony.

Lotus Manuscript (2.0)

Lotus Freelance (3.0)

PC SAS (Version 6.03). Installed on the Network nearly one year after the system had been demonstrated as being reliable and fully operational. Installing SAS on a server is of equivalent difficulty to that on a Pc. Converting individual workstations' took around 20 seconds, having only to change 3 lines of the CONFIG,SAS. It takes much longer to delete SAS from the local hard disk! You can see therefore that converting our whole department (10 SAS users) to using a central SAS version took only one day.

Advanced Revelation.

Novell Netware (2.11)

KENS (Knowledge Enhanced Non-parametric Statistics)

GUM

453

i .

Mark L. Crawford

USERS;

In total, the Network supports 27 users, of which I can reasonably describe 17 as REA VY users, namely those who use machines for several hours in every working day.

10 of these are Clinical Data Processing and Statistics Dept. 3 are document tracking, administration and C.R.F design. 1 archive machine. 1 safety officer machine. 1 finance. 7 Clinical Research Associate users. 4 Scientific Research Associates.

3) The Current SAS Service to Users

The SAS system provided on the Network can be considered to be giving three levels of service.

Firstly there is a group of personnel who use SAS solely within the confines of the SASI AF system kriown as Entrysys. This system, as the name implies, provides a front end to the data entry group for all their SAS tasks.

Secondly, our data co-ordinators have access to Entrysys, but use it as required, and not as a SAS front end. This is because a number of study related reports may be of use to them, and this system provides the automated coding facility.

The last group is constituted of the Statisticians (2), the Clinical Data Manger, the Clinical Database Manager, and programming staff (1). The main benefits to these users is that SAS software is centrally maintained, zaps are applied consistently, and disk space is freed;

Disk space considerations and the advent of OS/2, pose some very interesting problems for SAS users. Our more recently purchased PC's are equipped with 40mb hard disks, but this was soon exhausted by 27.5mb of SAS (for users with the full product) , 3.9mb of Lotus Manuscript and 3.5mb of Lotus Freelance. For an above average machine, only 4.25mb of working space would be left after taking DOS into account. The implication is therefore, that

454

Ii " '.,

.' t ~" .. ,"'

Mark L. Crawford

without being networked, our SAS users could not use OS/2 because they would not have the disk space, unless we purchase many new machines. It seems fairly clear therefore that people in a similar situation who wish to "upgrade" to OS/2 will have to address Networking and the new OS at the same time.

Earlier in the presentation, I mentioned a piece of SAS/ AF software called Entrysys. This module was developed exclusively in SAS by Boehringer as a multi-user, networked system for the Clinical Data Department. It marks the start of an evolving data management system to encompass all areas of our work.

Entrysys is a single SAS/AF catalogue which has been designed to have multiple access points for different classes of user. One version is maintained on the central server and is accessed by staff at different levels as defined by their AUTOEXEC.SAS. A development version i.s maintained on the other server which is constantly undergoing modernisation. At intervals, proposed to be six months. the development version is copied across and upgrade is complete. Our next upgrade is expected to take around two minutes.

1)

2)

3)

Reduce Training Costs

Provide NetworkedlMultiuser Coding system.

Automate tasks of Data Entry Supervisor.

What Does ENTRYSYS do:

Difficulty recruiting DE staff Much time wasted training and re-training. Needed to remove SAS code writing.

Coding dictionaries are only read to access codes therefore Network access may be set to allow multi-user reading.

Tasks including:

Sorting and Listing Datasets Datafile Compare (SCL compare - DVLP) Database entry figures Datafile access - browse only. Data Resolution reports. Dual entry status reports. Monthly monitoring reports.

455

, \ ..

~'. -

Mart L. Crawford

4) Menu-Driven Labelling Simple to use label creation. Label database created in SAS.

5) Regular Reports for Major Clinical Trials,

Safety reports generated monthly.

Each of these reports requires little more from the userthari to seleCt the appropriate datafile from selection lists. I have taken the opportunity to process some of the reports mentioned above whilst the server was servicing eight concurrent users. Three examples are given, all written in SCL, and I think you'll agree that they are fairly efficient. All figures were generated on one server with eight concurrent users to give a more realistic picture of report times.

Report Type Extent of Task

Data Re.s.oll!tiQn Rel1.orts. Check every data item for presence of a missing value code and where found, write relevant infomlation to a SAS dataset.

Time In Other Words

5 mins fora dataset of This represents a checking speed of 160 variables per second 247*195 vars. including external datafile writing. For this example, 0.1 % of

variables searched contained missing value codes. When found, relevant information is passed to an external SAS dataset.

456

Mark L. Crawford

Report Type Extent of Task

Dataflas.e. EntrY. Figures. Access all trials on the database, assess the number of variables, group together the entry figures for each study's datafiles and compare to previous months figures.

Time In Other Words

Under 2 mins to finish print- 60 files from around 15 different studies are opened and

out. assessed for number of variables. Their numbers are compared to those held on report datafile for the previous month. 6.25mb of files checked. A full run (around 45mb live data) would therefore take approx. 15mins.

Report Type Extent of Task

Dual EntrY. S.latll.~ R W-Qrt Compare the observational composition of2 datafiles and print a report of discrepancies.

Time In Other Words

Sub one min. for printing to This is for 2*200kb files with several dozen discrepancies finish. being written.

4) The General Philosophy

The first release of Entrysys (Dec. 88) used a fairly rigid and pre-programmed approach to allow users access to reports and database files. This has now been abandoned for a more flexible approach using datafile definitions.

With the recent development of a more sophisticated front end to our operations, it has become clear that users were repeatedly defining database characteristics which remain constant throughout the life of the project. In itself, this leads to an inefficient use of the system and there is potential to make mistakes which are wholly avoidable.

457

Mark L. Crawford

A problem arises here, due to the fact that in the common usage of the term, the SAS system is not a database. However, we wish to use SAS in a more sophisticated way than a well organised group of flat files.

In trying to do this we have adopted a central Clinical Database definition datafile which is itself a SAS dataset. At BJ.L this is known as the CJDB, and currently holds the following information about each trial.

Current Clinical Trial Numbers.

Individual Clinical Trial Library names (Libnames).

The DOS names of each SAS dataset which forms part of that trial.

For each of these SAS datasets, the observational indentifier variables ( the BY variables).

The investigational product name.

The location of the product specific dictionary to be used for coding AE's.

Monthly monitoring report program names.

There is clearly a huge potential for this approach and we intend to use it wherever possible to act as a central definition. The file is accessed whenever a standard report or task is requested. Usinga dataset approach rather than a programmed approach for this particular part of the system means that re-definition may occur whilst the system is live.

An example of the type of access made is for the Monthly Data Entry Figures Report. This report is selected form the Supervisor Menu, thereafter the system opens the central definition file (this is multi-user as are coding dictionaries) and reads the first observation (Clinical Trial entry). The system .collects all the names of the SAS datasets and then finds them. Upon finding files, they are opened, the number of items assessed and this figure is added to all other files under the same Clinical Trial entry. When the total has been calculated it is compared to the previous months totals and values written to a permanent SAS dataset with the cumulative entry figures. This apparently lengthy process occurs on average in around 6 seconds for studies containing around 0.5 Mb of data.

458

Mark L. Crawford

Other examples are that 'BY' variables for statusreports and compares are extracted from central definition and not requested from the user.

5) Future Directions

In-House Software:

1) New Datafile Compare PROC COMPARE is an excellent tool for assessing dif-

2) Data Dictionary

3 ) Safety Reporting

4) Drug Accountability

ferences between the entries in two SAS datasets which have been entered in batch mode. Our operation does not involve this kind of entry and so we require a more detailed report on the entry status of our studies. This has just been com­pletely re-written using SCL and came on-line at the end of April.

It seems quite clear that with increasing standardisation of file structure and naming conventions a system based on selection lists and automatic datafile creation should be designed. Parts of new project databases will be created at the touch of a button using screen control language and selection lists.

A need to centrally co-ordinate, code and report Adverse Events is evident. The generation of regular drug interaction and global event reports should be automatic. SCL will be used to monitor and co-ordinate a fully integrated safety database.

An SCL system could be used to check drug allocation against randomization's (automatically). At the end of the trial, the allocations and returned drug supplies could be automatically checked against patient records on the Clinical Trial Database for discrepancies. Not only could such a system monitor allocations during the trial but automatically generate reports of the drug disposition and allocation.

459

Mark L. Crawford

These are but a few of the projects to be undertaken at B.LL in enhancing the way that we manage our Clinical Trials.

Hardware:

Upgrades will be made to file servers. Probably Compaq 386 with 25mhz chips and 300mb hard disks. External disks will be used for duplexing.

Other Research:

For the future, we see three major impacts on the way our systems are organised. During 1989 we will be researching direct P.e. Network linkage to our German head office via a Server Partition and X.25 card. In the latter part of this year we will also run a test on SQl)SAS under OS/2, should it be available. The last line of testing and research will look at running concurrent SAS sessions on an AS400 and a Network Server with a gateway linking the two. Large jobs could be remotely submitted. This is some way off yet.

References:

1)

2)

"The Use of SAS/~ In Clinical Data Applications." Mark Crawford and Andrew Lawton -Boehringer Ingelheim (UK) Ltd. SUBI88 Conference Proceedings (p. 90-100).

"Medical Dictionary Based Coding Using SAS®." Andrew Lawton and Mark Crawford - Boehringer Ingelheim (UK) Ltd .. SUBI88 Conference Proceedings (p. 83-89).

460