adrian ball cv 2015-07 v0.7

8
ADRIAN BALL Senior Linux/Solaris System Administrator [email protected] 07539 623647 http://uk.linkedin.com/pub/adrian-ball/15/214/842/ SUMMARY Available for contract work from September 2015. Enterprise systems design, administration, troubleshooting and support Scripting and automation Sun-certified Security-cleared (SC) Conversant in the ITIL framework. KEY SKILLS Solaris (2.x to 10 and SunOS 3/4) Red Hat Linux System Administration Systems Engineering Unix Shell Scripting (bash / ksh) Sun/Oracle/Dell hardware Systems and Infrastructure Design DNS Disaster Recovery Backup and Recovery High Availability ZFS Sun Cluster (HA) NFS Solaris Volume Manager (DiskSuite) CentOS PERL, TK/TCL SSH Veritas VxVM/VxFS (storage foundation) RHN/Kickstart Ubuntu/Debian/Mint Linux HPC cluster Oracle Enterprise Linux Remedy/ITIL Jumpstart/JET SAN / EMC Clariion NetBackup & Legato Networker Team Leadership Solaris LDOMs (OVM) & zones NTP Technical Documentation Production Support Sendmail Apache Troubleshooting/problem solving Vendor Relationships Puppet DRBD/Corosync LDAP integration TCP/IP ABOUT ME I have over twenty years of experience designing, testing, building, documenting, automating and supporting Enterprise Unix systems. With a background in programming & scripting, I seek ways to make systems work together better, and will design and implement improvements, in addition to core activities. My technical skills are hands on, I'm happy dealing with a cross-section of disciplines and making them work cohesively. I have direct experience with storage, databases, applications, monitoring/alerting, packaging, data transfer, networking, web servers and other network services, mail, performance tuning and backup & recovery - in most cases this is quite in-depth. Where dedicated teams look after these areas, I have often assumed a project technical coordination role. I'm comfortable dealing with difficult and unfamiliar technical issues, often troubleshooting live production problems. In pressured situations, I deliberately take a careful and precise approach, gathering evidence to correctly identify the root cause of the problem, communicating with stakeholders as necessary, until a resolution is reached. I am an advocate of sharing knowledge effectively, clearly communicating and documenting ideas and processes. I will take the time to record useful information for others, and correct out-of-date instructions as they are encountered. Adrian Ball – Curriculum Vitae – 07/2015 - page 1/8

Upload: adrian-ball

Post on 21-Aug-2015

11 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Adrian Ball CV 2015-07 v0.7

ADRIAN BALLSenior Linux/Solaris System Administ ra tor

[email protected] 623647

http://uk.linkedin.com/pub/adrian-ball/15/214/842/

SUMMARY

Available for contract work from September 2015.

Enterprise systems design, administration, troubleshooting and support

Scripting and automation

Sun-certified

Security-cleared (SC)

Conversant in the ITIL framework.

KEY SKILLS

Solaris (2.x to 10 and SunOS 3/4) Red Hat Linux System Administration Systems Engineering

Unix Shell Scripting (bash / ksh) Sun/Oracle/Dell hardware Systems and Infrastructure Design DNS

Disaster Recovery Backup and Recovery High Availability ZFS

Sun Cluster (HA) NFS Solaris Volume Manager (DiskSuite) CentOS

PERL, TK/TCL SSH Veritas VxVM/VxFS (storage foundation) RHN/Kickstart

Ubuntu/Debian/Mint Linux HPC cluster Oracle Enterprise Linux Remedy/ITIL

Jumpstart/JET SAN / EMC Clariion NetBackup & Legato Networker Team Leadership

Solaris LDOMs (OVM) & zones NTP Technical Documentation Production Support

Sendmail Apache Troubleshooting/problem solving Vendor Relationships

Puppet DRBD/Corosync LDAP integration TCP/IP

ABOUT ME

I have over twenty years of experience designing, testing, building, documenting, automating and supporting Enterprise Unix systems. With a background in programming & scripting, I seek ways to make systems work together better, and

will design and implement improvements, in addition to core activities.

My technical skills are hands on, I'm happy dealing with a cross-section of disciplines and making them work

cohesively. I have direct experience with storage, databases, applications, monitoring/alerting, packaging, data transfer, networking, web servers and other network services, mail, performance tuning and backup & recovery - in most cases

this is quite in-depth. Where dedicated teams look after these areas, I have often assumed a project technical coordination role.

I'm comfortable dealing with difficult and unfamiliar technical issues, often troubleshooting live production problems. In pressured situations, I deliberately take a careful and precise approach, gathering evidence to correctly identify the

root cause of the problem, communicating with stakeholders as necessary, until a resolution is reached.

I am an advocate of sharing knowledge effectively, clearly communicating and documenting ideas and processes. I will

take the time to record useful information for others, and correct out-of-date instructions as they are encountered.

Adrian Ball – Curriculum Vitae – 07/2015 - page 1/8

Page 2: Adrian Ball CV 2015-07 v0.7

EXPERIENCE

Ux1 Ltd / UK Met Office (contract, two renewals) 09/2014 – 08/2015

Linux systems support and administration

OVERVIEW

The UK Met Office is a world-renowned weather forecasting centre, with data and science at its core. Its multi-million

pound computing facilities include IBM & Cray HPC Supercomputers, IBM mainframe systems (running zOS linux instances), massive data storage and retrieval facilities, large-scale VMware ESX clusters, a large operational Linux

server estate, a mixture of Windows and Linux desktop clients and the necessary support systems.I provided system administration services focussing on the operational Red Hat Linux estate, which comprised of

several hundred VMware instances, S390 zOS instances and physical servers.

KEY ACHIEVEMENTS

Improved and expanded the Munin performance trend monitoring service:

The existing service was under-used, primarily because the server was installed on a general purpose operational system, which was unable to adequately deal with the volumes of data. I designed and configured

two dedicated servers in different network environments, planned and seamlessly migrated the service. The service now monitors three times the original number of clients and has been an instrumental tool in resolving

performance issues.

Investigated and resolved several long-standing performance issues:

In one example, a satellite data collection/processing system was frequently over-running and missing scheduled targets. Using performance trend monitoring (Munin), more detailed information from sar, /proc/*

etc and researching kernel tuning options, I determined that the problem was that the system, whilst appearing to have plentiful RAM available to allocate from cache, was not allocating it quickly enough on demand. The

root cause was that the VFS (virtual file system) cache was being cleared first, and as this comprised thousandsof small files, it was taking too long. Changing the vfs_cache_pressure tuning parameter resolved this problem

- the systems have consistently performed within SLAs since this change was made.

Designed, documented and deployed a number of project-related intelligent system builds with Puppet:

Using the existing Puppet system, wrote new manifests with embedded logic to deal correctly with development, test, production, physical and virtual systems, enabling new instances to be added quickly on

demand, without requiring bespoke system-specific manifests.

Worked with the backup and storage team to manage the deployment of hundreds of EMC Networker

backup clients to the operational Linux estate:This required an information-gathering exercise, scripting to deploy to RHN-enabled servers, further scripting

to deploy to various unique systems which could not take advantage of any automation options, and the development and testing of a Puppet manifest tailored for the Met Office environment, which is now used to

deploy and configure the agent to all new systems.

TECHNOLOGY AREAS

Red Hat Enterprise Linux (RHEL) versions 4 to 6

Puppet

Remedy/ITIL

VMware ESX/VCenter, Dell servers & disk arrays, IBM S390 Linux instances

DRBD/Corosync HA cluster

Performance monitoring, troubleshooting and tuning.

Bash / ksh scripting

Adrian Ball – Curriculum Vitae – 07/2015 - page 2/8

Page 3: Adrian Ball CV 2015-07 v0.7

Ux1 Ltd / University of Bristol - Department of Theoretical Chemistry (contract)

07/2014 – 09/2014

HPC and general Linux systems administration

OVERVIEW

The University of Bristol Department of Theoretical Chemistry provides dedicated HPC cluster facilities to academic

staff, along with Linux workstations, some equipped with high-end GPU units.I was contracted for a specific period to provide support and management on these dedicated specialised systems.

General support included managing and maintaining the two Rocks HPC cluster systems hardware, job schedulers and queue management, software and troubleshooting.

KEY ACHIEVEMENTS

Upgraded the Curie HPC cluster, with minimal downtime:

One of the HPC clusters required upgrading for security and maintainability, but the system was complex and

large, with no equivalent hardware to test the process. I therefore recreated a virtual copy (with a representative, but reduced set of compute nodes) on my own equipment, and worked through documenting

and testing the process of upgrading from Rocks Cluster version 5 to version 6. This was a reasonably complex process, involving the reconfiguration of several external systems (including a virtual server running

on the cluster itself) and the upgrade+rebuild of 40+ compute nodes, along with the head node. The outage was expected to take about a week, but I was able to successfully return the cluster to service within the first

day.

Instigated compute node monitoring and alerting:

The HPC cluster compute nodes, were configured with a trade-off of high performance versus resilience. As they are hidden from the main infrastructure, they were manually monitored for such issues as mirror devices

going offline, filesystems filling, CPU/disk failures etc - all of which could cause compute jobs to fail or hang silently. I designed and developed a simple monitoring system to report on a number of issues via the head

nodes. This immediately picked up on several previously unreported problems, which were then able to be resolved quickly, improving the overall service.

TECHNOLOGY AREAS

Rocks Cluster

Linux: CentOS, Ubuntu, Scientific Linux

FlexLM, MPI compilers

KVM virtual machines

Bash

LDAP integration

Adrian Ball – Curriculum Vitae – 07/2015 - page 3/8

Page 4: Adrian Ball CV 2015-07 v0.7

HP Enterprise Services 12/2006 – 05/2014

Solaris/Linux systems engineer

OVERVIEW

Solaris, Red Hat Linux, shell scripting, design/build, support and troubleshooting for several clients, notably: The

Department of Work and Pensions, Aegon, Rolls-Royce Aerospace, Rolls-Royce Marine Power and the UK Ministry of Justice.

KEY ACHIEVEMENTS

Instigated and implemented the redesign of Solaris & Linux build systems for the UK Ministry of

Justice:

Scripted & documented modular & repeatable new builds for all hardware with the ability to retrospectively deploy updates on existing servers (e.g. security fixes) in a controlled manner. This resulted in much improved

(and predictable) upgrade and patching processes amongst other things, saving many hours of valuable administrator time.

Reverse-engineered key elements (NIS/Jumpstart) of the Rolls-Royce Unix/Solaris build system and re-

engineered to enable DHCP-based Jumpstart:

This removed a block in the critical path of the ongoing network infrastructure upgrade programme, enabling itto continue to original timescales, saving time and money.

Identified £140,000 savings in Oracle RDBMS licensing costs by specifying alternative (T-series) hardware

for the Rolls-Royce Strategic Sourcing project.

Designed and implemented a unique ZFS-based backup and disaster-recovery system for a bespoke

Ministry of Justice application:

Using Solaris zone restarts, ZFS snapshots and rsync cloning enabled the application to be securely copied andrestarted within 3 minutes, well within the allowed downtime of one hour per day. This saved the project some

embarrassment, time and money, as without this solution, service level agreements would have needed to have been renegotiated with the customer.

Redesigned and implemented the cross-site MoJ Unix NTP infrastructure to eliminate single points of

failure and adhere to security policy. The previous implementation had resulted in service outages as database,

application and NFS servers became out-of-sync. The new design performed flawlessly, resulting in no furtheroutages since implementation.

TECHNOLOGY AREAS

Enterprise systems design/build/support: Sun/Oracle M8000/3000, E10K, E25K, V-series, T-series, X-series

(Intel).

System design/build: OVM (LDOMs), zones, ZFS, Veritas (Symantec) Volume Manager/VxFS, Solaris

Volume Manager (Disksuite), Jumpstart/JET, LVM, NFS.

SAN (multi-pathing with EMC Powerpath, MPXIO, Veritas DMP & Linux native multi-pathing).

Solaris Cluster.

NetApp, HP EVA and EMC Clariion connectivity.

Red Hat Enterprise Linux, Oracle Enterprise Linux, Kickstart.

Patching/upgrades planning/deployment.

Monitoring client deployment (Tivoli, Sitescope & Xymon).

Scripting/packaging: ksh, bash, Bourne shell, SED/AWK, PERL, TK/TCL, Solaris packaging, RPM,

application/OS integration (e.g. for automated application zone shutdown/snapshot/restart)

Other: Sendmail configuration, DNS, LDAP proxy, SSH, rack layout/cabling, performance troubleshooting,

Adrian Ball – Curriculum Vitae – 07/2015 - page 4/8

Page 5: Adrian Ball CV 2015-07 v0.7

Data Protector, data migration, security scanning/remediation, firmware updates, cross-site NTP

design/implementation, Solaris resource management, technical disaster recovery (bare metal restore process planning/testing, offline boot), server consolidation planning/implementation.

Siemens Energy Services 11/2001 – 12/2006

Senior unix systems manager and team leader

OVERVIEW

Responsible for the design, implementation, maintenance and operation of the mainly Sun/Solaris based Unix systems,

storage, and backup infrastructure.

KEY ACHIEVEMENTS

Disaster recovery: Wrote the scripts and processes to recover the entire Unix-based service off-site from bare

metal, onto non-matching hardware (thus significantly reducing contract costs).

Designed an optimal and standardised system disk configuration, and implemented this on all existing (hitherto

individually configured) servers. This provided a building block for a number of subsequent system

improvements, including a better and more predictable performance profile, and improved backup/recovery times.

Designed and implemented a new backup network using existing equipment; improving throughput by eight

times. This enabled backups to be completed within the required window each night, improving the service to

users online during normal working hours – with no capital outlay.

Directly worked with vendors to gain best-value, particularly on older test-servers where the original

manufacturer would charge inflated prices for both initial purchase and subsequent maintenance of equipment nearing end-of-life. By purchasing legitimate second-user equipment and including spares on-site, this saved

Siemens tens of thousands of pounds, probably a six-figure sum, all without compromising service quality.

TECHNOLOGY AREAS

System performance optimisation: Worked with the DBA team to optimise VxVM disk volume layouts for

improved performance and resilience. Maintained the systems at up-to-date patch and firmware revisions.

Backup and recovery: Ensured that the daily backup procedures functioned correctly, amending as required.

Wrote scripts to automate several tasks, including the generation and web-enabling of tape picking-lists (for offsite storage), this enabled the operations staff to generate the lists on-demand, saving valuable administrator

time.

Developed, tested and amended semi-automated disaster recovery procedures, to ensure smooth operation

during the annual disaster recovery testing, and if required, real disaster recovery.

System design, configuration, procurement and installation.

System monitoring: Ensured that the monitoring system captured all important system issues. As previously

manually reported issues arose, created scripts to check for these unusual events: runaway processes, filesystem & volume size discrepancies, filesystems missing from backup schedules, network interfaces

running at incorrect speeds/mode etc.

Sun Microsystems (EDS/Rolls-Royce contract) 11/2000 – 09/2001

Unix consultancy – NFS server consolidation project

OVERVIEW

Adrian Ball – Curriculum Vitae – 07/2015 - page 5/8

Page 6: Adrian Ball CV 2015-07 v0.7

Contracted by Sun Microsystems to the EDS/Rolls-Royce Aero account, provided technical consultancy, design and

documentation for a large scale NFS server migration project. The work covered low-level design and implementation up to presentation at board level.

Audited the Sun server estate (400+ systems, 90 of which were NFS servers). This involved gathering information froma variety of disparate databases, spreadsheets, reports and from the servers themselves (systems and performance data).

Using the information gathered, designed and documented the new system enabling the NFS server population to be reduced from 90 to 23, whilst increasing capacity, performance, resilience and manageability.

KEY ACHIEVEMENTS

Developed a database and a suite of PERL scripts to analyse existing server data and made it available by

setting up a project web server on my own equipment. This enabled others on the project to extract relevant information as required, speeding up project progress.

Produced the detailed design & test documents with cost justifications for submission to the technical review

board, allowing the board to make an informed decision to proceed.

Wrote performance monitoring & data-collection scripts (as no suitable performance monitoring system was in

place at the time), and wrote a web-based interface allowing graphs to be displayed of the various data over

specified periods of time. This was used to compare the existing system performance against the pilot and initial implementations, proving that the new system was viable and working correctly.

Wrote a set of quota-management/reporting scripts in order that the pilot could commence prior to the delayed

arrival of the commercially-provided quota-management interface, removing a blockage in the critical path of

the project.

Barron McCann Ltd 07/1997 – 11/2000

Unix/technical disaster recovery consultant

OVERVIEW

Unix consultancy, with specific focus in technical disaster recovery of Sun/Solaris systems.

Major clients included:- IBM Business Continuity and Recovery Services, Powergen, Nortel Networks, M&G, Del Monte International, Toshiba UK, AstraZeneca and numerous local and county authorities.

KEY ACHIEVEMENTS

Devised the techniques and processes required to recover bare-metal Solaris systems on different hardware to

the original. This enabled us to offer lower-priced contracts, which brought in a lot of new business, and also saved us money through smaller capital outlay.

As part of the disaster-recovery service, delivered, installed and configured a replacement for a customer's

database & application server which was stolen one evening. The customer had no technical recovery

procedure in place, worked through the night to create and implement one, and fully recovered the system by lunchtime on the following day. This saved the customer, whose business was in shipping perishable goods,

tens of thousands of pounds in lost revenue. The customer had previously given notice that the existing contract would not be renewed, not seeing the value in a disaster-recovery service – after these events, a new

three-year contract was soon agreed.

Devised and presented training materials to teach our hardware engineers the basics of Unix system

administration, particularly how to identify and troubleshoot attached storage systems. This enabled them to provide a more effective service, with fewer calls back to support staff being required.

Adrian Ball – Curriculum Vitae – 07/2015 - page 6/8

Page 7: Adrian Ball CV 2015-07 v0.7

Adept Scientific PLC 01/1996 – 07/1997

Systems administrator

OVERVIEW

Reporting directly to the MD, I was responsible for the IT and communications infrastructure of the company.

Responsible for the technical management and development of the company's internal systems and connectivity.

KEY ACHIEVEMENTS

Redesigned and implemented a new structure for the company website, including a backend database (Postgres

(SQL)), including many automated features pulling in external data sources. Wrote the interfaces where

required), for example:

Automatically updated the company public training calendar from the internal Apple Mac calendar

application, enabling easy access to up-to-date information for customers.

Automatically generated an online telephone database from the SDX phone system. This ensured that

all phone information was consistent throughout the company, reducing communication errors.

Automatically locally mirrored selected areas of US business partners' websites (transatlantic transfer

rates being noticeably poor during this era). This allowed customers to research product information which was previously difficult to access.

Implemented Lotus Notes to replace the existing CC:Mail system.

Designed/implemented the company IT infrastructure database to aid management and reporting.

Managed the implementation of the WAN link to the newly opened US office.

Rothamsted Research - Biotechnology & Biological Sciences Research Council (BBSRC)

09/1987 – 01/1996

Analyst/programmer, systems administration and team leader

OVERVIEW

Part of the IT services team, I was responsible for the design and implementation of systems providing services to

around 1000 desktops over three sites. Prior to the systems manager/team-leader role, I was a scientific analyst/programmer for four years, writing bespoke data visualization programs in Fortran & C on DEC Vax and Sun

hardware.

KEY ACHIEVEMENTS

Designed and implemented the corporate networked desktop infrastructure. Based on Windows 3.11, PC/TCP,

Hummingbird Exceed and various GNU tools (MS-DOS versions of make, sed, grep and awk) on the client

side, along with NFS and Netware servers. Systems used an NFS shared distribution of Exceed (a PC X server)and NFS shared configuration files. This enabled central upgrades and management. Services were provided

by Unix and Vax/VMS servers via X-windows. This configuration enabled the scientific staff to access all available systems from a single desktop, making their work more efficient and reducing support costs.

Provided a bespoke graphics/data visualisation programming service (in FORTRAN and C) for scientific staff,

enabling them to develop insights into their data which were hitherto difficult, if not impossible, to reach.

Designed and implemented the corporate applications access solution. Written in C with Xview libraries, then

an improved and more flexible version in TK/TCL. Once again, this improved access and efficiency, and

reduced support costs.

Set up one of the first 100 web servers in the UK, and collaborated with a plant pathology researcher to make

Adrian Ball – Curriculum Vitae – 07/2015 - page 7/8

Page 8: Adrian Ball CV 2015-07 v0.7

some of the very first scientific reference images available to researchers and students on the Internet.

AFRC (Agriculture and Food Research Council) 1986

Student Placement

Plant pathology & entomology research projects. Interfaced micro-computers (Apple II / BBC) to lab kit and wrote bespoke programmes for data capture and interpretation.

Unilever 1985

Student Placement

Microbiology research project. Computing work included data preparation and modifying proprietary application code used to control transmission electron microscopes.

EDUCATION

University of Hertfordshire 1992

BSc Computer Science (with distinction)

OTHER COURSES

Sun Microsystems, SA-285, Solaris 2 System Administration - Exam 210-004 (Solaris System Administration

II) 85%, leading to Sun accreditation (EL1000)

Sun Microsystems, Solstice DiskSuite

Sun Microsystems, SA-380, Solaris Network Administration - Exam 210-005 (Solaris TCP/IP Network

Administration) 87%, leading to Sun accreditation (EL1500)

Sun Microsystems, SA-340, SunNet Manager;

Veritas Volume Manager;

Veritas NetBackup;

Oracle DBA part 1;

Sun Microsystems, SA-400, Solaris system performance management;

Sun Microsystems, ES-400, Sun E10000 system management;

Sun Microsystems, SA-225-S10, Solaris 10 new features for experienced system administrators;

Sun Microsystems, Sun Cluster 3.2

HP-UX System Administration for Experienced UNIX System Administrators

Others: Java programming, Advanced Vax/VMS Fortran programming, Vax/VMS DCL, OSF/1 administration,

image analysis, Uniras graphics library programming, System 1032 database, Datatrieve, Technical writing.

Adrian Ball – Curriculum Vitae – 07/2015 - page 8/8