too many websites v2
TRANSCRIPT
e-Delivery Team
2
Page Count <50 <1000 <2000 <3000 <4000 <5000 <10000 <20000 <100000
Site Count 1891 738 251 120 63 113 4 28 25
%of total 58% 81% 89% 93% 95% 98% 98% 99% 100%
Cumulative
Site Count 1891 2629 2880 3000 3063 3176 3180 3208 3233
Gov.UK - Web Site Page Counts
0
200
400
600
800
1000
1200
1400
1600
1800
2000
<50
<100
0
<200
0
<300
0
<400
0
<500
0
<100
00
<200
00
<100
000
e.g. 0<x<50, 50<x<1000 etc
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
%ag
e o
f to
tal sit
e c
ou
nt
High count with less than 50 pages – many redirects (where domain has changed or is not active, e.g. direct.gov.uk), also sites that use ASP or frames making it impossible for google to spider behind first page
But, notwithstanding inability to spider some sites, it looks clear that the vast bulk of .gov sites have less than 2000 pages
And only a few sites have huge page counts (between 20,000 and 100,000) … including ir.gov.uk, dh, scotland, ons, hmso
Pages per site
Using google to spider to count the pages of all 3233 .gov.uk sites … 90% of sites have less than 2,000 pages
Less than 1% of sites have more than 20,000 pages
e-Delivery Team
3
The Google Data - Raw
The Google data shows: More than 80% of the content (in pages) is found in around 10% of the total count of sites
There are huge numbers of very small sites (per Google), although that may be because
Google is unable to spider or does not cover all sites through the entire hierarchy
Still, errors in Google indexing are likely to be consistent across the entire population of .gov
sites, making the shape of the graph likely ok
Google's site sizes
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
sit
e s
ize
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
e-Delivery Team
4
Counting Servers
Checking on the servers operating
behind the websites in .gov.uk
Over 1,200 running Apache
And more than 1,500 running Microsoft IIS
These figures don’t include servers that
may be configured but not active for, e.g.
resilience. They also don’t include
servers further down the infrastructure
stack, e.g. running content applications
or other code
Naturally, each of these servers is likely
to be accompanied by firewall and
storage configurations
At a conservative cost of £10,000 per
server, the total cost of this infrastructure
alone is over £29,000,000
Apache 1209
Apache 186
Apache/1.3.26 274
Apache/1.3.27 282
Apache/1.3.28 62
Apache/1.3.29 99
Apache/2.0.40 25
Apache/2.0.45 1
Apache/2.0.46 32
other Apache 248
Microsoft-IIS 1547
IIS/4.0 377
IIS/5.0 1103
IIS/6.0 65
other IIS 2
Lotus-Domino Lotus-Domino 109
Netscape-Enterprise Netscape-Enterprise 74
e-Delivery Team
5
Cost of Websites (Benchmarking)
250k 500k 750k 1.0m 1.25m 1.5m 1.75m 2.0m 2.25m 2.5m 3.0m
Not on Record•dti•IR•HMCE•Home Office•DEFRA•ODPM
HMT
OFT
TheRegister.com
DWP The PensionService
JC+
DH
ONS
Worktrain
Business.gov?
JC+ (development)
Worktrain(development) DfES
LargeQuasi-PublicSector (fullyLoaded)
Figures drawn from recent PQ (and, unless stated, include only hosting charges and not development or development support)
DfT
e-Delivery Team
6
Characteristics of .gov.uk sites
More than 3200
sites
More than 2.5
million documents
Huge - up to 100,000
pagesComplex - Nine levels
deep
More than 200
URLs per dept More than 300
authors
Some parts of the
site not linked to others
‘orphan content’
100s of broken
links
Slow - download time
more than one minute
Unreliable - Poor uptime
Inconsistent - five different
look and feels
More than three
navigation designs
e-Delivery Team
7
Looking For The Right Thing?
30/03/2004
Disability Living Allowance 14,700
Child Tax Credit 5,790
Carers Allowance 915
Working Family Tax Credit 546
Attendance Allowance 13,000
Council Tax Benefit 42,000
Housing Benefit 77,800
Statutory Sick Pay 6,200
Self Assessment 14,000
Using Internet search engines in an
effort to find “the right thing” can be
challenging. The search terms at left
were entered, with the results restricted
to the “.gov.uk” domain only
There is a huge amount of duplication
in government online:
Many local authority sites repeat the
description of the rules for claiming
certain benefits, where to claim, what to
claim for and so on … and doubtless,
every year or so, each of these
mentions must be updated with the
correct rules (but what if they’re not?)
Even “self assessment” only has 4,950
mentions on the Inland Revenue’s own
site, but a further 9,000 across the rest
of government
e-Delivery Team
8
And how does .gov look to the consumer?
The variety of sites show little in
the way of consistency
Navigation varies from site to
site, sometimes on the left,
sometimes tabbed, sometimes
graphic, sometimes text
“Search” is called different
things, is often not on the home
page and often returns poor
results – despite research
showing that consumers who
can’t see what they want
instantly will use search
Accessibility is poor with many
sites not attempting to achieve
the lowest hurdles
Even sites owned by the same
parent are confusing, e.g.
pensionservice, pensionguide,
agepositive, over50 …
e-Delivery Team
9
The Missing Data
To complete the picture and allow the proposed plan of action to be fine
tuned, the following data is needed:
Visitor counts (Hitwise may offer an approximation)
Approximate costs to operate (at an infrastructure level including all servers,
network equipment, firewalls, software licences etc) – both price bought at and the
price for continued operations projected forwards (to allow for annual licence
premiums, renewals etc that may be due in the future)
Contractual agreements around exit arrangements, renewal dates etc along with
whether the contract for web hosting is part of a wider technology outsource
agreement (that might, therefore, make it harder to exit)
e-Delivery Team
10
Proposal For What Next
Principles
Government is in the business of helping citizens by making information easy to find. The total number of websites needs to be rationalised dramatically – from over 3,000 to under 600 in the first stage (including Local Authorities).
Government is in the business of presenting information in a way that citizens will understand; it is not in the user interface design business. The range of navigational and interface styles needs to be harmonised to a single core style.
Government has already spent significant sums on its online presence, yet government is not a technology leader. The cost of the programme outlined must be absorbed through saves generated in the first year of the programme, making it self-funding.
Government buys in cycles and these are likely to be maintained. This cycle will allow work to be completed at a constant pace as contracts come to their natural end, thus incurring no exit penalties.
A programme of rationalisation this large will require multiple parallel streams of work – the cost of the overlap reducing the saves inherent slightly but increasing the odds of success through elimination of bottleneck and delay
e-Delivery Team
11
DotP versus Everything Else
Condensing 3,000+ sites to a few hundred is no simple task. It will likely
require a variety of approaches and software solutions to ensure that there
are no bottlenecks.
DotP’s primary characteristics are:
A managed service model (i.e. hardware, software, network included)
A high end content management engine allowing customised workflow, complex
information architectures and large numbers of geographical authors
Highly resilient, scalable and secure infrastructure reducing the risk of failure
A model to allow changes to sites through configuration, not code customisation
A range of features tailored to solve government’s main content problems
Other content engines usually:
Come as a software licence with extensive customisation required
Have a range of features that DotP doesn’t have and that have been developed
over several product cycles, primarily for commercial customers. Some of these
features will be useful for government
Will develop competitively no matter what government does
But they rarely come as managed services, necessitating hosting and
management to be included
e-Delivery Team
12
Setting Up The Programme
Select a core of important websites based on:
Total size (aiming to isolate 50% of the content in government)
Visitor count (capturing a large chunk of the audience, say 50%)
Transaction generation (targeting the bulk of online transactions for both business
and citizen)
Content management status (looking first for unmanaged systems still based on
HTML or those that are not well advanced in terms of a content engine)
Outline the information architecture as it is coupled with the target
architecture for how it should be – taking each site and fitting it into an overall
architecture and design that is consistent across all of them
It is assumed that these sites – ranking as the most popular and largest in
government – will need rearchitecting to make the most of them (including a new
layout, new navigation and so on)
This rework will give a good chance to eliminate duplication and inconsistency, as
well as remove as much as 30-50% of content as redundant (based on experience
with Department of Health).
e-Delivery Team
13
Establishing The Target Platforms
To identify the target platforms, the following is proposed:
A “bake off” competition is kicked off where a variety of content management
vendors are given an environment (with workspace, hardware and network
connectivity).
Each vendor is given the same brief – to take an existing, static website – the
“challenge site” - with a known information architecture and transfer it to a new
target architecture (also provided).
The vendors then set up their systems, using templates and guidelines provided
by government, to deliver the challenge site under strict timescales – including
defining the architecture, implementing the style guidelines, integrating the search
engine and migrating the content
At the end of the competition, a subset of the vendors who have met previously
agreed and published criteria is passed through to the next stage
Commercial agreements are then built – using standard templates – with the
vendors, allowing for volume discounts on licences to be obtained.
Websites in the core population are then allocated across vendors and the
implementation task kicked off. Vendors that perform are given more, vendors
that don’t perform are gradually eliminated and their work shared across other,
more successful vendors
e-Delivery Team
14
Why a Bake Off?
Migrating some 3000 websites is a fearsome task, here is why there should
be more than one solution going:
The problem is not one of only technology – the changes required to government
editorial processes are enormous. The greater the range of experience thrown at
this, the better the result
One single system (or even two or three) would result in bottlenecks that would
delay rationalisation. Having several “similar” but independent systems will
resolve the bottleneck
One large system would be high risk – a single outage could take down
government’s online presence – spreading the systems will, in the end, reduce risk
versus cost.
Competition is healthy – a few players working both together (to complete the
goal) and against each other (to complete the goal first and therefore win
business) will work well
But, we need only a few (5,6,7?) – too many will bring too high an overhead and
risk quality standards
e-Delivery Team
15
Estimating the Costs
The costs of migration will include:
The initial work to identify candidates
The evaluation of target platforms
The setting up of migration environments
The cost of redesign of some sites to make them consistent with the target
standard (e.g. search engine on home page, navigation through tabs, reducing the
depth of the site etc)
The cost of redesigning pages to fit the new system – e.g. where the site uses
custom techniques that are not easily replicable
The actual migration of data from one format to another (there are tools that claim
to do this, with varying success, or manual methods – these too will need to be
assessed)
e-Delivery Team
16
Integrate … Marriott.com
One URL
13 brands
Five major redesigns
2,600 locations
142,000 people
e-Delivery Team
17
Rationalise … IRS.gov
235 sites … to one
47% e-filing
25 million regular users
AOL cache data at peaks
80% of e-filers do it again
Accountants starting to charge $35 for
those who want to do it on paper