how to identify credible sources on the

8/14/2019 How to Identify Credible Sources on The

1/123

HOW TO IDENTIFY CREDIBLE SOURCES ON THE WEB

by

Dax R. NormanNational Security Agency

PGIP Class 0001

Unclassified thesis submitted to the Faculty

of the Joint Military Intelligence Collegein partial fulfillment of the requirements for the degree ofMaster of Science of Strategic Intelligence.

19 December 2001

The views expressed in this paper are those of the author anddo not reflect the official policy or position of theDepartment of Defense or the U.S. Government.


2/123

ACKNOWLEDGEMENTS

Foremost, I am thankful for the endless patience of my wife and

daughter, who for two years worked and played one man short of a full team,

and often carried the ball when I should have.

I am grateful to Professor Jerry P. Miller, Director of the Competitive

Intelligence Center at Simmons College in Boston, for his patient and

persistent help in constructing the thesis survey.

I would also like to thank LTC (ret) Karl Prinslow, at the time, a

contractor employed by the U.S. Army Foreign Military Studies Office, for his

practical assistance, and encouragement.

Thank you must also go to my Thesis Chairman, Dr. Alex Cummins and

Thesis Reader Robyn Winder for their conscientious support of the Joint

Military Intelligence College Masters program by volunteering to serve as

Thesis Chairman and Reader.

ii


3/123


4/123

Appendices

A. Web Site Evaluation Worksheets.66

B. Survey to Industry and Academia...77

C. Survey to Intelligence Community..88

D. Criteria Analysts Currently Use to Judge Credibility..101

Bibliography......106

Annex 1. Survey Results (not included in original thesis.) 109

iv


5/123

LIST OF GRAPHICS

Tables

Page

1. Question 8a to 8r, Recommended Criteria and Relative Values (Mean).48

2. Questions 9a-f. Required Level of Source Credibilityfor Intelligence Products. .........53

3. Question 5. Part 1, Official Criteria for Unclassified Sources

....54

4. Question 5. Part 2, Official Criteria for Classified Sources

55

5. Questions 7a, b, c, j, k, l, m, Credibility of Well-Known Titles...

57

6. Questions 7d, e, f, g, h, i, Credibility of Obscure Titles, andForeign Web Sites

57

7. Questions 7n to 7s, Credibility of All Classified Sources

.59

8. Credibility of Open Sources Compared to Classified Sources...

.60

9. Question 7q, Credibility of IMINT Without Annotations..

..61

v


6/123

10. Benchmark Web Site Evaluation Work Sheet, Spot

66

11. Benchmark Web Site Evaluation Work Sheet, ITU

69

12. Benchmark Web Site Evaluation Work Sheet, NY Times.

71

13.Benchmark Web Site Evaluation Work Sheet, Korea..

73

14.Blank Web Site Evaluation Work Sheet..

76

15.Survey Question 6: Credibility Criteria Analysts Currently Use...

101

Graph

1. Question 7q, Credibility of IMINT Without Annotations..

..61

vi


7/123

ABSTRACT

TITLE OF THESIS: How to Identify Credible Sources on the Web.

STUDENT: Dax R. Norman

CLASS NO. PGIP 0001 DATE: 19 December 2001

THESIS COMMITTEE CHAIR: Dr. Alex Cummins

SECOND COMMITTEE MEMBER: Robyn Winder

There is little argument today that open sources and the World-Wide-

Web have a role to play in intelligence, but little has been written about

evaluating the credibility of Web sites and communicating that evaluation to

analysts. Such a capability is needed because of the increased opportunity to

collect open source intelligence from the Web; the ever increasing cost of

classified collection; and the ever-present demand on analysts to analyze and

report at the edge of their knowledge. With so many intelligence sources

available, including the Web, analysts must be able to identify credible

sources. The alternative is to evaluate every piece of information collected

from every Web site of intelligence interest. Due to the enormous size of the

Web, evaluating data validity is not practical.

That is why the Intelligence Community (IC) needs a generally agreed

upon set of criteria for evaluating Web sites of potential intelligence value.

Credible Web sites can be identified. However, without these criteria, and a

method to share the results, hundreds of analysts will repeatedly find the

same Web sites of dubious credibility as other analysts; they will attempt to


8/123

evaluate the sites usefulness and credibility by many widely different

standards, and share their results with only a few close coworkers. The

quality of these Web site evaluations will vary widely based on the subject of

the Web site and the subject expertise of the evaluator.

This thesis collected criteria recommended by professional Web

searchers and surveyed industry, academia, and the Intelligence Community

for their opinions of those criteria. From this survey the author developed a

weighted list of credibility criteria and a methodology that both the subject-

matter expert and the subject-matter novice will find useful. With these

criteria and the relative credibility scale, subject-matter experts throughout

the IC can evaluate Web sites within their area of expertise and share that

source evaluation with the entire IC.

This thesis identifies valid criteria for evaluating the credibility of open

source Web sites; presents a relative credibility scale based on benchmarked

Web sites; identifies the target level of credibility for all intelligence sources;

offers a Web site evaluation worksheet; and compares the credibility of open

sources to classified sources. Credible information can be located on the Web,

and although subject-matter experts are the best evaluators, any analyst can

evaluate a Web site when he does not have a subject-matter expert to assist

him.


9/123

CHAPTER 1

INTRODUCTION TO OPEN SOURCE EVALUATION

Along with the information technology revolution has come an equally

important increase in information access and information sources via the

World-Wide-Web. However, such abundance is a double-edged sword because

the Web contains every type of print, audio, and visual data from every type

of source, including children, students, professors, conspiracy theorists,

researchers, advertisers, government data, and government misinformation.

Information analysts must sort the useful information from the junk.

However, what is useless for one person may be just right for someone else.

This thesis will establish Intelligence Community criteria for identifying

credible Web sites from untrustworthy, or non-credible Web sites. This thesis

used a survey structured to answer several key issues and the research

question: how to identify credible sources on the Web. The hypothesis was

that credible Web sites can be confidently identified by evaluating the Web

sites based on criteria recommended by professional Web searchers and

agreed to by intelligence analysts. Most analysts today apparently evaluate

the data rather than the source.

1


10/123

VALIDITY MATTERS

This thesis will also show that most analysts do not attempt to identify

credible sources, but evaluate the validity of the data in the sources. There

is a common misunderstanding about validity and credibility. Validity is an

attribute of information. Validity also describes information as

simultaneously relevant and meaningful. Validity can also refer to the proper

use of logic to reach a conclusion.1 In psychometrics, validity can have

several meanings, including the proper use, or function of a measurement

tool.2 This thesis uses validity as an attribute of data that is verifiably correct.

Validity is what the analyst means when he asks, is this data correct?

Although validity is important to intelligence, it always describes the

information rather than the source, and alone does not measure believability,

which this thesis calls credibility. Because discrete elements of information

can be examined and compared, the validity of information is of most

concern to analysts because analysts know how to check validity. They

examine the data for consistency, verify it with other sources, or verify that it

functions as expected. Although consistently valid data can lead to credible

sources, the goal should be to identify sources as credible so that every

document from the source does not have to be validated. Establishing

source credibility should be of greater interest to analysts because they

cannot become expert in every subject on which they may be expected to

1 G. & C. Merriam Co., Websters New Collegiate Dictionary(Springfield, MA:G. & G. Merriam Co., 1975), under Valid. Cited hereafter as Websters.

2 Jum C. Nunnally, Psychometric Theory(New York: McGraw-Hill BookCompany, 1967), 75.

2


11/123

report, because organization focus changes, analysts change jobs, and there

just is not enough time to learn it all and still report.

This thesis will provide a tool for the general analysts to evaluate Web

sites as potential intelligence sources. Although Web site evaluations are

best done by subject-matter experts, analysts are often expected to report on

unfamiliar topics, and must discern for themselves if a source is credible.

Experts will also be able to use the recommended criteria and credibility

scale to evaluate Web sites in a consistent manner that other people will

understand, and can repeat.3

CREDIBILITY COUNTS MORE

To judge validity, an analyst must understand the issue, or technology,

or strategy, or politics very well for every data element included in his

reporting. Because every analyst cannot possibly be an expert on every

subject, they rely on sources that they trust to provide valid data. This trust

in a person or group is a measure of credibility. A credible source offers

reasonable grounds for being believed.4 This is the meaning intended in

this thesis for credibility.

These credible sources are an essential element of intelligence

analyses because analysts are often expected to report on topics, in which

they are not expert, or that are too complex for any one person to

3 See Appendix A, Web Site Evaluation Worksheet, for the relative credibilityscale, benchmark Web site evaluation worksheets, and a blank evaluationworksheet.

4 Websters, under Credible.

3


12/123

understand. Because it is impractical for analysts to validate every data

element from every source, the focus should be on identifying credible

sources. In the area of Open Source Intelligence (OSINT), this is even more

important because of the widespread use of OSINT by the other intelligence

disciplines, and the multitude of unclassified open sources.5 The source must

be judged credible before the data can be judged valid. Of course this can

become a circular argument, but in the end it is more useful to have a

credible source than a valid data element. For example, it would be better to

know where to find a foreign leaders official travel schedule, than to know

where the leader will travel next. This is true because this credible source

can tell one where the next trip will be, any changes to his next trip, and the

details of subsequent trips. If a source provides valid data consistently, it

will soon be judged a credible source. However, once judged credible, it is

less important that every data element the source provides is validated.

Note that open source information (OSINF) is public or proprietary

information available to anyone for a fee or for free. OSINF becomes open

source intelligence (OSINT) when it is used by the Intelligence Community to

answer a intelligence question.

THE CHALLENGE OF CREDIBLE SOURCES

Regardless of the credibility of a source, or the validity of the data,

analysts are more likely to use the sources most accessible to them. The

5 Joint Chiefs of Staff, Joint Pub 1-02, Department of Defense Dictionary ofMilitary and Associated Terms, URL:, accessed 13 February2000. Cited hereafter as Joint Pub 1-02. This thesis uses intelligence disciplines, suchas OSINT, as defined in Joint Pub 1-02.

4


13/123

Web has the potential to put a worldwide library on the desk of every analyst.

With todays search engines and Web-directories an analyst can conduct a

single search of the Web in seconds that would take a librarian a career to

complete. This is because the librarians know which sources are credible

based on their own use of the sources or recommendations from other

librarians and subject-matter experts. Therefore, it stands to reason that

intelligence analysts, who do not have access to a subject-matter expert on

every reportable issue, should have access to credible information sources on

the Web. How to identify credible sources on the Web is the challenge of this

thesis.

In an ideal world, subject-matter experts in every field would identify

credible sources, and index them for everyone to use. However, even in such

a world there would be disagreement on what is credible. Therefore, the

research question that this thesis will answer is how to identify credible

sources on the Web. The focus is on Web sites because library science and

publishers have already established acceptable standards in the print media

for credibility. Such standards include peer-review in scientific journals,

editorial review in newspapers, independent verification of facts, and the

proper labeling of commentary and advertisements in magazines. In the

absence of such standard practices on the Web, it is up to the reader to

judge. With the help of expert Web searchers from industry, defense, and

intelligence, this thesis establishes a set of common credibility evaluation

criteria, which can be used by subject-matter experts as well as analysts

reporting on an unfamiliar issue. Some subjectivity remains, but the criteria

5


14/123

are established which provide analysts with the tools and vocabulary to

measure credibility of sources and describe a sources relative

trustworthiness, known as credibility.

ASSUMPTIONS

This thesis does make some assumptions. The first two are that open

source intelligence is less costly than classified intelligence, and therefore is

the preferred source if it can be trusted. The third assumption is that

credibility is relative to its intended use and user. For example, a CNN

broadcast might be sufficiently credible for indications and warning (I&W),

but not sufficiently credible for basic intelligence for which the analyst has

some time to conduct research, or when the product will become the

background for future reporting. Likewise, a second-hand report of the

humanitarian conditions in a country may be credible enough for a person

planning an overseas visit; however, only a first-hand report from an

authoritative, unbiased source may be considered for the subject of an

intelligence report. Therefore, a relative credibility scale is necessary rather

than an absolute determination of credible or non-credible.

A UNIQUE STUDY

Although other studies establish criteria for evaluating Web sites, such

as Alison CookesAuthoritative Guide to Evaluating Information on the

Internet, I have not found a study that focuses on establishing the credibility

6


15/123

of Web sites.6 Cookes work is an excellent guide to evaluating the overall

quality of many types of Web sites. The closest Joint Military Intelligence

College study found is MAJ Robert M. Simmonss unclassified thesis, Open

Source Intelligence: An Examination of Its Exploitation, 1995.7 Simmons

focuses on the accessibility and use of open source, not the credibility of

sources. Although Reva Baschs Secrets of the Super Net Searchers includes

the question of credibility, it is less formal than this study and asks the

credibility question differently of each expert interviewed.8Secrets of the

Super Net Searchers does not focus on any one issue, but asks many

questions of the industry experts. However, many criteria from Baschs book

were included in the thesis survey used for this study. This thesis surveyed

analysts from defense, intelligence, and academia, as well as industry, to

establish common criteria for evaluating the credibility of Web sites.9 The

broad survey population, which included industry, academia, and

intelligence, and the focus on credibility, make this study unique.

REVIEW OF THESIS

6 Alison Cooke, Authoritative Guide to Evaluating Information on the Internet(New York: Neal-Schuman Publishers, Inc., 1999).

7 Major Robert M. Simmons, USA, Open Source Intelligence: An Examinationof Its Exploitation in the Defense Intelligence Community, MSSI Thesis (Washington,DC: Joint Military Intelligence College, August 1995.)

8 Reva Basch, Secrets of the Super Net Searchers (Wilton, CT : PembertonPress, 1996).

9 E-mail Survey, Joint Military Intelligence College Thesis Survey: CredibilityCriteria for Web Sites, conducted by the author, July-August 2001. Hereafter citedas Survey.

7


16/123

The research for this thesis began with a literature review, found in

Chapter two. From the literature several authors were selected who either

represent a significant point of view or are in a position to influence other

analysts. The objective of the literature review was to identify what is

already known, or thought about identifying credible sources on the Web.

However, the literature also revealed tangent issues that influence how or

when unclassified open sources are used in intelligence products. Most

significantly, the literature review identified the criteria recommended by

expert Web searchers for judging the credibility of Web sites. Those criteria

were included in the thesis survey, which was the primary research tool used

by the author.

Chapter three describes the research methodology employed. That

methodology included gathering expert criteria from the literature review;

developing and administering the survey to both industry, academic, and

intelligence analysts, coding the survey results and entering the data into the

SPSS statistical program; and performing the calculations which answered

the research questions and the key issues. The recommended credibility

criteria were determined by identifying the criteria that analysts most often

rated as contributing 50 percent or more to the credibility of a Web site; then

determining the relative weights for each criterion and a relative credibility

scale. Finally, four Web sites of known credibility were evaluated as

benchmark sites. Chapter three describes this process in detail as well as

how the target source-credibility level was determined for most intelligence

products.

8


17/123

The results of the survey calculations are shown in the findings

Chapter four. The findings chapter, like the methodology chapter, is

organized to answer the research question and each key issue, which in short

include the following key issues: open source relevance to intelligence,

knowledge of existing official criteria, analysts objectivity, credibility of

foreign Web sites in English, credibility of classified versus unclassified

sources; and the research questions of evaluation criteria, and needed level

of credibility,

The conclusions are in Chapter five, and include analysis of the survey

results. The thesis concludes that credible Web sites can be identified,

evaluated, and shared with other analysts. Known weaknesses in the survey

are mentioned in the findings and conclusions chapters. Chapter six also

includes a recommendation for implementing this evaluation procedure in

the Intelligence Community. The appendices include a copy of the surveys

used; the competed evaluation worksheets for the benchmarked Web sites;

and a blank evaluation worksheet.

9


18/123

CHAPTER 2

LITERATURE REVIEW

RANGE OF THOUGHT

Open source information (OSINF) has been widely accepted as a

necessary element of all-source intelligence reporting, as demonstrated by

Director of Central Intelligence Directive 2/12, which established the

Community Open Source Program Office.10 Most experts agree that OSINF

should support classified intelligence collection. However, I think there has

not been significant attention paid to the issue of identifying credible Web

sites, a significant source of unclassified information. The Web makes foreign

newspapers and gray literature (documents with limited distribution such

as company brochures, or equipment manuals), more accessible, as well as

expert opinions, and research projects from universities, just to name some

valuable sources.11 The issue of identifying credible Web sites affects

everyone who uses the Internet, including defense, intelligence, academia,

and industry. Therefore, the literature reviewed for this study included

documents from all of these communities of interest. The authors presented

in this study include: Robert David Steele of Open Source Solutions Inc.; Dr.

Wyn Bowen of Kings College, London, writing forJanes Intelligence Review;

A. Denis Clift, President of the Joint Military Intelligence College (JMIC),

Washington, D.C.; Reva Basch, author ofSecrets of the Super Net Searchers;

10 Director of Central Intelligence, Director of Central Intelligence Directive2/12 (Washington, D.C.: n.p., 1 March 1994). Hereafter cited as DCID 2/12.

11 Basch, 110.

10


19/123

and Allison Cooke, author ofAuthoritative Guide to Evaluating Information on

the Internet. These authors are all in a position to influence information

analysts, either inside or outside of government, and represent a range of

opinions on the proper use of open source information.

All these points of view agree that there is more data available now

than an analyst can manage unaided. Their approach is what differs. Steele

and Bowen would expand the Intelligence Community, which is not going to

happen without a long, and gradual culture change. Clift sees a need for

better automated tools for data retrieval, including an on-line index of open

sources .12 Cooke and Basch offer solutions for today: evaluate sources based

on criteria similar to those used for traditional print media. This thesis will

demonstrate that the ideas of each of these authors combined with the

recommend evaluation criteria in this thesis, represent a practical solution to

the information fog of the Web.

Robert David Steele, Open Source Solutions, Inc.

Steele is the most vocal advocate for expanded use of OSINF to

support the other intelligence disciplines, and recommends expanding the

Intelligence Community to include business people and academics, who have

unique knowledge and access. Steele would have analysts consult open

sources first, including subject experts in industry and academia, and then

classified sources. He is President of Open Source Solutions Inc. His

company is in the private open source intelligence (OSINT) business, and he

12 A. Denis Clift, Clift Notes: Intelligence and the Nations Security(Washington, D.C.: Joint Military Intelligence College, 1999), 51-57.

11


20/123

has proposed his own plan for intelligence in the 21st Century, called

Intelligence and Counterintelligence: Proposed Program for the 21st

Century.13 Steele sees a great need to expand the access that analysts have

to OSINF.14 His view of the future Intelligence Community (IC) includes

several new groups, including scholars and business people, which constitute

the Virtual IC.15 It is these sources that Steele sees as the gold mine of

information. However, he does acknowledge that the Internet will greatly

expand access to OSINF, primarily secondary sources, which are derived from

an original source. He also suggests that OSINF may be used as a source of

tip-offs to serious issues that warrant classified collection.16 However, his

stand that classified intelligence is only useful in the context of what is

already known from open sources borders on accepted practice.

Dr. Wyn Bowen, Open-source Intelligence.

Bowen is an academic concerned about information overload, and

would add non-government subject-matter experts to the intelligence

collection process, as Steele suggests. Bowen thinks that subject-matter

experts should be the people to evaluate Web sites, which is unique in this

literature review. However, he sees open sources as an adjunct to classified

sources, not the source of first resort as Steele suggests. Bowen, who is a

13Robert D. Steele, Intelligence and Counterintelligence: Proposed Program

for the 21st Century, URL: , accessed 5 January 2000.Cited hereafter as Steele, Intelligence.

14 Steele, Intelligence, under Introduction.

15 Steele, Intelligence, under Part III Figure 18.

16 Steele, Intelligence, underPart III.

12


21/123

professor at Kings College, London, and writes forJanes Intelligence Review,

demonstrates the invaluable resources available through open sources in his

article Open-sourceIntelligence: A Valuable National Security Resource.17 He

uses weapons proliferation as a demonstration case. This case is very

effective because it reduces the issue to tangible products of intelligence

value found in the public domain. Bowen thinks that the role of OSINF is to

provide the context of classified information.18 He also dwells on the issue of

information overload, which concerns Clift. However, he would add non-

government subject-matter experts to the collection process, as Steele also

suggests. Bowen thinks the experts role should be to identify the useful

sources to keep and collect, (not specific data) and the worthless sources to

ignore. In his view, experts would also serve to evaluate sources for

inaccuracy, bias, irrelevance and disinformation, which non-experts would

find difficult to do.19

17Dr. Wyn Bowen, Intelligence: A Valuable National Security Resource,Janes Intelligence Review, 1 November 1999, Dow Jones Interactive, PublicationsLibrary, All Publications, Search Terms Open Source Intelligence, URL: , accessed on 4 March 2000.

18 Bowen, under Technical Sources.

19 Bowen, under Conclusion.

13


22/123

A. Denis Clift, President of the Joint Military Intelligence

College

Clift is also concerned about information overload, and sees a need for

better automated selection tools to solve the analysts selection problems.

Clift is President of the Joint Military Intelligence College (JMIC) in Washington,

D.C. His views are his own and do not represent that of the U.S.

Government; however, as President of the JMIC, Clift is in a position to

influence the opinions of analysts graduating and going on to work in

intelligence. He also served as Editor for the United States Naval Institute

Proceedings, early in his career, from 1963 to 1966.

In Chapter five ofClift Notes: Intelligence and the Nations Security,

Clift gives a short explanation of the open source programs available today to

support the intelligence analyst.20 He defends the Intelligence Communitys

record on making open source information (OSINF) available to intelligence

analysts. He gives an overview of the OSINF programs available to the

analysts, but does not indicate how accessible the information is. I observed

lines of analysts waiting to use Internet terminals in the JMIC library in 1999

and 2000. This is an example of why it should be clear to the Intelligence

Community (IC) that OSINF will only be used to its highest potential when it is

on the analysts desk. The work lost walking to a terminal down the hall or in

the next building is not worth the effort to analysts unfamiliar with the

sources, or inundated with other sources at their finger tips. Clift writes that

OSINF plays an important role in intelligence, and states that the IC already

has a good collection of OSINF in Central Information Reference and Control

20 Clift, 51-57.

14


23/123

(CIRC) of the National Air Intelligence Center and the Defense Scientific and

Technical Intelligence Centers.21 He notes the serious difficulties analysts

have within formation overload and the need for better-automated selection

tools.22 However, the technology Clift wants is not yet intelligent enough to

discern credible sources from non-credible sources. As will be demonstrated

in the findings chapter, determination of credibility requires research, and

corroboration, and has a measure of subjectivity.

Reva Baschs Secrets of the Super Net Searchers

Basch does not address the Intelligence Community, but does address

the issue of how to select trustworthy Web sites. Basch, as well as Cooke,

takes the most practical approach to finding credible information in the flood

of electronic data. Both recommend using evaluation criteria similar to that

used for print media, with some variations.

Basch published Secrets of the Super Net Searchers in 1996, after

interviewing 35 of the best Internet searchers. In 1996, she was the news

editor for ONLINE, DATABASE, and ONLINE USER magazines and had been an

online researcher for about 21 years. Since then, she has published a series

of Super Searchers books. For Secrets of the Super Net Searchers she

conducted informal interviews with expert researchers, each of which

represents a chapter in Super Searchers. Her questions covered many issues

21 Clift, 54.

22 Clift, 56.

15


24/123

affecting online researchers and included the following, which relate to Web

site credibility: 23

What is the quality and reliability of information on the Web?

Are some types of sites more reliable than others?

How are biased sources treated?

How are the quality and reliability of unfamiliar Web sites judged?

Is there a relationship between credibility and longevity?

Many of the experts Basch interviewed had something useful to say

about source credibility, which were consolidated into several survey

questions for this thesis.

There is disagreement whether information from personal Web sites is

credible. Susan Feldman stated in Super Net Searchers that a Web site

written by Joe Schmo might be way ahead of McGraw-Hill. So youre left to

your own devices to analyze and evaluate.24 However, Mary Ellen Bates, also

interviewed by Basch for Super Net Searchers, stated at a WebSearch

conference in Virginia on 10 May 2001 that she does not rely on personal

Web sites unless they are well known.25

Alison Cooke, Authoritative Guide to Evaluating Information onthe Internet

Cooke also does not address the Intelligence Community, but does

address the issue of how to select trustworthy Web sites. Cooke also

23 Basch, 3.

24 Basch, 31.

25 Mary Ellen Bates, Presentation to WebSearch University Conference inReston, VA, 10 September 2001.

16


25/123

recommends using evaluation criteria similar to that used for print media,

with some variations.

Alison Cooke, who is a professional Internet searcher, wrote in 1999

theAuthoritative Guide to Evaluating Information on the Internet. The

authors implicit thesis is that although there is much useless, outdated, and

difficult to authenticate information on the Internet, high quality information

can be found and the quality can be assessed.26 Like Clift and Bowen, Cooke

sees information overload as a serious challenge facing researchers, but

believes accuracy is of most concern to researchers. Her solution is to

carefully evaluate Web sites using criteria similar to criteria used to evaluate

print media.

EVERY MANS PRINTING PRESS

There are widely accepted criteria for evaluating traditional print

media. These criteria include the reputation of the publisher and author,

peer-review of scientific articles, and editorial review of periodicals.27 Such

criteria work well when the number of publishers in a particular field are

quantifiable and their past work can be located and reviewed. However,

desktop publishing programs, personal computers, and the Web have

enabled hundreds of thousands of people to produce professional-looking

articles and distribute them to millions of potential readers without the

26 Alison Cooke, Authoritative.27 Jan Alexander and Marsha Tate, The Web as a Research Tool: Evaluation

Techniques, Wolfgram Memorial Library, Widener University, Chester, PA, URL: accessed 13 March 2001.

17


26/123

benefit of peer or editorial review, or regard for brand name reputation.

Among the millions of Web pages available to the public today are many of

potential intelligence value produced by proud inventors, boisterous

government agencies, self-promoting corporations, community-minded

colleges, nave public servants, happy vacationers, and zealous

revolutionaries. The issue at hand today is how to identify credible

information among the millions of personal, organizational, industry,

academic, and government sources. There are as many opinions on this

topic as there are open source researchers and intelligence analysts.

INFORMATION GAPS

Even after a Web site is evaluated based on the criteria presented in

Basch, Cooke or Alexander, the issue of credibility still remains. How does a

subject-matter novice know which sources he can believe? The other issue is

that of relativity. Is a Web site that is credible enough for a high school term

paper also credible enough for a basic intelligence report, or for an

intelligence warning report. This study answers both of these questions.

18


27/123

CHAPTER 3

METHODOLOGY

This chapter on methodology and the following chapter on findings are

organized by key issues and research questions. The key issues are

obstacles that must be overcome before the research question can be

answered. The key issues include: how is open source information relevant

to intelligence; do analysts know of existing official credibility criteria; are

analysts biased toward popular source titles; are foreign sites in English less

credible; and how does the credibility of classified sources compare to

unclassified sources? To answer the research question of, how to identify

credible sources on the Web, it was necessary to separate the question into

two parts. The first part of the research question was what criteria can be

use to identify credible Web sites. The second part of the research question

was how credible should any intelligence source be. The methodology relies

on logic, and statistics, and is somewhat complex due to the many steps

necessary to arrive at useful criteria, which is accurately weighted. The

methodology begins with the development of the thesis survey.

KEY ISSUE: OSINF RELEVANCE TO INTELLIGENCE

Even before the survey could be developed, the basic question needed

to be answered: why is open source information relevant to intelligence? The

19


28/123

literature review provided several views on the role of open sources in

intelligence. The opinions of Steele and Clift offered convincing reasons that

intelligence must include open source information. The reasons for using

OSINF in intelligence products are included in the findings chapter.

SURVEY DEVELOPMENT

Although the primary research question was, how to identify credible

sources on the Web, this thesis needed to answer several key issues

regarding source credibility on the way to answering the primary research

question. Two research methods were used to answer the key issues and

research question. First, published literature was reviewed from Intelink,

online DIA course material, Lexis-Nexus, Dow Jones Interactive, the NSA

Library, and academic Web pages. This literature review uncovered some

answers to the key issues and provided the majority of the concepts tested

by the thesis survey.

Once the thesis survey was developed, it was given to a test

population of 15 intelligence analysts for a validity check. The 15 analysts

completed the survey, and suggested adding questions, clarifying ambiguous

wording, and questioned the relevance of some questions. Those changes

were made and the second draft was given to Professor Jerry P. Miller,

Director of the Competitive Intelligence Center at Simmons College in Boston.

Miller offered numerous suggestions that improved the reliability of the

survey. He identified government lingo that would not likely be

understood in industry and academia, and recommended changes to the

20


29/123

survey questions to maintain Likert-type scales for the responses. Likert

scales are a recognized method in social sciences to format survey response

options that are understood by most populations and can be used to measure

evenly a populations opinions.

The second draft was also sent to LTC (ret) Karl Prinslow, project

manager and operations officer of a virtual organization that employs over

150 military reservists who work via telecommuting to collect and acquire

open source information in support of the Intelligence Community's

requirements. Prinslow suggested several format changes that insured all

recipients were able to display the survey on their computers, and would be

comfortable replying with anonymity. Prinslow and Miller suggested adding

the personal information disclosure statement. Prinslow also recommended

E-mailing the survey as an ASCII text message rather than a MS-Word

document, and simplified some questions. The text message enabled

anyone who was able to receive the E-mailed survey to respond to it without

special software.

After making the changes suggested by Miller and Prinslow, two

separate surveys were distributed by E-mail. In the coding and analysis, the

two surveys were treated as one survey, with some questions not applicable

to the whole population. The Intelligence Community (IC) Survey included

several questions at the end, which would not apply to industry or academia,

and it was distributed by internal communications. The Industry Survey

included the same questions as the IC Survey without the IC-unique

questions. The IC Survey was E-mailed to a group of about 100 IC analysts

who have an interest in open source intelligence (OSINT). The exact number

21


30/123

of IC analysts cannot be determined because it was sent to a mail-list, which

often changes. This method had the effect of randomizing the population

selection. One of these 100 analysts E-mailed the survey to 18 other IC

analysts. Four of these 18 E-mailed the survey to 238 others, for a total of

356 IC analysts. This chain of events was evident from the E-mail headings

and some respondents informed the author who forwarded the survey to

them. About 50 participants from a Society for Competitive Intelligence

Professional (SCIP) conference were then contacted by telephone and agreed

to participate in the E-mail Industry Survey. The Industry Survey was then E-

mailed to those 50 and 9 Defense Department analysts. One of the 9

Defense analysts E-mailed the survey to about 120 other defense analysts. A

total of about 179 analysts are known to have received the Industry Survey.

Together, the two surveys reached about 535 analysts who have an interest

in Internet research. With 66 responses, this equates to a 12.3 percent

response rate from a randomly selected population.28

RESEARCH QUESTION AND SURVEY STRUCTURE

The survey was structured to answer several key issues and the

research question: how to identify credible sources on the Web. The

hypothesis was that credible Web sites can be confidently identified by

evaluating the Web sites based on criteria recommended by professional Web

searchers and agreed to by intelligence analysts. The thesis survey asked

this question directly in survey question 6, and indirectly in survey questions

28 Appendices B and C include a copy of the E-mailed surveys.

22


31/123

8a through 8r. Question 8 listed the criteria most often mentioned by

published experts. Here is how the survey asked these questions.29

6. List up to five criteria that you use to determine the credibility of

any information source.a.b.c.d.e.

8. How much credibility does each of the following factors add to thetotal credibility of a Web site? Use the following scale:

___6) 100 percent Credibility___5) 75 percent Credibility___4) 50 percent Credibility___3) 25 percent Credibility___2) 10 percent Credibility___1) 0 percent Credibility

a. Recommended by a subject-matter expert.b. Recommended by a generalist.c. Listed by an Internet subject guide that evaluates Web sites.d. Listed in a search engine such as Alta Vista.e. Listed in a Web-directory organized by people, such as yahoo.f. Content is perceived current.

g. Content is perceived accurate.h. A peer or editor reviewed the content.i. Content's bias is obvious.

j. Author is reputable.k. Author is associated with a reputable organization.l. Publisher, or Web-host is reputable.m. Content can be corroborated with other sources.n. Other Web sites link to or give credit to the evaluated site.o. The server or Internet domain is a recognized copyrighted or

trademark name such as IBM.com ,p. There is a statement of attribution.q. Professional appearance of the Web site.

r. Professional writing style of the Web site.

To avoid influencing the responses to survey question 6, analysts were

first asked to list the criteria they currently use; they were later asked to

29 Survey, questions 6 and 8.

23


32/123

evaluate the list of criteria in questions 8a through 8r. If the survey

population had been asked about specific criteria (question 8) before being

asked what criteria they actually use (question 6), they may have been

influenced to include the listed criteria from question 8 as criteria that they

use. This arrangement was necessary because earlier discussions with

analysts revealed that there were criteria that analysts would use only after

they were told of them. Discussions with analysts prior to the survey

development had also revealed that many analyst do not know how they

determine what is a credible source, and that many analysts may only

evaluate the data, and not the source.

As is shown in the findings chapter, many analysts were confused

about the difference between data validity and source credibility. The

categorized results of question 6 were then compared to the specific criteria

analysts approved of in question 8.

RESEARCH QUESTION: CREDIBILITY CRITERIA

The results of questions 6, and 8a through 8r were used to develop the

recommended credibility criteria and credibility scale in the findings chapter.

The recommended criteria were determined by computing the mode (score

most-often chosen) for each criterion in survey questions 8a through 8r, and

to avoid influencing the responses to survey question 6. An unusual amount

of variance would indicate little agreement among the analysts. Only criteria

from question 8 that scored a mode of 50-percent credibility or greater were

included in the recommended criteria list. This means analysts most often

24


33/123

believe (mode) that the satisfaction of any one of these recommended

criteria made the source at least 50-percent credible.

Then the arithmetic mean (average) credibility was calculated for each

recommended criterion from question 8 and became that criterions relative

value. The relative value is how much more important, on average, analysts

think one criterion is than another criterion. The assumption here is that

such attributes are cumulative, and the more recommended criteria a site

satisfies, the more credible is the site.

The results of question 6 were categorized into a list of criteria that

analysts think they use to evaluate source credibility. The frequencies of

these criteria were calculated, and those criteria that were suggested by 50

percent of the analysts were added to the recommended criteria list.

Because the recommended criteria from question 6 were not evaluated on a

scale in the survey, they were arbitrarily assigned the average relative value

of those recommended criteria from question 8. This allowed the inclusion of

any criteria not included in question 8, but also did not significantly affect the

relative values of those criteria.

The following is a summary of the selection process for the

recommended criteria, and relative value calculation:

Step 1. Calculated the mode (most-often chosen) credibility (0-100

percent) of each criterion from survey question 8.

Step 2. Listed as recommended the criteria from question 8 that had amode credibility of 50 percent or greater.

Step 3. Calculated the mean credibility (average analyst chosen score)for each recommended criteria from question 8.

25


34/123


35/123

Scale:___7) No Opinion___6) 100 percent Credible___5) 75 percent Credible___4) 50 percent Credible___3) 25 percent Credible

___2) 10 percent Credible___1) 0 percent Credible

Analyst were the asked to choose the required level of credibility for:

9a. Research, or topic summaries.9b. Current, day-to-day developments.9c. Estimative, identifies trends or forecasts opportunities or threats.9d. Operational, tailored, focused to support an activity.9e. Scientific, and technical, in-depth, focused assessments.9f. Warning, an alert to take action.

The mode response for each of these types of intelligence products

was calculated and is the product-credibility levels, which are shown in Table

2 in the findings chapter. The product-credibility levels percentages were

converted into a score so that analysts can simple add the results of an

evaluation and compare the sum to the table of product-credibility levels.

The product-credibility level is also the credibility level that is needed

for sources that analysts use for a particular intelligence product. When a

potential Web site is evaluated, the analyst calculates the credibility score of

the evaluated site, and then compares it to the table of product-credibility

levels in Table 2. The sum of the evaluated Web site should be at least equal

to the product-credibility level of that type of intelligence product shown in

the table. The source-credibility level of each intelligence product type was

determined by calculating the percentage of a benchmarked very credible

Web sites score which would equal the product-credibility level that was

recommended by the surveyed analysts.. For example, here is a theoretical

27


36/123

Web site evaluation, which also demonstrates how the product-credibility

level was determined.

Example:

Benchmark site credibility score = 46.75 points (100 percent Credible)Product-credibility level of intelligence product: 35.06 (75 percent of

46.75).Theoretical results of a Web site evaluation:

Meets Criteria 1 = 5 pointsMeets Criteria 3 = 6 pointsMeets Criteria 4 = 3 pointsMeets Criteria 5 = 3.5 pointsMeets Criteria 6 = 5 pointsMeets Criteria 7= 4.5 pointsMeets Criteria 10 = 2 pointsMeets Criteria 11 = 3 pointsMeets Criteria 12 = 3.5Meets Criteria 13 = 1.5Meets Criteria 14 = 4.5

Sum of Evaluated Site = 38 pointsResult: Exceeds the product-credibility level of 35.06

This summarizes the process recommended in this thesis to evaluate

the credibility of a Web site. This process is based on the theory that the

criteria recommended by expert Web searchers and approved by most

analysts are the best criteria for evaluating Web sites. The weight or relative

value of each criterion is based on the average score given the criterion by

analysts. The final evaluation is based on a comparison of the total values of

the evaluated site to the total values of the benchmark sites.

.

28


37/123


38/123

Discussions with analysts and the literature review indicated that well-

known publication titles are perceived as more credible than obscure titles,

even though the analysts may have never seen the well-known titles.

Therefore, to determine how objective analysts are, question 7a through 7m

asked analysts to evaluate the credibility of 13 sources based only on their

titles. This key issue was answered by comparing the well-known titles in

survey questions 7a, b, c, j, k, l, and m, with obscure titles in survey

questions 7d, e, f, g, h, and i. Question 7 asked:34

7. How credible are the following information sources given only theirtitles? Choose one from the following scale:

___7) = Certainly True___6) = Strongly Credible___5) = Credible___4) = Undecided___3) = Non-credible___2) = Strongly Non-credible___1) = Certainly False

Well-known Titles:

a. NY Timesb. Washington Postc. Harvard.edu Web site

j. NationalGeographic.com Web sitek. JanesDefenseWeekly.com Web sitel. InformationWeek.com Web sitem. DowJonesInteractive.com Web site

Obscure Titles:d. RussianArmy.ru, Web site in Russiane. RussianArmy.ru Web site in Englishf. IsraelIndependentNews.is Web site in Hebrew

g. IsraelIndependentNews.is Web site in Englishh. FrenchIndependentNews.fr Web site in Frenchi. FrenchIndependentNews.fr Web site in English

34 Survey, questions 7a 7l..

30


39/123

However, there was a problem with how this question was structured

and the findings may not be valid. Judging from the comments in the

surveys, it was evident that analysts were not able to make credibility

judgments for many sources based on titles alone either because they had

personal experience with the sources, which influenced their judgments, or

because they were unwilling to make an uninformed judgment based on titles

alone.35

KEY ISSUE: FOREIGN LANGUAGE SOURCES

An issue related to source titles was, do analysts perceive foreign

sources published in their native language to be more credible than the

English language version of the same publications? This question was

answered by comparing survey questions 7d to 7e, and comparing 7f to 7g,

and comparing 7h to 7i. The validity of these questions was preserved by not

including any real publications or Web site titles, which the analysts may be

familiar with.36


___7) = Certainly True___6) = Strongly Credible___5) = Credible___4) = Undecided

___3) = Non-credible___2) = Strongly Non-credible___1) = Certainly False

d. RussianArmy.ru, Web site in Russian

35 Survey, questions 7a 7l.

36 Survey, questions 7d 7i.

31


40/123

e. RussianArmy.ru Web site in English

f. IsraelIndependentNews.is Web site in Hebrewg. IsraelIndependentNews.is Web site in English

h. FrenchIndependentNews.fr Web site in French

i. FrenchIndependentNews.fr Web site in English

KEY ISSUE:CLASSIFIED VS. UNCLASSIFIED SOURCES

Discussions with IC managers and consultants often included

statements such as, how do classified sources compare in credibility to

unclassified sources and less often, how do classified sources compare to one

another. This is a comparison that is likely to change over time. One JMIC

professor explained that different intelligence sources seem to go in and out

of favor as access success improves for one source or another. These issues

were only included in the IC Survey and most analysts answered as though

they had an opinion. Therefore, questions 7n, o, p, q, r, and, s. asked:37


___7) = Certainly True___6) = Strongly Credible___5) = Credible___4) = Undecided___3) = Non-credible___2) = Strongly Non-credible___1) = Certainly False

The intelligence sources in question included:

7n. HUMINT sources with no reporting record7o. HUMINT sources with a proven reporting record7p. IMINT, with National analysts annotations or comments7q. IMINT, without National analysts annotations or comments

37 Survey, questions 7n 7s.

32


41/123

7r. SIGINT reporting7s. MASINT

Analysis of these questions included a calculation of the mode and

range for all sources included in question 7, and compared them to each

other. This provides an interesting comparison of classified and unclassified

sources. 38

ETHICS

The thesis survey relied on the truthful response from analysts

currently working in areas included in this survey. Such responses could be

critical of an analysts employer or profession; therefore, the thesis included

the following statement intended to protect the respondents anonymity.

PRIVACY:You do not need to include your name; however, if you choose toinclude your name, it will only be used by me to contact you if I needmore information regarding your comments. I will not quote you

directly unless you indicate in Questions 3 and 4 that I may do so.Otherwise, only me and my Thesis Chairman, Professor Alex Cummins will have access to respondent names. Any record of the names inassociation with the responses will be destroyed after the research iscompleted, except those names included in the thesis withpermission.39

38 See Table 7, and Table 8 in the findings chapter.

39 Survey, Privacy.

33


42/123

CHAPTER 4

FINDINGS

This chapter first describes what was discovered in the literature

review that could answer the research question and the key issues. Then the

results of the survey are described , followed by how these results answered

the research question and the key issues. The survey determined what

criteria analyst use today to judge the credibility of an intelligence source,

which can be found in Appendix D. Even after consolidation, 148 separate

criteria were suggested by analysts, indicating little consistency in criteria, or

little understanding of the differences between data validity and source

credibility. Many of the suggested criteria appear to be measures of valid

data, or lists of known credible sources.40

The most significant result of the survey is the list of recommended

credibility criteria determined by surveying analysts opinions of criteria

suggested by experts in the literature review. Only two expert

recommendations were rejected by the surveyed analysts. The survey also

showed that analysts see only a small difference in the credibility of open

sources and classified sources.4142

40 Survey, question 6.

41 Survey, questions 7a through 7s.

42 See Table 8 in findings chapter for comparison of classified andunclassified source credibility.

34


43/123

Just as useful as the credibility criteria is the credibility scale

developed by benchmarking known credible and known non-credible Web

sites. The benchmarked sites determined the expected score of a credible

Web site. The survey results also determined a target level of credibility for

intelligence sources, which was converted to a percent of the credible

benchmark score on the credibility scale. The benchmarking of known

credible and non-credible Web sites validated the criteria and demonstrated

that credible sources can be identified on the Web.43

KEY ISSUE: OSINF RELEVANCE TO INTELLIGENCE

Although all experts agree that open source information (OSINF)

contributes to intelligence, how OSINF should contribute is still an open

debate. Steele suggests that analysts should reference OSINF first, and then

classified sources, and presumably only then request further classified

collection to fill the intelligence gaps.44 This approach would acquire data

from the least expensive sources first. Steele calls for 5 percent of the

intelligence budget to be moved to support OSINF acquisition.45 He claims

this would increase timely intelligence by a magnitude. His comments

suggest an answer to the key issue how relevant is OSINF to intelligence.

Open sources include what is already publicly known about a subject, and

therefore should represent the background and context of any intelligence

43 See Appendix A, Benchmarked Web Site Evaluation Worksheet.

44 Steele, under Part III.

45 Steele, under Part III.

35


44/123

report, and should be considered before any classified collection is

attempted. Not to do so would potentially waste funds and possibly put

people at risk for information that may have been found in a foreign Web

site, foreign newspaper, or company brochure. These open sources can also

be used to corroborate classified intelligence, thus contributing to the

credibility of a classified source. Because classified resources are so much

more expensive than open sources, open sources should always be the first

choice, followed by classified sources if not available through open sources,

or if the open sources credibility cannot be determined or is determined to be

too low. Therefore, OSINF affects the cost of intelligence, the timely access

to information, the context of intelligence, the credibility of intelligence, as

well as the content.

Bowens recommendations to include subject-matter experts in the

intelligence collection cycle may be a practical way to implement the

evaluation process proposed by this thesis.46 Implemented community wide,

Bowens cadre of OS subject experts could produce a significant savings in

time and money spent by countless analysts attempting to sort the useful

credible information from the useless and non-credible information. I have

observed that every analyst who makes use of Web sites for open source

intelligence must rediscover which sites are useful and credible, even though

an expert at another agency or just down the hall may have already

evaluated the site. Also, when a Web site is recommended by one analyst to

another analyst, there is no consistent way to evaluate the Web site and

express that evaluation to other analysts. This research produced a

46 Bowen, under Collection Strategy.

36


45/123


46/123

Bias. The researcher must understand the sources bias.51

Objectivity. Are the authors statements supported with reasoning

or facts? 52 Even a bias author can compensate for his bias by including

competing reasoning and facts.

Accuracy. Online sources are generally quicker than print media at

correcting errors.53 Even print sources include inaccurate information or

disinformation.54 I believe that this is significant because accuracy affects

credibility; therefore, Web sources should be more accurate and timely

than print media because the technology enables quicker revisions.

Expert opinion.

Rely on second party expert evaluation whenever possible, e.g.,

recommendations from professional associations, academic organizations,

subject experts.55

Informal networks of colleagues with different areas of expertise

inform one another of credible sources.56

Use second opinions to evaluate the accuracy of an author, which

can be done by posting related questions to appropriate news groups. 57

51 Basch, 9, 15.

52 Basch, 31.

53 Basch, 48.

54 Basch, 9.

55 Basch, 31.

56 Basch, 31.

57 Basch, 31.

38


47/123

Subject area Web pages created by subject librarians are a good

source of links to evaluated Web sites.58 I recommend evaluation sites

that explain their evaluation process.

Gray literature (documents with limited distribution such as

company brochures, or equipment manuals), best located on the Web, is

often published by very credible sources, including governments, and

corporations, which can be good sources for factual data. Interpretation

of the data may require an expert.59 I suggest asking a subject-matter

expert to distinguish facts from advertising in corporate literature.

Origin.

How close is the source to the origin of the data? 60

Discover the original source to avoid circular and false

corroboration.61

Corroboration. Can the information be corroborated?62

Corroboration is only effective if it is from diverse sources. This is another

reason it is important to know the origin of the data.

Current. Is the information current? 63

58 Basch, 139.

59 Basch, 40, 110.

60 Basch, 9.

61 Basch, 16.

62 Basch, 9, 96.

63 Basch, 132.

39


48/123


49/123

Know which publishers, universities, or companies are well

respected in your topic area.69 These are likely to be credible sources, or

able to identify credible sources.

Reputable publishers, well-known authors, and (peer) reviewed

publications are more credible than other sources.70

Attribution.

Does the source clearly identify its self and its purpose? 71

Indications of the source include the text of the Web site, the name

of the Web server in the URL, and the directory name in the URL, which

may include the authors name.72

Attribution should include the institution and a person,

withinformation on how to contact the author.73

I would also recommend viewing the Web sites HTML source code

for revision dates, and statements of attribution not shown in the Web

sites body.

Motivation.

Information has value; therefore, know why a source provides

information for free.74

69 Basch, 110, 137.

70 Basch, 32.

71 Basch, 16.

72 Basch, 140.

73 Basch, 140.

74 Basch, 77.

41


50/123

The presence of a counter on a Web site indicates the author cares

that people know that other people like his site enough to visit it.75

However, I am aware that counters have also been used to falsely

indicate that a site is popular when it is not. Therefore, counters are

probable not a reliable indicator of anything. A more relevant indicator of

popularity is how many and which other Web sites include links to the

evaluated site. I suggest using Alta-Vistas Link: command in the

Advanced Search area to determine this. A search of relative news groups

will also indicate what other people think of a Web site.

Relativity. What is a good source for one purpose may be

insufficient for another purpose.76 This is another reason that I think

that Web sites are best evaluated by subject-matter experts. A

novice or generalist who evaluates a Web site for someone else

should indicate his own level of knowledge in the topic area. This

also relates to thesis survey question 9, which asked analysts to

evaluate how credible a source must be to use it for different

intelligence products.

All of the statements listed above from respected Internet searchers

contributed to the thesis survey question 8, which asked how much does

specific criteria contributed to the credibility of Web sites.

Alison CookesAuthoritative Guide to Evaluating Information on the

Internetincluded three areas: what is high quality information, how to find it,

and how to evaluate it. Each of these areas contributed to the development

75 Basch, 132.

76 Basch, 133.

42


51/123

of relevant questions in the thesis survey. On the topic of high-quality

information, Cooke explains that some of the most common problems with

the Internet include:77

information overload

too much useless information

potentially inaccurate material

outdated material

Publishing has become so easy that researchers must comb through

thousands of supposedly related Web pages returned by search tools, which

do not even include, databases, news services, and FTP sites. The citation

search engines are of no help in determining quality, or relevance. Most

search engines are only an index of Web pages found.

Cooke explains that without the filtering provided by commercial and

academic publishers, people publish because they can, not because they

have something useful to share.78 I have observed that this is a serious

problem because it camouflages the useful information and requires a great

amount of time to sort through. A useless site can have all the gloss, format,

and authoritative lingo of a useful site, yet have no useful content.

Cooke contends that accuracy is perhaps of most concern to

researchers and professionals. As an example of the accuracy issue, Cooke

explains that of forty WWW medical sites evaluated, only four included the

advice close to the authoritative published recommendations.79 I believe that

this level of inaccuracy is possible because Web authors are their own editor

77 Cooke, 89.

78 Cooke, 12.

79 Cooke, 62.

43


52/123

and publisher, allowing no opportunity for critical review which most scholars

and professionals welcome.

Methods for finding data on the Web are unique to the Web and online

sources. Cooke explains in great detail the advantages and disadvantages

of:

search engines

review and rating services

subject catalogs and directories

subject-based gateway services and virtual libraries

Cooke explains that search engines such as Excite and Lycos (or

AltaVista, which is still solvent) are comprehensive, unfocused, have poor

relevance ranking, and are not useful for finding nor evaluating sources for

quality. They are also generally limited to Web sites and index every page on

every site, further multiplying the number of results per query.80 I have

observed that some search engines such as Google have resolved this

multiple indexing of a single site by displaying only the first indexed page,

unless one requests more.

Cooke also writes that subject catalogs and directories such as Yahoo

and Galaxy are more useful because site authors write the site descriptions;

catalog experts choose the hierarchy category to place the site; and only

sites are indexed, not every page. However, these sites are still very large,

and because the indexing is done by people rather than machines, as is the

80 Cooke, Chapter 2.

44


53/123

case with search engines, Web site directories are not revisited as often and

may become outdate.81

Cooke also wrote that rating and reviewing services use different,

usually unpublished criteria for rating the best sites. These include

Encyclopaedia Britannicas Internet Guide and Lycos Top 5 percent.82 These

are even better yet for finding high-quality sources because a person other

than the author has reviewed the site based on some criteria. However,

these criteria are targeted to a general audience, not the academic or

professional. Higher weight may be given to organization and graphics, than

for content or accuracy, and the evaluators are not subject-matter experts.83

Cooke believes that the best place to find high-quality sources is from

subject-based gateway services and virtual libraries. These facilities are

designed by librarians or subject-matter experts, and use common indexing

methods used in libraries. They are often subject-matter specific and site

descriptions are evaluated and described by subject-matter experts.84

The last section of Cookes book gives checklists of evaluation criteria

for several internet source types. The criteria can be used for overall

evaluation of Web sites, not specifically for credibility as this thesis does.

Cookes criteria are based on surveys of hundreds of internet users, and were

81 Cook, Chapter 2.

82 Cook, Chapter 2.

83 Cooke, Chapter 2.

84 Cooke, 92.

45


54/123

validated by professional librarians. The unique evaluation criteria for each

type of Web site are fully described.

The source types described in this book, with general evaluation

criteria, included:

organizational WWW sites

personal home pages

subject-based WWW sites

electronic journals and magazines

image-based and multimedia sources

USENET newsgroups and discussion groups

databases

FTP archives

current awareness services

FAQs

Criteria for assessing an organizational Web site should include the

authority and reputation of the institution within its field, as well as the date

the page was last updated.85 Criteria for a subject-based Web site include

the purpose of the site, comprehensiveness, and whether the page includes

pointers to other sources for more information.86 Evaluation criteria for

electronic journals and magazines include the sites authority and reputation

as well as whether the site has been referenced by a known reputable journal

85 Cooke, 90.

86 Cooke, 97.

46


55/123

that filters its own articles for accuracy.87 These criteria were included in the

survey questions for this thesis.

SURVEY FINDINGS, CREDIBILITY CRITERIA

The primary purpose of the thesis survey was to identify criteria for

assessing the credibility of a Web site. The recommended credibility criteria

were determined by a multi-step processes. First, all credibility criteria

recommended by experts in the literature review were listed, and then

consolidated. Then the consolidated list of expert criteria were included in

the thesis survey to industry and intelligence analysts as questions 8a

through 8r. Those criteria, which analysts most often gave a credibility value

of 50 percent or higher, were then listed as recommendations. Note that

only three criteria were rejected as credible by 50 percent or more

respondents. The first two were not recommended by experts, but were

added to assess the basic knowledge of respondents and as control

questions, which were not expected to be accepted by respondents.

Rejected criteria included:

8d. Listed in a search engine such as AltaVista.8e. Listed in a Web directory organized by people, such as Yahoo.8r. Professional writing style of Web page

Then the mean credibility (average analyst chosen score) was

calculated for each recommended criteria from question 8. The mean then

became the relative value or weight for each criterion.

87 Cooke, 98.

47


56/123

The criteria recommended in survey question 6 were then listed, and

consolidated. The methodology planned to add to the list of recommended

criteria from question 8, those criteria from question 6 that were not already

on the recommended list, and that had a mode occurrence of 50 percent or

greater (at least half the analysts listed the criterion). Surprisingly, there

were no criteria recommended by half or more of the respondents in the

open survey question number 6. The criteria that were mentioned most

often were: corroboration (28 occurrences), bias (14 occurrences), reputation

of the source (10 occurrences), sources authority or credentials (8

occurrences), and presentation (7 occurrences).88 However, each of these

most-often suggested criterion, except source authority, were also suggested

by published experts discussed in the literature review, and were recommend

by 50 percent or more of respondents when ask about those specific criterion

in survey questions 8a-8r. Therefore, no additional criteria were added from

question 6.

Therefore, Table 1 below includes the results of the criteria surveyed,

the relative values of each criterion, and which criteria were chosen for

recommendation.89

Table 1. Question 8a to 8r, Recommended Criteria and Relative Values (Mean).(a)

Number ofCases

Criteria Valid Missing

Mean Mode

Recommended

88 See Table 15. Survey Question 6: Personal Criteria Analysts Currently Useto Determine Credibility.

89 Survey, questions 8a 8r.

48


57/123

8a. Recommended bysubject-matter expert in thetopic of the Web page.

66 0 4.94 5 Yes

8b. Recommended by ageneralist.

65 1 3.65 4 Yes

8c. Listed by an Internet

subject guide that evaluatesWeb sites.

63 3 3.56 4 Yes

8d. Listed in a search enginesuch as AltaVista

64 2 2.39 1 No

8e. Listed in a Web directoryorganized by people, such asYahoo.

62 4 2.65 2 No

8f. Content is perceivedcurrent.

64 2 3.78 5 Yes

8g. Content is perceived

accurate. 63 3 4.56 5 Yes

8h. A peer or editor reviewedthe content.

65 1 4.52 5 Yes

8i. Content's bias is obvious. 65 1 3.06 4 Yes8j. Author is reputable. 64 2 4.64 5 Yes8k. Author is associated witha reputable organization.

65 1 4.42 5 Yes

8l. Publisher or Web host isreputable.

65 1 4.02 5 Yes

8m. Content can becorroborated with othersources

65 1 5.17 5 Yes

8n. Other Web sites link to, orgive credit to the evaluatedsite

65 1 3.68 5(b) Yes

8o. Server or domain iscopyrighted or trademarkname, like IMB.com.

65 1 3.45 4 Yes

8p. Statement of attribution. 64 2 3.78 5 Yes

8q. Professional appearanceof Web site.

65 1 2.86 4 Yes

8r. Professional writing styleof Web page.

64 2 3.16 3 No

(a) (a) Table Explanatory Notes. Mode Values: 1=0 percent, 2=10 percent, 3=25percent, 4=50 percent, 5=75 percent, 6=100 percent credible. Mode is themost-often chosen score respondents gave each criterion. Only modes of 50percent credible and higher are recommended. The Mean is the average scorerespondents gave each criterion. The Mean is assigned to each recommended

criteria as their relative values which are latter summed when evaluating a Website.

(b) (b) Multiple modes exist. The smallest value is shown

The last step of the processes to identify commonly agreed-upon

credibility criteria and to assign relative weights, involved applying the

49


58/123

recommended criteria to known credible, and known non-credible Web sites,

to establish benchmarks and a relative credibility scale. Three credible sites

known to the author or recommended by a subject expert were evaluated to

establish the high-end of the relative credibility scale. The relative values of

each criterion that the site satisfied were then summed for the sites relative

credibility score. Then the average of the three credible Web sites was

calculated as the benchmark credible score. See Appendix A for the

evaluation worksheets, and detailed evaluation for these Web sites.

It was surprisingly easier to find known credible Web sites to evaluate

than it was to find known non-credible Web sites to evaluate. This was

because it did not seem useful to benchmark a Web site so obviously non-

credible that no analysts would consider using it, negating the need for an

evaluation at all. Due to this difficulty, only one non-credible Web site was

evaluated. Due to concerns about potential libel claims, this non-credible

Web site will be referenced here by the pseudonym KoreanNewsSite. The

KoreanNewsSite was selected because the author had evaluated this site for

a previous research paper and had found it non-credible, and yet a challenge

to evaluate. The challenge to evaluating it came from its mix of very credible

links, unknown contributing authors, and non-credible articles by the

publisher. The key points that made the publishers articles non-credible

included a general lack of authoritative citations to source documents, lack of

dates on the articles, a distinct bias camouflaged by corroborative facts, and

inaccuracies. Relative newsgroup discussions indicated that the publishing

author had a poor reputation for these same reasons.

50


59/123

The figures below represent the relative credibility scale and how these

benchmarks were determined. Based on these evaluations, a very credible

Web site should rate a relative credibility score of about 46.75, and a non-

credible site should rate a relative credibility score of about 7.46.

51


60/123

Benchmark Credible Web sites Evaluated Score

Spot Image Corporation, www.spot.com 43.19International Telecommunications Union, www.itu.int 48.24NY Times On the Web, nytimes.com 48.82Average Score 46.75

Benchmark Non-credible Web site Evaluated ScoreKoreanNewsSite 7.46

Relative Credibility Scale:46.75 = Very-Credible

7.46 = Non-credible

SURVEY FINDINGS,CREDIBLE ENOUGH FOR INTELLIGENCE USE

As discussed in the methodology chapter, having a relative scale is

useful from an academic perspective; however, to be of practical use, the

analysts must also know what the target or required level of credibility is for

a source he would like to use in an intelligence product. The required level of

credibility for intelligence sources was determined by survey questions 9a

9f, which asked:90

How credible must an intelligence source be to use its data in thefollowing intelligence products?

7) No Opinion6) 100 percent Credible5) 75 percent Credible4) 50 percent Credible3) 25 percent Credible2) 10 percent Credible1) 0 percent Credible

9a. Research, or topic summaries9b. Current, day-to-day developments9c. Estimative, identifies trends or forecasts opportunities or threats9d. Operational, tailored, focused to support an activity9e. Scientific, or technical, in-depth, focused assessments

90 Survey, questions 9a 9f.

52


61/123

9f. Warning, an alert to take action

The following calculations were used to determine the product-

credibility level for six types of intelligence products. The mode was

calculated for survey questions 9a 9f. The mode is the most-often chosen

required level of source credibility. The statistics indicate that most analysts

believe that all types of intelligence products require that sources be 75

percent credible.91 This was a surprise because the author expected to see a

greater variance in the required levels of source credibility, with warning

intelligence requiring the least credibility and in-depth focused assessments

requiring the greatest level of credibility. This presumption was based on the

belief that analysts require less information about an imminent threat than

they do about a future scientific or political condition, because the potential

impact of ignoring the least threat is so much greater than ignoring the most

significant emerging scientific or political condition. Apparently, most

analysts do not understand the relationship of intelligence products to

outcomes, or the survey question was flawed.

However, using the survey results, the sources of all intelligence

products should be 75 percent credible. If the most credible Web sites have a

relative-credibility score of 46.75 as demonstrated above, then intelligence

products should be 75 percent of that, which is 35.06. Therefore, the target-

credibility level of any intelligence source is 35.06, as evaluated by the

recommended credibility criteria. The following table shows the most-often

chosen (mode) required credibility level for intelligence products.

91 See Table 2.

53


62/123

Table 2. Questions 9a-f. Required Level of Source Credibility forIntelligence Products.92

Number ofCases

Required Credibility

Valid Missing(b)

Modepercent

Rangepercent

9a. Research, special topicsummaries

35 31 50percent(a)

0-100percent

9b. Current, day-to-daydevelopments

35 31 75percent

0-100percent

8c. Estimative, identifies trends orforecasts opportunities or threats

35 31 75percent

0-100percent

9d. Operational, tailored, focused, tosupport a military, intelligence, ordiplomatic activity

35 31 75percent

0-100percent

9e. Scientific or technical, in-depth,

focused assessments of trends orcapabilities

35 31 75

percent

0-100

percent

9f. Warning, an alert to take action 35 31 75percent

0-100percent

Required-credibility level for allIntelligence Product Sources

75percent

(a) Multiple modes exist. The smallest value is shown. Just as manyrespondents chose 75 percent.(b) Missing responses are primarily because non Intelligence Communitypersonnel were not asked these questions in the survey. Mode is based onvalid responses.

SURVEY FINDINGS, OFFICIAL CREDIBILITY CRITERIA

Question 5 asked, Does your organization have official criteria that

you are told to use for determining the credibility of any source? "Any source"

means published, proprietary, and classified sources.93 The purpose of this

question was to determine if analysts are aware of credibility criteria that

they can use to ensure a consistent quality of reporting. The assumption

92 Survey, questions 9a 9f.

93 Survey, question 5.

54


63/123

here is that only criteria formally sanctioned by the organization are likely to

be consistently followed. As the table below indicates, 86.2 percent of

analysts are eith

how to identify credible sources on the

Documents