how to identify credible sources on the

Upload: joelswinson

Post on 30-May-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 How to Identify Credible Sources on The

    1/123

    HOW TO IDENTIFY CREDIBLE SOURCES ON THE WEB

    by

    Dax R. NormanNational Security Agency

    PGIP Class 0001

    Unclassified thesis submitted to the Faculty

    of the Joint Military Intelligence Collegein partial fulfillment of the requirements for the degree ofMaster of Science of Strategic Intelligence.

    19 December 2001

    The views expressed in this paper are those of the author anddo not reflect the official policy or position of theDepartment of Defense or the U.S. Government.

  • 8/14/2019 How to Identify Credible Sources on The

    2/123

    ACKNOWLEDGEMENTS

    Foremost, I am thankful for the endless patience of my wife and

    daughter, who for two years worked and played one man short of a full team,

    and often carried the ball when I should have.

    I am grateful to Professor Jerry P. Miller, Director of the Competitive

    Intelligence Center at Simmons College in Boston, for his patient and

    persistent help in constructing the thesis survey.

    I would also like to thank LTC (ret) Karl Prinslow, at the time, a

    contractor employed by the U.S. Army Foreign Military Studies Office, for his

    practical assistance, and encouragement.

    Thank you must also go to my Thesis Chairman, Dr. Alex Cummins and

    Thesis Reader Robyn Winder for their conscientious support of the Joint

    Military Intelligence College Masters program by volunteering to serve as

    Thesis Chairman and Reader.

    ii

  • 8/14/2019 How to Identify Credible Sources on The

    3/123

  • 8/14/2019 How to Identify Credible Sources on The

    4/123

    Appendices

    A. Web Site Evaluation Worksheets.66

    B. Survey to Industry and Academia...77

    C. Survey to Intelligence Community..88

    D. Criteria Analysts Currently Use to Judge Credibility..101

    Bibliography......106

    Annex 1. Survey Results (not included in original thesis.) 109

    iv

  • 8/14/2019 How to Identify Credible Sources on The

    5/123

    LIST OF GRAPHICS

    Tables

    Page

    1. Question 8a to 8r, Recommended Criteria and Relative Values (Mean).48

    2. Questions 9a-f. Required Level of Source Credibilityfor Intelligence Products. .........53

    3. Question 5. Part 1, Official Criteria for Unclassified Sources

    ....54

    4. Question 5. Part 2, Official Criteria for Classified Sources

    55

    5. Questions 7a, b, c, j, k, l, m, Credibility of Well-Known Titles...

    57

    6. Questions 7d, e, f, g, h, i, Credibility of Obscure Titles, andForeign Web Sites

    57

    7. Questions 7n to 7s, Credibility of All Classified Sources

    .59

    8. Credibility of Open Sources Compared to Classified Sources...

    .60

    9. Question 7q, Credibility of IMINT Without Annotations..

    ..61

    v

  • 8/14/2019 How to Identify Credible Sources on The

    6/123

    10. Benchmark Web Site Evaluation Work Sheet, Spot

    66

    11. Benchmark Web Site Evaluation Work Sheet, ITU

    69

    12. Benchmark Web Site Evaluation Work Sheet, NY Times.

    71

    13.Benchmark Web Site Evaluation Work Sheet, Korea..

    73

    14.Blank Web Site Evaluation Work Sheet..

    76

    15.Survey Question 6: Credibility Criteria Analysts Currently Use...

    101

    Graph

    1. Question 7q, Credibility of IMINT Without Annotations..

    ..61

    vi

  • 8/14/2019 How to Identify Credible Sources on The

    7/123

    ABSTRACT

    TITLE OF THESIS: How to Identify Credible Sources on the Web.

    STUDENT: Dax R. Norman

    CLASS NO. PGIP 0001 DATE: 19 December 2001

    THESIS COMMITTEE CHAIR: Dr. Alex Cummins

    SECOND COMMITTEE MEMBER: Robyn Winder

    There is little argument today that open sources and the World-Wide-

    Web have a role to play in intelligence, but little has been written about

    evaluating the credibility of Web sites and communicating that evaluation to

    analysts. Such a capability is needed because of the increased opportunity to

    collect open source intelligence from the Web; the ever increasing cost of

    classified collection; and the ever-present demand on analysts to analyze and

    report at the edge of their knowledge. With so many intelligence sources

    available, including the Web, analysts must be able to identify credible

    sources. The alternative is to evaluate every piece of information collected

    from every Web site of intelligence interest. Due to the enormous size of the

    Web, evaluating data validity is not practical.

    That is why the Intelligence Community (IC) needs a generally agreed

    upon set of criteria for evaluating Web sites of potential intelligence value.

    Credible Web sites can be identified. However, without these criteria, and a

    method to share the results, hundreds of analysts will repeatedly find the

    same Web sites of dubious credibility as other analysts; they will attempt to

  • 8/14/2019 How to Identify Credible Sources on The

    8/123

    evaluate the sites usefulness and credibility by many widely different

    standards, and share their results with only a few close coworkers. The

    quality of these Web site evaluations will vary widely based on the subject of

    the Web site and the subject expertise of the evaluator.

    This thesis collected criteria recommended by professional Web

    searchers and surveyed industry, academia, and the Intelligence Community

    for their opinions of those criteria. From this survey the author developed a

    weighted list of credibility criteria and a methodology that both the subject-

    matter expert and the subject-matter novice will find useful. With these

    criteria and the relative credibility scale, subject-matter experts throughout

    the IC can evaluate Web sites within their area of expertise and share that

    source evaluation with the entire IC.

    This thesis identifies valid criteria for evaluating the credibility of open

    source Web sites; presents a relative credibility scale based on benchmarked

    Web sites; identifies the target level of credibility for all intelligence sources;

    offers a Web site evaluation worksheet; and compares the credibility of open

    sources to classified sources. Credible information can be located on the Web,

    and although subject-matter experts are the best evaluators, any analyst can

    evaluate a Web site when he does not have a subject-matter expert to assist

    him.

  • 8/14/2019 How to Identify Credible Sources on The

    9/123

    CHAPTER 1

    INTRODUCTION TO OPEN SOURCE EVALUATION

    Along with the information technology revolution has come an equally

    important increase in information access and information sources via the

    World-Wide-Web. However, such abundance is a double-edged sword because

    the Web contains every type of print, audio, and visual data from every type

    of source, including children, students, professors, conspiracy theorists,

    researchers, advertisers, government data, and government misinformation.

    Information analysts must sort the useful information from the junk.

    However, what is useless for one person may be just right for someone else.

    This thesis will establish Intelligence Community criteria for identifying

    credible Web sites from untrustworthy, or non-credible Web sites. This thesis

    used a survey structured to answer several key issues and the research

    question: how to identify credible sources on the Web. The hypothesis was

    that credible Web sites can be confidently identified by evaluating the Web

    sites based on criteria recommended by professional Web searchers and

    agreed to by intelligence analysts. Most analysts today apparently evaluate

    the data rather than the source.

    1

  • 8/14/2019 How to Identify Credible Sources on The

    10/123

    VALIDITY MATTERS

    This thesis will also show that most analysts do not attempt to identify

    credible sources, but evaluate the validity of the data in the sources. There

    is a common misunderstanding about validity and credibility. Validity is an

    attribute of information. Validity also describes information as

    simultaneously relevant and meaningful. Validity can also refer to the proper

    use of logic to reach a conclusion.1 In psychometrics, validity can have

    several meanings, including the proper use, or function of a measurement

    tool.2 This thesis uses validity as an attribute of data that is verifiably correct.

    Validity is what the analyst means when he asks, is this data correct?

    Although validity is important to intelligence, it always describes the

    information rather than the source, and alone does not measure believability,

    which this thesis calls credibility. Because discrete elements of information

    can be examined and compared, the validity of information is of most

    concern to analysts because analysts know how to check validity. They

    examine the data for consistency, verify it with other sources, or verify that it

    functions as expected. Although consistently valid data can lead to credible

    sources, the goal should be to identify sources as credible so that every

    document from the source does not have to be validated. Establishing

    source credibility should be of greater interest to analysts because they

    cannot become expert in every subject on which they may be expected to

    1 G. & C. Merriam Co., Websters New Collegiate Dictionary(Springfield, MA:G. & G. Merriam Co., 1975), under Valid. Cited hereafter as Websters.

    2 Jum C. Nunnally, Psychometric Theory(New York: McGraw-Hill BookCompany, 1967), 75.

    2

  • 8/14/2019 How to Identify Credible Sources on The

    11/123

    report, because organization focus changes, analysts change jobs, and there

    just is not enough time to learn it all and still report.

    This thesis will provide a tool for the general analysts to evaluate Web

    sites as potential intelligence sources. Although Web site evaluations are

    best done by subject-matter experts, analysts are often expected to report on

    unfamiliar topics, and must discern for themselves if a source is credible.

    Experts will also be able to use the recommended criteria and credibility

    scale to evaluate Web sites in a consistent manner that other people will

    understand, and can repeat.3

    CREDIBILITY COUNTS MORE

    To judge validity, an analyst must understand the issue, or technology,

    or strategy, or politics very well for every data element included in his

    reporting. Because every analyst cannot possibly be an expert on every

    subject, they rely on sources that they trust to provide valid data. This trust

    in a person or group is a measure of credibility. A credible source offers

    reasonable grounds for being believed.4 This is the meaning intended in

    this thesis for credibility.

    These credible sources are an essential element of intelligence

    analyses because analysts are often expected to report on topics, in which

    they are not expert, or that are too complex for any one person to

    3 See Appendix A, Web Site Evaluation Worksheet, for the relative credibilityscale, benchmark Web site evaluation worksheets, and a blank evaluationworksheet.

    4 Websters, under Credible.

    3

  • 8/14/2019 How to Identify Credible Sources on The

    12/123

    understand. Because it is impractical for analysts to validate every data

    element from every source, the focus should be on identifying credible

    sources. In the area of Open Source Intelligence (OSINT), this is even more

    important because of the widespread use of OSINT by the other intelligence

    disciplines, and the multitude of unclassified open sources.5 The source must

    be judged credible before the data can be judged valid. Of course this can

    become a circular argument, but in the end it is more useful to have a

    credible source than a valid data element. For example, it would be better to

    know where to find a foreign leaders official travel schedule, than to know

    where the leader will travel next. This is true because this credible source

    can tell one where the next trip will be, any changes to his next trip, and the

    details of subsequent trips. If a source provides valid data consistently, it

    will soon be judged a credible source. However, once judged credible, it is

    less important that every data element the source provides is validated.

    Note that open source information (OSINF) is public or proprietary

    information available to anyone for a fee or for free. OSINF becomes open

    source intelligence (OSINT) when it is used by the Intelligence Community to

    answer a intelligence question.

    THE CHALLENGE OF CREDIBLE SOURCES

    Regardless of the credibility of a source, or the validity of the data,

    analysts are more likely to use the sources most accessible to them. The

    5 Joint Chiefs of Staff, Joint Pub 1-02, Department of Defense Dictionary ofMilitary and Associated Terms, URL:, accessed 13 February2000. Cited hereafter as Joint Pub 1-02. This thesis uses intelligence disciplines, suchas OSINT, as defined in Joint Pub 1-02.

    4

  • 8/14/2019 How to Identify Credible Sources on The

    13/123

    Web has the potential to put a worldwide library on the desk of every analyst.

    With todays search engines and Web-directories an analyst can conduct a

    single search of the Web in seconds that would take a librarian a career to

    complete. This is because the librarians know which sources are credible

    based on their own use of the sources or recommendations from other

    librarians and subject-matter experts. Therefore, it stands to reason that

    intelligence analysts, who do not have access to a subject-matter expert on

    every reportable issue, should have access to credible information sources on

    the Web. How to identify credible sources on the Web is the challenge of this

    thesis.

    In an ideal world, subject-matter experts in every field would identify

    credible sources, and index them for everyone to use. However, even in such

    a world there would be disagreement on what is credible. Therefore, the

    research question that this thesis will answer is how to identify credible

    sources on the Web. The focus is on Web sites because library science and

    publishers have already established acceptable standards in the print media

    for credibility. Such standards include peer-review in scientific journals,

    editorial review in newspapers, independent verification of facts, and the

    proper labeling of commentary and advertisements in magazines. In the

    absence of such standard practices on the Web, it is up to the reader to

    judge. With the help of expert Web searchers from industry, defense, and

    intelligence, this thesis establishes a set of common credibility evaluation

    criteria, which can be used by subject-matter experts as well as analysts

    reporting on an unfamiliar issue. Some subjectivity remains, but the criteria

    5

  • 8/14/2019 How to Identify Credible Sources on The

    14/123

    are established which provide analysts with the tools and vocabulary to

    measure credibility of sources and describe a sources relative

    trustworthiness, known as credibility.

    ASSUMPTIONS

    This thesis does make some assumptions. The first two are that open

    source intelligence is less costly than classified intelligence, and therefore is

    the preferred source if it can be trusted. The third assumption is that

    credibility is relative to its intended use and user. For example, a CNN

    broadcast might be sufficiently credible for indications and warning (I&W),

    but not sufficiently credible for basic intelligence for which the analyst has

    some time to conduct research, or when the product will become the

    background for future reporting. Likewise, a second-hand report of the

    humanitarian conditions in a country may be credible enough for a person

    planning an overseas visit; however, only a first-hand report from an

    authoritative, unbiased source may be considered for the subject of an

    intelligence report. Therefore, a relative credibility scale is necessary rather

    than an absolute determination of credible or non-credible.

    A UNIQUE STUDY

    Although other studies establish criteria for evaluating Web sites, such

    as Alison CookesAuthoritative Guide to Evaluating Information on the

    Internet, I have not found a study that focuses on establishing the credibility

    6

  • 8/14/2019 How to Identify Credible Sources on The

    15/123

    of Web sites.6 Cookes work is an excellent guide to evaluating the overall

    quality of many types of Web sites. The closest Joint Military Intelligence

    College study found is MAJ Robert M. Simmonss unclassified thesis, Open

    Source Intelligence: An Examination of Its Exploitation, 1995.7 Simmons

    focuses on the accessibility and use of open source, not the credibility of

    sources. Although Reva Baschs Secrets of the Super Net Searchers includes

    the question of credibility, it is less formal than this study and asks the

    credibility question differently of each expert interviewed.8Secrets of the

    Super Net Searchers does not focus on any one issue, but asks many

    questions of the industry experts. However, many criteria from Baschs book

    were included in the thesis survey used for this study. This thesis surveyed

    analysts from defense, intelligence, and academia, as well as industry, to

    establish common criteria for evaluating the credibility of Web sites.9 The

    broad survey population, which included industry, academia, and

    intelligence, and the focus on credibility, make this study unique.

    REVIEW OF THESIS

    6 Alison Cooke, Authoritative Guide to Evaluating Information on the Internet(New York: Neal-Schuman Publishers, Inc., 1999).

    7 Major Robert M. Simmons, USA, Open Source Intelligence: An Examinationof Its Exploitation in the Defense Intelligence Community, MSSI Thesis (Washington,DC: Joint Military Intelligence College, August 1995.)

    8 Reva Basch, Secrets of the Super Net Searchers (Wilton, CT : PembertonPress, 1996).

    9 E-mail Survey, Joint Military Intelligence College Thesis Survey: CredibilityCriteria for Web Sites, conducted by the author, July-August 2001. Hereafter citedas Survey.

    7

  • 8/14/2019 How to Identify Credible Sources on The

    16/123

    The research for this thesis began with a literature review, found in

    Chapter two. From the literature several authors were selected who either

    represent a significant point of view or are in a position to influence other

    analysts. The objective of the literature review was to identify what is

    already known, or thought about identifying credible sources on the Web.

    However, the literature also revealed tangent issues that influence how or

    when unclassified open sources are used in intelligence products. Most

    significantly, the literature review identified the criteria recommended by

    expert Web searchers for judging the credibility of Web sites. Those criteria

    were included in the thesis survey, which was the primary research tool used

    by the author.

    Chapter three describes the research methodology employed. That

    methodology included gathering expert criteria from the literature review;

    developing and administering the survey to both industry, academic, and

    intelligence analysts, coding the survey results and entering the data into the

    SPSS statistical program; and performing the calculations which answered

    the research questions and the key issues. The recommended credibility

    criteria were determined by identifying the criteria that analysts most often

    rated as contributing 50 percent or more to the credibility of a Web site; then

    determining the relative weights for each criterion and a relative credibility

    scale. Finally, four Web sites of known credibility were evaluated as

    benchmark sites. Chapter three describes this process in detail as well as

    how the target source-credibility level was determined for most intelligence

    products.

    8

  • 8/14/2019 How to Identify Credible Sources on The

    17/123

    The results of the survey calculations are shown in the findings

    Chapter four. The findings chapter, like the methodology chapter, is

    organized to answer the research question and each key issue, which in short

    include the following key issues: open source relevance to intelligence,

    knowledge of existing official criteria, analysts objectivity, credibility of

    foreign Web sites in English, credibility of classified versus unclassified

    sources; and the research questions of evaluation criteria, and needed level

    of credibility,

    The conclusions are in Chapter five, and include analysis of the survey

    results. The thesis concludes that credible Web sites can be identified,

    evaluated, and shared with other analysts. Known weaknesses in the survey

    are mentioned in the findings and conclusions chapters. Chapter six also

    includes a recommendation for implementing this evaluation procedure in

    the Intelligence Community. The appendices include a copy of the surveys

    used; the competed evaluation worksheets for the benchmarked Web sites;

    and a blank evaluation worksheet.

    9

  • 8/14/2019 How to Identify Credible Sources on The

    18/123

    CHAPTER 2

    LITERATURE REVIEW

    RANGE OF THOUGHT

    Open source information (OSINF) has been widely accepted as a

    necessary element of all-source intelligence reporting, as demonstrated by

    Director of Central Intelligence Directive 2/12, which established the

    Community Open Source Program Office.10 Most experts agree that OSINF

    should support classified intelligence collection. However, I think there has

    not been significant attention paid to the issue of identifying credible Web

    sites, a significant source of unclassified information. The Web makes foreign

    newspapers and gray literature (documents with limited distribution such

    as company brochures, or equipment manuals), more accessible, as well as

    expert opinions, and research projects from universities, just to name some

    valuable sources.11 The issue of identifying credible Web sites affects

    everyone who uses the Internet, including defense, intelligence, academia,

    and industry. Therefore, the literature reviewed for this study included

    documents from all of these communities of interest. The authors presented

    in this study include: Robert David Steele of Open Source Solutions Inc.; Dr.

    Wyn Bowen of Kings College, London, writing forJanes Intelligence Review;

    A. Denis Clift, President of the Joint Military Intelligence College (JMIC),

    Washington, D.C.; Reva Basch, author ofSecrets of the Super Net Searchers;

    10 Director of Central Intelligence, Director of Central Intelligence Directive2/12 (Washington, D.C.: n.p., 1 March 1994). Hereafter cited as DCID 2/12.

    11 Basch, 110.

    10

  • 8/14/2019 How to Identify Credible Sources on The

    19/123

    and Allison Cooke, author ofAuthoritative Guide to Evaluating Information on

    the Internet. These authors are all in a position to influence information

    analysts, either inside or outside of government, and represent a range of

    opinions on the proper use of open source information.

    All these points of view agree that there is more data available now

    than an analyst can manage unaided. Their approach is what differs. Steele

    and Bowen would expand the Intelligence Community, which is not going to

    happen without a long, and gradual culture change. Clift sees a need for

    better automated tools for data retrieval, including an on-line index of open

    sources .12 Cooke and Basch offer solutions for today: evaluate sources based

    on criteria similar to those used for traditional print media. This thesis will

    demonstrate that the ideas of each of these authors combined with the

    recommend evaluation criteria in this thesis, represent a practical solution to

    the information fog of the Web.

    Robert David Steele, Open Source Solutions, Inc.

    Steele is the most vocal advocate for expanded use of OSINF to

    support the other intelligence disciplines, and recommends expanding the

    Intelligence Community to include business people and academics, who have

    unique knowledge and access. Steele would have analysts consult open

    sources first, including subject experts in industry and academia, and then

    classified sources. He is President of Open Source Solutions Inc. His

    company is in the private open source intelligence (OSINT) business, and he

    12 A. Denis Clift, Clift Notes: Intelligence and the Nations Security(Washington, D.C.: Joint Military Intelligence College, 1999), 51-57.

    11

  • 8/14/2019 How to Identify Credible Sources on The

    20/123

    has proposed his own plan for intelligence in the 21st Century, called

    Intelligence and Counterintelligence: Proposed Program for the 21st

    Century.13 Steele sees a great need to expand the access that analysts have

    to OSINF.14 His view of the future Intelligence Community (IC) includes

    several new groups, including scholars and business people, which constitute

    the Virtual IC.15 It is these sources that Steele sees as the gold mine of

    information. However, he does acknowledge that the Internet will greatly

    expand access to OSINF, primarily secondary sources, which are derived from

    an original source. He also suggests that OSINF may be used as a source of

    tip-offs to serious issues that warrant classified collection.16 However, his

    stand that classified intelligence is only useful in the context of what is

    already known from open sources borders on accepted practice.

    Dr. Wyn Bowen, Open-source Intelligence.

    Bowen is an academic concerned about information overload, and

    would add non-government subject-matter experts to the intelligence

    collection process, as Steele suggests. Bowen thinks that subject-matter

    experts should be the people to evaluate Web sites, which is unique in this

    literature review. However, he sees open sources as an adjunct to classified

    sources, not the source of first resort as Steele suggests. Bowen, who is a

    13Robert D. Steele, Intelligence and Counterintelligence: Proposed Program

    for the 21st Century, URL: , accessed 5 January 2000.Cited hereafter as Steele, Intelligence.

    14 Steele, Intelligence, under Introduction.

    15 Steele, Intelligence, under Part III Figure 18.

    16 Steele, Intelligence, underPart III.

    12

  • 8/14/2019 How to Identify Credible Sources on The

    21/123

    professor at Kings College, London, and writes forJanes Intelligence Review,

    demonstrates the invaluable resources available through open sources in his

    article Open-sourceIntelligence: A Valuable National Security Resource.17 He

    uses weapons proliferation as a demonstration case. This case is very

    effective because it reduces the issue to tangible products of intelligence

    value found in the public domain. Bowen thinks that the role of OSINF is to

    provide the context of classified information.18 He also dwells on the issue of

    information overload, which concerns Clift. However, he would add non-

    government subject-matter experts to the collection process, as Steele also

    suggests. Bowen thinks the experts role should be to identify the useful

    sources to keep and collect, (not specific data) and the worthless sources to

    ignore. In his view, experts would also serve to evaluate sources for

    inaccuracy, bias, irrelevance and disinformation, which non-experts would

    find difficult to do.19

    17Dr. Wyn Bowen, Intelligence: A Valuable National Security Resource,Janes Intelligence Review, 1 November 1999, Dow Jones Interactive, PublicationsLibrary, All Publications, Search Terms Open Source Intelligence, URL: , accessed on 4 March 2000.

    18 Bowen, under Technical Sources.

    19 Bowen, under Conclusion.

    13

  • 8/14/2019 How to Identify Credible Sources on The

    22/123

    A. Denis Clift, President of the Joint Military Intelligence

    College

    Clift is also concerned about information overload, and sees a need for

    better automated selection tools to solve the analysts selection problems.

    Clift is President of the Joint Military Intelligence College (JMIC) in Washington,

    D.C. His views are his own and do not represent that of the U.S.

    Government; however, as President of the JMIC, Clift is in a position to

    influence the opinions of analysts graduating and going on to work in

    intelligence. He also served as Editor for the United States Naval Institute

    Proceedings, early in his career, from 1963 to 1966.

    In Chapter five ofClift Notes: Intelligence and the Nations Security,

    Clift gives a short explanation of the open source programs available today to

    support the intelligence analyst.20 He defends the Intelligence Communitys

    record on making open source information (OSINF) available to intelligence

    analysts. He gives an overview of the OSINF programs available to the

    analysts, but does not indicate how accessible the information is. I observed

    lines of analysts waiting to use Internet terminals in the JMIC library in 1999

    and 2000. This is an example of why it should be clear to the Intelligence

    Community (IC) that OSINF will only be used to its highest potential when it is

    on the analysts desk. The work lost walking to a terminal down the hall or in

    the next building is not worth the effort to analysts unfamiliar with the

    sources, or inundated with other sources at their finger tips. Clift writes that

    OSINF plays an important role in intelligence, and states that the IC already

    has a good collection of OSINF in Central Information Reference and Control

    20 Clift, 51-57.

    14

  • 8/14/2019 How to Identify Credible Sources on The

    23/123

    (CIRC) of the National Air Intelligence Center and the Defense Scientific and

    Technical Intelligence Centers.21 He notes the serious difficulties analysts

    have within formation overload and the need for better-automated selection

    tools.22 However, the technology Clift wants is not yet intelligent enough to

    discern credible sources from non-credible sources. As will be demonstrated

    in the findings chapter, determination of credibility requires research, and

    corroboration, and has a measure of subjectivity.

    Reva Baschs Secrets of the Super Net Searchers

    Basch does not address the Intelligence Community, but does address

    the issue of how to select trustworthy Web sites. Basch, as well as Cooke,

    takes the most practical approach to finding credible information in the flood

    of electronic data. Both recommend using evaluation criteria similar to that

    used for print media, with some variations.

    Basch published Secrets of the Super Net Searchers in 1996, after

    interviewing 35 of the best Internet searchers. In 1996, she was the news

    editor for ONLINE, DATABASE, and ONLINE USER magazines and had been an

    online researcher for about 21 years. Since then, she has published a series

    of Super Searchers books. For Secrets of the Super Net Searchers she

    conducted informal interviews with expert researchers, each of which

    represents a chapter in Super Searchers. Her questions covered many issues

    21 Clift, 54.

    22 Clift, 56.

    15

  • 8/14/2019 How to Identify Credible Sources on The

    24/123

    affecting online researchers and included the following, which relate to Web

    site credibility: 23

    What is the quality and reliability of information on the Web?

    Are some types of sites more reliable than others?

    How are biased sources treated?

    How are the quality and reliability of unfamiliar Web sites judged?

    Is there a relationship between credibility and longevity?

    Many of the experts Basch interviewed had something useful to say

    about source credibility, which were consolidated into several survey

    questions for this thesis.

    There is disagreement whether information from personal Web sites is

    credible. Susan Feldman stated in Super Net Searchers that a Web site

    written by Joe Schmo might be way ahead of McGraw-Hill. So youre left to

    your own devices to analyze and evaluate.24 However, Mary Ellen Bates, also

    interviewed by Basch for Super Net Searchers, stated at a WebSearch

    conference in Virginia on 10 May 2001 that she does not rely on personal

    Web sites unless they are well known.25

    Alison Cooke, Authoritative Guide to Evaluating Information onthe Internet

    Cooke also does not address the Intelligence Community, but does

    address the issue of how to select trustworthy Web sites. Cooke also

    23 Basch, 3.

    24 Basch, 31.

    25 Mary Ellen Bates, Presentation to WebSearch University Conference inReston, VA, 10 September 2001.

    16

  • 8/14/2019 How to Identify Credible Sources on The

    25/123

    recommends using evaluation criteria similar to that used for print media,

    with some variations.

    Alison Cooke, who is a professional Internet searcher, wrote in 1999

    theAuthoritative Guide to Evaluating Information on the Internet. The

    authors implicit thesis is that although there is much useless, outdated, and

    difficult to authenticate information on the Internet, high quality information

    can be found and the quality can be assessed.26 Like Clift and Bowen, Cooke

    sees information overload as a serious challenge facing researchers, but

    believes accuracy is of most concern to researchers. Her solution is to

    carefully evaluate Web sites using criteria similar to criteria used to evaluate

    print media.

    EVERY MANS PRINTING PRESS

    There are widely accepted criteria for evaluating traditional print

    media. These criteria include the reputation of the publisher and author,

    peer-review of scientific articles, and editorial review of periodicals.27 Such

    criteria work well when the number of publishers in a particular field are

    quantifiable and their past work can be located and reviewed. However,

    desktop publishing programs, personal computers, and the Web have

    enabled hundreds of thousands of people to produce professional-looking

    articles and distribute them to millions of potential readers without the

    26 Alison Cooke, Authoritative.27 Jan Alexander and Marsha Tate, The Web as a Research Tool: Evaluation

    Techniques, Wolfgram Memorial Library, Widener University, Chester, PA, URL: accessed 13 March 2001.

    17

  • 8/14/2019 How to Identify Credible Sources on The

    26/123

    benefit of peer or editorial review, or regard for brand name reputation.

    Among the millions of Web pages available to the public today are many of

    potential intelligence value produced by proud inventors, boisterous

    government agencies, self-promoting corporations, community-minded

    colleges, nave public servants, happy vacationers, and zealous

    revolutionaries. The issue at hand today is how to identify credible

    information among the millions of personal, organizational, industry,

    academic, and government sources. There are as many opinions on this

    topic as there are open source researchers and intelligence analysts.

    INFORMATION GAPS

    Even after a Web site is evaluated based on the criteria presented in

    Basch, Cooke or Alexander, the issue of credibility still remains. How does a

    subject-matter novice know which sources he can believe? The other issue is

    that of relativity. Is a Web site that is credible enough for a high school term

    paper also credible enough for a basic intelligence report, or for an

    intelligence warning report. This study answers both of these questions.

    18

  • 8/14/2019 How to Identify Credible Sources on The

    27/123

    CHAPTER 3

    METHODOLOGY

    This chapter on methodology and the following chapter on findings are

    organized by key issues and research questions. The key issues are

    obstacles that must be overcome before the research question can be

    answered. The key issues include: how is open source information relevant

    to intelligence; do analysts know of existing official credibility criteria; are

    analysts biased toward popular source titles; are foreign sites in English less

    credible; and how does the credibility of classified sources compare to

    unclassified sources? To answer the research question of, how to identify

    credible sources on the Web, it was necessary to separate the question into

    two parts. The first part of the research question was what criteria can be

    use to identify credible Web sites. The second part of the research question

    was how credible should any intelligence source be. The methodology relies

    on logic, and statistics, and is somewhat complex due to the many steps

    necessary to arrive at useful criteria, which is accurately weighted. The

    methodology begins with the development of the thesis survey.

    KEY ISSUE: OSINF RELEVANCE TO INTELLIGENCE

    Even before the survey could be developed, the basic question needed

    to be answered: why is open source information relevant to intelligence? The

    19

  • 8/14/2019 How to Identify Credible Sources on The

    28/123

    literature review provided several views on the role of open sources in

    intelligence. The opinions of Steele and Clift offered convincing reasons that

    intelligence must include open source information. The reasons for using

    OSINF in intelligence products are included in the findings chapter.

    SURVEY DEVELOPMENT

    Although the primary research question was, how to identify credible

    sources on the Web, this thesis needed to answer several key issues

    regarding source credibility on the way to answering the primary research

    question. Two research methods were used to answer the key issues and

    research question. First, published literature was reviewed from Intelink,

    online DIA course material, Lexis-Nexus, Dow Jones Interactive, the NSA

    Library, and academic Web pages. This literature review uncovered some

    answers to the key issues and provided the majority of the concepts tested

    by the thesis survey.

    Once the thesis survey was developed, it was given to a test

    population of 15 intelligence analysts for a validity check. The 15 analysts

    completed the survey, and suggested adding questions, clarifying ambiguous

    wording, and questioned the relevance of some questions. Those changes

    were made and the second draft was given to Professor Jerry P. Miller,

    Director of the Competitive Intelligence Center at Simmons College in Boston.

    Miller offered numerous suggestions that improved the reliability of the

    survey. He identified government lingo that would not likely be

    understood in industry and academia, and recommended changes to the

    20

  • 8/14/2019 How to Identify Credible Sources on The

    29/123

    survey questions to maintain Likert-type scales for the responses. Likert

    scales are a recognized method in social sciences to format survey response

    options that are understood by most populations and can be used to measure

    evenly a populations opinions.

    The second draft was also sent to LTC (ret) Karl Prinslow, project

    manager and operations officer of a virtual organization that employs over

    150 military reservists who work via telecommuting to collect and acquire

    open source information in support of the Intelligence Community's

    requirements. Prinslow suggested several format changes that insured all

    recipients were able to display the survey on their computers, and would be

    comfortable replying with anonymity. Prinslow and Miller suggested adding

    the personal information disclosure statement. Prinslow also recommended

    E-mailing the survey as an ASCII text message rather than a MS-Word

    document, and simplified some questions. The text message enabled

    anyone who was able to receive the E-mailed survey to respond to it without

    special software.

    After making the changes suggested by Miller and Prinslow, two

    separate surveys were distributed by E-mail. In the coding and analysis, the

    two surveys were treated as one survey, with some questions not applicable

    to the whole population. The Intelligence Community (IC) Survey included

    several questions at the end, which would not apply to industry or academia,

    and it was distributed by internal communications. The Industry Survey

    included the same questions as the IC Survey without the IC-unique

    questions. The IC Survey was E-mailed to a group of about 100 IC analysts

    who have an interest in open source intelligence (OSINT). The exact number

    21

  • 8/14/2019 How to Identify Credible Sources on The

    30/123

    of IC analysts cannot be determined because it was sent to a mail-list, which

    often changes. This method had the effect of randomizing the population

    selection. One of these 100 analysts E-mailed the survey to 18 other IC

    analysts. Four of these 18 E-mailed the survey to 238 others, for a total of

    356 IC analysts. This chain of events was evident from the E-mail headings

    and some respondents informed the author who forwarded the survey to

    them. About 50 participants from a Society for Competitive Intelligence

    Professional (SCIP) conference were then contacted by telephone and agreed

    to participate in the E-mail Industry Survey. The Industry Survey was then E-

    mailed to those 50 and 9 Defense Department analysts. One of the 9

    Defense analysts E-mailed the survey to about 120 other defense analysts. A

    total of about 179 analysts are known to have received the Industry Survey.

    Together, the two surveys reached about 535 analysts who have an interest

    in Internet research. With 66 responses, this equates to a 12.3 percent

    response rate from a randomly selected population.28

    RESEARCH QUESTION AND SURVEY STRUCTURE

    The survey was structured to answer several key issues and the

    research question: how to identify credible sources on the Web. The

    hypothesis was that credible Web sites can be confidently identified by

    evaluating the Web sites based on criteria recommended by professional Web

    searchers and agreed to by intelligence analysts. The thesis survey asked

    this question directly in survey question 6, and indirectly in survey questions

    28 Appendices B and C include a copy of the E-mailed surveys.

    22

  • 8/14/2019 How to Identify Credible Sources on The

    31/123

    8a through 8r. Question 8 listed the criteria most often mentioned by

    published experts. Here is how the survey asked these questions.29

    6. List up to five criteria that you use to determine the credibility of

    any information source.a.b.c.d.e.

    8. How much credibility does each of the following factors add to thetotal credibility of a Web site? Use the following scale:

    ___6) 100 percent Credibility___5) 75 percent Credibility___4) 50 percent Credibility___3) 25 percent Credibility___2) 10 percent Credibility___1) 0 percent Credibility

    a. Recommended by a subject-matter expert.b. Recommended by a generalist.c. Listed by an Internet subject guide that evaluates Web sites.d. Listed in a search engine such as Alta Vista.e. Listed in a Web-directory organized by people, such as yahoo.f. Content is perceived current.

    g. Content is perceived accurate.h. A peer or editor reviewed the content.i. Content's bias is obvious.

    j. Author is reputable.k. Author is associated with a reputable organization.l. Publisher, or Web-host is reputable.m. Content can be corroborated with other sources.n. Other Web sites link to or give credit to the evaluated site.o. The server or Internet domain is a recognized copyrighted or

    trademark name such as IBM.com ,p. There is a statement of attribution.q. Professional appearance of the Web site.

    r. Professional writing style of the Web site.

    To avoid influencing the responses to survey question 6, analysts were

    first asked to list the criteria they currently use; they were later asked to

    29 Survey, questions 6 and 8.

    23

  • 8/14/2019 How to Identify Credible Sources on The

    32/123

    evaluate the list of criteria in questions 8a through 8r. If the survey

    population had been asked about specific criteria (question 8) before being

    asked what criteria they actually use (question 6), they may have been

    influenced to include the listed criteria from question 8 as criteria that they

    use. This arrangement was necessary because earlier discussions with

    analysts revealed that there were criteria that analysts would use only after

    they were told of them. Discussions with analysts prior to the survey

    development had also revealed that many analyst do not know how they

    determine what is a credible source, and that many analysts may only

    evaluate the data, and not the source.

    As is shown in the findings chapter, many analysts were confused

    about the difference between data validity and source credibility. The

    categorized results of question 6 were then compared to the specific criteria

    analysts approved of in question 8.

    RESEARCH QUESTION: CREDIBILITY CRITERIA

    The results of questions 6, and 8a through 8r were used to develop the

    recommended credibility criteria and credibility scale in the findings chapter.

    The recommended criteria were determined by computing the mode (score

    most-often chosen) for each criterion in survey questions 8a through 8r, and

    to avoid influencing the responses to survey question 6. An unusual amount

    of variance would indicate little agreement among the analysts. Only criteria

    from question 8 that scored a mode of 50-percent credibility or greater were

    included in the recommended criteria list. This means analysts most often

    24

  • 8/14/2019 How to Identify Credible Sources on The

    33/123

    believe (mode) that the satisfaction of any one of these recommended

    criteria made the source at least 50-percent credible.

    Then the arithmetic mean (average) credibility was calculated for each

    recommended criterion from question 8 and became that criterions relative

    value. The relative value is how much more important, on average, analysts

    think one criterion is than another criterion. The assumption here is that

    such attributes are cumulative, and the more recommended criteria a site

    satisfies, the more credible is the site.

    The results of question 6 were categorized into a list of criteria that

    analysts think they use to evaluate source credibility. The frequencies of

    these criteria were calculated, and those criteria that were suggested by 50

    percent of the analysts were added to the recommended criteria list.

    Because the recommended criteria from question 6 were not evaluated on a

    scale in the survey, they were arbitrarily assigned the average relative value

    of those recommended criteria from question 8. This allowed the inclusion of

    any criteria not included in question 8, but also did not significantly affect the

    relative values of those criteria.

    The following is a summary of the selection process for the

    recommended criteria, and relative value calculation:

    Step 1. Calculated the mode (most-often chosen) credibility (0-100

    percent) of each criterion from survey question 8.

    Step 2. Listed as recommended the criteria from question 8 that had amode credibility of 50 percent or greater.

    Step 3. Calculated the mean credibility (average analyst chosen score)for each recommended criteria from question 8.

    25

  • 8/14/2019 How to Identify Credible Sources on The

    34/123

  • 8/14/2019 How to Identify Credible Sources on The

    35/123

    Scale:___7) No Opinion___6) 100 percent Credible___5) 75 percent Credible___4) 50 percent Credible___3) 25 percent Credible

    ___2) 10 percent Credible___1) 0 percent Credible

    Analyst were the asked to choose the required level of credibility for:

    9a. Research, or topic summaries.9b. Current, day-to-day developments.9c. Estimative, identifies trends or forecasts opportunities or threats.9d. Operational, tailored, focused to support an activity.9e. Scientific, and technical, in-depth, focused assessments.9f. Warning, an alert to take action.

    The mode response for each of these types of intelligence products

    was calculated and is the product-credibility levels, which are shown in Table

    2 in the findings chapter. The product-credibility levels percentages were

    converted into a score so that analysts can simple add the results of an

    evaluation and compare the sum to the table of product-credibility levels.

    The product-credibility level is also the credibility level that is needed

    for sources that analysts use for a particular intelligence product. When a

    potential Web site is evaluated, the analyst calculates the credibility score of

    the evaluated site, and then compares it to the table of product-credibility

    levels in Table 2. The sum of the evaluated Web site should be at least equal

    to the product-credibility level of that type of intelligence product shown in

    the table. The source-credibility level of each intelligence product type was

    determined by calculating the percentage of a benchmarked very credible

    Web sites score which would equal the product-credibility level that was

    recommended by the surveyed analysts.. For example, here is a theoretical

    27

  • 8/14/2019 How to Identify Credible Sources on The

    36/123

    Web site evaluation, which also demonstrates how the product-credibility

    level was determined.

    Example:

    Benchmark site credibility score = 46.75 points (100 percent Credible)Product-credibility level of intelligence product: 35.06 (75 percent of

    46.75).Theoretical results of a Web site evaluation:

    Meets Criteria 1 = 5 pointsMeets Criteria 3 = 6 pointsMeets Criteria 4 = 3 pointsMeets Criteria 5 = 3.5 pointsMeets Criteria 6 = 5 pointsMeets Criteria 7= 4.5 pointsMeets Criteria 10 = 2 pointsMeets Criteria 11 = 3 pointsMeets Criteria 12 = 3.5Meets Criteria 13 = 1.5Meets Criteria 14 = 4.5

    Sum of Evaluated Site = 38 pointsResult: Exceeds the product-credibility level of 35.06

    This summarizes the process recommended in this thesis to evaluate

    the credibility of a Web site. This process is based on the theory that the

    criteria recommended by expert Web searchers and approved by most

    analysts are the best criteria for evaluating Web sites. The weight or relative

    value of each criterion is based on the average score given the criterion by

    analysts. The final evaluation is based on a comparison of the total values of

    the evaluated site to the total values of the benchmark sites.

    .

    28

  • 8/14/2019 How to Identify Credible Sources on The

    37/123

  • 8/14/2019 How to Identify Credible Sources on The

    38/123

    Discussions with analysts and the literature review indicated that well-

    known publication titles are perceived as more credible than obscure titles,

    even though the analysts may have never seen the well-known titles.

    Therefore, to determine how objective analysts are, question 7a through 7m

    asked analysts to evaluate the credibility of 13 sources based only on their

    titles. This key issue was answered by comparing the well-known titles in

    survey questions 7a, b, c, j, k, l, and m, with obscure titles in survey

    questions 7d, e, f, g, h, and i. Question 7 asked:34

    7. How credible are the following information sources given only theirtitles? Choose one from the following scale:

    ___7) = Certainly True___6) = Strongly Credible___5) = Credible___4) = Undecided___3) = Non-credible___2) = Strongly Non-credible___1) = Certainly False

    Well-known Titles:

    a. NY Timesb. Washington Postc. Harvard.edu Web site

    j. NationalGeographic.com Web sitek. JanesDefenseWeekly.com Web sitel. InformationWeek.com Web sitem. DowJonesInteractive.com Web site

    Obscure Titles:d. RussianArmy.ru, Web site in Russiane. RussianArmy.ru Web site in Englishf. IsraelIndependentNews.is Web site in Hebrew

    g. IsraelIndependentNews.is Web site in Englishh. FrenchIndependentNews.fr Web site in Frenchi. FrenchIndependentNews.fr Web site in English

    34 Survey, questions 7a 7l..

    30

  • 8/14/2019 How to Identify Credible Sources on The

    39/123

    However, there was a problem with how this question was structured

    and the findings may not be valid. Judging from the comments in the

    surveys, it was evident that analysts were not able to make credibility

    judgments for many sources based on titles alone either because they had

    personal experience with the sources, which influenced their judgments, or

    because they were unwilling to make an uninformed judgment based on titles

    alone.35

    KEY ISSUE: FOREIGN LANGUAGE SOURCES

    An issue related to source titles was, do analysts perceive foreign

    sources published in their native language to be more credible than the

    English language version of the same publications? This question was

    answered by comparing survey questions 7d to 7e, and comparing 7f to 7g,

    and comparing 7h to 7i. The validity of these questions was preserved by not

    including any real publications or Web site titles, which the analysts may be

    familiar with.36

    7. How credible are the following information sources given only theirtitles? Choose one from the following scale:

    ___7) = Certainly True___6) = Strongly Credible___5) = Credible___4) = Undecided

    ___3) = Non-credible___2) = Strongly Non-credible___1) = Certainly False

    d. RussianArmy.ru, Web site in Russian

    35 Survey, questions 7a 7l.

    36 Survey, questions 7d 7i.

    31

  • 8/14/2019 How to Identify Credible Sources on The

    40/123

    e. RussianArmy.ru Web site in English

    f. IsraelIndependentNews.is Web site in Hebrewg. IsraelIndependentNews.is Web site in English

    h. FrenchIndependentNews.fr Web site in French

    i. FrenchIndependentNews.fr Web site in English

    KEY ISSUE:CLASSIFIED VS. UNCLASSIFIED SOURCES

    Discussions with IC managers and consultants often included

    statements such as, how do classified sources compare in credibility to

    unclassified sources and less often, how do classified sources compare to one

    another. This is a comparison that is likely to change over time. One JMIC

    professor explained that different intelligence sources seem to go in and out

    of favor as access success improves for one source or another. These issues

    were only included in the IC Survey and most analysts answered as though

    they had an opinion. Therefore, questions 7n, o, p, q, r, and, s. asked:37

    7. How credible are the following information sources given only theirtitles? Choose one from the following scale:

    ___7) = Certainly True___6) = Strongly Credible___5) = Credible___4) = Undecided___3) = Non-credible___2) = Strongly Non-credible___1) = Certainly False

    The intelligence sources in question included:

    7n. HUMINT sources with no reporting record7o. HUMINT sources with a proven reporting record7p. IMINT, with National analysts annotations or comments7q. IMINT, without National analysts annotations or comments

    37 Survey, questions 7n 7s.

    32

  • 8/14/2019 How to Identify Credible Sources on The

    41/123

    7r. SIGINT reporting7s. MASINT

    Analysis of these questions included a calculation of the mode and

    range for all sources included in question 7, and compared them to each

    other. This provides an interesting comparison of classified and unclassified

    sources. 38

    ETHICS

    The thesis survey relied on the truthful response from analysts

    currently working in areas included in this survey. Such responses could be

    critical of an analysts employer or profession; therefore, the thesis included

    the following statement intended to protect the respondents anonymity.

    PRIVACY:You do not need to include your name; however, if you choose toinclude your name, it will only be used by me to contact you if I needmore information regarding your comments. I will not quote you

    directly unless you indicate in Questions 3 and 4 that I may do so.Otherwise, only me and my Thesis Chairman, Professor Alex Cummins will have access to respondent names. Any record of the names inassociation with the responses will be destroyed after the research iscompleted, except those names included in the thesis withpermission.39

    38 See Table 7, and Table 8 in the findings chapter.

    39 Survey, Privacy.

    33

  • 8/14/2019 How to Identify Credible Sources on The

    42/123

    CHAPTER 4

    FINDINGS

    This chapter first describes what was discovered in the literature

    review that could answer the research question and the key issues. Then the

    results of the survey are described , followed by how these results answered

    the research question and the key issues. The survey determined what

    criteria analyst use today to judge the credibility of an intelligence source,

    which can be found in Appendix D. Even after consolidation, 148 separate

    criteria were suggested by analysts, indicating little consistency in criteria, or

    little understanding of the differences between data validity and source

    credibility. Many of the suggested criteria appear to be measures of valid

    data, or lists of known credible sources.40

    The most significant result of the survey is the list of recommended

    credibility criteria determined by surveying analysts opinions of criteria

    suggested by experts in the literature review. Only two expert

    recommendations were rejected by the surveyed analysts. The survey also

    showed that analysts see only a small difference in the credibility of open

    sources and classified sources.4142

    40 Survey, question 6.

    41 Survey, questions 7a through 7s.

    42 See Table 8 in findings chapter for comparison of classified andunclassified source credibility.

    34

  • 8/14/2019 How to Identify Credible Sources on The

    43/123

    Just as useful as the credibility criteria is the credibility scale

    developed by benchmarking known credible and known non-credible Web

    sites. The benchmarked sites determined the expected score of a credible

    Web site. The survey results also determined a target level of credibility for

    intelligence sources, which was converted to a percent of the credible

    benchmark score on the credibility scale. The benchmarking of known

    credible and non-credible Web sites validated the criteria and demonstrated

    that credible sources can be identified on the Web.43

    KEY ISSUE: OSINF RELEVANCE TO INTELLIGENCE

    Although all experts agree that open source information (OSINF)

    contributes to intelligence, how OSINF should contribute is still an open

    debate. Steele suggests that analysts should reference OSINF first, and then

    classified sources, and presumably only then request further classified

    collection to fill the intelligence gaps.44 This approach would acquire data

    from the least expensive sources first. Steele calls for 5 percent of the

    intelligence budget to be moved to support OSINF acquisition.45 He claims

    this would increase timely intelligence by a magnitude. His comments

    suggest an answer to the key issue how relevant is OSINF to intelligence.

    Open sources include what is already publicly known about a subject, and

    therefore should represent the background and context of any intelligence

    43 See Appendix A, Benchmarked Web Site Evaluation Worksheet.

    44 Steele, under Part III.

    45 Steele, under Part III.

    35

  • 8/14/2019 How to Identify Credible Sources on The

    44/123

    report, and should be considered before any classified collection is

    attempted. Not to do so would potentially waste funds and possibly put

    people at risk for information that may have been found in a foreign Web

    site, foreign newspaper, or company brochure. These open sources can also

    be used to corroborate classified intelligence, thus contributing to the

    credibility of a classified source. Because classified resources are so much

    more expensive than open sources, open sources should always be the first

    choice, followed by classified sources if not available through open sources,

    or if the open sources credibility cannot be determined or is determined to be

    too low. Therefore, OSINF affects the cost of intelligence, the timely access

    to information, the context of intelligence, the credibility of intelligence, as

    well as the content.

    Bowens recommendations to include subject-matter experts in the

    intelligence collection cycle may be a practical way to implement the

    evaluation process proposed by this thesis.46 Implemented community wide,

    Bowens cadre of OS subject experts could produce a significant savings in

    time and money spent by countless analysts attempting to sort the useful

    credible information from the useless and non-credible information. I have

    observed that every analyst who makes use of Web sites for open source

    intelligence must rediscover which sites are useful and credible, even though

    an expert at another agency or just down the hall may have already

    evaluated the site. Also, when a Web site is recommended by one analyst to

    another analyst, there is no consistent way to evaluate the Web site and

    express that evaluation to other analysts. This research produced a

    46 Bowen, under Collection Strategy.

    36

  • 8/14/2019 How to Identify Credible Sources on The

    45/123

  • 8/14/2019 How to Identify Credible Sources on The

    46/123

    Bias. The researcher must understand the sources bias.51

    Objectivity. Are the authors statements supported with reasoning

    or facts? 52 Even a bias author can compensate for his bias by including

    competing reasoning and facts.

    Accuracy. Online sources are generally quicker than print media at

    correcting errors.53 Even print sources include inaccurate information or

    disinformation.54 I believe that this is significant because accuracy affects

    credibility; therefore, Web sources should be more accurate and timely

    than print media because the technology enables quicker revisions.

    Expert opinion.

    Rely on second party expert evaluation whenever possible, e.g.,

    recommendations from professional associations, academic organizations,

    subject experts.55

    Informal networks of colleagues with different areas of expertise

    inform one another of credible sources.56

    Use second opinions to evaluate the accuracy of an author, which

    can be done by posting related questions to appropriate news groups. 57

    51 Basch, 9, 15.

    52 Basch, 31.

    53 Basch, 48.

    54 Basch, 9.

    55 Basch, 31.

    56 Basch, 31.

    57 Basch, 31.

    38

  • 8/14/2019 How to Identify Credible Sources on The

    47/123

    Subject area Web pages created by subject librarians are a good

    source of links to evaluated Web sites.58 I recommend evaluation sites

    that explain their evaluation process.

    Gray literature (documents with limited distribution such as

    company brochures, or equipment manuals), best located on the Web, is

    often published by very credible sources, including governments, and

    corporations, which can be good sources for factual data. Interpretation

    of the data may require an expert.59 I suggest asking a subject-matter

    expert to distinguish facts from advertising in corporate literature.

    Origin.

    How close is the source to the origin of the data? 60

    Discover the original source to avoid circular and false

    corroboration.61

    Corroboration. Can the information be corroborated?62

    Corroboration is only effective if it is from diverse sources. This is another

    reason it is important to know the origin of the data.

    Current. Is the information current? 63

    58 Basch, 139.

    59 Basch, 40, 110.

    60 Basch, 9.

    61 Basch, 16.

    62 Basch, 9, 96.

    63 Basch, 132.

    39

  • 8/14/2019 How to Identify Credible Sources on The

    48/123

  • 8/14/2019 How to Identify Credible Sources on The

    49/123

    Know which publishers, universities, or companies are well

    respected in your topic area.69 These are likely to be credible sources, or

    able to identify credible sources.

    Reputable publishers, well-known authors, and (peer) reviewed

    publications are more credible than other sources.70

    Attribution.

    Does the source clearly identify its self and its purpose? 71

    Indications of the source include the text of the Web site, the name

    of the Web server in the URL, and the directory name in the URL, which

    may include the authors name.72

    Attribution should include the institution and a person,

    withinformation on how to contact the author.73

    I would also recommend viewing the Web sites HTML source code

    for revision dates, and statements of attribution not shown in the Web

    sites body.

    Motivation.

    Information has value; therefore, know why a source provides

    information for free.74

    69 Basch, 110, 137.

    70 Basch, 32.

    71 Basch, 16.

    72 Basch, 140.

    73 Basch, 140.

    74 Basch, 77.

    41

  • 8/14/2019 How to Identify Credible Sources on The

    50/123

    The presence of a counter on a Web site indicates the author cares

    that people know that other people like his site enough to visit it.75

    However, I am aware that counters have also been used to falsely

    indicate that a site is popular when it is not. Therefore, counters are

    probable not a reliable indicator of anything. A more relevant indicator of

    popularity is how many and which other Web sites include links to the

    evaluated site. I suggest using Alta-Vistas Link: command in the

    Advanced Search area to determine this. A search of relative news groups

    will also indicate what other people think of a Web site.

    Relativity. What is a good source for one purpose may be

    insufficient for another purpose.76 This is another reason that I think

    that Web sites are best evaluated by subject-matter experts. A

    novice or generalist who evaluates a Web site for someone else

    should indicate his own level of knowledge in the topic area. This

    also relates to thesis survey question 9, which asked analysts to

    evaluate how credible a source must be to use it for different

    intelligence products.

    All of the statements listed above from respected Internet searchers

    contributed to the thesis survey question 8, which asked how much does

    specific criteria contributed to the credibility of Web sites.

    Alison CookesAuthoritative Guide to Evaluating Information on the

    Internetincluded three areas: what is high quality information, how to find it,

    and how to evaluate it. Each of these areas contributed to the development

    75 Basch, 132.

    76 Basch, 133.

    42

  • 8/14/2019 How to Identify Credible Sources on The

    51/123

    of relevant questions in the thesis survey. On the topic of high-quality

    information, Cooke explains that some of the most common problems with

    the Internet include:77

    information overload

    too much useless information

    potentially inaccurate material

    outdated material

    Publishing has become so easy that researchers must comb through

    thousands of supposedly related Web pages returned by search tools, which

    do not even include, databases, news services, and FTP sites. The citation

    search engines are of no help in determining quality, or relevance. Most

    search engines are only an index of Web pages found.

    Cooke explains that without the filtering provided by commercial and

    academic publishers, people publish because they can, not because they

    have something useful to share.78 I have observed that this is a serious

    problem because it camouflages the useful information and requires a great

    amount of time to sort through. A useless site can have all the gloss, format,

    and authoritative lingo of a useful site, yet have no useful content.

    Cooke contends that accuracy is perhaps of most concern to

    researchers and professionals. As an example of the accuracy issue, Cooke

    explains that of forty WWW medical sites evaluated, only four included the

    advice close to the authoritative published recommendations.79 I believe that

    this level of inaccuracy is possible because Web authors are their own editor

    77 Cooke, 89.

    78 Cooke, 12.

    79 Cooke, 62.

    43

  • 8/14/2019 How to Identify Credible Sources on The

    52/123

    and publisher, allowing no opportunity for critical review which most scholars

    and professionals welcome.

    Methods for finding data on the Web are unique to the Web and online

    sources. Cooke explains in great detail the advantages and disadvantages

    of:

    search engines

    review and rating services

    subject catalogs and directories

    subject-based gateway services and virtual libraries

    Cooke explains that search engines such as Excite and Lycos (or

    AltaVista, which is still solvent) are comprehensive, unfocused, have poor

    relevance ranking, and are not useful for finding nor evaluating sources for

    quality. They are also generally limited to Web sites and index every page on

    every site, further multiplying the number of results per query.80 I have

    observed that some search engines such as Google have resolved this

    multiple indexing of a single site by displaying only the first indexed page,

    unless one requests more.

    Cooke also writes that subject catalogs and directories such as Yahoo

    and Galaxy are more useful because site authors write the site descriptions;

    catalog experts choose the hierarchy category to place the site; and only

    sites are indexed, not every page. However, these sites are still very large,

    and because the indexing is done by people rather than machines, as is the

    80 Cooke, Chapter 2.

    44

  • 8/14/2019 How to Identify Credible Sources on The

    53/123

    case with search engines, Web site directories are not revisited as often and

    may become outdate.81

    Cooke also wrote that rating and reviewing services use different,

    usually unpublished criteria for rating the best sites. These include

    Encyclopaedia Britannicas Internet Guide and Lycos Top 5 percent.82 These

    are even better yet for finding high-quality sources because a person other

    than the author has reviewed the site based on some criteria. However,

    these criteria are targeted to a general audience, not the academic or

    professional. Higher weight may be given to organization and graphics, than

    for content or accuracy, and the evaluators are not subject-matter experts.83

    Cooke believes that the best place to find high-quality sources is from

    subject-based gateway services and virtual libraries. These facilities are

    designed by librarians or subject-matter experts, and use common indexing

    methods used in libraries. They are often subject-matter specific and site

    descriptions are evaluated and described by subject-matter experts.84

    The last section of Cookes book gives checklists of evaluation criteria

    for several internet source types. The criteria can be used for overall

    evaluation of Web sites, not specifically for credibility as this thesis does.

    Cookes criteria are based on surveys of hundreds of internet users, and were

    81 Cook, Chapter 2.

    82 Cook, Chapter 2.

    83 Cooke, Chapter 2.

    84 Cooke, 92.

    45

  • 8/14/2019 How to Identify Credible Sources on The

    54/123

    validated by professional librarians. The unique evaluation criteria for each

    type of Web site are fully described.

    The source types described in this book, with general evaluation

    criteria, included:

    organizational WWW sites

    personal home pages

    subject-based WWW sites

    electronic journals and magazines

    image-based and multimedia sources

    USENET newsgroups and discussion groups

    databases

    FTP archives

    current awareness services

    FAQs

    Criteria for assessing an organizational Web site should include the

    authority and reputation of the institution within its field, as well as the date

    the page was last updated.85 Criteria for a subject-based Web site include

    the purpose of the site, comprehensiveness, and whether the page includes

    pointers to other sources for more information.86 Evaluation criteria for

    electronic journals and magazines include the sites authority and reputation

    as well as whether the site has been referenced by a known reputable journal

    85 Cooke, 90.

    86 Cooke, 97.

    46

  • 8/14/2019 How to Identify Credible Sources on The

    55/123

    that filters its own articles for accuracy.87 These criteria were included in the

    survey questions for this thesis.

    SURVEY FINDINGS, CREDIBILITY CRITERIA

    The primary purpose of the thesis survey was to identify criteria for

    assessing the credibility of a Web site. The recommended credibility criteria

    were determined by a multi-step processes. First, all credibility criteria

    recommended by experts in the literature review were listed, and then

    consolidated. Then the consolidated list of expert criteria were included in

    the thesis survey to industry and intelligence analysts as questions 8a

    through 8r. Those criteria, which analysts most often gave a credibility value

    of 50 percent or higher, were then listed as recommendations. Note that

    only three criteria were rejected as credible by 50 percent or more

    respondents. The first two were not recommended by experts, but were

    added to assess the basic knowledge of respondents and as control

    questions, which were not expected to be accepted by respondents.

    Rejected criteria included:

    8d. Listed in a search engine such as AltaVista.8e. Listed in a Web directory organized by people, such as Yahoo.8r. Professional writing style of Web page

    Then the mean credibility (average analyst chosen score) was

    calculated for each recommended criteria from question 8. The mean then

    became the relative value or weight for each criterion.

    87 Cooke, 98.

    47

  • 8/14/2019 How to Identify Credible Sources on The

    56/123

    The criteria recommended in survey question 6 were then listed, and

    consolidated. The methodology planned to add to the list of recommended

    criteria from question 8, those criteria from question 6 that were not already

    on the recommended list, and that had a mode occurrence of 50 percent or

    greater (at least half the analysts listed the criterion). Surprisingly, there

    were no criteria recommended by half or more of the respondents in the

    open survey question number 6. The criteria that were mentioned most

    often were: corroboration (28 occurrences), bias (14 occurrences), reputation

    of the source (10 occurrences), sources authority or credentials (8

    occurrences), and presentation (7 occurrences).88 However, each of these

    most-often suggested criterion, except source authority, were also suggested

    by published experts discussed in the literature review, and were recommend

    by 50 percent or more of respondents when ask about those specific criterion

    in survey questions 8a-8r. Therefore, no additional criteria were added from

    question 6.

    Therefore, Table 1 below includes the results of the criteria surveyed,

    the relative values of each criterion, and which criteria were chosen for

    recommendation.89

    Table 1. Question 8a to 8r, Recommended Criteria and Relative Values (Mean).(a)

    Number ofCases

    Criteria Valid Missing

    Mean Mode

    Recommended

    88 See Table 15. Survey Question 6: Personal Criteria Analysts Currently Useto Determine Credibility.

    89 Survey, questions 8a 8r.

    48

  • 8/14/2019 How to Identify Credible Sources on The

    57/123

    8a. Recommended bysubject-matter expert in thetopic of the Web page.

    66 0 4.94 5 Yes

    8b. Recommended by ageneralist.

    65 1 3.65 4 Yes

    8c. Listed by an Internet

    subject guide that evaluatesWeb sites.

    63 3 3.56 4 Yes

    8d. Listed in a search enginesuch as AltaVista

    64 2 2.39 1 No

    8e. Listed in a Web directoryorganized by people, such asYahoo.

    62 4 2.65 2 No

    8f. Content is perceivedcurrent.

    64 2 3.78 5 Yes

    8g. Content is perceived

    accurate. 63 3 4.56 5 Yes

    8h. A peer or editor reviewedthe content.

    65 1 4.52 5 Yes

    8i. Content's bias is obvious. 65 1 3.06 4 Yes8j. Author is reputable. 64 2 4.64 5 Yes8k. Author is associated witha reputable organization.

    65 1 4.42 5 Yes

    8l. Publisher or Web host isreputable.

    65 1 4.02 5 Yes

    8m. Content can becorroborated with othersources

    65 1 5.17 5 Yes

    8n. Other Web sites link to, orgive credit to the evaluatedsite

    65 1 3.68 5(b) Yes

    8o. Server or domain iscopyrighted or trademarkname, like IMB.com.

    65 1 3.45 4 Yes

    8p. Statement of attribution. 64 2 3.78 5 Yes

    8q. Professional appearanceof Web site.

    65 1 2.86 4 Yes

    8r. Professional writing styleof Web page.

    64 2 3.16 3 No

    (a) (a) Table Explanatory Notes. Mode Values: 1=0 percent, 2=10 percent, 3=25percent, 4=50 percent, 5=75 percent, 6=100 percent credible. Mode is themost-often chosen score respondents gave each criterion. Only modes of 50percent credible and higher are recommended. The Mean is the average scorerespondents gave each criterion. The Mean is assigned to each recommended

    criteria as their relative values which are latter summed when evaluating a Website.

    (b) (b) Multiple modes exist. The smallest value is shown

    The last step of the processes to identify commonly agreed-upon

    credibility criteria and to assign relative weights, involved applying the

    49

  • 8/14/2019 How to Identify Credible Sources on The

    58/123

    recommended criteria to known credible, and known non-credible Web sites,

    to establish benchmarks and a relative credibility scale. Three credible sites

    known to the author or recommended by a subject expert were evaluated to

    establish the high-end of the relative credibility scale. The relative values of

    each criterion that the site satisfied were then summed for the sites relative

    credibility score. Then the average of the three credible Web sites was

    calculated as the benchmark credible score. See Appendix A for the

    evaluation worksheets, and detailed evaluation for these Web sites.

    It was surprisingly easier to find known credible Web sites to evaluate

    than it was to find known non-credible Web sites to evaluate. This was

    because it did not seem useful to benchmark a Web site so obviously non-

    credible that no analysts would consider using it, negating the need for an

    evaluation at all. Due to this difficulty, only one non-credible Web site was

    evaluated. Due to concerns about potential libel claims, this non-credible

    Web site will be referenced here by the pseudonym KoreanNewsSite. The

    KoreanNewsSite was selected because the author had evaluated this site for

    a previous research paper and had found it non-credible, and yet a challenge

    to evaluate. The challenge to evaluating it came from its mix of very credible

    links, unknown contributing authors, and non-credible articles by the

    publisher. The key points that made the publishers articles non-credible

    included a general lack of authoritative citations to source documents, lack of

    dates on the articles, a distinct bias camouflaged by corroborative facts, and

    inaccuracies. Relative newsgroup discussions indicated that the publishing

    author had a poor reputation for these same reasons.

    50

  • 8/14/2019 How to Identify Credible Sources on The

    59/123

    The figures below represent the relative credibility scale and how these

    benchmarks were determined. Based on these evaluations, a very credible

    Web site should rate a relative credibility score of about 46.75, and a non-

    credible site should rate a relative credibility score of about 7.46.

    51

  • 8/14/2019 How to Identify Credible Sources on The

    60/123

    Benchmark Credible Web sites Evaluated Score

    Spot Image Corporation, www.spot.com 43.19International Telecommunications Union, www.itu.int 48.24NY Times On the Web, nytimes.com 48.82Average Score 46.75

    Benchmark Non-credible Web site Evaluated ScoreKoreanNewsSite 7.46

    Relative Credibility Scale:46.75 = Very-Credible

    7.46 = Non-credible

    SURVEY FINDINGS,CREDIBLE ENOUGH FOR INTELLIGENCE USE

    As discussed in the methodology chapter, having a relative scale is

    useful from an academic perspective; however, to be of practical use, the

    analysts must also know what the target or required level of credibility is for

    a source he would like to use in an intelligence product. The required level of

    credibility for intelligence sources was determined by survey questions 9a

    9f, which asked:90

    How credible must an intelligence source be to use its data in thefollowing intelligence products?

    7) No Opinion6) 100 percent Credible5) 75 percent Credible4) 50 percent Credible3) 25 percent Credible2) 10 percent Credible1) 0 percent Credible

    9a. Research, or topic summaries9b. Current, day-to-day developments9c. Estimative, identifies trends or forecasts opportunities or threats9d. Operational, tailored, focused to support an activity9e. Scientific, or technical, in-depth, focused assessments

    90 Survey, questions 9a 9f.

    52

  • 8/14/2019 How to Identify Credible Sources on The

    61/123

    9f. Warning, an alert to take action

    The following calculations were used to determine the product-

    credibility level for six types of intelligence products. The mode was

    calculated for survey questions 9a 9f. The mode is the most-often chosen

    required level of source credibility. The statistics indicate that most analysts

    believe that all types of intelligence products require that sources be 75

    percent credible.91 This was a surprise because the author expected to see a

    greater variance in the required levels of source credibility, with warning

    intelligence requiring the least credibility and in-depth focused assessments

    requiring the greatest level of credibility. This presumption was based on the

    belief that analysts require less information about an imminent threat than

    they do about a future scientific or political condition, because the potential

    impact of ignoring the least threat is so much greater than ignoring the most

    significant emerging scientific or political condition. Apparently, most

    analysts do not understand the relationship of intelligence products to

    outcomes, or the survey question was flawed.

    However, using the survey results, the sources of all intelligence

    products should be 75 percent credible. If the most credible Web sites have a

    relative-credibility score of 46.75 as demonstrated above, then intelligence

    products should be 75 percent of that, which is 35.06. Therefore, the target-

    credibility level of any intelligence source is 35.06, as evaluated by the

    recommended credibility criteria. The following table shows the most-often

    chosen (mode) required credibility level for intelligence products.

    91 See Table 2.

    53

  • 8/14/2019 How to Identify Credible Sources on The

    62/123

    Table 2. Questions 9a-f. Required Level of Source Credibility forIntelligence Products.92

    Number ofCases

    Required Credibility

    Valid Missing(b)

    Modepercent

    Rangepercent

    9a. Research, special topicsummaries

    35 31 50percent(a)

    0-100percent

    9b. Current, day-to-daydevelopments

    35 31 75percent

    0-100percent

    8c. Estimative, identifies trends orforecasts opportunities or threats

    35 31 75percent

    0-100percent

    9d. Operational, tailored, focused, tosupport a military, intelligence, ordiplomatic activity

    35 31 75percent

    0-100percent

    9e. Scientific or technical, in-depth,

    focused assessments of trends orcapabilities

    35 31 75

    percent

    0-100

    percent

    9f. Warning, an alert to take action 35 31 75percent

    0-100percent

    Required-credibility level for allIntelligence Product Sources

    75percent

    (a) Multiple modes exist. The smallest value is shown. Just as manyrespondents chose 75 percent.(b) Missing responses are primarily because non Intelligence Communitypersonnel were not asked these questions in the survey. Mode is based onvalid responses.

    SURVEY FINDINGS, OFFICIAL CREDIBILITY CRITERIA

    Question 5 asked, Does your organization have official criteria that

    you are told to use for determining the credibility of any source? "Any source"

    means published, proprietary, and classified sources.93 The purpose of this

    question was to determine if analysts are aware of credibility criteria that

    they can use to ensure a consistent quality of reporting. The assumption

    92 Survey, questions 9a 9f.

    93 Survey, question 5.

    54

  • 8/14/2019 How to Identify Credible Sources on The

    63/123

    here is that only criteria formally sanctioned by the organization are likely to

    be consistently followed. As the table below indicates, 86.2 percent of

    analysts are eith