a study of argumentation in a causal probabilistic humanistic domain: genetic counseling

23
A Study of Argumentation in a Causal Probabilistic Humanistic Domain: Genetic Counseling Nancy Green * Department of Mathematical Sciences, University of North Carolina at Greensboro, Greensboro, North Carolina 27402-6170, USA We present the results of an in-depth qualitative analysis of argumentation in two genetic coun- seling patient letters. In addition to argumentation techniques designed for medical experts, we found other types of causal argumentation designed for lay readers, reflecting the educational and supportive counseling functions of these letters. Analysis was facilitated by use of a coding scheme for representing causal probabilistic biomedical content of the letters as Bayesian net- works. We define the argument techniques used in the letters in terms of Bayesian network, semantic network, argumentation theory, and user model concepts rather than in terms of genet- ics concepts. © 2007 Wiley Periodicals, Inc. 1. INTRODUCTION Genetic counselors in the United States meet with clients to discuss genetics- related health care issues such as newborn screening for inherited metabolic dis- orders and genetic testing for an inherited predisposition to breast cancer. Some information may be difficult for a lay person to understand because clinical genet- ics, like other medical domains, involves probabilities and causal reasoning. In addition, a client’s emotional state, attitudes, or mistaken assumptions may impede communication. One of the concerns of genetic counseling is to address these com- munication challenges. The genetic counseling patient letter is a standard docu- ment written by a genetic counselor that summarizes information discussed at the meeting. We are designing a system to generate the first draft of the patient letter from general information on clinical genetics and specific information about a client’s case. Note that it is not the system’s goal to automate problem-solving tasks such as diagnosis and risk assessment, but rather to support health care com- munication tasks. Automatic generation of the first draft could save a counselor a significant amount of time. Our overall goal is to investigate how intelligent *E-mail: [email protected]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 22, 71–93 ~2007! © 2007 Wiley Periodicals, Inc. Published online in Wiley InterScience ~www.interscience.wiley.com!. DOI 10.1002/ int.20190

Upload: nancy-green

Post on 12-Jun-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

A Study of Argumentation in a CausalProbabilistic Humanistic Domain:Genetic CounselingNancy Green*

Department of Mathematical Sciences, University of North Carolina atGreensboro, Greensboro, North Carolina 27402-6170, USA

We present the results of an in-depth qualitative analysis of argumentation in two genetic coun-seling patient letters. In addition to argumentation techniques designed for medical experts, wefound other types of causal argumentation designed for lay readers, reflecting the educationaland supportive counseling functions of these letters. Analysis was facilitated by use of a codingscheme for representing causal probabilistic biomedical content of the letters as Bayesian net-works. We define the argument techniques used in the letters in terms of Bayesian network,semantic network, argumentation theory, and user model concepts rather than in terms of genet-ics concepts. © 2007 Wiley Periodicals, Inc.

1. INTRODUCTION

Genetic counselors in the United States meet with clients to discuss genetics-related health care issues such as newborn screening for inherited metabolic dis-orders and genetic testing for an inherited predisposition to breast cancer. Someinformation may be difficult for a lay person to understand because clinical genet-ics, like other medical domains, involves probabilities and causal reasoning. Inaddition, a client’s emotional state, attitudes, or mistaken assumptions may impedecommunication. One of the concerns of genetic counseling is to address these com-munication challenges. The genetic counseling patient letter is a standard docu-ment written by a genetic counselor that summarizes information discussed at themeeting. We are designing a system to generate the first draft of the patient letterfrom general information on clinical genetics and specific information about aclient’s case. Note that it is not the system’s goal to automate problem-solvingtasks such as diagnosis and risk assessment, but rather to support health care com-munication tasks. Automatic generation of the first draft could save a counselora significant amount of time. Our overall goal is to investigate how intelligent

*E-mail: [email protected].

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 22, 71–93 ~2007!© 2007 Wiley Periodicals, Inc. Published online in Wiley InterScience~www.interscience.wiley.com!. • DOI 10.1002/int.20190

Page 2: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

systems can help communicate medical and other scientific information effec-tively to audiences with different educational backgrounds, needs, and attitudes.

Currently, we are analyzing a corpus of patient letters written by genetic coun-selors. Analyzing a corpus is a standard knowledge acquisition technique in thefield of natural language generation ~NLG!.1 One goal of our corpus analysis is toinform design of the system’s knowledge base and to ensure that letters drafted bythe system are consistent with current practice. Another, broader, goal is to iden-tify argumentation techniques that may be reusable in other domains. We havefound a variety of types of causal argumentation in the corpus. Some types, designedprimarily for the expert audience, are used in service of the letters’ medical docu-mentation function, for example, an argument for why one candidate diagnosiswas ruled out. Other uses of causal argumentation are intended to fulfill an educa-tional function for the lay audience, for example, an argument for how a patientcould have acquired a genetic mutation. Furthermore, we have found uses of dia-lectical and affective causal argumentation consistent with the letters’ supportivecounseling function, for example, addressing potential counterarguments to causalclaims that may be negatively valued by the client. In short, analysis of the corpussuggests that the repertoire of argumentation techniques employed by writers inthis genre is broader than the set of techniques typically used for communicationwith domain experts by intelligent systems in medicine.

As part of the corpus analysis, we developed a coding scheme for patientletters that models their biomedical content in a Bayesian network ~BN! formal-ism. The BN representation of a letter enables argumentation techniques to bedefined at a higher level of abstraction than in terms of domain-specific knowl-edge, that is, clinical genetics. In this article, we present an in-depth qualitativeanalysis of argumentation in two patient letters. We provide definitions of argu-ment techniques in terms of general BN, argumentation, semantic network, anduser model concepts. The article is organized as follows. In Section 2, we presentbackground information on the genre represented by the corpus; in Section 3, wesummarize the BN coding scheme used to annotate the corpus; in Section 4, wepresent the argument techniques found in our analysis of the two letters; in Sec-tion 5, we describe the architecture of the proposed NLG system; and in Section 6,we outline related work.

2. GENETIC COUNSELING PATIENT LETTERS

2.1. General Purpose and Points of View

A genetic counselor writes a patient letter to a client after meeting with theclient “for the purpose of providing a permanent, easily understood record of therelevant information discussed during the genetics clinic visit” and to provide infor-mation “that may have become available since the patient’s visit ~e.g., outsiderecords or laboratory results!. . . . It is generally composed on behalf of the entiregenetics team, including the geneticist, genetic counselor, and other who wereinvolved in the clinic visit. A secondary audience for the letter, should the patient

72 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 3: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

wish to share it, may include other health care providers, family members, teachers, school administrators, or child care workers” ~Ref. 2, p. 402!.

2.2. Medical Documentation

One of the functions of the patient letter is to provide medical documenta-tion. According to Ref. 3 ~p. 235!, “good documentation serves to improve com-munication between all the health and social service providers working with apatient” and “may contribute, now or in the future, to establishing or confirming adiagnosis, and to determining an accurate risk assessment.” Documentation maybe important for legal reasons as well, since Baker et al.3 recommend that thedocumentation demonstrate “that services were performed within the accepted stan-dards of care” and that it “should be complete, because there is oftentimes a per-ception that anything not included in the medical documentation didn’t happen.”Their writing guidelines for medical documentation include that it should be “objec-tive and factual,” “as brief as possible,” and record events in chronological order.

In the patient letters that we have examined, medical documentation is pro-vided in chronological order on the diagnostic process, which may include topicssuch as the reason for the referral, relevant patient and family history, test results,and findings of physical exams. As described in Ref. 4, diagnosis is an iterativeprocess in which, in each cycle, the physician considers a set of hypotheses, gath-ers evidence from the patient’s background or test results, and determines the over-all best explanation ~one or more hypotheses!with respect to the currently availableevidence. “After the physician formulates a hypothesis, he generates questions basedon this hypothesis and runs additional tests to obtain answers to these questions.He then formulates a ~possibly! new most probable explanation. This process con-tinues until a satisfactory diagnosis is reached” ~p. 318!.

2.3. Educational Function

An important educational function of genetic counseling ~and by extensionthe patient letter! is to provide background information on genetics relevant to aclient’s case. This information is not addressed to an expert audience because itconsists of fundamental knowledge that would be shared by experts in this domain,for example, Mendelian inheritance patterns. Under the heading of educationalfunction, we include medical or informative counseling5 on treatment options orrecurrence risks, respectively. In providing this type of information, the counseloris expected to take a nondirective stance to enable a family to “choose the courseof actions which seems to them appropriate in view of their risk, their family goals,and their ethical or religious standards.”a For example, to avoid emphasis of onepossible outcome over the other, a risk is presented as a 25% chance that an event~such as inheriting a mutated gene that can cause a health problem! will occur and,equivalently, as a 75% chance that it will not occur.

aAmerican Society of Human Genetics, as quoted in Ref. 5, p. 430.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 73

International Journal of Intelligent Systems DOI 10.1002/int

Page 4: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

2.4. Counseling Function

A third important function of genetic counseling ~and by extension the patientletter! is supportive counseling5 to help the client to deal with the emotional impactof genetic risk or genetic disease, which may include “denial, anger, and/or grief”~Ref. 5, p. 5!. Baker et al.2 recommend a number of writing strategies for mitigat-ing the negative impact of information such as use of “value-free” and “nonstig-matizing” language, for example, use of “alteration” in place of “mutation.”

3. BAYESIAN NETWORK CODING SCHEME

In Refs. 6 and 7, we present a coding scheme for annotating biomedical infor-mation in patient letters. The scheme is designed for encoding this information asa BN, a directed graph whose nodes represent discrete-valued random variablesand whose arcs represent dependencies of conditional probability between vari-ables.b BN models have been used in a number of medical decision support sys-tems, for example, Refs. 9 and 10.

Our coding scheme includes a small set of BN variable types relevant to clin-ical genetics and domain constraints on their interrelationships. When encoding apatient letter, a coder may assign the following tags to a phrase: history, genotype,mutation-event, biochemistry, physiology, symptom, result, test, and probability.c

The first seven tags in this list identify BN variable types. A numbering scheme isused to uniquely identify each instance of a tag, for example, genotype-1. Eachinstance is a name of a BN variable. Two or more phrases in a letter may be inter-preted as co-referring to the same BN node; for example, genotype-1 and genotype-2may tag phrases referring to the same node in the BN model.

To help an analyst keep track of which variables are relevant to which indi-viduals discussed in a letter, the coder also assigns suffixes identifying a persondescribed by a tagged phrase in terms of his or her position in a “family tree”~called the pedigree in genetics! relative to a proband, where proband is a term ofgenetics referring to the person of primary interest in a genetic study. For instance,genotype-1/mother could be used to tag a phrase describing a proband’s mother’sgenotype. In cases where a phrase refers to a generic family member, the suffixpopulation is added to the suffix; for example, genotype-2/mother/population couldbe used to tag a phrase about the genotype of mothers in an epidemiological study.

For illustration, the following shows a sentence before and after annotationby a coder.

Original sentence: @DOCTOR# asked us to evaluate @PROBAND# to determine if @HIS/HER#delays in development and @SPECIFIC TYPE OF BIRTH DEFECT# weredue to a recognizable genetic condition.

bFor more information on Bayesian networks, see, for example, Refs. 4 and 8.cGenotype refers to the two copies of a gene in an individual’s genome. For definitions of

other tags, see Refs. 6 and 7. However, for the purposes of this article, their meaning should besufficiently clear from name and context.

74 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 5: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

Annotated sentence:~2! @DOCTOR# asked us to evaluate @PROBAND# to determine if

�symptom-2.1/proband @HIS/HER# delays� in development and�symptom-2.2/proband @SPECIFIC TYPE OF BIRTH DEFECT#� were due to�genotype-2/proband a recognizable genetic condition�.

For ease of reference, the sentences of each letter in the corpus have been num-bered in sequence, for example, ~2! in the above example. Words in uppercaseenclosed in square brackets have been substituted for confidential information inthe original text. Each tagged phrase is enclosed in angle brackets. Tags are shownin bold immediately following a left angle bracket. A right angle bracket is placedimmediately following the syntactic head of a tagged phrase; posthead modifiersare understood as belonging to the tagged concept although they are outside of thebrackets. For example in ~2!, the BN variable symptom-2.1 represents the symp-tom described as “delays in development.”

As part of the corpus analysis, a coder draws a BN diagram representing inter-relationships among variables identified in an encoded letter. The coding schemestipulates what types of node may be linked by an arc to another node. Figure 1illustrates some generic BN relationships among node types in the coding scheme.Arcs between nodes in the BN are interpreted as probabilistic dependencies betweenvariables. In addition, links between certain node types represent causal relations.For example, a genotype node may be causally linked to a symptom node, whereasa history node ~representing a demographic or other risk factor! has only a proba-bilistic relationship to genotype. In addition to BN relations, a coder annotates thesemantic relations, member-group and type-supertype, holding between any pairof nodes.

Figure 1. Generic layout of Bayesian networks used to model biomedical content of letters incorpus. ~Nodes refer to proband unless another family member is specified. Note that models ofparticular letters may vary from this considerably. For example, a proband could have multiplesymptoms and test results, one of several genotypes may be suspected as responsible for theproband’s symptoms, and nodes about other members of the family tree may be included.!

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 75

International Journal of Intelligent Systems DOI 10.1002/int

Page 6: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

As part of our corpus analysis, probability values in a letter are annotatedwith the probability tag. The probability tag is assigned to numeric probabilitiessuch as “25%,” as well as to qualitative indicators of probability ~e.g., “likely”!,possibility ~e.g., “may”!, and frequency ~e.g., “often,” “many”!, and phrases con-textually implying belief ~e.g., “associated with”!. After annotating a letter, theanalyst encodes probability statements in a letter in terms of the probability tagsand the BN variables assigned to the text. To illustrate, the coded sentence

~5! Individuals with�genotype-5/population/proband VCF��probability-5 often� have�symptom-5.1/population/proband @SPECIFIC TYPE OF BIRTH DEFECT#� and�symptom-5.2/population/proband learning problems�.

can be analyzed as expressing the probability statement

P~symptom-5.1, symptom-5.2 6 genotype-5!� probability-5

~For readability, the values of the BN variables are not shown, as they can beunderstood from the associated text.! In a fully specified BN model, a condi-tional probability table ~CPT! associated with each node specifies the node’sconditional probability given its predecessors. For example, in a BN where X1. . . Xn

are the nodes linked directly to Y, a CPT would be associated with Y specifyingthe numeric value of P~Y 6X1. . . Xn ! for all values of X1. . . Xn and Y. However,our goal is not to acquire conditional probabilities from the corpus needed toimplement a BN model, but to analyze how probabilistic information is used inhuman–human communication.

In Ref. 11, we report a pilot study analyzing three letters in the corpus usingthe coding scheme. That study suggests that the model is of practical significancein terms of two relevance metrics: the ratio of the number of unique BN nodes tothe number of sentences and the ratio of probability statements to sentences. Forexample, in one letter, every two sentences contribute a unique node to a BN andout of a total of 24 sentences, 19 probability statements are given. In Ref. 7, wepresent the results of an evaluation that shows that the tags can be applied withgood intercoder reliability. Currently, we are applying the coding scheme to thecorpus to inform the design of a knowledge base for use by the NLG system and toprovide an abstraction of the content of patient letters. As shown in the next sec-tion, this level of abstraction enables us to specify argumentation strategies occur-ring in the letters in terms of BN properties instead of in terms of genetics.

4. ARGUMENTATION TECHNIQUES

The argumentation techniques presented in this section are based on qualita-tive analyses of a letter to a client whose child may have sensorineural hearingloss and a letter to a client whose child was tested for Velocardiofacial syndrome~respectively identified as letter HL and letter VCF!. We provide definitions of the

76 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 7: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

argumentation techniquesd in terms of BN, semantic network, argumentation theory,and user model concepts. Although we did not limit our investigation to causalargumentation, all of the instances found involve causation or probability. Thediscussion is organized according to the three functions of the genetics counselingpatient letter.

4.1. Medical Documentation Function

4.1.1. Letter VCF

The iterative process of diagnosis described in Section 2.2 can be seen inletter VCF. During the first cycle of diagnosis, the referring doctor has hypoth-esized that some member of a set of candidate genotypes may account for thesymptoms of the patient. At the end of the first cycle, all of the initial candidatediagnoses have been disconfirmed by testing. A new diagnosis is made in the sec-ond cycle.

Documenting the doctor’s candidate diagnosis of genotype-2 in the first cycleof diagnosis, ~2! implies that symptom-2.1 and symptom-2.2 were considered asevidence for genotype-2.

~2! @DOCTOR# asked us to evaluate @PROBAND# to determine if�symptom-2.1/proband @HIS/HER# delays� in development and�symptom-2.2/proband @SPECIFIC TYPE OF BIRTH DEFECT#� were due to�genotype-2/proband a recognizable genetic condition�.

We analyze ~2! as an instance of observed-effect-to-hypothesized-cause ~E2C!:given a BN variable X and its descendents Y1. . .Yn, the observed values of Y1. . .Yn

are given as evidence for belief that the value of X is x, that is, that X � x couldbe responsible for the observed values of Y1. . .Yn . ~Figure 2 shows part of a BNmodel for letter VCF, where symptom-2.1 and symptom-2.2 are descendents ofgenotype-2.! Similarly, ~4! can be analyzed as another instance of E2C, docu-menting that the referring doctor hypothesized genotype-4 ~VCF! as a possiblecause of the symptoms previously given in ~2!. ~Figure 2 shows the relationshipof genotype-4 to those symptoms.!

~4! In addition to�test-4.1/proband the routine chromosome study� . . . ,�test-4.2/proband a special analysis� . . . was done to test for�genotype-4/proband Velocardiofacial syndrome� ~VCF!.

Next, ~5! documents why the referring doctor hypothesized VCF as a possi-ble cause of the proband’s symptoms.

dWe have invented descriptive names for argumentation techniques in this article for thesake of consistency, although some have been described under various names in the Bayesiannetwork literature.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 77

International Journal of Intelligent Systems DOI 10.1002/int

Page 8: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

~5! Individuals with�genotype-5/population/proband VCF��probability-5 often� have�symptom-5.1/population/proband @SPECIFIC TYPE OF BIRTH DEFECT#� and�symptom-5.2/population/proband learning problems�.

The support given in ~5! can be represented by the conditional probability statement,

P~symptom-5.1, symptom-5.2 6 genotype-5!� probability-5

describing the frequency of occurrence of symptoms like the symptoms of theproband given in ~2!, for a population that is known to have VCF, the hypoth-esized genotype of the proband given in ~4!. We analyze ~5! as an instance ofevidence-from-retrospective-frequency-in-population (RETRO): given BN vari-ables X and Y1. . .Yn where Y1 � y1, . . . ,Yn � yn are believed to be true of someindividual I, the conditional probability P~Y1 � y1, . . . ,Yn � yn 6X � x!� 0 describ-ing frequency in a population is given as evidence that X � x for I.

Note that understanding the RETRO argument in ~5! requires some under-standing of the mathematics of conditional probability statements, because in gen-eral P~Y 6X ! is not equal to P~X 6Y !. Following Eddy’s12 distinction, predictivestatements express the probability of a particular diagnosis given knowledge ofobservables, whereas retrospective statements express the reported frequency ofobservables in a population whose true diagnosis is known. For example, ~5! shouldnot be interpreted as making the predictive statement

P~genotype-5 6 symptom-5.1, symptom-5.2 !� probability-5

Even medical experts confuse retrospective and predictive statements.12 Thus,although ~5! is sufficient for purposes of medical documentation, it may not besufficient for a lay audience. For example, a reader may draw the unlicensed con-clusion from ~5! that individuals with those symptoms often have VCF.

Figure 2. Part of BN model of letter VCF, representing discussion about proband in ~2!, ~4!,~7!, ~8!. Open arrow denotes that genotype-2 subsumes genotype-4. Diamond signifies thatsymptom-8 is a group whose members are symptom-2.1 and symptom-2.2.

78 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 9: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

In the description of the second and final cycle of diagnosis in letter VCF, thenegative ~“normal”! result for all of the tested disorders ~result-7! is reported asevidence against the candidate diagnoses of the first cycle ~genotype-2, genotype-4!.

~7! �test-7.1/proband @PROBAND’S# chromosome study�, including�test-7.2/proband the FISH study� and�test-7.3/proband the telomere study�, showed�result-7/proband a normal result�.

~The relationship between result-7 and these genotypes is shown in Figure 2.!We analyze ~7! as two instances ~one for each of these candidate diagnoses! ofno-predicted-effect-to-no-cause (NE2NC): given a BN variable X and its descen-dent Y where if X � x, then it is expected that Y � yexpected , the observed value ofY � yobserved , yobserved � yexpected , is given as evidence for belief that the value ofX is not x.

Next, we analyze ~8! as an instance of E2C, because the current diagnosis~genotype-8!must have been made in view of the proband’s symptoms ~symptom-8,collectively co-referring with the previously described symptoms!. ~The relation-ship of genotype-8 to the symptoms is shown in Figure 2.!

~8! It is important to remember that, even though�test-8/proband these tests� were�result-8/proband negative�,�symptom-8/proband @PROBAND’S# problems��probability-9 could� still be caused by�genotype-8/proband a genetic alteration�.

4.1.2. Letter HL

As in letter VCF, letter HL describes an iterative diagnostic process. Duringthe first cycle, the referring doctor has hypothesized that a mutation of the GJB2gene is responsible for the proband’s hearing loss. In the second cycle, the originaldiagnosis has been confirmed by a positive test result for a GJB2 mutation; inaddition, the specific type of mutation of GJB2 has been identified.

Documenting the initial candidate diagnosis of a GJB2 mutation, ~3! is ana-lyzed as an instance of E2C where the symptom of hearing loss ~symptom-3.1!and risk factor of childhood onset ~history-3.2! are provided as support for thatdiagnosis ~genotype-3.2!.e ~See Figure 3.!

~3! @PROBAND#’s blood was obtained to determine if�symptom-3.1/proband @HIS/HER# hearing loss� was due to�genotype-3.1/population/proband//

-3.2/proband a change� in a gene called GJB2 . . .

eThe phrases a change and childhood are tagged each with two BN variables, one describ-ing the proband, and one the proband’s counterpart in the general population.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 79

International Journal of Intelligent Systems DOI 10.1002/int

Page 10: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

�probability-3 associated with��history-3.1/population/proband//

-3.2/proband childhood��symptom-3.2/population/proband hearing loss�.

In addition, ~3! is analyzed as an instance of RETRO, because it conveys a retro-spective probability statement in support of genotype-3.2:

P~symptom-3.2, history-3.16genotype-3.1!� probability-3

In contrast, ~5! conveys a predictive probability statement in support of theless specific diagnosis that the cause of the proband’s hearing loss is genetic.

~5! �probability-5 Approximately 60%� of�history-5/population/proband childhood��symptom-5/population/proband hearing loss� is�genotype-5.1/population/proband genetic�; . . .

The predictive probability statement conveyed in ~5! can be written as

P~genotype-5.16symptom-5, history-5!� probability-5

We analyze ~5! as an instance of evidence-from-predictive-frequency-in-population(PREDIC): given BN variables X and Y1. . .Yn where Y1 � y1, . . . ,Yn � yn arebelieved to be true of some individual I, the conditional probability P~X � x 6Y1 �y1, . . . ,Yn � yn !� 0 describing frequency in a population is given as evidence thatX � x for I. In other words, ~5! is given to support the candidate diagnosis given in~6! that the proband’s hearing loss is due to a genetic problem ~genotype-6.2!.

~6! As with�probability-6 most��history-6.1/population/proband//

6.2/proband children� with�genotype-6.1/population/proband//

6.2/proband a genetic��symptom-6.1/population/proband//

6.2/proband hearing loss�, @PROBAND# has

Figure 3. Part of BN model of letter HL, representing discussion about proband in ~3! and ~6!.Open arrow denotes that genotype-6.2 subsumes genotype-3.2.

80 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 11: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

�symptom-6.3/proband//6.4/population/proband no unusual features� . . .

We analyze ~6! as an instance of ~a more general version of ! RETRO, providingthe retrospective probability statement,

P~symptom-6.46history-6.1, genotype-6.1, symptom-6.1!� probability-6

as further support for a genetic cause of the proband’s hearing loss ~genotype-6.2!.Note that arguments in ~5! and ~6! described so far for a nonspecific genetic

diagnosis can be interpreted as support for the diagnosis of a GJB2 mutation asfollows. We analyze ~6! as an instance of super-cause-to-hypothesized-cause(SUPC2C): given BN variables SUPER and X, where SUPER is a property sub-suming X such that X � x implies that Super � vsuper , and given BN variable Y thatis a descendent of both SUPER and X, belief that the value of SUPER is vsuper , thatis, that SUPER � vsuper is responsible for the observed value of Y is given as evi-dence for belief that the value of X is x, that is, that X � x is responsible for theobserved value of Y. ~See Figure 3 showing both the BN and semantic relation-ships involved in this instance of SUPC2C.!

Finally, we analyze ~7!

~7! Although�genotype-7.1/population/proband changes� in many different genes�probability-7.1 can� cause�symptom-7.1/population/proband deafness�,�probability-7.2 about 50%� of�history-7/population/proband children� with�symptom-7.2/population/proband severe to profound��genotype-7.2/population/proband recessively inherited��symptom-7.3/population/proband nonsyndromic��genotype-7.2c/population/proband genetic��symptom-7.2c hearing loss� have�genotype-7.3/population/proband a change� in one particular gene called GJB2.

as an instance of PREDIC, where the statement

P~genotype-7.36genotype-7.2, symptom-7.2, symptom-7.3, history-7!� probability-7.2

is given as support for belief in a mutation of the proband’s GJB2 gene~genotype-3.2!.f

In the account of the second cycle of diagnosis in letter HL, an instance ofE2C is used in ~9!, in which the positive test result ~result-9! is given as supportfor the original diagnosis of a mutation of GJB2.

fThe pairs of tags ~genotype-7.2, genotype-7.2c! and ~symptom-7.2, symptom-7.2c! areused to encode discontinuous phrases; for example, the full phrase tagged by the first pair isrecessively inherited genetic.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 81

International Journal of Intelligent Systems DOI 10.1002/int

Page 12: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

~9! �genotype-9.1/proband The change��result-9/proband found� in @PROBAND#’s GJB2 gene was a�probability-9 rarer��genotype-9.2/proband change� and is called @CHANGE# .

4.2. Educational Function

4.2.1. Letter VCF

An important educational function of the genetic counseling patient letter isto provide background information for the lay reader about human genetic inheri-tance and its relationship to health. Whereas the arguments used for medical doc-umentation record the reasoning and beliefs of the health providers during thediagnostic process, argumentation used in the letters for educational purposes hasa different goal, namely, to make the conclusions of the medical care providersplausible to the lay audience.

In ~10!, ~11!, ~12!, and ~14!, the letter provides support for the current diag-nosis, that is, that the proband has an unspecified genetic disorder ~genotype-14.3,g co-referring with genotype-8! by providing a causal explanation for itsorigin. ~See Figure 4, which shows part of a BN model of the proband’s familyand another BN model of their counterparts in the general population. Note thatthe node labeled genotype-14.3 is the same node as, that is, co-refers with, thenode labeled genotype-8 in Figure 2.!

gIn ~14!, the phrase an altered form is tagged with three BN variables, one describing themother, one the father, and one the proband.

Figure 4. Part of BN models of letter VCF, representing discussion about proband’s family in~14! on right-hand side and discussion about their counterparts in the population in ~10! and ~11!on the left-hand side. Open arrows denote that genotype-10.1 and genotype-10.3 subsumegenotype-11.2 and genotype-11.3, respectively.

82 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 13: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

~10! We�probability-10.1 all� carry�genotype-10.1/population/mother//

10.2/population/father//10.3/population/proband some altered genes� that

�probability-10.2 can� cause�symptom-10.1/population/proband//

10.2/population/mother//10.3/population/father health problems� in ourselves or our children.

~11! In�probability-11 some� instances,�history-11/population/proband children� with�symptom-11.1/population/proband birth defects� and�symptom-11.2/population/proband learning problems� inherit�genotype-11.1/population/proband two copies� of an altered gene;�genotype-11.2/population/mother//

11.3/population/father one� from each parent.

~12! Because they have�genotype-12/population/proband no normal copy� of that gene, they�probability-12 are� affected by�symptom-12/population/proband the disorder�.

~14! @PROBAND#�probability-14.1 could� have inherited�genotype-14.1/mother//

14.2/father//14.3/proband an altered form� of a gene from both you and @HIS/HER# fatherthat

�probability-14.2 caused��symptom-14.1/proband @HIS/HER# birth defects� and�symptom-14.2/proband learning problems�.

We analyze ~11!, ~12!, and ~14! as an instance of hypothesized-cause-to-hypothesized-effect-with-warrant (HC2HEW ): given BN variables X1. . . Xn andtheir descendant Y, the hypothesis that the values of X1. . . Xn are x1-hypothesis . . .xn-hypothesis , respectively, and a description of the underlying causal mechanism~the warranth! relating X1. . . Xn to Y are given to support the belief that Y has avalue yhypothesis . In this example, the general causal mechanism is autosomal reces-sive inheritance, described in ~11! and ~12!, and X1. . . Xn are the hypothesizedgenotypes of the mother and father ~genotype-14.1 and genotype-14.2, respec-tively!, and Y is the genotype of the proband ~genotype-14.3!. ~The warrant cor-responds to the part of the BN model shown on the left-hand side of Figure 4.!

Preceding this argument, ~10! can be viewed as an attempt to increase theplausibility of the hypothesized genotypes of the parents in ~14!. ~10! conveys theprobability statements

hWe have adopted the term warrant from Toulmin’s13 theory of argumentation as summa-rized in Ref. 14.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 83

International Journal of Intelligent Systems DOI 10.1002/int

Page 14: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

P~genotype-10.1!� probability-10.1P~genotype-10.2!� probability-10.1

We analyze ~10! as two instances of evidence-from-superset-predictive-frequency-in-population (SPREDIC): given BN variables SUPER and X, where SUPER is aproperty subsuming X such that X � xhypothesis implies that SUPER � vsuper , thestatement that P~SUPER � vsuper ! � 0 for a population is given to support thestatement that P~X � xhypothesis ! � 0 is true for a particular individual.

4.2.2. Letter HL

As in letter VCF, background information in the form of a causal explanationis provided in letter HL. In particular, ~9! through ~11! provide additionali supportfor the current diagnosis, a certain type of mutation of GJB2 ~genotype-9.1!, byproviding a causal explanation of how this mutation could be responsible for theproband’s hearing loss. ~See Figure 5.!

~9! �genotype-9.1/proband The change��result-9/proband found� in @PROBAND#’s GJB2 gene was a�probability-9 rarer��genotype-9.2/proband change� and is called @CHANGE# .

~10! �genotype-10/population/proband This change� is�probability-10 known� to result in a�biochemistry-10/population/proband shortened or truncated protein�.

~11! �biochemistry-11/population/proband The protein� . . . is important in maintaining�physiology-11/population/proband the chemical equilibrium� of the inner ear.

iRecall that ~9! was analyzed in Section 4.1 as making an E2C argument for genotype-9.1from result-9.

Figure 5. Part of BN models of letter HL, representing discussion about proband in ~6! and ~9!on right-hand side and discussion about proband’s counterpart in population in ~10! and ~11! onleft-hand side.

84 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 15: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

We analyze this as an instance of observed-effect-to-hypothesized-cause-with-warrant (E2CW ): given a BN variable X and its descendent Y, the hypothesis thatthe value of X is xhypothesis , the observed value of Y, and a description of the under-lying causal mechanism ~the warrant! relating X � xhypothesis to the observed valueis given to support the belief that X has the value xhypothesis . In this case, X is theproband’s genotype ~genotype-9.1!, Y is the proband’s symptom of hearing loss~symptom-6.2!, the general causal mechanism is a causal chain from the sametype of mutation of GJB2 ~genotype-10! to shortened protein ~biochemistry-10!,to abnormal inner ear equilibrium ~physiology-11!, to hearing loss. ~Although it isnot stated, it is implied that the abnormal equilibrium ~physiology-11! may resultin hearing loss.!

Next we analyze ~20! and ~23! as conveying the warrant of another argumentfor the diagnosis. ~See the left-hand side of Figure 6.!

~20! �genotype-20.1/population/mother One allele� for the GJB2 gene is�genotype-20.2/population/proband inherited� from our mother whereas�genotype-20.3/population/father the other allele� is�genotype-20.2c/population/proband inherited� from our father.

~23! Children who inherit�genotype-23.1/population/proband two altered alleles�~�genotype-23.2/population/mother//

23.3/population/father one� from each parent!�probability-23 will� have�symptom-23/population/proband hearing loss�.

This argument is similar to the HC2HEW argument in ~11!, ~12!, and ~14! of letterVCF. However, although it is implied that the proband’s parents are carriers of themutation of GJB2 believed to be responsible for the proband’s hearing loss, it isnot stated anywhere in letter HL. Thus we call this an implied HC2HEW argu-ment, IHC2HEW.

Figure 6. Part of BN models of letter HL, representing discussion about proband and his futureoffspring on right-hand side and discussion about parent–offspring relationships in populationon left-hand side.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 85

International Journal of Intelligent Systems DOI 10.1002/int

Page 16: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

An educational function in letter HL ~not found in letter VCF! is explanationof inheritance risks. In ~33!, it is predicted that each of the proband’s future chil-dren will inherit one altered allele from the proband, based on the previously sup-ported hypothesis in ~9! that both of the proband’s GJB2 alleles are mutated.

~33! Since @HE/SHE# has�genotype-33.1/proband two altered GBJ2 alleles�, each of @HIS/HER# children�probability-33 will� inherit�genotype-33.2/offspring one altered allele� from him.

The warrant for this prediction is the causal mechanism described in both ~20! and~23!, that is, that one allele is inherited from each parent. ~See Figure 6. Note thatto apply this warrant, the reader must realize that the proband corresponds to oneof the parents in the warrant and that the proband’s offspring corresponds to theproband in the warrant.!We analyze ~33! as an instance of hypothesized-cause-to-predicted-effect-with-warrant (HC2PEW ): given BN variables X and Y, and adescription of the underlying causal mechanism ~the warrant! relating X to Y,including the conditional probability p � P~Y � ypredicted 6X � x! for a population,the claim that X � x for an individual I is given to support the belief that theprobability that Y � ypredicted will occur is p for I.

Finally, ~35! illustrates an instance of PREDIC, in which the low chance ofbeing a carrier ~i.e., someone having only one mutated copy of the gene! of aGJB2 mutation in the general Caucasian population is used to support the beliefthat it is unlikely that a future mate of the proband would be a carrier ~assumingthat the proband would marry only within that ethnic population!.

~35! Since the chance of being�genotype-35.1/population/mate a carrier� in�history-35/population/mate the general Caucasian population� is�probability-35.1 probably around 3 in 100 ~3%!�, it is�probability-35.2 unlikely� that @PROBAND# would have children with�genotype-35.2 a carrier�.

The predictive probability statement expressed in ~35! can be represented as

P~genotype-35.16history-35!� probability-35.1

4.3. Supportive Counseling Function

4.3.1. Letter VCF

In addition to the types of causal argument described in Sections 4.1 and 4.2,the letters we have examined provide examples of dialectical argumentation.j The

jWe are using dialectical in a broad sense to reflect the writer’s intention to address actualor potential objections from the audience. The term has been widely used in the history of argu-mentation theory.14

86 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 17: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

writer may use a dialectical argument to address misconceptions of a lay audience,or a reader’s emotional barriers to acceptance of certain information such as theparents’ role in their child’s inheritance of a genetic disorder. ~We cannot deter-mine from the corpus whether the writer is addressing typical issues or actual issuesthat were discussed during the meeting with the client.!

In letter VCF, ~9! addresses a possible objection to the claim in ~8! that agenetic mutation could be responsible for the proband’s symptoms.

~8! It is important to remember that, even though�test-8/proband these tests� were�result-8/proband negative�,�symptom-8/proband @PROBAND’S# problems��probability-8 could� still be caused by�genotype-8/proband a genetic alteration�.

~9! There are currently�test-9/population/proband no specific tests� for many of�genotype-9/population/proband the genetic changes� that�probability-9 can� cause�symptom-9.1/population/proband birth defects� and�symptom-9.2/population/proband learning problems�.

We analyze ~8! and ~9! as an instance of dialectical-attack-causal-backing(DACB): an attack is made on the backing for a causal warrant of a counter-argument.k A possible counterargument ~not explicitly stated in the letter! to thecurrent diagnosis ~genotype-8! given in ~8! is as follows:

• If ~a! the diagnosis of genotype-8 is correct,• then ~b! at least one of the test results would have been positive,• because ~c! the tests that were performed could have detected any genotypes that could

cause symptom-8.• But, ~d! the test results were not positive.• Thus, from if a, then b and d, a is not plausible.

~In this counterargument, if a, then b is the warrant and c is the backing for thewarrant.! Using DACB, the writer attacks the above counterargument with ~9!,which is inconsistent with c.

Next, ~13! addresses a possible objection to the claim in ~14! that the proband’sparents are the origin of the proband’s genetic disorder.

~13! The parents are�probability-13 not��symptom-13.1/population/mother//

13.2/population/father affected� because they have�genotype-13.1/population/mother//

13.2/population/father a normal copy� in addition to the altered copy of thegene.

kIn Toulmin’s13 theory of argumentation as summarized in Ref. 14, backing is the term fordata supporting a warrant.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 87

International Journal of Intelligent Systems DOI 10.1002/int

Page 18: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

~14! @PROBAND#�probability-14.1 could� have inherited�genotype-14.1/mother//

14.2/father//14.3/proband an altered form� of a gene from both you and @HIS/HER# fatherthat

�probability-14.2 caused��symptom-14.1/proband @HIS/HER# birth defects� and�symptom-14.2/proband learning problems�.

We analyze ~13!–~14! as an instance of dialectical-attack-causal-warrant(DACW): an attack is made on the causal warrant of a counterargument. Apossible counterargument to the claim in ~14! that the parents’ genotypes arecausally linked to the proband’s genetic disorder is as follows:

• If ~a! the child having genotype-14.3 is a result of the parents having genotype-14.1 andgenotype-14.2,

• then ~b! the parents would have symptoms similar to the child’s symptoms ~symptom-14.1 and symptom-14.2!.

• But @it is assumed that# ~c! the parents do not have symptoms similar to the child’s.• Thus, from if a, then b and c, a is not plausible.

In this counterargument, if a, then b is the warrant. Using DACW, the writer attacksthe counterargument with ~13!, which provides a generalized causal explanationfor why the warrant is incorrect.

In service of the counseling function, the writers also use affective argumen-tation techniques to mitigate negative emotional reactions to information pre-sented in the letter. For example, we analyze ~10! in letter VCF as an instance ofuniversal-causal-trigger (UCT): a claim that X caused Y, where Y may be viewednegatively, and which might result in the attribution of blame to certain personsassociated with X, such as the claim in ~14! that the proband’s parents are carriersof the mutation responsible for their child’s problems, is mitigated by another claimthat situations like X are universal.

~10!We all carry some altered genes that can cause health problems in ourselves or our children.

4.3.2. Letter HL

Affective argumentation also is used in letter HL, which contains an instanceof UCT in ~13!.

~13! It is believed that we all carry some altered genes that may cause health problems in our-selves or our children.

We analyze ~22! and ~31! as instances of nonintentional-cause (NIC): a claimthat X caused Y, where Y may be viewed negatively, and which might result in theattribution of blame to certain persons associated with X, for example, that theproband’s parents are to be blamed for the proband’s inheriting two recessive alleles

88 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 19: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

for hearing loss, is mitigated by another claim that in general such persons do notintend to cause Y.

~22! In fact, most individuals do not know they carry an altered gene until they have a deafchild.

~31! It is important to remember that we have no control over which genes our children inherit.

We analyze ~29! as an instance of not-uncommon-causal-chain (NUCC): aclaim that may be valued negatively, such as the claim that the proband’s hearingloss is due to an autosomal recessively inherited mutation, is mitigated by anotherclaim that the same situation is not uncommon.

~29! Many genetic conditions, such as cystic fibrosis and sickle cell anemia, are inherited thisway.

4.4. Discussion and Summary

These two relatively short letters ~letter VCF and HL are 24 and 40 sentencesin length, respectively! illustrate numerous ~in this case 15! types of argumenta-tion and numerous instances ~9 in letter VCF, 14 in letter HL! of these types.

The strategies used for medical documentation purposes, E2C, NE2NC,RETRO, PREDIC, and SUPC2C, may not be effective for a lay audience. Forexample, a lay audience unfamiliar with diagnostic reasoning may not understandthe probabilistic nature of the conclusions, may confuse retrospective and predic-tive probabilities, or may not be familiar with the presumed causal model warrant-ing the conclusions. Many of the arguments in the letters used for educationalpurposes include a description of the causal model warranting a claim, that is, theyused the strategies HC2HEW, E2CW, IHC2HEW, and HC2PEW. On the otherhand, a medical expert would not find some of the arguments given for educa-tional purposes appropriate for medical purposes, for example, the SPREDIC argu-ment in ~10! of letter VCF.

A variety of strategies seem to have been employed for the sake of thosereaders who, because of their personal relationship to the proband, may be nega-tively affected by information to be conveyed by the counselor. For example, theinstance of IHC2HEW in letter HL might have been used instead of HC2HEW toavoid negative effects on the emotional states of the parents that could result frombeing directly informed that their genotypes are believed to be responsible fortheir child’s health problems. Also, the dialectical strategies, DACW and DACB,are used in letter VCF to address potential counterarguments to claims that may benegatively valued by the parents, that is, that their child may have a genetic disor-der and that their child may have inherited it from them. Although the definitionsof these dialectical strategies do not limit them to use for supportive counseling,it will be interesting to see whether they are used in other letters in the corpusfor medical documentation or educational purposes. Finally, several affective

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 89

International Journal of Intelligent Systems DOI 10.1002/int

Page 20: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

techniques were identified, that is, UCT, NIC, and NUCC, that would be appro-priate only for supportive counseling purposes.

5. SYSTEM ARCHITECTURE

The architecture of the proposed NLG system for drafting patient letters isshown in Figure 7. A health care professional would provide the system with infor-mation about a client’s case, for example, through a forms-style graphical userinterface as shown on the far left of the figure. Generation also would draw ongeneral, nonlinguistically represented information about clinical genetics storedin a knowledge base, as shown at the bottom of the figure. Many NLG systemssynthesize texts in a target language, such as English, from a nonlinguistic repre-sentation of information in a knowledge base.15 Currently, we are investigatinguse of a hybrid Bayesian network/qualitative probabilistic network-based approachfor implementation of the knowledge base. A qualitative probabilistic network~QPN! is an abstraction of a BN in which conditional probability tables are replacedby qualitative constraints.16 Use of a QPN would avoid the requirement to acquirethe many probability values needed to implement a BN. ~For example, only prob-ability values from Mendelian inheritance theory or from selected biomedical stud-ies would be supplied to our system.!

The generation of the first draft of a patient letter would begin with a processof developing a text plan using genre-specific discourse moves ~see the compo-nent in Figure 7 labeled move generator!. The plan would provide a nonlinguisticrepresentation of the principal claims of the letter and the high-level organizationof the letter as required for this genre. For example, an initial step of the move todocument the diagnosis is to describe the candidate diagnosis before genetic test-ing has been performed. The next stage of text planning, labeled argument gener-ator, would add support from the knowledge base for the claims using argumenttechniques such as those identified in this article. For example, an E2C argumentfor the initial candidate diagnosis based upon the patient’s symptoms could beadded to the text plan.

The final process, labeled sentence generator in Figure 7, would transformthe nonlinguistic representation of the content into text. A document editor ~shown

Figure 7. Planned system to create first draft of genetic counseling patient letters.

90 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 21: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

at the right of the figure! would present the draft to a genetic counselor. The coun-selor would be able to review and edit a letter before sending it to a client. Apipelined generation architecture distinguishing text planning from sentence gen-eration has been employed in other NLG systems.15 The contribution of this arti-cle to NLG research is the identification of argument techniques that can beimplemented by an argument generation component, that is, independently of thegenre-specific move generator. This conceptual separation should make it easierfor others to apply argument techniques that we identify in the corpus to otherdomains.

6. RELATED WORK

Several health-care-related NLG systems have been developed that generatearguments. OPADE generates user-tailored explanations about drug prescriptionsto ensure that a prescription is performed correctly by readers with different rolesin health care: doctor, nurse, or patient.17 STOP generates user-tailored lettersintended to convince the reader to stop smoking.18 However, the argument strat-egies used in these systems are not described by application-independent models.A nutrition counseling system19 that participates in dialogue with users to con-vince them to adopt a healthy diet generates arguments based on schemas of theNew Rhetoric ~NR! theory of argumentation.20 The NR appears to be most rele-vant to the strategies that we classified as affective in this article; although the NRdefines a schema for causal reasoning, it does not distinguish the types we foundin our corpus and defined in terms of BN notions.l

There has been other NLG research on argument generation from domainknowledge represented in Bayesian networks. BANTER21 provides a natural lan-guage dialogue interface to a Bayesian reasoning system used for medical train-ing. NAG22 uses argument strategies such as Premise to Goal, Reasoning by Cases,and Reductio ad absurdum to select propositions from a BN to include in an argu-ment. However, the BN models and argument strategies implemented in these twosystems are not based on empirical studies of corpora. Bayesian networks alsohave been used for user modeling in NLG.22,23 Carofiglio and de Rosis23 proposeusing a BN to represent both logical and emotional reasoning in a unified frame-work for naturalistic argument generation. Currently, the BN models derived fromour corpus analysis are not intended to represent the presumed beliefs or possibleaffective states of the audience.

As for research on medical reasoning systems related to clinical genetics,GENINFER uses BNs to calculate risk of inheriting a genotype.24 However, onlythe genotypes of the pedigree are represented in the BN and no natural languagegeneration is performed. The RAGs project developed a decision support tool for

lUnlike the arguments employed in these systems, the arguments to be generated by oursystem are not intended to convince a reader to change his behavior. As mentioned in Section 2,one of the guidelines for genetic counselors is to provide information in a nondirective manner.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 91

International Journal of Intelligent Systems DOI 10.1002/int

Page 22: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

doctors ~not based upon a BN model! that assesses the patient’s risk and explainsits reasoning by listing reasons for and against the risk assessment.25

Recently, two coding schemes for performing empirical studies of argumen-tation have been developed. Teufel and Moens developed a scheme for annotatingscientific research articles for information retrieval purposes.26 However, it doesnot characterize different argumentation techniques. Grasso27 presents a dialoguecoding scheme that is based on the NR theory of argumentation.

7. CONCLUSIONS

In this article, we presented the results of an in-depth qualitative analysis ofargumentation in two genetic counseling patient letters. In addition to techniquesappropriate for an expert audience, the analysis revealed a number of techniquesintended for a lay audience, including readers who may be negatively affected byinformation to be conveyed in the letter. We provided definitions of argumentationtechniques used in the letters in terms of general BN, semantic network, argumen-tation, and user model concepts so that the techniques could be reused by NLGsystems in other domains.

In addition, this article illustrates a new approach to the empirical study ofargumentation in corpora from scientific domains. We developed a BN-based cod-ing scheme for representing the causal and probabilistic biomedical content of textin the corpus. The coding scheme enables abstract BN models of a text to be cre-ated. It also enables the representation of probability statements given in the text.We know of no previous empirical work on argumentation in which BN models oftext in a corpus were created for the purpose of facilitating analysis and definitionof argument techniques in terms of general causal and probabilistic relationships.

Acknowledgments

This material is based on work supported by the National Science Foundation underCAREER Award No. 0132821. An unpublished earlier version of part of this material was pre-sented by the author at Panel on Argument Cultures, International Pragmatics Association 8thInternational Conference, July 13–18, 2003, Toronto, Canada.

References

1. Reiter E, Sripada S, Robertson R. Acquiring correct knowledge for natural language gen-eration. J Artif Intell Res 2003;18:491–516.

2. Baker DL, Eash T, Schuette JL, Uhlmann WR. Guidelines for writing letters to patients.J Genet Counsel 2002;11:399– 418.

3. Baker DL, Schuette JL, Uhlmann WR, editors. A guide to genetic counseling. New York:Wiley-Liss; 1998.

4. Neapolitan RE. Probabilistic reasoning in expert systems. New York: John Wiley & Sons,Inc.; 1990.

5. Wilson, GN. Clinical genetics: A short course. New York: Wiley-Liss, Inc.; 2000.6. Green N. Coding manual. Available at: http://www.uncg.edu/;nlgreen/GenIE-web-

page.html.

92 GREEN

International Journal of Intelligent Systems DOI 10.1002/int

Page 23: A study of argumentation in a causal probabilistic humanistic domain: Genetic counseling

7. Green N. A Bayesian network coding scheme for annotating biomedical information pre-sented to genetic counseling clients. J Biomed Informat 2005;38:130–144.

8. Korb K, Nicholson AE. Bayesian artificial intelligence. Boca Raton, FL: Chapman & Hall/CRC; 2004.

9. Shwe MA, Middleton B, Heckerman DE, Henrion M, Horvitz EJ, Lehmann HP, CooperGF. Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledgebase. Method Inform Med 1991;30:241–255.

10. Liebovici L, Fishman M, Schonheyder HC, Riekehr C, Kristensen B, Shraga I, Andreas-sen S. A causal probabilistic network for optimal treatment of bacterial infections. IEEETrans Knowl Data Eng 2000;12:517–528.

11. Green N. Towards an empirical model of argumentation in medical genetics. In: WorkshopProc CMNA 2003: IJCAI 2003 Workshop on Computational Models of Natural Argu-ment; 2003. pp 39– 44.

12. Eddy D. Probabilistic reasoning in clinical medicine: Problems and opportunities. In:Kahneman D, Slovic P, Tversky A, editors. Judgment under uncertainty: Heuristics andbiases. Cambridge, England: Cambridge University Press; 1982. pp 249–267.

13. Toulmin SE. The uses of argument, 9th ed. Cambridge, UK: Cambridge University Press;1998.

14. van Eemeren FH, Grootendorst R, Henkemans FS. Fundamentals of argumentation theory:A handbook of historical backgrounds and contemporary developments. Mahway, NJ: Law-rence Erlbaum Associates; 1996.

15. Reiter E, Dale R. Building natural language generation systems. Cambridge, UK: Cam-bridge University Press; 2000.

16. Druzdzel MJ, Henrion M. Efficient reasoning in qualitative probabilistic networks. Proc11th Nat Conf on AI ~AAAI-93!; 1993. pp 548–553.

17. de Rosis F, Grasso F, Berry DC. Refining instructional text generation after evaluation.Artif Intell Med 1999;17:1–36.

18. Reiter E, Robertson R. The architecture of the STOP system. In: Proc Workshop on Ref-erence Architectures for Natural Language Generation; 1999.

19. Grasso F, Cawsey A, Jones R. Dialectical argumentation to solve conflicts in advice giv-ing: A case study in the promotion of healthy nutrition. Int J Hum Comput Stud 2000;53:1077–1115.

20. Perelman C, Olbrechts-Tyteca L. The new rhetoric: A treatise on argumentation. NotreDame, Indiana: University of Notre Dame Press; 1969.

21. McRoy S, Liu-Perez A, Ali S. Interactive computerized health care education. J Am MedInformat Assoc 1998;5:76–104.

22. Zukerman I, McConachy R, Korb K, Pickett DA. Bayesian reasoning in an abductive mech-anism for argument generation and analysis. In: Proc 15th Nat Conf on Artificial Intelli-gence; 1998. pp 833–838.

23. Carofiglio V, de Rosis F. Combining logical with emotional reasoning in natural argumen-tation. In: Conati C, Hudlicka E, Lisetti C, editors. Ninth Int Conf on User Modeling.Workshop Proc Assessing and Adapting to User Attitudes and Effect; 2003. pp 9–15.

24. Szolovits P, Pauker SP. Pedigree analysis for genetic counseling. In: Lun KC et al., editors.MEDINFO 92: Proc Seventh Conf on Medical Informatics; 1992. pp 679– 683.

25. Emery J, Walton R, Coulson A, Glasspool D, Ziebland S, Fox J. Computer support forrecording and interpreting family histories of breast and ovarian cancer in primary care~RAGS!: Qualitative evaluation with simulated patients. Brit Med J 1999;319:32–36.

26. Teufel S, Moens M. Summarizing scientific articles: Experiments with relevance and rhe-torical status. Comput Ling 2002;28:409– 445.

27. Grasso F. Rhetorical coding of health promotion dialogues. In: Dojat M, Keravnou E,Barahona P, editors. Artificial Intelligence in Medicine, AIME 2003. Berlin: Springer-Verlag; 2003. pp 179–188.

ARGUMENTATION IN A CAUSAL PROBABILISTIC HUMANISTIC DOMAIN 93

International Journal of Intelligent Systems DOI 10.1002/int