perceived clutter in advanced cockpit displays ... · consider clutter to be an actual quality of...

12
Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008 1 RESEARCH ARTICLE K ABER DB, A LEXANDER AL, S TELZER EM, K IM S-H, K AUFMANN K, H SIANG S. Perceived clutter in advanced cockpit displays: measure- ment and modeling with experienced pilots. Aviat Space Environ Med 2008; 79:1–12. Introduction: Synthetic and enhanced vision systems (SVS and EVS) are being introduced into the cockpit to promote safety under workload conditions. Integration of existing iconic imagery with SVS and EVS dis- plays may lead to perceptions of clutter. This research evaluated head-up display (HUD) features, including SVS, EVS, traffic collision avoidance system symbology, flight pathway (TUNNEL) guidance, and different pri- mary flight display symbol sets, on pilot perceptions of clutter. A percep- tual decomposition of the construct of clutter was also conducted. Method: During a simulated landing, 4 expert pilots viewed images of 16 HUD configurations. Pilots rated clutter for each image and the utility of pairs of terms for describing clutter. Results: Results revealed all HUD features and two-way interactions to be significant in perceived clutter. Ratings increased with additional features. The presence of EVS, TUN- NEL, and an expanded symbol set contributed the most. Regression models were developed to predict the likelihood of clutter ratings based on pilot perceptions of display characteristics. Pairs of terms found to have the greatest use for describing clutter included “redundant/orthogo- nal,” “monochromatic/colorful,” “salient/not salient, ” “safe/unsafe,” and “dense/sparse” (in that order). A factor analysis revealed underlying dis- play qualities explaining ;78% of variability in perceived clutter, in- cluding global density, feature similarity, feature clarity, and the dynamic nature of displays. These qualities corresponded with the display de- scriptor terms plus the terms “static/dynamic. ” Discussion: The study provided information on the relationship of display features and pilot perceptions of clutter. We identified terminology pilots use to describe clutter and latent display variables that drive perceived clutter. Keywords: display clutter, psychophysical modeling, head-up displays, and intelligent flight deck technologies. A DVANCED INFORMATION display technologies are being designed and developed for the auto- mated aircraft flight deck, including synthetic and en- hanced vision systems, with the objective of supporting pilot situation awareness and managing cognitive work- load under critical flight conditions. Synthetic vision system (SVS) and enhanced vision system (EVS) dis- plays present pilots with out-of-cockpit view informa- tion, including terrain models and other environment features, by using a global positioning system based ter- rain model and forward looking infrared (FLIR) camera, respectively. These systems support displays that allow for flight in instrument meteorological conditions (IMC) with safety and operational flexibility equivalent to clear day or visual meteorological condition flight (1,17,21). The overall goal of these technologies is to reduce the occurrence of low visibility induced accidents, including controlled flight into terrain and loss of air- craft control. While advantages of the use of SVS and EVS display technologies for supporting pilot situation awareness have been demonstrated (17), the design features of these displays may produce visual clutter when inte- grated with existing cockpit display symbology (i.e., un- clear or indiscernible features). Thus, the clutter pro- duced by these displays may inhibit the processes and constructs they have been designed to support. While these design features might be identified as contributors to visual clutter, the aviation and human factors research communities have yet to establish a commonly accepted definition of clutter (13). Several definitions and contrib- utory factors have been posed in the literature (2); how- ever, none are comprehensive or sufficiently detailed to provide design guidance. In general, established definitions of clutter can be grouped as those that identify clutter with respect to the content or format of the display, degraded performance of the operator using the display, or irrelevance of infor- mation to a task at hand. The vast majority of definitions center on the representation of display content, defining clutter in relation to the size of the display region (5), the size of the target of interest (5), the number of objects (9), the density of objects (16,19,22), and the complexity (6) or similarity of objects (24) within the display. Focus has also been placed on the location of objects relative to foveal vision (9), the salience of objects (14,23,26) or their contrast to the background (2), and their relative motion (3). While approaches that define clutter in terms of display content provide an objective and quantitative From the Edward P. Fitts Department of Industrial & Systems Engi- neering, North Carolina State University, Raleigh, NC, and Aptima, Inc., Woburn, MA. This manuscript was received for review in March 2008. It was ac- cepted for publication in September 2008. Address reprint requests to: David B. Kaber, Edward P. Fitts Depart- ment of Industrial & Systems Engineering, North Carolina State University, 400 Daniels Hall, 111 Lampe Dr., Raleigh, NC 27695-7906; [email protected]. Reprint & Copyright © by the Aerospace Medical Association, Alex- andria, VA. DOI: 10.3357/ASEM.2319.2008 Perceived Clutter in Advanced Cockpit Displays: Measurement and Modeling with Experienced Pilots David B. Kaber, Amy L. Alexander, Emily M. Stelzer, Sang-Hwan Kim, Karl Kaufmann, and Simon Hsiang

Upload: others

Post on 09-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008 1

RESEARCH ARTICLE

K ABER DB, A LEXANDER AL, S TELZER EM, K IM S-H, K AUFMANN K, H SIANG S. Perceived clutter in advanced cockpit displays: measure-ment and modeling with experienced pilots. Aviat Space Environ Med 2008; 79: 1 – 12 .

Introduction: Synthetic and enhanced vision systems (SVS and EVS) are being introduced into the cockpit to promote safety under workload conditions. Integration of existing iconic imagery with SVS and EVS dis-plays may lead to perceptions of clutter. This research evaluated head-up display (HUD) features, including SVS, EVS, traffi c collision avoidance system symbology, fl ight pathway (TUNNEL) guidance, and different pri-mary fl ight display symbol sets, on pilot perceptions of clutter. A percep-tual decomposition of the construct of clutter was also conducted. Method: During a simulated landing, 4 expert pilots viewed images of 16 HUD confi gurations. Pilots rated clutter for each image and the utility of pairs of terms for describing clutter. Results: Results revealed all HUD features and two-way interactions to be signifi cant in perceived clutter. Ratings increased with additional features. The presence of EVS, TUN-NEL, and an expanded symbol set contributed the most. Regression models were developed to predict the likelihood of clutter ratings based on pilot perceptions of display characteristics. Pairs of terms found to have the greatest use for describing clutter included “ redundant/orthogo-nal, ” “ monochromatic/colorful, ” “ salient/not salient, ” “ safe/unsafe, ” and “ dense/sparse ” (in that order). A factor analysis revealed underlying dis-play qualities explaining ; 78% of variability in perceived clutter, in-cluding global density, feature similarity, feature clarity, and the dynamic nature of displays. These qualities corresponded with the display de-scriptor terms plus the terms “ static/dynamic. ” Discussion: The study provided information on the relationship of display features and pilot perceptions of clutter. We identifi ed terminology pilots use to describe clutter and latent display variables that drive perceived clutter. Keywords: display clutter , psychophysical modeling , head-up displays , and intelligent fl ight deck technologies .

ADVANCED INFORMATION display technologies are being designed and developed for the auto-

mated aircraft fl ight deck, including synthetic and en-hanced vision systems, with the objective of supporting pilot situation awareness and managing cognitive work-load under critical fl ight conditions. Synthetic vision system (SVS) and enhanced vision system (EVS) dis-plays present pilots with out-of-cockpit view informa-tion, including terrain models and other environment features, by using a global positioning system based ter-rain model and forward looking infrared (FLIR) camera, respectively. These systems support displays that allow for fl ight in instrument meteorological conditions (IMC) with safety and operational fl exibility equivalent to clear day or visual meteorological condition fl ight ( 1 , 17 , 21 ). The overall goal of these technologies is to reduce the occurrence of low visibility induced accidents,

including controlled fl ight into terrain and loss of air-craft control.

While advantages of the use of SVS and EVS display technologies for supporting pilot situation awareness have been demonstrated ( 17 ), the design features of these displays may produce visual clutter when inte-grated with existing cockpit display symbology (i.e., un-clear or indiscernible features). Thus, the clutter pro-duced by these displays may inhibit the processes and constructs they have been designed to support. While these design features might be identifi ed as contributors to visual clutter, the aviation and human factors research communities have yet to establish a commonly accepted defi nition of clutter ( 13 ). Several defi nitions and contrib-utory factors have been posed in the literature ( 2 ); how-ever, none are comprehensive or suffi ciently detailed to provide design guidance.

In general, established defi nitions of clutter can be grouped as those that identify clutter with respect to the content or format of the display, degraded performance of the operator using the display, or irrelevance of infor-mation to a task at hand. The vast majority of defi nitions center on the representation of display content, defi ning clutter in relation to the size of the display region ( 5 ), the size of the target of interest ( 5 ), the number of objects ( 9 ), the density of objects ( 16 , 19 , 22 ), and the complexity ( 6 ) or similarity of objects ( 24 ) within the display. Focus has also been placed on the location of objects relative to foveal vision ( 9 ), the salience of objects ( 14 , 23 , 26 ) or their contrast to the background ( 2 ), and their relative motion ( 3 ).

While approaches that defi ne clutter in terms of display content provide an objective and quantitative

From the Edward P. Fitts Department of Industrial & Systems Engi-neering, North Carolina State University, Raleigh, NC, and Aptima, Inc., Woburn, MA.

This manuscript was received for review in March 2008 . It was ac-cepted for publication in September 2008 .

Address reprint requests to: David B. Kaber, Edward P. Fitts Depart-ment of Industrial & Systems Engineering, North Carolina State University, 400 Daniels Hall, 111 Lampe Dr., Raleigh, NC 27695-7906; [email protected] .

Reprint & Copyright © by the Aerospace Medical Association, Alex-andria, VA.

DOI: 10.3357/ASEM.2319.2008

Perceived Clutter in Advanced Cockpit Displays: Measurement and Modeling with Experienced Pilots

David B. Kaber , Amy L. Alexander , Emily M. Stelzer , Sang-Hwan Kim , Karl Kaufmann , and Simon Hsiang

Page 2: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

2 Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008

AVIATION DISPLAY CLUTTER — KABER ET AL.

foundation for representing this construct, recent em-pirical research suggests pilot perceptions of clutter may be strongly infl uenced by experience, task characteris-tics, demands, and performance. For example, Ahlstrom ( 1 ) noted that clutter can be attributed to the presence of redundant information, where redundancy is shaped by an operator’s knowledge of the task domain. The pres-ence of irrelevant information that does not support task performance can serve as an additional mechanism for generating clutter ( 9 , 22 ). However, task demands and information requirements can shift and change over time, dynamically infl uencing the relevance of informa-tion at specifi c points within a task and changing the level of perceived clutter within the display ( 29 ). De-pending upon pilot experience and fl ight task require-ments, pilots may judge certain display features to rep-resent “ clutter ” when in fact they may be directly relevant to performance. This potential disassociation of perceptual judgments with performance needs to be as-sessed to clearly identify display content that may drive clutter.

Within the aviation domain, guidelines have been for-mulated in an attempt to prevent clutter in developed displays [e.g., SAE ARP 5288 ( 20 ); AC-25-11]. Under-standing the importance of the role of relevance in shap-ing display clutter, the AC-25-11 guideline suggests that designers determine pilot information requirements for the task at hand and limit the amount of irrelevant infor-mation presented at any given time. However, dynam-ics in pilot goal states during fl ight would dictate the development of adaptive interfaces to achieve this guideline. Collectively, these fi ndings suggest that a comprehensive defi nition of clutter must consider dis-play content and representation within the framework of relevant contextual factors. A qualitative defi nition that aggregates these relevant factors can then provide the foundation for the development of a measurement of clutter in EVS and SVS displays.

Historically, aviation display design features have been evaluated using common human factors measures, such as subjective ratings of pilot preference, workload, or situation awareness ( 8 ). The utility of these measures has been constrained by the limited validity of human behavior measures for assessing overall system perfor-mance potential. To propose metrics that overcome these limitations, several contemporary studies have devel-oped objective measures of display clutter and mapped these measures to human performance ( 2 , 5 , 19 ). These approaches have focused on quantitative measures of display density ( 5 , 18 , 19 ) and feature size ( 16 ). Despite the demonstrated utility of these measures for quantita-tively defi ning display clutter, only one or two physical display characteristics are used as a basis for the mea-sures. Given the multidimensional nature of display clutter revealed by the empirical work described previ-ously, these measures do not provide for a comprehen-sive assessment of display clutter for complex display technologies such as the SVS and EVS.

Thus, the review of empirical work on the qualitative defi nition and quantitative measure of display clutter

reveals two clear limitations for describing clutter within SVS and EVS technologies for the cockpit ( 1 ). Defi ni-tions of clutter traditionally relate clutter to the physical features of a display, failing to capture the effects of task characteristics and demands on subjective perceptions of clutter. Thus, there exists a need for further under-standing of the relationship between the physical fea-tures of a display and the dimensions of pilot percep-tions of clutter ( 2 ). Quantitative measures of clutter defi ne this construct with a limited set of physical dis-play characteristics, providing a simplistic representa-tion of the diverse parameters infl uencing perceptions of clutter. Thus, there is also a need for multidimensional measures of clutter that can be applied to evaluate new aviation displays designed to reduce workload and sup-port information integration for situation awareness. Such measures may involve a vector mapping of display attributes into a single scalar of “ clutter. ” Related to these needs, it is important to determine whether pilots consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all ” term for a set of display attributes. If there exists a set of display features that drives the construct of clutter, then it is necessary to determine those features accounting for the majority of variability in perceived clutter. If we consider “ clutter ” to be an unintended ef-fect of imagery display that obscures or confuses other information, or that may not be relevant to the task at hand, the challenge is to propose a clear formulation that specifi es the display features and the strength of their contributions to this state. In this paper we present methods toward developing a succinct and comprehen-sive defi nition and measure of clutter to address the cur-rent research limitations.

Our procedure was to expose expert commercial air transport pilots to advanced prototype SVS and EVS head-up displays (HUDs) in a simulated approach and landing procedure conducive to the use of these new display technologies. The prototypes were developed by NASA Langley Research Center and the quality of the display features was comparable to current “ glass cockpit ” technology. Specifi cally, we presented display images along a standard instrument landing system (ILS) approach path under IMC (nighttime and inclem-ent weather). We used this scenario to identify a set of display descriptor terms that pilots associate with/use to characterize HUD clutter from an exhaustive list of factors previously considered in the ontology of clutter. We related pilot perceptions of the perceived utility of various terms for describing SVS and EVS HUD images to overall ratings of display clutter as a basis for deter-mining pilot psychophysical transfer functions. Func-tions were developed to predict the likelihood of per-ceived clutter given the presence of certain display characteristics. Determination of transfer functions in-volved developing a statistically based voting mecha-nism to identify display characteristics with the greatest importance to pilots in perceptions of clutter.

In general, we expected that as the number of physi-cal features in HUDs increased, so would pilot percep-

Page 3: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008 3

AVIATION DISPLAY CLUTTER — KABER ET AL.

tions of clutter (Hypothesis 1). Given differences in the graphical (or noniconic) information presented by the SVS and EVS displays, we also expected different effects of the terrain model vs. FLIR imagery on pilot percep-tions of clutter. The visual density of the FLIR imagery was expected to yield higher perceptions of clutter than the terrain model (Hypthesis 2). Regarding the need to determine the underlying dimensionality of perceived clutter, we expected that a set of latent display variables or qualifi ers might drive clutter (Hypothesis 3) and that pilots would use a common language to describe these characteristics (Hypothesis 4). We expected the latent display characteristics to be related to those identifi ed in the previous research, including density, similarity of features, salience, and relative motion. We also expected some display descriptor terms to have greater relevance than others for pilots in evaluating advanced HUDs (Hypothesis 5). This was to be revealed through the psy-chophysical function determination.

METHODS

Subjects

Four expert test pilots (three men, one woman), with prior experience fl ying commercial transport vehicles with advanced HUDs, participated in the study. The subject group ranged in age from 27 to 59 yr (M 5 47.5 yr). While all subjects were expert aviators, their total fl ight hours ranged considerably from 1300 h to 9000 h (M 5 5325 h). Two of the four subjects had fl own aircraft equipped with SVS displays and one had fl ight experi-ence with EVS capabilities. We expected this subject sample to provide insight into the roles that experience (with technology) and knowledge-driven processing may play in the perception of clutter and our formaliza-tion of such mechanisms within a quantitative model.

Independent Variables

The independent variables that were manipulated in the experiment related to various HUD display features, including SVS, EVS, traffi c collision avoidance system (TCAS), tunnel-in-the-sky (TUNNEL), and HUD sym-bology. Synthetic vision was manipulated by turning a wire-frame representation of a terrain model on or off. There was no use of shading or texturing of polygons in the terrain model. This mode of presentation was used primarily to distinguish the SVS information from the EVS imagery as part of the HUD.

Enhanced vision was manipulated by turning on or off imagery collected by a thermal imaging system on actual approaches previously fl own into Reno/Tahoe International Airport. When the EVS imagery was turned “ on, ” sensor returns for terrain were revealed as a texture map in the HUD. TCAS was manipulated by turning on or off the presentation of traffi c icons in the HUD. When TCAS was turned “ on, ” aircraft icon dis-plays included range, bearing, and altitude of an “ in-truder ” relative to the subject’s ship. Up to two intruders appeared on the HUD at any time. TUNNEL was ma-nipulated by turning on or off a “ highway-in-the-sky ”

or pathway guidance display. When TUNNEL was turned “ on, ” four sets of crow’s feet were presented to outline boxes in the HUD images with a maximum physical space width of 600 ft (182.88 m) and maximum and minimum heights of 350 ft (106.68 m) and 50 ft (15.24 m), respectively.

HUD symbology was manipulated by turning pri-mary instrumentation on or off. When primary instru-mentation was turned “ on ” (Primary mode), the follow-ing display features were presented, in addition to the standard IMC mode symbology: pitch ladder, ground speed, selected speed, airspeed tape, speed bug, altitude tape, selected altitude, barometer setting, and runway outline (depending on phase of fl ight). Fig. 1 presents a sample image with all fi ve HUD information elements turned “ on. ”

Scenario

In order to assess the impact of various advanced HUD information elements on pilot perceptions of dis-play clutter and the utility of select semantic pairs of terms for characterizing clutter within the context of rel-evant phases of fl ight, we presented HUD images at specifi c points in the fl ight scenario. We selected the standard ILS approach to Runway 16R at Reno/Tahoe International Airport. In general, the scenario provided a context in which to present display images to pilots that would be seen as they fl ew an aircraft on approach to the airport. The approach involved a low ceiling and reduced visibility conditions and guided pilots to seek information from the HUD images pertinent to specifi c fl ight tasks. The scenario script also provided pilots with information on aircraft state that was not available from the HUD. The sequence of events and information needs of pilots performing the approach, which were captured as critical elements in the scenario, were based on the cognitive task analysis of ILS approaches, using both conventional and SVS displays, by Keller and colleagues ( 10 ). Details of approach speeds and aircraft confi gura-tion changes were also drawn from a major airline’s B-757 fl ight manual provided by NASA. Also included in the scenario were appropriate air traffi c control commu-nications between controllers and the fl ight crew. These consisted of an automatic terminal information system (ATIS) broadcast at the beginning of the scenario, ap-proach and landing clearances, and a frequency change from the approach control frequency to tower frequency. Finally, the scenario for each trial included slight varia-tions in wind direction and airfi eld advisories in the ATIS information to ensure the pilots paid attention to the scenario content. Wind direction varied about the runway heading, while the headings portrayed in the HUD images did not, which created the impression of a possibility of wind shear during the approach (and pi-lots proved sensitive to this).

Dependent Measures

In order to identify the perceptual qualities of displays that pilots might consider in internally defi ning clutter,

Page 4: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

4 Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008

AVIATION DISPLAY CLUTTER — KABER ET AL.

or for characterizing clutter in fl ight deck displays, we conducted a review of literature on the development of measures of clutter. This survey revealed physical dis-play factors (e.g., target size, display complexity, target and background contrast; 2 , 5 , 29 ) and perceptual qual-ities (e.g., amount of task-irrelevant information, atten-tional distribution to tasks; 1 , 23 ) which were considered to play a role in clutter (through analytical or empirical research). These factors in clutter were used as bases for identifying semantic pairs of terms that might represent pilot concepts of clutter in making subjective assess-ments. We generated an initial list of semantic pairs based on the literature review and then conducted a se-mantic analysis of these terms in order to identify pairs with the greatest power and least overlap with other terms for characterizing clutter in aviation displays. The steps included developing a database of concept (clut-ter)-relevant terminology, specifi cally 100 words from Webster’s dictionary. We then identifi ed which terms were synonymous with the clutter descriptor terms pulled from the literature and counted the number of synonymous concept terms for each clutter descriptor term. This provided a measure of conceptual breadth of the descriptors. We also counted the number of times descriptor terms loaded on a concept term that was re-lated to other descriptor terms. This allowed us to estab-lish the degree of conceptual overlap or redundancy of terms. From this analysis, we found the following pairs of terms to have no or low conceptual redundancy (i.e., loading on a concept term held in common by only one to three other pairs of descriptors):

1. Sparse / Dense 2. Monotonous / Variable 3. Indiscernible / Discernible 4. Not Salient / Salient

5. Dull / Sharp 6. Monochromatic / Colorful 7. Low Workload / High Workload 8. Static / Dynamic 9. Unsafe / Safe

10. Redundant / Orthogonal 11. Low Attention / High Attention 12. Empty / Crowded 13. Similar / Dissimilar (among display features) 14. Ungrouped / Grouped

We speculated that certain display qualities would be readily perceptible by pilots and might drive their re-ported sense of clutter. Consequently, the 14 semantic pairs of descriptor terms were included in survey forms for pilot evaluation of how to characterize the occur-rence of clutter in HUDs, incorporating the information elements identifi ed above.

Following the presentation of each display image in a scenario, subjects were asked to rate the utility of each se-mantic pair for describing clutter on a 20-point scale from “ low ” to “ high. ” We did not instruct pilots on which term in a pair might be associated with “ low ” or “ high ” per-ceived clutter, but rather we elicited the perceived rele-vance of the display attribute to overall clutter. Subjects were also asked to provide a single rating of overall clutter on a 20-point scale from “ low clutter ” to “ high clutter, ” indicating the overall amount of perceived clut-ter associated with each display con fi guration.

Procedure

The research protocol that we followed was reviewed and approved by the North Carolina State University Institutional Review Board for the protection of human subjects. Pilots were initially provided with a briefi ng and introduction to the display technology under study.

Fig. 1. Example image with all fi ve HUD information elements active or “ on. ”

Page 5: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008 5

AVIATION DISPLAY CLUTTER — KABER ET AL.

This was followed by a set of trials depicting the instru-ment approach into the airport and presenting the HUD image. A laptop computer and LCD projector were used to display the images on a projection screen; therefore, the visual properties of the image (luminance, contrast, etc.) were not an exact replication of the actual HUD. Two pilots were tested on each of 2 d, one in the morn-ing and one in the afternoon, with each subject taking 4 h to complete the experiment.

Each pilot was briefed on the aims of the study and the pairs of display descriptor terms to be evaluated and informed consent was obtained. Next, pilots completed a survey of overall fl ight, HUD, and SVS/EVS experi-ence. They were then shown an example HUD image including all display elements and the features were described.

Each experimental trial included a series of 16 static HUD images which were attained for different points within the described scenario. Each image included a specifi c combination of display elements. Prior research has suggested the context dependence of perceived clut-ter ( 9 ) and that perceptions may be driven by the amount of task relevant or irrelevant information in a display. Although each test trial involved the same approach scenario and phases of fl ight, the fl ight tasks that pilots mentally considered varied from phase to phase. There-fore, it was likely that the relevance of specifi c display elements also varied across phases. We recognized the need to balance the appearance of the various display elements among the 16 images in each trial and at the same time to make realistic use of the elements. For this reason, we randomized the order of presentation of HUD conditions within and across trials with some con-straints. For example, the EVS (FLIR sensor) delivers a view of the actual out-of-cockpit environment, but does not provide a useful picture beyond a certain distance from the airport or above a certain altitude. As a result, HUD images including EVS were more concentrated in the fi nal stages of the approach. Similarly, the SVS ter-rain model is most useful in stages of an approach where there are substantial terrain features, but it has limited use near the runway and level terrain. Consequently, HUD images including SVS were more concentrated in the beginning stages of the approach. General altitude guidelines for use of these features were identifi ed by a pilot subject matter expert working on the project and used as a basis for identifying HUD feature combina-tions for each of the 16 images in each trial.

Each subject was presented with the full set of display feature combinations specifi ed by a fractional factorial experimental design (2 5-1 Resolution V design) in the course of each of the four planned trials. A fractional de-sign was used in this study due to the limited sample size and experimental resources. The design allowed for investigation of all main effects and two-way inter-actions of the various HUD features (among 32 display confi gurations) on perceptions of clutter through a total of 16 experimental runs ( 15 , p. 627). At the beginning of each trial, a pilot was provided with a copy of the sce-nario script and a copy of the instrument approach plate

for the ILS runway 16R at Reno/Tahoe International Airport. Two researchers read information to the pilot prior to each image presentation, including one acting as air traffi c control (ATC), giving appropriate weather information and clearances, and the other (a former U. S. Air Force pilot) playing a confederate fi rst offi cer (FO). The FO responded to the simulated ATC communica-tions and provided the subject with fl ight condition and aircraft confi guration information needed for the approach.

Once a HUD image appeared, the subject was given whatever time they needed to interpret the display in terms of the fl ight information provided by the FO. The subject then rated the utility of each of the pairs of de-scriptor terms for describing clutter as well as the over-all clutter for that particular image (it is important to note that pilots were not instructed to rate the actual perceived levels of, for example, “ redundancy ” or infor-mation “ similarity ” of the display content but the use-fulness of these terms for describing the HUD). After the pilot completed the ratings, the next image was pre-sented and this process was repeated until all 16 images had been viewed. After the last image of an approach was viewed (the HUD showing the runway just prior to touchdown), the next trial depicting the same approach began. The fi rst HUD image presented aircraft and ter-rain information for the initial approach fi x position. To promote pilot engagement in the fl ight scenario, sub-jects were instructed that some displays might show variations from the desired fl ight path on the approach and that they should detect and identify these. Each trial included one image depicting a clearly apparent devia-tion from the localizer or glideslope, all of which were noticed by the subjects.

RESULTS

Pilot ratings of clutter for the 16 images ranged from 0 ( “ low ” ) to 20 ( “ high ” ) and were submitted to an ANOVA with the HUD features as independent variables. The fractional factorial design of the experiment dictated that the statistical model only include main effects and two-way interactions. We also analyzed the potential for a trial order effect and individual differences in the rat-ings by including trial and subject terms in the ANOVA model. Beyond this, residual diagnostics were con-ducted for each statistical model. A normal probability plot on the overall clutter ratings revealed a linear trend and a Shapiro-Wilk test was highly insignifi cant ( P 5 0.94), indicating that the dataset followed a normal dis-tribution. Plots of model residuals against the levels of the fi xed effects also did not reveal any violation of ho-mogeneity of variance.

The ANOVA revealed no systematic variation in re-sponses across trials (learning, fatigue effects, etc.); how-ever, the subject term was signifi cant [F(3136) 5 4.795, P 5 0.003] and interacted with all display factors except SVS ( P . 0.20) in infl uencing ratings of clutter. Because of the individual differences among the pilots in the small sample, the following results on HUD information

Q1Q1

Page 6: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

6 Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008

AVIATION DISPLAY CLUTTER — KABER ET AL.

features may be limited in generalizability to a larger pi-lot population. The ANOVA also revealed all display features and two-way interactions to be signifi cant in overall perceived clutter, except those involving SVS, EVS, and TUNNEL crossed with primary mode symbol-ogy as well as EVS by TUNNEL. In general, the ANOVA model was signifi cant [F(15,144) 5 22.74, P , 0.0001] with a R 2 of 0.7031.

In fractional factorial analysis, it is practice to reduce a model to only the signifi cant terms and reanalyze the fi t with the response. A regression analysis on a reduced model of overall clutter was conducted with all 5 main effects and 6 of the 10 possible 2-way interactions (in-cluded in the original ANOVA). This model also proved to be highly signifi cant for explaining clutter ratings [F(11,148) 5 30.61, P , 0.0001] and yielded a R 2 of 0.69; t -tests on the parameter estimates revealed all main ef-fects, including the symbology set (Primary), path guid-ance (TUNNEL), TCAS, SVS, and EVS, as well as all two-way interactions involving TCAS or the symbology set, to be signifi cant ( P , 0.025) in predicting overall perceived clutter.

The resulting model included 11 terms allowing for es-timation of clutter rating differences among expert pilots when specifi c HUD features are toggled “ on ” or “ off ” (the terms in the model have been organized according to positive and negative associations with overall per-ceived clutter and based on the strength of each parame-ter in the model).

Overall clutter = 12.15 + 2.20 (EVS) + 1.22 (PRIMARY) + 1.112(TCAS*PRIMARY + 1.07 (TUNNEL) + 0.77 (SVS*TUNNEL) + 0.466 (SVS) + 0.45 (SVS*TCAS) 1.04 (TCAS) 0.85 (SVS*EVS)

0− −

− ..60 (EVS*TCA S) 0.50 (TCAS*TUNNEL)−

The signs for the majority of parameter estimates for the HUD feature manipulations were in line with our expectations. For example, clutter ratings were pre-dicted to increase with the addition of SVS and EVS features, possibly due to increased visual density of the display with overlapping iconic and non-iconic infor-mation. The same was true for the use of the TUNNEL and Primary mode symbology (vs. IMC mode). Sur-prisingly, the regression analysis revealed the presence of TCAS symbology to reduce ratings of clutter, as a main effect, and as interactions with EVS and TUN-NEL. It is possible that this was due to the pilot’s atten-tion being drawn to less visually dense areas of the dis-play so that they perceived the TCAS icons. In addition, the combination of SVS and EVS in the HUD served to reduce the perception of clutter. It is possible that the integration of the wire frame terrain model with the thermal imagery prevented pilot confusion of the SVS with other display graphics (e.g., the tunnel or runway outline) and made terrain features revealed by the FLIR clearer for pilots.

The sizes of the parameter coeffi cients in the regression model also provide an indication of the importance of each HUD information feature to the perception of clut-ter. The use of the EVS, Primary mode symbology, and

the TUNNEL appeared to make the greatest contribu-tions. It can also be inferred that the greater the coeffi cient for a particular feature, the greater the consensus among pilots as to the importance of the feature in clutter.

In general, clutter ratings were found to increase as the number of display features increased. The mean rat-ings for 1, 3, and 5 display features were 9.79 ( N 5 50), 12.94 ( N 5 100), and 16 ( N 5 10), accordingly. An addi-tional correlation analysis indicated that as the major ac-tive HUD information elements changed from one to three to fi ve, overall perceived clutter signifi cantly in-creased (r 5 0.42, P , 0.0001). All images tested in the experimental trials only included odd numbers of active features due to the fractional factorial design.

In order for us to determine the psychophysical trans-fer function that pilots use to assess display clutter, pilots rated the applicability of the various pairs of descriptor terms for describing the perceived display clutter within a given display confi guration. These ratings assumed that all pairs were well-calibrated psychophysically; that is, if a pilot considered a denser display to represent a worse clutter situation, such a judgment had to be con-sistent across trials, monotonically (i.e., perceived clut-ter always increased as density increased, but possibly at different rates depending upon HUD features). To specify transfer functions relating the perception of spe-cifi c display quantities to the likelihood of perceived clutter ratings, we used maximal likelihood estimation ( 25 ), where the number of consistent judgments on clut-ter given a physical signal intensity a i , is the sum of ran-dom samples (clutter judgments) from a Bernoulli pro-cess with a probability of success p i :

pi ai F ai= = + − −Ψ( ; , , , ) ( ) ( ; , )a b g l g l g a b1

where the shape of the function Ψ( )⋅ →R R is determined by the parameters { a , b , g , l }. F is typically a sigmoid function, such as the Weibull, logistic, cumulative Gauss-ian, or Gumbel distribution, with two parameters { a , b }. An ideal C ( z ) should be used in judgments on all the possible HUD confi gurations, and the usage of the given semantic pair (e.g., sparse/dense) should be represented symmetrically — roughly 50% positive (e.g., dense), and 50% negative (e.g., sparse) across trials. Many of the cu-mulative density functions (CDF) on overall clutter rat-ings for test trials (16 display images) revealed asymme-try ( Fig. 2A ); that is, there were deviations of the trend of the CDF (discrete step pattern) from a linear trend. This indicated that certain clutter ratings were more likely than others for the HUD displays and that certain display characteristics (e.g., density) were systemati-cally and consistently considered in clutter judgments. When the CDF is symmetric and centered, normal and logistic distributions are the ideal models of psycho-physical transfer function (and they can be directly re-lated to the principal components discussed in the next section). Since our fi nal goal was to compare the ranking of each of the semantic pairs for describing clutter when the transfer functions for all pilots were scaled proportionally, any scoring system needed to be robust to individual differences or underlying clutter sensitivities.

Page 7: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008 7

AVIATION DISPLAY CLUTTER — KABER ET AL.

To accommodate asymmetry in clutter judgments, sev-eral distributions were used in comparisons with the sigmoid function assumed in the Bernoulli process: ex-ponential, Weibull, log-normal, gamma, log-logistic, and extreme value distributions. A log-log transforma-tion was used to linearize the CDF of pilot clutter ratings in all trials for distribution fi tting ( Fig. 2B ).

Based on the log-likelihood ratios for each possible distribution fi tting of F, a gamma distribution model proved to be the best for 90% of all test trials. Intuitively, the gamma transfer function family can account for clut-ter judgments developed over time or based on the sud-den detection of display characteristics. Subsequently, the contribution of each pair of display descriptor terms to the likelihood of the modeled clutter judgment was determined based on Chi-square values ( Table I ).

The resulting general model of the log-likelihood of clutter ratings included a parameter estimate for those pairs of display descriptor terms that proved signifi cant on the basis of the Chi-square tests. As with the above regression model, the terms in the log-likelihood model have been organized according to positive and negative associations with the occurrence of clutter and based on the strength of each parameter in the model.

log-rating = 19.1608 + 0.2844 (Not Salient/Salient) + 0.12112 (Indiscernible/Discernible) 0.4254 (Empty/Crowded) − − 0.3347(Monochromatic/Colorful) 0.3030 (Low Attention/H− iigh Attention)

0.1874 (Similar/Dissimilar) 0.1789 (Stat− − iic/Dynamic) 0.1508 (Sparse/Dense) 0.1369 (Low Workloa− − dd/High Workload) 0.1317 (Ungrouped/Grouped)−

Due to individual differences in perception of clutter, the 10 coeffi cients varied among pilots. The challenge here was to identify a parsimonious set of semantic pairs based on the majority preferences (e.g., three out of fi ve pilots) or by using a voting mechanism (i.e., a rank ag-gregation procedure). There is a long history of research on such choice or voting procedures for nonparametric analysis. Unfortunately, there are few “ consensus meth-ods ” that prescribe how rankings of display alternatives or qualities can be combined across subjects and trials ( 4 ). To establish a common frame of reference for ratings across pilots, the median voter theorem suggests that when the clutter judgments for each pilot have a single peak (i.e., a modal/individual gamma distribution), we can identify a “ median clutter vote ” across pilots. If the function is multimodal a median vote cannot be identi-fi ed. Relative to the median vote, half of the pilot judg-ments will be considered “ high clutter ” and the other half will be considered “ low clutter. ” Under the least sensitive condition, the majority of pilots will prefer the median vote; therefore, qualities of the display cannot be clearly associated with high or low clutter.

One possible solution is using the Kemeny-Snell me-dian ( 11 ). The idea is to create an overall ranking of the semantic pairs that has as few disagreements as possible among the fi ve pilots. Given rankings r 1 , r 2 , … r 5 , the method involves fi nding a ranking x so that d(r 1 , x) 1 d(r 2 , x) 1 … 1 d(r 5 , x) is minimized, where d(r p , x) is the Kendall tau distance between two pilot rankings. To fi nd the Kemeny-Snell median ( 7 ), the parameter coeffi cients for all pairs of terms in the log-likelihood regression

TABLE I. GAMMA MODEL PARAMETER ESTIMATES AND CHI-SQUARE TESTS.

Parameter DF Estimate Standard Error 95% Confi dence Limits Chi-Square

Intercept 1 19.1608 0.0092 19.1428 19.1788 4,355,277 SP 1 2 0.1508 0.0005 2 0.1518 2 0.1497 78,445.3 IN 1 0.1212 0.0001 0.121 0.1214 1,542,569 NO 1 0.2844 0.0004 0.2836 0.2852 523,396 MOC 1 2 0.3347 0.0004 2 0.3355 2 0.3339 660,095 LOW 1 2 0.1369 0.0004 2 0.1377 2 0.1361 109,582 ST 1 2 0.1789 0.0002 2 0.1792 2 0.1786 1,228,467 LOA 1 2 0.303 0.0002 2 0.3033 2 0.3026 3,735,572 EM 1 2 0.4254 0.0004 2 0.4262 2 0.4247 1,176,144 DI 1 2 0.1874 0.0002 2 0.1877 2 0.187 1,227,554 UNG 1 2 0.1317 0.0003 2 0.1322 2 0.1311 194,892 Scale 1 0.0004 0.0041 0 2,705,191 – Shape 1 367.972 4271.86 2 8004.7 8740.66 –

Note: All listed parameters were signifi cant with P , 0.0001. SP 5 sparse/dense; IN 5 indiscernible/discernible; NO 5 not salient/salient; MOC 5 monochromatic/colorful; LOW 5 low workload/high workload; ST 5 static/dynamic; ; LOA 5 low attention/high attention; EM 5 empty/crowded; DI 5 similar/dissimilar; UNG 5 ungrouped/grouped. The table excludes those model parameters that were not found to be signifi cant based on Chi-square tests.

Fig. 2. A) An example of asymmet-ric usage of a pair of descriptor terms. B) A log-negative log transformation of the cumulative density function (CDF) in A.

Page 8: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

8 Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008

AVIATION DISPLAY CLUTTER — KABER ET AL.

models of perceived clutter for each pilot were normal-ized for each trial and pilot by conversion to squared Z-scores. This was done to further reduce the effect of indi-vidual differences in internal scaling of rankings of terms. The mean squared Z-score for each pair of terms for each pilot across trials was then calculated. The scores ranged from 0.0 to 5.83 and represented the average im-portance of any pair of terms to a particular pilot. The median squared Z-score for each pair of terms was then calculated across all pilots to determine the perceptual display qualities the group considered most relevant to perceived clutter. The median scores ranged from 0.02 to 0.75. Table II shows the correlations among the mean squared Z-scores for each pilot along with the median scores across pilots. The table provides a sense of the de-gree of consistency in pilot perceptions regarding the terms. Positive correlations ranged from r 5 0.26 to 0.91 with all being signifi cant at the P , 0.05 level. Only the scores for Pilot 2 were negatively correlated with those of Pilots 3 and 4, with the latter being insignifi cant ( P . 0.05). The “ top 5 ” terms (i.e., the most highly ranked), listed in order of frequency of use for describing clutter and strength of prediction of the likelihood of perceived clutter, included: 1) “ redundant/orthogonal, ” 2) “ mono-chromatic/colorful, ” 3) “ not salient/salient, ” 4) “ un-safe/safe, ” and 5) “ sparse/dense. ” These pairs differ slightly from those identifi ed based on the above model-ing of the log-likelihood of clutter ratings.

A factor analysis using principal components was con-ducted to further investigate the relationships among the 14 pairs of display clutter descriptors (principal com-ponents analysis is a specifi c form of factor analysis that does not make assumptions about the distribution of re-sponse measure variance). This analysis was intended to 1) examine the underlying factors or latent variables in pilot perceptions of display clutter; and 2) determine how many different factors are necessary to explain the pattern of variance in perceptions of clutter. Based on the psychophysical transfer function modeling and selection of relevant pairs of terms for characterizing clutter (nonparametric analysis approach), we expected that a fi nite set of latent display variables or qualities was driv-ing perceived clutter. The principal components analysis was also conducted to provide parametric confi rmation of the nonparametric approach.

The ratings for the pairs of descriptor terms were sub-mitted to a principal components analysis with an or-thogonal varimax rotation. Factors with Eigen values greater than 1 were retained. The analysis revealed four factors that explained 78.45% of the total variance in

perceptions of HUD clutter. Each factor was constructed using semantic pairs with no loading less than 0.36 on the rotated component matrix ( Table III ). The loadings in Table II refl ect the importance of each display quality to perceived clutter.

The fi rst factor, explaining 29.53% of the variance, was defi ned by the semantic pairs of the terms “ not salient/salient ” (0.89), “ sparse/dense ” (0.83), “ empty/crowded ” (0.79), “ low workload/high workload ” (0.75), and “ low attention/high attention ” (0.70). We labeled this factor as “ global density, ” which has been previously defi ned as “ the total amount of marks on a display, both relevant and irrelevant ” ( 28 ). The salience of display features ap-peared to be the most important quality to this factor. The second factor, explaining 21.41% of the variance, was defi ned by the semantic pairs “ redundant/orthogo-nal ” (0.92) and “ similar/dissimilar ” (0.90). We labeled it as “feature similarity,” primarily referring to the extent to which the various HUD features provided similar in-formation. The pairs of terms were almost comparable in importance to the factor. The semantic pairs that loaded most on the third factor, explaining 15.48% of the variance, were “ unsafe/safe ” (0.90) and “ dull/sharp ” (0.74). Two additional semantic pairs, “ indiscernible/discernible ” (0.59) and “ monochromatic/colorful ” (0.55), loaded moderately on the factor. Because the majority of pairs of terms loading on this factor were related to dis-criminating among various display entities, we labeled it as “feature clarity.” The fourth factor, explaining 12.03% of the variance, was defi ned by the semantic pairs “ static/dynamic ” (0.78), “ ungrouped/grouped ” (0.67), and “ monotonous/variable ” (0.60). These terms indicated the degree of movement or change expected within the display upon image presentation; therefore, we labeled the fourth factor a “dynamic nature.” With respect to this factor loading, the perceived importance of the “ static/dynamics ” pair was generally associated

TABLE III. ROTATED COMPONENT MATRIX.

Component

Term 1 2 3 4

SP 0.83 0.13 0.08 0.01 MOV 2 0.46 0.38 0.19 0.6 IN 0.54 0.3 0.59 2 0.01 NO 0.89 0.07 0.06 2 0.08 DU 0 0.6 0.73 2 0.02 MOC 2 0.55 0.25 0.55 0.4 LOW 0.75 2 0.46 0.03 0.11 ST 0.37 2 0.22 2 0.08 0.78 UNS 0.11 0.08 0.9 2 0.02 RE 2 0.05 0.92 0.26 0.06 LOA 0.7 2 0.27 0.04 0.27 EM 0.79 0.07 0.04 2 0.06 DI 0.08 0.9 0.22 0.06 UNG 2 0.09 0.57 0 0.67

SP 5 sparse/dense; MOV 5 monotonous/variable; IN 5 indiscernible/discernible; NO 5 not salient/salient; DU 5 dull/sharp; MOC 5 mono-chromatic/colorful; LOW 5 low workload/high workload; ST 5 static/dynamic; UNS 5 unsafe/safe; RE 5 redundant/orthogonal; LOA 5 low attention/high attention; EM 5 empty/crowded; DI 5 similar/dissimilar; UNG 5 ungrouped/grouped.

TABLE II. CORRELATIONS AMONG PILOT PERCEPTIONS OF IMPORTANCE OF PAIRS OF DISPLAY DESCRIPTOR TERMS.

Pilot 1 Pilot 2 Pilot 3 Pilot 4 Median

Pilot 1 1 – – – – Pilot 2 0.47 1 – – – Pilot 3 0.29 2 0.26 1 – – Pilot 4 0.91 2 0.05 0.26 1 – Median 0.60 0.46 0.33 0.39 1

Page 9: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008 9

AVIATION DISPLAY CLUTTER — KABER ET AL.

with more clutter when there were more static elements in the display. The reason for this is that pilots perceive static displays to be less useful for trend information and to more likely represent clutter. The semantic pair of “ ungrouped/grouped ” was considered to be related to the dynamic nature of the display as grouped features in the HUD would move together and contribute less to the sense of dynamics than ungrouped features that might move independently about the display.

DISCUSSION

Display Features and Overall Perceived Clutter

The ANOVAs on the pilot clutter ratings for the HUD images revealed specifi c display features and interac-tions of features to drive perceptions of clutter. During the fl ight scenarios, all features appeared important to pilot ratings. The HUD confi gurations involving SVS, EVS, or TCAS features appeared to dictate clutter rat-ings. In line with our second hypothesis (Hypothesis 2) for the experiment, those displays including the FLIR imagery generally led to higher perceptions of clutter than those with the terrain model. With respect to the TCAS, small “ diamond ” shaped symbols were used in the HUD to represent other aircraft along with relative altitude and direction information. These symbols al-ways appeared in either the lower-left or upper-right corners of the HUD image ( Fig. 1 ) between the altitude and speed “ tapes ” (symbology). Few if any other fea-tures were presented in these areas of the HUD, thus giving pilots the sense of lower local density when at-tending to the TCAS symbology. This appeared to trans-late to lower clutter ratings when varying all other dis-play features, save the primary fl ight display (PFD) symbology manipulation in which Primary mode caused perceived clutter to signifi cantly increase. During the experiment some pilots said the SVS (wireframe ter-rain model) lines became confused with the TUNNEL crows ’ feet, runway outline, and PFD symbology. They also said that the presence of the EVS imagery rendered some of the PFD symbology indiscernible. However, it is possible that the combination of the SVS and EVS fea-tures made clearer for pilots the association of SVS lines with terrain features illuminated by the FLIR and served to facilitate perceptual separation of the FLIR pixels from active symbol pixels. That is, pilots may have been able to better discriminate the EVS feature from the PFD symbology as a result of the presence of the SVS. These effects may have translated to lower clutter ratings.

Based on the ANOVA results, when Primary mode symbology was active, the presence of SVS, EVS, or TUNNEL did not appear to infl uence pilot ratings. Pi-lots appeared to be less affected by the addition of these features to the HUD when using primary symbology be-cause of their focus on the additional aircraft status in-formation as compared to the IMC mode symbology. This interpretation is supported by previous research examining the extent to which tunnel guidance and in-strumentation overlays affected traffi c detection in a SVS-hosted primary fl ight display. Wickens and colleagues

( 27 ) expected traffi c detection to be mediated by display clutter associated with the presence of the tunnel and instrumentation overlay. They found that the tunnel supported somewhat better traffi c detection perfor-mance compared to conventional instrumentation; in other words, the tunnel did not increase clutter to the point of performance degradation. The conformal na-ture of the tunnel symbology with the out-of-cockpit view may have served to better join the display and view in one attentional fi eld, lessening the sense of clut-ter. This is related to the psychological mechanism of scene linking in HUD symbology design, as identifi ed by Levy, Foyle, and McCain ( 12 ). However, the instru-mentation overlay substantially inhibited traffi c detec-tion, regardless of tunnel presentation. Instrument sym-bology, then, appears to be a primary factor in driving display clutter and associated performance effects.

Finally, it was not surprising that the manipulation of the EVS, PFD symbology, and TUNNEL features were most important to perceived clutter. These features also led to the greatest increase in active pixel counts when turned “ on. ” Table IV shows the active pixel percent-ages for each of the HUD confi gurations. In general, the table reveals that as information features are added to the HUD, pixel counts increase. It also shows that when the EVS and SVS features are turned “ on, ” the percent-ages are the highest and this likely translated to in-creased pilot perceptions of density and decreased dis-cernibility of features. Regarding the EVS, during the experiment, all pilots commented that in certain phases of the approach, the thermal imagery caused several other features to be indiscernible. Some pilots also ob-served the tunnel lines to occlude or confuse the glides-lope deviation indicator, which was a key feature they attended to in the later phases of the approach (i.e., after intercepting the glideslope). In general, it is possible that because of variations in the relevance of specifi c display

Q2Q2

TABLE IV. PERCENTAGE OF ACTIVE PIXELS FOR HUD CONFIGURATIONS.

Display Confi guration Active Pixels Total Pixels% Active

Pixels

Primary/TUNNEL 13,696 354,047 3.94 IMC 6687 353,632 1.89 IMC/TUNNEL/SVS 54,020 338,388 15.96 IMC/SVS 45,903 338,388 13.57 Primary/TUNNEL/SVS 56,569 319,012 17.73 Primary/SVS 52,136 319,704 16.31 IMC/TUNNEL 9703 344,616 2.82 Primary/SVS/EVS 91,558 347,384 26.36 IMC/SVS/EVS 41,668 330,084 12.62 Primary/EVS 126,869 347,384 36.52 Primary/TUNNEL/EVS 135,600 348,076 38.96 IMC/TUNNEL/EVS 81,893 338,793 24.17 IMC/TUNNEL/SVS/EVS 155,262 334,928 46.36 Primary/TUNNEL/SVS/EVS 178,309 343,232 51.95 IMC/EVS 153,233 337,004 45.47 Primary 33,124 323,288 10.25

All pixels counts are for images presented under instrument fl ight rules (IFR) conditions, save the Primary-only confi guration. This image was presented after “ breakout ” under visual fl ight rules (VFR) conditions and included a visual image of the runway environment.

Page 10: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

10 Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008

AVIATION DISPLAY CLUTTER — KABER ET AL.

features to pilot activities from phase-to-phase (as noted in the Procedures section), clutter ratings for the various HUD confi gurations may have been infl uenced by the different phases of fl ight.

The correlation analysis confi rmed that as the number of active display features increased, so did pilot ratings of clutter. This was in agreement with our fi rst hypoth-esis (Hypothesis 1). Based on the results shown in Table IV , we inferred that the visual density of the displays, in terms of iconic and non-iconic imagery (alone), was likely a key predictor of pilot clutter assessments. We conducted an additional correlation analysis on the per-ceived overall clutter ratings across HUDs with the per-cent active pixel counts and found a highly signifi cant linear relation (r 5 0.76798, P , 0.0001) Although non-iconic imagery, such as the EVS FLIR returns, may have provided pilots with additional fl ight information, if the imposed visual density exceeded the information den-sity, pilots tended to considered the HUDs to be cluttered.

Modeling Perceptions of Display Clutter

We hypothesized that pilots would use a common language to characterize perceived clutter in the ad-vanced HUD displays (Hypothesis 4). We also expected some display descriptor terms to have greater relevance than others, given the presence of various HUD features (Hypothesis 5). If all terms have equal frequency of use and strength for pilots, then no one display quality has greater power than any other for describing clutter. Although pilots may have had different internal models of display clutter and defi nitions of pairs of display de-scriptor terms, the psychophysical transfer function analysis revealed convergence of pilot ratings on fi ve se-mantic pairs: “ redundant/orthogonal, ” “ monochro-matic/colorful, ” “ salient/not salient, ” “ safe/unsafe, ” and “ dense/sparse. ” These pairs were considered to represent the perceptual qualities of HUDs that expert pilots consider to be highly relevant for characterizing display clutter. The fact that there were differences in the perceived relevance of terms indicates that specifi c display qualities contribute more or less to clutter, de-pendent upon HUD content. This is meaningful from a display design perspective as it supports designer iden-tifi cation of the nature and origin of perceived clutter. The results of the psychophysical transfer function anal-ysis confi rmed the identifi ed hypotheses.

Underlying Factors in Perceptions of Display Clutter

We also hypothesized that a set of latent display vari-ables or qualities related to those identifi ed in the previ-ous research was driving pilot perceptions of clutter (Hypothesis 3). The principal components analysis re-vealed a set of four factors to be suffi cient for explaining the vast majority of variance ( ; 78%) in overall clutter ratings. Furthermore, the ratings of the utility of the se-mantic pairs of terms for describing clutter loaded on these factors in such a way that they could be described as abstracted display qualities. The strongest factor we

identifi ed appeared to represent global display density. Secondary factors concerned: the similarity of features in the HUD, or redundancy; the clarity of features, or extent to which relevant information is obscured by ir-relevant information; and the dynamic nature of the dis-play, or the degree to which the display was expected to change from moment to moment with changing aircraft states.

These latent variables corresponded well with the pairs of terms selected for characterizing display clutter based on the psychophysical transfer function analysis. The “ redundant/orthogonal ” pair is strongly related to the “ feature similarity ” factor we defi ned. The “ mono-chromatic/colorful ” and “ unsafe/safe ” pairs were re-lated to the “ feature clarity ” factor. The “ sparse/dense ” and “ not salient/salient ” pairs were related to the “ glo-bal density ” factor. However, there was no selected pair of terms that was strongly related with the “ dynamic nature ” factor. This needs to be carefully considered in the use of latent variables identifi ed by the principal components analysis as bases for developing future subjective measures of clutter, such as rating scales; however, these factors do correspond to methods often cited in the literature as potential means for reducing display clutter. For example, the factor “ clarity of fea-tures ” is related to intensity-coding display manipula-tions such as backgrounding, highlighting, and low-lighting ( 26 ).

Limitations and Implications for Future Research

Three aspects of this study that affect the interpreta-tion of results include the pilot sample, the use of static images as stimuli, and the lack of performance data. The sample was limited because of resource constraints, and this might have affected the sensitivity and reliability of the statistical analyses, particularly the ANOVA, on the overall clutter ratings. However, we consider the results on the pilot assessment of the utility of pairs of HUD de-scriptors to be reliable. There was strong convergence of pilot opinion on the most important and frequently used terms. We only selected those pairs found to be signifi -cant terms (based on Chi-square tests) in signifi cant psy-chophysical transfer functions (log-likelihood) across all pilots and all test trials. Our aggregation of pilot ranking of terms was also conservative; that is, we selected the top fi ve pairs of terms. The results may also be general-izable to the larger pilot population as the sample of test pilots recruited for the study represented a demographi-cally diverse group with various types of expertise (i.e., one female and three male pilots; two pilots with mili-tary service and two with civilian training; two with 757/767 type rating, one with 737 and A319/320 type ratings, and one with DC-9 type rating). As reported, two of the test pilots did have prior SVS and EVS HUD experience in simulators and/or active fl ight and this is likely not representative of the general pilot population. In general, the fi ndings on our sample of pilots regard-ing the underlying qualities of display clutter may cap-ture average pilot behavior.

Q3Q3

Page 11: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008 11

AVIATION DISPLAY CLUTTER — KABER ET AL.

The use of static imagery in the experiment introduced a degree of separation from actual piloting using a HUD. This approach allowed pilots to study and give detailed feedback on each HUD image from a clutter perspec-tive, and to assess the utility of the various display de-scriptor terms. On the other hand, any effect on percep-tion of clutter arising from display dynamics of an actual HUD, and the interaction of display elements, could not be assessed. For example, features like the TUNNEL normally move in the HUD during fl ight and may oc-clude or confuse certain other symbology depending upon conditions. This experiment only provided pilot ratings of the perceived relevance of the “ static/dy-namic ” pair of terms for describing HUDs.

Related to this, the fact that we did not present the HUD images in, for example, a fl ight simulation and re-cord pilot performance in the approach scenario may have further limited our ability to establish what dis-play features represent clutter under specifi c conditions and what display qualities are truly predictive of clutter. By collecting performance data, it would be possible to determine whether increased perceptions of clutter due to certain HUD features actually led to degradations in performance. As observed in the introduction, it is pos-sible that HUD features pilots considered to represent “ clutter ” may actually support fl ight control under spe-cifi c conditions.

Future research should focus on developing a multi-dimensional subjective measure of clutter based on criti-cal HUD qualities, such as those identifi ed through the principal components analysis. The pairs of display de-scriptor terms selected by test pilots in this study could be used as anchors to a set of scales to assess display density, redundancy, feature salience, and dynamics. The measurement approach might also involve pilot ad-vance ranking of display qualities as a basis for calculat-ing a rank-weighted sum of ratings, or overall clutter score. Such a measure could be used to assess the infl u-ence of HUD visual properties on pilot perceptions of clutter. These properties might include contrast or lumi-nance differences between various display elements. There is also a need to relate pilot performance to per-ceived clutter through empirical testing. A model of per-ceived clutter in HUD visual properties might allow avi-onics manufacturers to predict clutter and performance potentials for various display designs. Furthermore, a multidimensional subjective measure of clutter could be used to validate predictions through experiments as part of the design process.

Conclusion

In general, this research provided further insight into the relationship of physical features of aviation displays to pilot perceptions of display clutter. The study identi-fi ed various advanced HUD information elements (SVS, EVS, TUNNEL, TCAS, symbology set) that appear to be infl uential in perceived clutter ratings. In addition, the experiment identifi ed a concise set of display descriptor terms that expert pilots consider to have the greatest rel-

evance when assessing aviation display clutter, particu-larly perceptions of non-iconic display imagery (proto-type SVS/EVS features) combined with standard iconic imagery (primary fl ight display-like symbology). The research also revealed a concise set of latent HUD varia-bles that drive perceptions of clutter. The semantic pairs of descriptor terms closely corresponded with the ab-stract display qualities driving perceived clutter. These outcomes all provide a basis for a comprehensive defi ni-tion of clutter in aviation displays and the development of multidimensional measures of the phenomenon in fu-ture research.

ACKNOWLEDGMENT This research was supported by a NASA Research Announcement

Grant (No. NNL06AA21A) through the Aeronautics Research Mission Directorate (ARMD), and the Integrated Intelligent Flight Deck (IIFDT) program. Lance Prinzel and Randy Bailey were the techni-cal monitors. The opinions expressed in this paper are those of the authors and do not necessarily refl ect the views of NASA. We would like to thank Jennifer Cowley for acting as the air traffi c controller dur-ing the experiment and recording pilot comments. We would also like to thank Lance Prinzel for preparation of the HUD images used in the experiment. We would like to thank Nathan Bailey for his input on the experiment design and development of the study survey forms. Finally, we would like to thank the anonymous reviewers for their insightful comments that served to strengthen the contribution of this paper.

Authors and affi liations: David B. Kaber, Ph.D., Sang-Hwan Kim , Karl Kaufmann, and Simon M. Hsiang, Edward P. Fitts Department of Industrial & Systems Engineering, Raleigh, NC; Amy L. Alexander, Aptima, Inc., Woburn, MA; and Emily M. Stelzer, Aptima Inc., Washington, DC.

REFERENCES 1. Ahlstrom U . Work domain analysis for air traffi c controller

weather displays . J Safety Res 2005 ; 36 : 159 – 69 . 2. Aviram G , Rotman SR . Evaluating human detection performance

of targets and false alarms, using a statistical texture image metric . Optical Engineering 2000 ; 39 : 2285 – 95 .

3. Beijer D , Smiley A , Eizenman M . Driver and vehicle simulation, human performance, and information systems for highways; railroad safety; and visualization in transportation . Transporta-tion Research Record 2004 ; 1899 : 96 – 103 .

4. Campbell DE , Kelly JS . A simple characterization of majority rule . Economic Theory 2000 ; 15 : 689 – 700 .

5. Ewing GJ , Woodruff CJ , Vickers D . Effects of ‘ local ’ clutter on human target detection . Spat Vis 2006 ; 19 : 37 – 60 .

6. Freedman D , Brandstein MS . Contour tracking in clutter: a subset approach . International Journal of Computer Vision 2000 ; 38 : 173 – 86 .

7. Gulyanitskii LF , Volkovich OV , Malyshko SA . An approach to formalization and analysis of group choice problems . Cybernetics and Systems Analysis 1994 ; 30 : 413 – 8 .

8. Haworth LA , Newman RL . Test techniques for evaluating fl ight displays . Washington, DC : NASA ; 1993 . Tech. Memo. No: 103947 .

9. Horrey WJ , Wickens CD . Driving and side task performance: the effects of display clutter, separation, and modality . Hum Factors 2004 ; 46 : 611 – 24 .

10. Keller J , Leiden K , Small R . Cognitive task analysis of commercial jet aircraft pilots during instrument approaches for baseline and synthetic vision displays . In: Foyle DC , Goodman A , Hooey BL , eds. Proceedings of the 2003 NASA Aviation Safety Program Conference on Human Performance Modeling of Approach and Landing with Augmented Displays . Moffett Field, CA : NASA 2003 . NASA Conference Proceedings No: NASA/CP-2003-212267 .

11. Kemeny J , Snell JL . Mathematical models in the social sciences . Boston : Ginn and Co. ; 1960 .

12. Levy JL , Foyle DC , McCann RS . Performance benefi ts with scene-linked HUD symbology: an attentional phenomenon? Proceedings of the 42nd Annual Meeting of the Human Factors

Q4Q4

Page 12: Perceived Clutter in Advanced Cockpit Displays ... · consider clutter to be an actual quality of displays that may lead to performance problems or if clutter is simply a “ catch-all

12 Aviation, Space, and Environmental Medicine x Vol. 79, No. 11 x November 2008

AVIATION DISPLAY CLUTTER — KABER ET AL.

and Ergonomic Society ; October 5-9, 1998; Chicago, IL . Santa Monica, CA : HFES; 1998 : 11 – 15 .

13. Meitzler T , Gerhart G , Singh H . A relative clutter metric . IEEE Transactions on Aerospace and Electronic Systems 1998 ; 34 : 968 – 76 .

14. Moberly NJ , Langham MP . Pedestrian conspicuity at night: failure to observe a biological motion advantage in a high-clutter environment . Applied Cognitive Psychology 2002 ; 16 : 477 – 85 .

15. Montgomery DC Design and analysis of experiments , 3 rd ed. New York : John Wiley & Sons ; 1991 .

16. Muthard EK , Wickens CD . Display size contamination of attentional and spatial tasks: an evaluation of display minifi cation and axis compression . Savoy, IL : University of Illinois, Aviation Human Factors Division ; 2005 . Tech. Report No: AHFD-05-12/NASA-05-3 .

17. Prinzel LJ , Comstock JR , Jr , Glaab LJ , Kramer LJ , Arthur JJ , Barry JS . The effi cacy of head-down and head-up synthetic vision display concepts for retro- and forward-fi t of commercial aircraft . International Journal of Aviation Psychology 2004 ; 14 : 53 – 77 .

18. Rosenholtz R , Li Y , Mansfi eld J , Jin Z . Feature congestion: a mea-sure of display clutter . In: Proceedings of CHI 2005 . Portland, OR : ACM ; 2005 .

19. Rotman SR , Tidhar G , Kowalczyk ML . Clutter metrics for target detection systems . IEEE Transactions on Aerospace and Electronic Systems 1994 ; 30 : 81 – 91 .

20. Society of Automotive Engineers, Inc . Aerospace recommended practices (ARP)-5288, transport category airplane head up display (HUD) systems . Warrendale, PA : SAE ; 1999 .

21. Schnell T , Kwon Y , Merchant S , Etherington T . Improved fl ight technical performance in fl ight decks equipped with synthetic

vision information system displays . International Journal of Aviation Psychology 2004 ; 14 : 79 – 102 .

22. Tullis T . Screen design . In: Helander M , Landauer TK , Prabhu P , eds. Handbook of human-computer interaction . Amsterdam : Elsevier Science ; 1997: 503 – 31 .

23. Ververs PM , Wickens CD . Head-up displays: effects of clutter, display intensity, and display location on pilot performance . Int J Aviat Psychol 1998 ; 8 : 377 – 403 .

24. Wang C , Griebel S , Brandstein M , Hsu B-J . Real-time automated video and audio capture with multiple cameras and microphones . Journal of VLSI Signal Processing Systems 2001 ; 29 : 81 – 99 .

25. Wichmann FA , Hill NJ . The psychometric function: I. Fitting, sampling, and goodness of fi t . Percept Psychophys 2001 ; 63 : 1314 – 29 .

26. Wickens CD , Alexander AL , Ambinder MS , Martens M . The role of highlighting in visual search through maps . Spat Vis 2004 ; 17 : 373 – 88 .

27. Wickens CD , McCarley JS , Alexander AL , Thomas LC , Ambinder MS , Zheng XS . Attention-situation awareness (A-SA) model of pilot error . In: Foyle DC , Hooey BL , eds. Human performance modeling in aviation . Mahwah, NJ : Lawrence Erlbaum ; 2007 .

28. Wickens CD , Vincow M , Schopper R , Lincoln J . Computational models of human performance in the design and layout of controls and displays . Wright Patterson Air Force Base, OH : SCERIAC ; 1997 . SCERIAC SOAR Report No: 97-22 .

29. Xing J . Measure of information complexity and the implications for automation design . Washington, DC : U.S. DOT FAA ; 2004 . Tech. Report No: DOT/FAA/AM-04/17 .