visual tools for databade queries and analysis

37
Visual Tools for Queries and Display of Quantitative Information in a Cancer Research Database JESSE STEWART and JERZY W. JAROMCZYK Department of Computer Science University of Kentucky, Lexington KY

Upload: moochm

Post on 05-Jul-2015

391 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Visual tools for databade queries and analysis

Visual Tools for Queries and Display of Quantitative Information

in a Cancer Research Database

JESSE STEWART and JERZY W. JAROMCZYKDepartment of Computer Science

University of Kentucky, Lexington KY

Page 2: Visual tools for databade queries and analysis

The Kentucky Cancer Registry

• The Markey Cancer has the singular mission to eliminate the morbidity and mortality of cancer

• Since its founding, the Markey Cancer Center and the UK Chandler hospital have served 2000-2200 new patients a year and is one of the few institutions nationwide that address both clinical care as well as cancer research.

• The KCR’s case count exceeded 30,000 annually as of 2009

• The KCR houses a wealth of historical data for hundreds of cancer variants, associated treatments, and their relative success across the state of Kentucky.

Page 3: Visual tools for databade queries and analysis

Patient

Events

Abstracting

CPDMS.NET

Internet

HTTPS

Registry DB

MySQL

Data Collection

Page 4: Visual tools for databade queries and analysis

Cancer Abstracts

• A cancer abstract contains up to 240 different elements ranging from patient demographics to staging information to therapy history

• KCR alone stores tens of thousands of unique abstracts

• Each abstract is created by a registrar, a professional trained to understand cancer data standards, formats and coding rules

Page 5: Visual tools for databade queries and analysis

Accelerating Cancer Research

Develop Queries

Visualize Data Sets

Discover Important

Correlations

Page 6: Visual tools for databade queries and analysis

Registry Databases and Research

Challenges in Research•Coded Data•SQL•Complex DB Schemas•Access Control•Visualization

Valuable Information•Survival Trends•Incidence Rates•Behavioral and Geographical Correlation

Page 7: Visual tools for databade queries and analysis

Software Solutions

• Define Queries (Data Sets)– Intuitive: no programming required

– Flexible: allow any data set to be explored

– Accessible: Visual cross-browser application

– Re-use: Save, modify and combine Data Sets

• Data Analysis and Visualization:– Context-specific diagrams

– Compare data sets singularly or side-by-side

– Customizable appearance

Page 8: Visual tools for databade queries and analysis

The Query Builder

• Presents a high-level abstraction of the Registry Database

• Patient, Case, Therapy data variables are easily recognizable and categorized

• Separates the user from the actual database structure and coded information

– Example: Treatment is encoded as:

• No Treatment=0, Treatment=1, Surveillance=2

Page 9: Visual tools for databade queries and analysis

The Query Builder

• Translates a question about cancer data into SQL (Structured Query Language) which can be understood by the computer system

• Parses and stores the query for modification and reuse later

Page 10: Visual tools for databade queries and analysis

Example Query

• Patients diagnosed between Jan 1, 2005 and Dec 31, 2008

• Patients diagnosed in Kentucky• Patients treated with immunotherapy

• SQL may be complex

case_data.diagdate >= 20050101 and case_data.diagdate <= 20081231 and case_tx.txtype = ‘I’ and case_data.diagstate = ‘KY’ from case_data, case_txwhere case_tx.hospkey = case_data.hospkey and case_tx.patkey = case_data.patkey and case_data.incomplete = 0;

Page 11: Visual tools for databade queries and analysis

Interface Design

• To make writing a query like the previous example simple, the Query Builder must provide intuitive controls permitting a user to define each query component

• Variable names and coded values should be descriptive and easy to locate

• Conditions should be combined in a natural way with Boolean operators

• Tree-like layout chosen to represent queries

Page 12: Visual tools for databade queries and analysis

Query Builder in Action

Page 13: Visual tools for databade queries and analysis

Custom UI Controls

• For each variable, DB schema information is used to display a customized UI control, eg:

– Dates: date fields or ranges

– Discrete variables: drop-down list or multiselect

– Variable with many values: autofill field

Page 14: Visual tools for databade queries and analysis

Syntax Tree

Page 15: Visual tools for databade queries and analysis

Internal Representation

• Program maintains an abstract syntax tree for the query as it is created

• Captures the essential structure of the query but omits SQL-specific syntax

• This data structure serves as an intermediary between the interface and the database system

• Permits two code-generation targets: JSON and SQL

Page 16: Visual tools for databade queries and analysis

Serialization and Storage

• Each query once created by the user may be saved for future analysis or manipulation

• The program stores the AST for the query as a JavaScript object, which can then be serialized into JSON (JavaScript Object Notation) and then stored.

• Deserialization and conversion to SQL is performed later for analysis

Page 17: Visual tools for databade queries and analysis

Query Management

Page 18: Visual tools for databade queries and analysis

Query Storage

• Queries are often referred to as ‘study groups’ by researchers

• The serialized queries and associated metadata is stored in a database table

study_groups: id | Name | Query| User | LastModified | LastUsed

• MySQL database was chosen for convenience since registry data is stored using this system

Page 19: Visual tools for databade queries and analysis

Visualization Tools

– Scaled Venn Diagrams• User can quickly ascertain relative size of data sets and

their relationship to one another

– Bar and Histogram Charts• Flexible view of variable distribution for different sets

– Survival Trends• View and compare survival rates over time

– Statistics• Common descriptive statistics

• Comparison with Chi-square, Log rank, T-, Z-tests

Page 20: Visual tools for databade queries and analysis

Venn Diagrams

• Venn diagrams show logical relationships between a number of sets

• Subset of Euler diagrams – all possible subsets must be displayed

• Can quickly convey how data sets overlap and relate to one another

Page 21: Visual tools for databade queries and analysis

Area Proportionality

• Area-proportional venn diagrams show the relative size of datasets and their intersections

• Very useful for rapid exploration of data sets such as cancer data

• Although typical venn digrams often display 3 sets, area-proportional diagrams cannot always be drawn with circles for more than 2 sets [1]

• The vast majority of research needs involve comparison of two data sets

Page 22: Visual tools for databade queries and analysis

Drawing To Scale

Circle-intersection problemTriangle(C1,C2,A) = Triangle(C1,C2,B)Triangle(C1,C2,A) + Triangle(C1,C2,B) + Lens = Sector(C1,A,B) + Sector(C2,A,B)Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A)

Page 23: Visual tools for databade queries and analysis

Drawing to Scale

Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A)

• By applying formulas for the area of a circular sector and triangle, we arrive at this result for the distance between the circles’ centers:

• The value must be approximated, to do so the Root-bisection method was used in implementation.

Page 24: Visual tools for databade queries and analysis

Visualization: Venn Diagrams

Page 25: Visual tools for databade queries and analysis

Visualization: Venn Diagrams

Page 26: Visual tools for databade queries and analysis

Reports

• Several customizable reports were implemented to further leverage the query builder’s utility.

• Each is implemented in PHP, and produces an SQL query using the saved criteria and the settings selected by the user for the report

Page 27: Visual tools for databade queries and analysis

Data List Tool

Page 28: Visual tools for databade queries and analysis

Cross-Tab Analysis

Page 29: Visual tools for databade queries and analysis

Graph Settings Interface

Page 30: Visual tools for databade queries and analysis

Visualization: Histogram

Page 31: Visual tools for databade queries and analysis

Visualization: Survival Trends

Page 32: Visual tools for databade queries and analysis

Chi-square Analysis

Page 33: Visual tools for databade queries and analysis

Censored Life Table

Page 34: Visual tools for databade queries and analysis

Success

• The Visual Query Builder and Data Analysis tools have become an integral part of CPDMS.NET – the online abstracting system developed at the KCR.

• Over 5000 study groups have been created by users of the system.

• Features have been added and improved resulting from feedback given by researchers and registrars (cancer data professionals).

• Future developments may include:– Wider array of statistical tests

– Functions to analyze more than two data sets at once

Page 35: Visual tools for databade queries and analysis

References

• The Kentucky Cancer Registry – A History

• http://www.kcr.uky.edu/about.php

• F. Ruskey and M. Weston – A Survey of Venn Diagrams

• http://www.combinatorics.org/Surveys/ds5/VennEJC.html

• S. Chow and F. Ruskey, Drawing Area-Proportional Venn and Euler Diagrams

• Circle-Circle Intersection Problem

• http://mathworld.wolfram.com/Circle-CircleIntersection.html

Page 36: Visual tools for databade queries and analysis

Acknowledgements

Eric Durbin, Kentucky Cancer Registry

Dr. Jerzy Jaromczyk, UK Computer Science

Page 37: Visual tools for databade queries and analysis

Software