using sankey diagram to analyze drug pipelinesankey diagrams are a specific type of flow diagram, in...

21
- 1 - Paper DV03 Using Sankey Diagram to Analyze Drug Pipeline Tanmay Khole, Bristol-Myers Squibb, Berkeley Heights NJ, USA ABSTRACT Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. Sankey diagrams put a visual emphasis on the major transfers or flows within a system. They are helpful in locating dominant contributions to an overall flow. This paper will focus on drug pipeline of a sponsor and leverage data from clinicaltrials.gov to analyze number of clinical trials a sponsor has with respect to conditions, interventions, and phases. This will be visualized with the use of Sankey diagram and display the weightage a sponsor has given to a drug or a condition based on the phases of clinical trials. A drug pipeline gives us an idea about the future of a company and this paper will give a deep dive on some of the aspects by use of sankey diagram. INTRODUCTION This paper analyzes data from clinicaltrials.gov for selected few clinical trial sponsors and uses that info to create sankey diagram. A sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages and data from clinicaltrials.gov is an excellent example to analyze a sponsor’s drug pipeline to see which clinical condition or interventions are focused by sponsor with respect to stages of clinical trials. Techniques such as data mapping, data analysis and data visualization are used to create the sankey diagrams displayed in this paper. Phase I clinical trials are excluded from data analysis and data visualization for ease of understanding the flow of clinical trials which are in Phase 2-4. Data is obtained in csv file format from clinicaltrials.gov using advanced search option and searching only for sponsor section. Analysis is performed on trials with status: "Active, not recruiting", "Available", "Enrolling by invitation, "Not yet recruiting", or "Recruiting".

Upload: others

Post on 28-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • - 1 -

    Paper DV03

    Using Sankey Diagram to Analyze Drug Pipeline

    Tanmay Khole, Bristol-Myers Squibb, Berkeley Heights NJ, USA

    ABSTRACT

    Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. Sankey diagrams put a

    visual emphasis on the major transfers or flows within a system. They are helpful in locating dominant contributions to an overall flow. This paper will focus on drug pipeline of a sponsor and leverage data from clinicaltrials.gov to

    analyze number of clinical trials a sponsor has with respect to conditions, interventions, and phases. This will be visualized with the use of Sankey diagram and display the weightage a sponsor has given to a drug or a condition

    based on the phases of clinical trials. A drug pipeline gives us an idea about the future of a company and this paper will give a deep dive on some of the

    aspects by use of sankey diagram.

    INTRODUCTION

    This paper analyzes data from clinicaltrials.gov for selected few clinical trial sponsors and uses that info to create sankey diagram. A sankey diagram is a

    visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys

    are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages and data from clinicaltrials.gov is an excellent example to analyze a sponsor’s drug pipeline to

    see which clinical condition or interventions are focused by sponsor with respect to stages of clinical trials. Techniques such as data mapping, data

    analysis and data visualization are used to create the sankey diagrams displayed in this paper. Phase I clinical trials are excluded from data analysis and data visualization for ease of understanding the flow of clinical trials which

    are in Phase 2-4. Data is obtained in csv file format from clinicaltrials.gov using advanced search option and searching only for sponsor section. Analysis is performed on trials with status: "Active, not recruiting", "Available", "Enrolling

    by invitation, "Not yet recruiting", or "Recruiting".

  • - 2-

    SANKEY DIAGRAM FOR CLINICALTRIALS.GOV DATA

    Data obtained from clinicaltrials.gov in csv format is one record per trial, see figure 1. In order to use it for Sankey diagram, it needs to be processed as per below steps:

    • Data Mapping

    • Data Analysis

    • Data Visualization

    Figure 1: Data obtained from clinicaltrials.gov and imported into SAS® dataset.

    Sponsors listed in table 1 are considered in this paper for data analysis and to

    create sankey diagrams for the on-going clinical trials of each sponsor.

    Clinical trials with status: "Active, not recruiting", "Available", "Enrolling by invitation, "Not yet recruiting", or "Recruiting" are considered as on-going.

    Only those clinical trials are selected where sponsor is the lead sponsor of that clinical trial.

    Sponsor Distinct On-going

    Clinical Trials Count

    Data Extraction Date

    Sponsor 1 Bristol-Myers Squibb 250 22NOV2019

    Sponsor 2 Janssen 126

    Sponsor 3 Merck & Co. 173

    Sponsor 4 Amgen 56

    Sponsor 5 Bayer 56

    Table 1: List of Sponsors

    22JAN2020

  • - 3-

    DATA MAPPING

    Data mapping is an essential component in order to connect links and nodes in

    sankey diagrams. Clinical trials data obtained from clinicaltrials.gov contains multiple names for same conditions (e.g.: “NSCLC”, “Non-Small Cell Lung

    Cancer”, or “Carcinoma, Non-Small-Cell Lung”), figure 2, and multiple names for same drug/biologic compounds (e.g.: "Nivolumab", "Opdivo", "BMS-936558", "ONO-4538“), figure 3. Hence it is important to identify each

    condition and intervention into correct category. As there are numerous conditions, they are mapped into high-level categories like Solid Tumors, Cardiovascular, Leukemia & Lymphoma, etc. See figure 4 for example of

    mapping different conditions to high-level category.

    Figure 2: Mapping different names of same condition into single category.

    Figure 3: Mapping different names of same compound/intervention into single category.

    Figure 4: Mapping different conditions to high-level category.

  • - 4-

    Below mapping rules are applied before data analysis step. The mapping rules are designed to identify the focus of the sponsor regards to clinical

    conditions/interventions.

    • Clinical trials with multiple phases are mapped toward the higher phase

    • Clinical trials with multiple clinical conditions are mapped towards each condition

    • Clinical trials with multiple interventions are mapped towards each intervention of the respective sponsor

    Example 1: Clinical trial NCT03331198, title “Study Evaluating Safety and Efficacy of JCAR017 in Subjects With Relapsed or Refractory Chronic

    Lymphocytic Leukemia (CLL) or Small Lymphocytic Lymphoma (SLL)”, has trial design for phase 1 and phase 2. As per the mapping rules, it will be mapped for Phase 2 only. This trial also has multiple clinical conditions listed such as

    Chronic Lymphocytic Leukemia, Small Lymphocytic Lymphoma, and will be mapped to each clinical condition as per the mapping rules.

    Example 2: Clinical trial NCT04088500, title “A Study of Combination

    Nivolumab and Ipilimumab Retreatment in Patients With Advanced Renal Cell Carcinoma” has multiple interventions: Nivolumab and Ipilimumab. As per the

    mapping rules, this trial will be mapped to each intervention listed.

    Example 3: Clinical trial NCT03036098, title “Study of Nivolumab in Combination With Ipilimumab or Standard of Care Chemotherapy Compared to

    the Standard of Care Chemotherapy Alone in Treatment of Patients With Untreated Inoperable or Metastatic Urothelial Cancer” has multiple

    interventions: nivolumab, ipilimumab, gemcitabine, cisplatin, carboplatin but only the first two are sponsor’s compounds, hence this trial will be mapped to two interventions: nivolumab & ipilimumab.

    Data mapping for this paper is performed by creating flags/identifiers for each condition and intervention listed in respective sponsor’s clinical trials data. Each sponsor listed in table 1 have unique compounds and mapping of each

    compound/intervention is required by closely observing the data.

    Data obtained from clinicaltrials.gov is one record per trial (horizontal data

    format) and it needs to be transformed into vertical data format as shown in figure 5 by using the flags created for each condition category and intervention.

  • - 5-

    Figure 5: Horizontal data mapped and transformed into vertical data format

    DATA ANALYSIS

    Data analysis is performed by calculating number of objects with respect to its categories which needs to be displayed in sankey diagram. The categories are used as nodes and the count of those objects are used to determine the width

    of links between the selected categories.

    In this paper, data analysis is performed by calculating number of clinical

    trials with respect to sponsor, conditions, interventions, and phases. This step is performed after data mapping to ensure correct connection of links and nodes. SAS® macro %sankey_nodes is used for data analysis and reference

    code can be found in the appendix.

    %sankey_nodes(inds = ct_gov

    ,outds = sankey_out

    ,nodes=%str(sponsor|conditions|interventions|phases)

    ,cond =

    );

    %sankey_nodes will calculate the number of objects, in this case, number of

    clinical trials. The mapped data is fed into “inds” macro parameter. The nodes

    (categories) which needs to be displayed in the sankey diagram are listed in “nodes” macro parameter and if any condition needs to be applied, it can be

    listed in “cond” macro parameter. This macro creates a macro variable

    &sankeydata. and output dataset which has data for sankey diagram stored in

    it. It gets used in the data visualization step to create sankey diagram.

  • - 6-

    DATA VISUALIZATION

    Data visualization step is performed using SAS® macro %sankey2html and

    D3.js which is a JavaScript library. The output created is in HTML format.

    %sankey2html(indata = %nrbquote(&sankeydata.)

    ,outfl = %sysfunc(pathname(outg,f))/sankey.html

    ,width = 2100

    ,height = 700

    ,flow_num =

    );

    %sankey2html macro reads macro variable &sankeydata. created from

    %sankey_nodes and implement it in HTML file. The output file location and

    HTML filename is specified in “outfl” macro parameter. “width” and “height”

    parameters are used for sankey diagram height and width. “flow_num” parameter is used to display link labels above a specified number.

    Sankey diagrams displays flow of number of clinical trials from

    SPONSOR → CONDITIONS → INTERVENTIONS → PHASES

    which are also used as nodes for sankey diagrams displayed in this paper.

    The thickness of the links signifies the number of clinical trials connecting the

    nodes.

  • Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases

    Number of on-going clinical trials for each node are displayed in parenthesis.

    Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards

    each condition; Clinical trials with multiple interventions are counted towards each intervention.

    Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.

    - 7 -

    SANKEY DIAGRAM 1

    Sponsor: Bristol-Myers Squibb

    https://clinicaltrials.gov/ct2/about-site/background

  • Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.

    -8-

    SANKEY DIAGRAM 2

    Sponsor: Janssen

    https://clinicaltrials.gov/ct2/about-site/background

  • Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.

    -9-

    SANKEY DIAGRAM 3

    Sponsor: Merck & Co.

    https://clinicaltrials.gov/ct2/about-site/background

  • Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.

    -10-

    SANKEY DIAGRAM 4

    Sponsor: Amgen

    https://clinicaltrials.gov/ct2/about-site/background

  • Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.

    -11-

    SANKEY DIAGRAM 5

    Sponsor: Bayer

    https://clinicaltrials.gov/ct2/about-site/background

  • - 12 -

    CONCLUSION

    Sankey diagram is an impressive data visualization tool to understand flow of clinical trials. It helps to track several clinical trials in a single view. Sankey diagram also facilitates to understand weightage of clinical condition or

    intervention with respect to the phases of clinical trials and it represents flow in a manner that can be understood by anyone, instantly. Sankey diagrams in this paper allows user to see complex pipeline of a sponsor in a single image

    with a focus on the clinical conditions and interventions/compounds of that sponsor. Sankey diagrams make dominant clinical conditions or interventions stand out, and they help users to see relative magnitudes and/or areas with

    the largest opportunities.

    By using provided macros, sankey diagram can be adjusted as per user’s need. Sankey diagrams offer the added benefit of supporting multiple viewing levels. Users can get a high-level view, see specific details, or generate custom

    diagrams by using provided macros.

    NOTE FROM THE AUTHOR

    Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the public data acquired from clinicaltrials.gov. This presentation reflects views of the author and should not be construed to represent any of the clinical trial sponsors’ pipeline. ClinicalTrials.gov is a Web-based resource that provides patients, their family members, health care professionals, researchers, and the public with easy access to information on publicly and privately supported clinical studies on a wide range of diseases and conditions.

    ACKNOWLEDGMENTS

    I would like to thank Vineet Mathur and Simon Xue for their guidance and

    support for this paper. You supported me greatly and were always willing to help me.

    I would like to thank the dedicated people who manage and maintain clinicaltrials.gov and d3js.org. Without the resources from these sources, this

    paper wouldn’t be possible.

    https://clinicaltrials.gov/https://d3js.org/

  • -13-

    CONTACT INFORMATION

    Your comments and questions are valued and encouraged. Contact the author at:

    Author Name: Tanmay Khole

    Company: Bristol-Myers Squibb

    Address: 300 Connell Drive, Berkeley Heights

    City / Postcode: NJ 07922

    Email: [email protected]

    Brand and product names are trademarks of their respective companies.

    APPENDIX

    %macro sankey_nodes(inds=, outds=, nodes=, cond=);

    %let cnt = %eval(%sysfunc(countc(&nodes.,"|")) +1);

    %put &cnt.;

    data _inds;

    set &inds.;

    run;

    %do i = 1 %to &cnt;

    %let single&i. = %scan(&nodes, &i , '|');

    %put single&i. = &&single&i;

    %end;

    proc sql;

    %do i = 1 %to %eval(&cnt. -1);

    create table &&single&i.._wt_chk as

    select distinct

    %do k=1 %to &i. ;

    &&single&k.,

    %end;

    %superq(single%eval(&i. +1)), &&single&i. as SOURCE

    length=100, %superq(single%eval(&i. +1)) as TARGET length=100, count(&&single&i.) as VALUE,

    "

    {'source':'"||strip(&&single&i.)||"','target':'"||strip(%superq(single%eval(&i.

    +1)))||"','value':"||strip(put(count(&&single&i.), 5.0))||"}," as final length=1000

    from _inds

    %if &cond. ne %then %do;

    where &cond.

    %end;

    group by

    %do k=1 %to &i. ;

    &&single&k.,

    %end;

    %superq(single%eval(&i. +1))

    ;

    %end;

    quit;

    data &outds.;

    set

    %do i = 1 %to %eval(&cnt. -1);

    &&single&i.._wt_chk

    %end;

    ;

    run;

  • -14-

    options linesize=max;

    %global sankeydata;

    proc sql noprint;

    select final into: sankeydata separated by " "

    from &outds.

    ;

    quit;

    %put &sankeydata. ;

    %mend sankey_nodes;

    %macro sankey2html(indata=, outfl=, width=, height=, flow_num=);

    data _null_;

    file "&outfl.";

    put '';

    put '';

    put '';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' .node rect {';

    put ' cursor: move;';

    put '}';

    put '.link {';

    put ' fill: none;';

    put ' stroke: #000;';

    put ' stroke-opacity: .2;';

    put '}';

    put '.link:hover {';

    put ' stroke-opacity: .5;';

    put '} ';

    put ' * {';

    put ' font: 11px sans-serif;';

    put '}';

    put '.linkLabel {';

    put ' z-index:10;';

    put '}';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' //

  • -15-

    put ' nodes = [],';

    put ' links = [];';

    put ' ';

    put ' sankey.nodeWidth = function (_) {';

    put ' if (!arguments.length) return nodeWidth;';

    put ' nodeWidth = +_;';

    put ' return sankey;';

    put ' };';

    put ' ';

    put ' sankey.nodePadding = function (_) {';

    put ' if (!arguments.length) return nodePadding;';

    put ' nodePadding = +_;';

    put ' return sankey;';

    put ' };';

    put ' ';

    put ' sankey.nodes = function (_) {';

    put ' if (!arguments.length) return nodes;';

    put ' nodes = _;';

    put ' return sankey;';

    put ' };';

    put ' ';

    put ' sankey.links = function (_) {';

    put ' if (!arguments.length) return links;';

    put ' links = _;';

    put ' return sankey;';

    put ' };';

    put ' ';

    put ' sankey.size = function (_) {';

    put ' if (!arguments.length) return size;';

    put ' size = _;';

    put ' return sankey;';

    put ' };';

    put ' ';

    put ' sankey.layout = function (iterations) {';

    put ' computeNodeLinks();';

    put ' computeNodeValues();';

    put ' computeNodeBreadths();';

    put ' computeNodeDepths(iterations);';

    put ' computeLinkDepths();';

    put ' return sankey;';

    put ' };';

    put ' ';

    put ' sankey.relayout = function () {';

    put ' computeLinkDepths();';

    put ' return sankey;';

    put ' };';

    put ' ';

    put ' sankey.link = function () {';

    put ' var curvature = .5;';

    put ' ';

    put ' function link(d) {';

    put ' var x0 = d.source.x + d.source.dx,';

    put ' x1 = d.target.x,';

    put ' xi = d3.interpolateNumber(x0, x1),';

    put ' x2 = xi(curvature),';

    put ' x3 = xi(1 - curvature),';

    put ' y0 = d.source.y + d.sy + d.dy / 2,';

    put ' y1 = d.target.y + d.ty + d.dy / 2;';

    put ' return "M" + x0 + "," + y0 + "C" + x2 + "," + y0 + " " + x3 + "," + y1 +

    " " + x1 + "," + y1;';

    put ' }';

    put ' ';

    put ' link.curvature = function (_) {';

    put ' if (!arguments.length) return curvature;';

    put ' curvature = +_;';

    put ' return link;';

    put ' };';

    put ' ';

    put ' return link;';

    put ' };';

    put ' ';

    put ' // Populate the sourceLinks and targetLinks for each node.';

    put ' // Also, if the source and target are not objects, assume they are indices.';

    put ' function computeNodeLinks() {';

    put ' nodes.forEach(function (node) {';

    put ' node.sourceLinks = [];';

  • -16-

    put ' node.targetLinks = [];';

    put ' });';

    put ' links.forEach(function (link) {';

    put ' var source = link.source,';

    put ' target = link.target;';

    put ' if (typeof source === "number") source = link.source =

    nodes[link.source];';

    put ' if (typeof target === "number") target = link.target =

    nodes[link.target];';

    put ' source.sourceLinks.push(link);';

    put ' target.targetLinks.push(link);';

    put ' });';

    put ' }';

    put ' ';

    put ' // Compute the value (size) of each node by summing the associated links.';

    put ' function computeNodeValues() {';

    put ' nodes.forEach(function (node) {';

    put ' node.value = Math.max(';

    put ' d3.sum(node.sourceLinks, value),';

    put ' d3.sum(node.targetLinks, value));';

    put ' });';

    put ' }';

    put ' ';

    put ' // Iteratively assign the breadth (x-position) for each node.';

    put ' // Nodes are assigned the maximum breadth of incoming neighbors plus one;';

    put ' // nodes with no incoming links are assigned breadth zero, while';

    put ' // nodes with no outgoing links are assigned the maximum breadth.';

    put ' function computeNodeBreadths() {';

    put ' var remainingNodes = nodes,';

    put ' nextNodes,';

    put ' x = 0;';

    put ' ';

    put ' while (remainingNodes.length) {';

    put ' nextNodes = [];';

    put ' remainingNodes.forEach(function (node) {';

    put ' node.x = x;';

    put ' node.dx = nodeWidth;';

    put ' node.sourceLinks.forEach(function (link) {';

    put ' nextNodes.push(link.target);';

    put ' });';

    put ' });';

    put ' remainingNodes = nextNodes;';

    put ' ++x;';

    put ' }';

    put ' ';

    put ' //';

    put ' moveSinksRight(x);';

    put ' scaleNodeBreadths((width - nodeWidth) / (x - 1));';

    put ' }';

    put ' ';

    put ' function moveSourcesRight() {';

    put ' nodes.forEach(function (node) {';

    put ' if (!node.targetLinks.length) {';

    put ' node.x = d3.min(node.sourceLinks, function (d) {';

    put ' return d.target.x;';

    put ' }) - 1;';

    put ' }';

    put ' });';

    put ' }';

    put ' ';

    put ' function moveSinksRight(x) {';

    put ' nodes.forEach(function (node) {';

    put ' if (!node.sourceLinks.length) {';

    put ' node.x = x - 1;';

    put ' }';

    put ' });';

    put ' }';

    put ' ';

    put ' function scaleNodeBreadths(kx) {';

    put ' nodes.forEach(function (node) {';

    put ' node.x *= kx;';

    put ' });';

    put ' }';

    put ' ';

    put ' function computeNodeDepths(iterations) {';

    put ' var nodesByBreadth = d3.nest()';

  • -17-

    put ' .key(function (d) {';

    put ' return d.x;';

    put ' })';

    put ' .sortKeys(d3.ascending)';

    put ' .entries(nodes)';

    put ' .map(function (d) {';

    put ' return d.values;';

    put ' });';

    put ' ';

    put ' //';

    put ' initializeNodeDepth();';

    put ' resolveCollisions();';

    put ' for (var alpha = 1; iterations > 0; --iterations) {';

    put ' relaxRightToLeft(alpha *= .99);';

    put ' resolveCollisions();';

    put ' relaxLeftToRight(alpha);';

    put ' resolveCollisions();';

    put ' }';

    put ' ';

    put ' function initializeNodeDepth() {';

    put ' var ky = d3.min(nodesByBreadth, function (nodes) {';

    put ' return (size[1] - (nodes.length - 1) * nodePadding) / d3.sum(nodes,

    value);';

    put ' });';

    put ' ';

    put ' nodesByBreadth.forEach(function (nodes) {';

    put ' nodes.forEach(function (node, i) {';

    put ' node.y = i;';

    put ' node.dy = node.value * ky;';

    put ' });';

    put ' });';

    put ' ';

    put ' links.forEach(function (link) {';

    put ' link.dy = link.value * ky;';

    put ' });';

    put ' }';

    put ' ';

    put ' function relaxLeftToRight(alpha) {';

    put ' nodesByBreadth.forEach(function (nodes, breadth) {';

    put ' nodes.forEach(function (node) {';

    put ' if (node.targetLinks.length) {';

    put ' var y = d3.sum(node.targetLinks, weightedSource) /

    d3.sum(node.targetLinks, value);';

    put ' node.y += (y - center(node)) * alpha;';

    put ' }';

    put ' });';

    put ' });';

    put ' ';

    put ' function weightedSource(link) {';

    put ' return center(link.source) * link.value;';

    put ' }';

    put ' }';

    put ' ';

    put ' function relaxRightToLeft(alpha) {';

    put ' nodesByBreadth.slice().reverse().forEach(function (nodes) {';

    put ' nodes.forEach(function (node) {';

    put ' if (node.sourceLinks.length) {';

    put ' var y = d3.sum(node.sourceLinks, weightedTarget) /

    d3.sum(node.sourceLinks, value);';

    put ' node.y += (y - center(node)) * alpha;';

    put ' }';

    put ' });';

    put ' });';

    put ' ';

    put ' function weightedTarget(link) {';

    put ' return center(link.target) * link.value;';

    put ' }';

    put ' }';

    put ' ';

    put ' function resolveCollisions() {';

    put ' nodesByBreadth.forEach(function (nodes) {';

    put ' var node,';

    put ' dy,';

    put ' y0 = 0,';

    put ' n = nodes.length,';

    put ' i;';

  • -18-

    put ' ';

    put ' // Push any overlapping nodes down.';

    put ' nodes.sort(ascendingDepth);';

    put ' for (i = 0; i < n; ++i) {';

    put ' node = nodes[i];';

    put ' dy = y0 - node.y;';

    put ' if (dy > 0) node.y += dy;';

    put ' y0 = node.y + node.dy + nodePadding;';

    put ' }';

    put ' ';

    put ' // If the bottommost node goes outside the bounds, push it back up.';

    put ' dy = y0 - nodePadding - size[1];';

    put ' if (dy > 0) {';

    put ' y0 = node.y -= dy;';

    put ' ';

    put ' // Push any overlapping nodes back up.';

    put ' for (i = n - 2; i >= 0; --i) {';

    put ' node = nodes[i];';

    put ' dy = node.y + node.dy + nodePadding - y0;';

    put ' if (dy > 0) node.y -= dy;';

    put ' y0 = node.y;';

    put ' }';

    put ' }';

    put ' });';

    put ' }';

    put ' ';

    put ' function ascendingDepth(a, b) {';

    put ' return a.y - b.y;';

    put ' }';

    put ' }';

    put ' ';

    put ' function computeLinkDepths() {';

    put ' nodes.forEach(function (node) {';

    put ' node.sourceLinks.sort(ascendingTargetDepth);';

    put ' node.targetLinks.sort(ascendingSourceDepth);';

    put ' });';

    put ' nodes.forEach(function (node) {';

    put ' var sy = 0,';

    put ' ty = 0;';

    put ' node.sourceLinks.forEach(function (link) {';

    put ' link.sy = sy;';

    put ' sy += link.dy;';

    put ' });';

    put ' node.targetLinks.forEach(function (link) {';

    put ' link.ty = ty;';

    put ' ty += link.dy;';

    put ' });';

    put ' });';

    put ' ';

    put ' function ascendingSourceDepth(a, b) {';

    put ' return a.source.y - b.source.y;';

    put ' }';

    put ' ';

    put ' function ascendingTargetDepth(a, b) {';

    put ' return a.target.y - b.target.y;';

    put ' }';

    put ' }';

    put ' ';

    put ' function center(node) {';

    put ' return node.y + node.dy / 2;';

    put ' }';

    put ' ';

    put ' function value(link) {';

    put ' return link.value;';

    put ' }';

    put ' ';

    put ' return sankey;';

    put '};';

    put ' ';

    put ' ';

    put '/* ------------------- our code ------------------------ */';

    put '//var canvas = document.getElementById("chart");';

    put ' ';

    put 'var units = "Widgets";';

    put ' ';

    put 'var margin = {';

  • -19-

    put ' top: 10,';

    put ' right: 10,';

    put ' bottom: 10,';

    put ' left: 10';

    put '},';

    /****ADJUST WIDTH AND HEIGHT****/

    put "width = &width - margin.left - margin.right,";

    /*******************************/

    put " height = &height - margin.top - margin.bottom;";

    put ' ';

    put 'var formatNumber = d3.format(",.0f"), // zero decimal places';

    put ' format = function (d) {';

    put ' return formatNumber(d) + " " + units;';

    put ' },';

    put ' color = d3.scale.category20();';

    put ' ';

    put '// append the svg canvas to the page';

    put 'var svg = d3.select("#chart").append("svg")';

    put ' .attr("width", width + margin.left + margin.right)';

    put ' .attr("height", height + margin.top + margin.bottom)';

    put ' .append("g")';

    put ' .attr("transform",';

    put ' "translate(" + margin.left + "," + margin.top + ")");';

    put ' ';

    put '// Set the sankey diagram properties';

    put 'var sankey = d3.sankey()';

    put ' .nodeWidth(10)';

    put ' .nodePadding(20)';

    put ' .size([width, height]);';

    put ' ';

    put 'var path = sankey.link();';

    put ' ';

    put ' ';

    put 'var data = [';

    put "&indata.";

    put ']; ';

    put ' ';

    put '//set up graph in same style as original example but empty';

    put 'graph = {';

    put ' "nodes": [],';

    put ' "links": []';

    put '};';

    put ' ';

    put 'data.forEach(function (d) {';

    put ' graph.nodes.push({';

    put ' "name": d.source';

    put ' });';

    put ' graph.nodes.push({';

    put ' "name": d.target';

    put ' });';

    put ' graph.links.push({';

    put ' "source": d.source,';

    put ' "target": d.target,';

    put ' "value": +d.value';

    put ' });';

    put '});';

    put ' ';

    put '// return only the distinct / unique nodes';

    put 'graph.nodes = d3.keys(d3.nest()';

    put ' .key(function (d) {';

    put ' return d.name;';

    put '})';

    put ' .map(graph.nodes));';

    put ' ';

    put '// loop through each link replacing the text with its index from node';

    put 'graph.links.forEach(function (d, i) {';

    put ' graph.links[i].source = graph.nodes.indexOf(graph.links[i].source);';

    put ' graph.links[i].target = graph.nodes.indexOf(graph.links[i].target);';

    put '});';

    put ' ';

    put '//now loop through each nodes to make nodes an array of objects';

    put '// rather than an array of strings';

    put 'graph.nodes.forEach(function (d, i) {';

    put ' graph.nodes[i] = {';

  • -20-

    put ' "name": d';

    put ' };';

    put '});';

    put ' ';

    put 'sankey.nodes(graph.nodes)';

    put ' .links(graph.links)';

    put ' .layout(32);';

    put ' ';

    put '// add in the links';

    put 'var link = svg.append("g").selectAll(".link")';

    put ' .data(graph.links)';

    put ' .enter()';

    put ' .append("path")';

    put ' .attr("class", "link")';

    put ' .attr("id",function(d,i) { return "linkLabel" + i; })';

    put ' .attr("d", path)';

    put ' .style("stroke-width", function (d) {';

    put ' return Math.max(1, d.dy);';

    put ' })';

    put ' .sort(function (a, b) {';

    put ' return b.dy - a.dy;';

    put ' })';

    put ' ';

    put ' ';

    put ' ';

    put '// add in the nodes';

    put 'var node = svg.append("g").selectAll(".node")';

    put ' .data(graph.nodes)';

    put ' .enter().append("g")';

    put ' .attr("class", "node")';

    put ' .attr("transform", function (d) {';

    put ' return "translate(" + d.x + "," + d.y + ")";';

    put '})';

    put ' .call(d3.behavior.drag()';

    put ' .origin(function (d) {';

    put ' return d;';

    put '})';

    put ' .on("dragstart", function () {';

    put ' this.parentNode.appendChild(this);';

    put '})';

    put ' .on("drag", dragmove));';

    put ' ';

    put '// add the rectangles for the nodes';

    put 'node.append("rect")';

    put ' .attr("height", function (d) {';

    put ' return d.dy;';

    put '})';

    put ' .attr("width", sankey.nodeWidth())';

    put ' .style("fill", function (d) {';

    put ' return d.color = color(d.name.replace(/ .*/, ""));';

    put '})';

    put ' .style("stroke", function (d) {';

    put ' return d3.rgb(d.color);//.darker(2);';

    put '})';

    put ' .append("title")';

    put ' .text(function (d) {';

    put ' return d.name + "\n" + format(d.value);';

    put '});';

    put ' ';

    put '// add in the title for the nodes';

    put 'node.append("text")';

    put ' .attr("x", -6)';

    put ' .attr("y", function (d) {';

    put ' return d.dy / 2;';

    put '})';

    put ' .attr("dy", ".35em")';

    put ' .attr("text-anchor", "end")';

    put ' .attr("transform", null)';

    put ' .text(function (d) {';

    put ' return d.name + " (" + d.value + ")";';

    put '})';

    put ' .filter(function (d) {';

    put ' return d.x < width / 2;';

    put '})';

    put ' .attr("x", 6 + sankey.nodeWidth())';

    put ' .attr("text-anchor", "start");';

  • -21-

    put ' ';

    put '/* add labels to graphs */';

    put 'var labelText = svg.selectAll(".labelText")';

    put ' .data(graph.links)';

    put ' .enter()';

    put ' .append("text")';

    put ' .attr("class","labelText")';

    put ' .attr("dx",130)';

    put ' .attr("dy",0)';

    put ' .append("textPath")';

    put ' .attr("xlink:href",function(d,i) { return "#linkLabel" + i;})';

    put ' .text(function(d,i) ';

    put ' { ';

    %if &flow_num. > 0 %then %do;

    put " if (d.value > &flow_num.) return ' -> ' + d.target.name + ' : ' + d.value;";

    %end;

    put ' }';

    put ' );';

    put '// if (d.value > 10) return " -> " + d.value + " -> ";});';

    put ' ';

    put '// the function for moving the nodes';

    put 'function dragmove(d) {';

    put ' d3.select(this).attr("transform",';

    put ' "translate(" + d.x + "," + (';

    put ' d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))) + ")");';

    put ' sankey.relayout();';

    put ' link.attr("d", path);';

    put '}';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' }';

    put ' ';

    put ' //]]>';

    put ' ';

    put '';

    put '';

    put '

    ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' ';

    put ' // tell the embed parent frame the height of the content';

    put ' if (window.parent && window.parent.parent){';

    put ' window.parent.parent.postMessage(["resultsFrame", {';

    put ' height: document.body.getBoundingClientRect().height,';

    put ' slug: "Lsjkhzf1"';

    put ' }], "*")';

    put ' }';

    put ' ';

    put ' // always overwrite window.name, in case users try to set it manually';

    put ' window.name = "result"';

    put ' ';

    put '';

    put '';

    ;;;;

    run;

    %mend sankey2html;