introduction to clinical sas programming

23
Introduction to Clinical SAS Programming

Upload: ray4hz

Post on 14-Jan-2017

902 views

Category:

Data & Analytics


22 download

TRANSCRIPT

Page 1: Introduction to clinical sas programming

Introduction to Clinical SAS Programming

Page 2: Introduction to clinical sas programming
Page 3: Introduction to clinical sas programming

Source: http://www.cdisc.org/standards-and-implementations

CDISC – The Clinical Data Interchange Consortium, non-profitable organization, that defines the world-wide standards for representing clinical data, required by the regulatory authorities as part of submissions

Page 4: Introduction to clinical sas programming

Source: www.bioforum.co.il

Page 5: Introduction to clinical sas programming

Source: http://www.cros.it/statistical-analysis.html

Page 6: Introduction to clinical sas programming

Career in Clinical Data Management• Data Entry Operator• Data Validation Executive• QA Executive• Data Manager• QA Manager• Statistical Programmer• Statistician• Data Reviewer• Data Base Designer• Medical Writer• Head –Data Management

Page 7: Introduction to clinical sas programming

What is the job of Clinical SAS Programmer?• Manage hundreds of datasets• Thousands of data points• Understand all interdependencies of the data points• Deliver accurate, timely, and reproducible analyses that determine

the approval of novel therapies

• Conclusion• Clinical SAS Programming is a difficult mental activity and requires

uninterrupted concentration.

Source: “Motivating Clinical SAS Programmers” from Daniel Boisvert

Page 8: Introduction to clinical sas programming

Attributes which makes Programmers Competent• SAS technical skills• Clinical trial understanding• Basic understanding of statistics in clinical trial• Industry data standards and guidelines• Analyst mindset (planning, execution, problem solving and decision

making skill)• Soft skill (communication skill and basic etiquette)• Industry collaboration (conferences, microblogging sites etc.)

Source: “Competent statistical programmer: Need of business process outsourcing industry” from Imran Khan

Page 9: Introduction to clinical sas programming

Source: http://www.lexjansen.com/nesug/nesug96/NESUG96061.pdf

Page 10: Introduction to clinical sas programming

Documents Data Submission

Protocol, SAP

Case Report Form (CRF)

Table Shells

SAS Dataset format or others (xpt, xls, xlsx)

Tables, Figures, Listings (TFLs)

Define.xml, Review Guide

Page 11: Introduction to clinical sas programming

Clinical Reporting Summary• Input: reading in the source data

• Analysis: determining the analysis result

• Output: presenting the analysis result

Source: http://www.phusewiki.org/wiki/index.php?title=Clinical_Reporting_Summary

Page 12: Introduction to clinical sas programming

SAS• SAS (Statistical Analysis Software) was

founded in 1976 by James Goodnight and several colleagues from North Carolina State University• Originally designed to mine agricultural

research, SAS's software was quickly adopted by corporate, government, and academic customers.• $3 billion business-analytics juggernaut,

with 12,000 employees and an unbroken 35-year track record of revenue growth

Source: Forbes, “Roundup Of Analytics, Big Data & Business Intelligence Forecasts And Market Estimates, 2015”

Page 13: Introduction to clinical sas programming

Source: http://cos.name/2010/12/think-sas-2/

What is SAS?It’s not only a statistical software, but also a Business Analytics and Business Intelligence Software

Page 14: Introduction to clinical sas programming

• 1.传统 SAS(编程驱动)• 1-1 基础模块( Base SAS):包括类似于 PL/L的第四代编程语言 data steps、 SQL、 ODS、 XML Engine、Macro以及大量的内置函数(支持 Perl 正则表达式)和过程步 (procedures)等。对一个 SAS程序员来说,这个 BASE模块几乎就是全部(说学 SAS,就是先学这个,这不需要统计背景或者计算机背景)。又,狭义的 SAS Base仅指 data steps。• 1-2 数据存取模块( SAS/ACCESS):支持大量的 PC文件(除文本文件外,还包括 Excel、 SPSS的 .sav、 Stata的 .dta等等)以及所有主流的关系数据库及 ERP系统( Oracle、 SAP、 SQL Server、 DB2、MySQL等等)• 1-3 作图模块 (SAS/GRAPH): SAS作图功能强大,只是模样够土。新版 SAS 9.2在这块有不少让人兴奋的改进,比如支持

ODS、 TrueColor、 ActiveX、 SVG( Scalable Vector Graphics)等,另外还加了一个 graph editor,新潮不少。• 1-4 统计分析模块 (SAS/STAT ) :包括回归模型、方差分析模型、混合模型、贝叶斯分析、分类数据分析、多元分析(主成分和因子分析等)、判别分析、聚类分析、生存分析、非参数分析等,多是我不熟悉的领域,只管罗列。有个 Stat Studio支持 R。• 1-5 时间序列与计量经济学模块( SAS/ETS ):同上, X11、 X12、 ARIMA、 PANEL、 AUTOREG、……。又,做资产组合等投资分析也是在这个模块。• 1-6 矩阵运算模块 (SAS/IML) 1-7 运筹学模块 (SAS/OR) 1-8 地理信息系统模块 (SAS/GIS) …… 以上部分是大部分 SAS高校用户能接触到的东西,主要用命令行实现功能。下面则是有 GUI的客户端工具,一般都用它们作计算引擎。

• 2. 客户端工具• 2-1 元数据管理 (SAS Management Console, SMC):元数据 (metadata)就是关于数据的数据。最简单的,一个数据集的变量属性就是

metadata。 SMC是 SAS产品元数据管理的统一中心。• 2-2 ETL工具 (SAS Data Integration Studio): ETL就是数据抽取 (Extract)、转换 (Transform)和加载 (Load)。完成企业数据处理的工具还有 SAS OLAP Cube Studio、 SAS Information Map Studio等。• 2-3 数据挖掘模块 (SAS Enterprise Miner, EM) : SAS的重磅产品之一。 5以上是 Java客户端版本,用户体验增进不少。• 2-4 综合分析工具包 (SAS Enterprise Guide, EG): 有完善的 GUI界面,完成 SAS从数据整合、分析到报表的一系列功能。 EG与

JMP,是 SAS公司的两个明星产品。• 3. 其他

• 3-1 统计探索软件包 JMP:这是一个独立于 Base SAS的软件,由 SAS公司的二把手 John Sall主管,界面很炫,功能很强,让业务人员爱不释手那种。 JMP9支持 R。Source: http://cos.name/2010/12/think-sas-2/

Page 15: Introduction to clinical sas programming

Why use SAS?• SAS is very efficient with data manipulation if you know what you're

doing. It's been designed to work with sequential tapes so it is built with the assumption that data access is expensive. Makes wonders when you work truly massive datasets. • SAS is good at opening up gigantic data sets even on computer which

do not have a lot of computing power. Essentially data sets that would crash most programs on a given computer in a heart beat can load in SAS.• SAS as a company is smart and designs its products at corporate cost

centers. This includes doing things like company wide installations and setting up its platform in a way that makes it easy for corporate it departments to setup a company wide SAS infrastructure.

Source: How to start using SAS from SARBAJIT MUKHERJEE

From 2016 TIOBE

Page 16: Introduction to clinical sas programming

How to learn SAS?• Learn SAS ® in 50 minuteshttp://support.sas.com/resources/papers/proceedings11/054-2011.pdf

• Book: Learning SAS by Example: A Programmer's Guide

• For more recourses, check:https://web.stanford.edu/group/ssds/cgi-bin/drupal/files/Guides/Resources_for_Learning_SAS.pdf

Page 17: Introduction to clinical sas programming

Source: How to start using SAS from SARBAJIT MUKHERJEE

Page 18: Introduction to clinical sas programming

SAS Data Step• The DATA step is a separate language for performing programming tasks such as data

manipulation, i.e. cleaning and editing, and data restructuring.

• The Implied Loop of the DATA Step (ILDS), internal looping. One important consequence of the ILDS is that it may make sense to place code before the line that reads data.

• Program Data Vector (PDV)• it is a logical area in memory where SAS builds a data set, one observation at a

time • contains current values for all variables • maintains two automatic variables, _N_ and _ERROR_Source: http://www2.sas.com/proceedings/sugi31/246-31.pdf

Page 19: Introduction to clinical sas programming

Source: http://www.mwsug.org/proceedings/2013/BB/MWSUG-2013-BB03.pdf

SAS dataset format

Data Step logic

SAS Code

SAS will read in three lines automatically by it’s ILDS

Raw data

Page 20: Introduction to clinical sas programming

Source: http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf

The macro facility is a tool for text substitution, which reduces the amount of text entered for common tasks.

Two components of Macro facility: 1.the macro processor, which is the portion of the system that does the work and2.macro language, the syntax used to communicate with the macro processor

Page 21: Introduction to clinical sas programming

Source: http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf

When the word scanner detects a macro trigger, ampersand (&) or percent (%), it sends information and temporarily turns processing over to the macro processor.

Page 22: Introduction to clinical sas programming

Source: http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf

Page 23: Introduction to clinical sas programming

Source: http://www.lexjansen.com/nesug/nesug04/pm/pm07.pdf