lecture 15: course conclusion
TRANSCRIPT
1Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Lecture 15:Course Conclusion
2Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Announcements● TA office hours will continue to be project advising sessions during this week
○ Sign up on spreadsheet (see Ed announcement)○ Attendance is worth 5% of project grade
● Final Project Poster Session: Thu 12/9 12:15-3:15pm ● Final Project Report due Fri 12/10, 11:59pm
3Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
This course: foundations of AI in healthcare
4Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
This course: foundations of AI in healthcare
5Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Convergence of key ingredients of deep learning Algorithms Compute
Data
6Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Different classes of neural networks
...
Input sequence
Output sequence
Fully connected neural networks(linear layers, good for “feature vector” inputs)
Convolutional neural networks(convolutional layers, good for image inputs)
Recurrent neural networks(linear layers modeling recurrence relation across
sequence, good for sequence inputs)
7Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Neural network parameters:
Output:
Loss function (regression loss, same as before):
Per-example:
Over M examples:
Gradient of loss w.r.t. weights:Function more complex -> now much harder to derive the expressions! Instead… computational graphs and backpropagation.
Two-layer fully-connected neural network
8Serena Yeung BIODS 220: AI in Healthcare Lecture 1 -
Input
Softmax
3x3 conv, 64
7x7 conv, 64, / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 643x3 conv, 64
3x3 conv, 1283x3 conv, 128, / 2
3x3 conv, 1283x3 conv, 128
3x3 conv, 1283x3 conv, 128
..
.
3x3 conv, 5123x3 conv, 512, /2
3x3 conv, 5123x3 conv, 512
3x3 conv, 5123x3 conv, 512
PoolResNet[He et al., 2015]
relu
Residual block
3x3 conv
3x3 conv
Xidentity
F(x) + x
F(x)
relu
X
Full ResNet architecture:- Stack residual blocks- Every residual block has
two 3x3 conv layers- Periodically, double # of
filters and downsample spatially using stride 2 (/2 in each dimension)
- Additional conv layer at the beginning
- No FC layers at the end (only FC 1000 to output classes)
No FC layers besides FC 1000 to output classes
Slide credit: CS231n
9Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Common loss functionsRegression Binary Cross-Entropy
Label is a continuous value.
Minimize squared difference between prediction output and target
Equivalent to the negative log of the probability of the correct ground truth class being predicted. Think about what the expression looks like when y_i = 1 vs. 0.
Label is binary in {0,1}. Prediction is a real number in (0,1) and is the probability of the label being 1. It is usually the output of a sigmoid operation after the final layer.
Softmax
Label is 1 of K classes in {0, …, K}. Extension of binary cross-entropy loss to multiple classes. s_j corresponds to the score (e.g. output of final layer) for each class; the fraction in the log provides a normalized probability for each class.
Negative log of the probability of the true class y_i, as with the BCE loss. SVM
Label is 1 of K classes in {0, …, K}. Same use case as softmax, but different way of encouraging the model to produce outputs that we “like”. In practice, softmax is more popular and provides a nice probabilistic interpretation.
Incurs lowest loss of 0 (what we want) if the score for the true class y_i is greater than the score for each incorrect class j by a margin of 1
10Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
- Receiver Operating Characteristic (ROC) curve:
- Plots sensitivity and specificity (specifically, 1 - specificity) as prediction threshold is varied
- Gives trade-off between sensitivity and specificity
- Also report summary statistic AUC (area under the curve)
Evaluation metrics
True Positive Rate (TPR)
False Positive Rate (FPR)
11Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Ciompi et al. 2015
Ciompi et al. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box. Medical Image Analysis, 2015.
- Task: classification of lung nodules in 3D CT scans as peri-fissural nodules (PFN, likely to be benign) or not
- Dataset: 568 nodules from 1729 scans at a single institution. (65 typical PFNs, 19 atypical PFNs, 484 non-PFNs).
- Data pre-processing: prescaling from CT hounsfield units (HU) into [0,255]. Replicate 3x across R,G,B channels to match input dimensions of ImageNet-trained CNNs.
12Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Gulshan et al. 2016- Dataset:
- 128,175 images, each graded by 3-7 ophthalmologists.
- 54 total graders, each paid to grade between 20 to 62508 images.
- Data preprocessing: - Circular mask of each image was detected
and rescaled to be 299 pixels wide- Model:
- Inception-v3 CNN, with ImageNet pre-training- Multiple BCE losses corresponding to different
binary prediction problems, which were then used for final determination of referable diabetic retinopathy
Graders provided finer-grained labels which were then consolidated into (easier) binary prediction problems
Gulshan, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA, 2016.
13Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Richer visual recognition tasks: segmentation and detection
Figures: Chen et al. 2016. https://arxiv.org/pdf/1604.02677.pdf
Classification
Output: one category label for image (e.g., colorectal
glands)
Semantic Segmentation
Detection InstanceSegmentation
Output: category label for each pixel
in the image
Output: Spatial bounding box for
each instance of a category object in the
image
Output: Category label and instance
label for each pixel in the image
Distinguishes between different instances of an object
14Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Lung nodule segmentation
Liu et al. Segmentation of Lung Nodule in CT Images Based on Mask R-CNN. 2018.
- E.g. Liu et al. 2018
- Dataset: Lung Nodule Analysis (LUNA) challenge, 888 512x512 CT scans from the Lung Image Data Consortium database (LIDC-IDRI).
- Performed 2D instance segmentation in 2D CT slices
We will see other ways to handle 3D medical data types in the next lecture
15Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Example: instance segmentation of cell nuclei
16Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
3D convolutions
Figure credit: https://www.researchgate.net/profile/Deepak_Mishra19/publication/330912338/figure/fig1/AS:723363244810254@1549474645742/Basic-3D-CNN-architecture-the-3D-filter-is-convolved-with-the-video-in-three-dimensions.png
Slide filter along 3 directions:x, y, and z!
When might you use 3D convolutions?
Ex: 224 x 224 x 1 x 256 3D CT scan (with 256 slices)
Ex: 224 x 224 x 3 x 500 video data (with 500 temporal frames)
x,y,z are spatial and/or temporal dimensions. Filter (e.g. 5 x 5 x 3 x 10 filter) goes all the way through the “channels” dimension as before.
x y z
channels (e.g. R,G,B)
17Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
I3D: 3D convolutional network for video dataInception Module (Inc.) w/
3D convolutions3D Inception Module used in Inception Network (also known as GoogLeNet)
3D convs
Carreira and Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CVPR 2017.
Can pre-train from 2D datasets e.g. ImageNet by replicating and normalizing 2D weights over additional dimension!
Note: in general, can 3D-ify many 2D architectures!
18Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
For richer visual recognition tasks, can also extend respective CNN architectures to use 3D convolutions
Figures: Chen et al. 2016. https://arxiv.org/pdf/1604.02677.pdf
Classification
Output: one category label for image (e.g., colorectal
glands)
Semantic Segmentation
Detection InstanceSegmentation
Output: category label for each pixel
in the image
Output: Spatial bounding box for
each instance of a category object in the
image
Output: Category label and instance
label for each pixel in the image
19Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
E.g. 3D U-NetEx: 3D segmentation of Xenopus kidney in confocal microscopic data
Spatial dims: ~ 250 x 250 x 60. 3 channels: each channel corresponds to a different type of data capture
Used only 3 samples total! (with total of 77 annotated 2D slices). Leverages fact that each sample contains many instances of same repetitive structures w/ variation.
Cicek et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. MICCAI 2016.
20Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
What are electronic health records?
Figure credit: Rajkomar et al. 2018
Patient chart in digital form, containing medical and treatment history
Medical imaging and lab test results and reports
21Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
A real example of EHR data: MIMIC-III dataset
Johnson et al. MIMIC-III, a freely accessible critical care database. 2016.
22Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
CPT (Current procedural terminology): procedures and services codes
Johnson et al. MIMIC-III, a freely accessible critical care database. 2016.Additional figure credit: https://d20ohkaloyme4g.cloudfront.net/img/document_thumbnails/e570ad571499b88c8814e7366594e9bd/thumb_1200_1553.png
23Serena Yeung BIODS 220: AI in Healthcare Lecture 1 -
(Vanilla) Recurrent Neural Network
x
RNN
y
The state consists of a single “hidden” vector h:
Fully connected layersSlide credit: CS231n
24Serena Yeung BIODS 220: AI in Healthcare Lecture 1 -
h0 fW h1 fW h2 fW h3
x3
yT
…
x2x1W
RNN: Computational Graph: Many to Many
hT
y3y2y1 L1L2 L3 LT
L
Slide credit: CS231n
25Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Harutyunyan et al.: phenotypes- Input: Time-series data corresponding to entire ICU stay- Output: Multilabel classification of the presence of 25 acute care
conditions (merged from ICD codes) in stay record
Figure credit: Harutyunyan et al. Multitask learning and benchmarking with clinical time series data. 2019.
Q: Why do we formulate this as
a multi-label classification
task?
Q: What loss function should
we use?
A: Comorbidities (co-occurring conditions)
A: Multiple binary cross-entropy losses
26Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
OMOP Common Data Model
Figure credit: https://ohdsi.github.io/TheBookOfOhdsi/images/CommonDataModel/cdmDiagram.png
27Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
FHIR
Figure credit: Choi et al. OHDSI on FHIR Platform Development with OMOP CDM mapping to FHIR Resources. 2016.
Data from all sources can be written in an OMOP data repository for analysis
28Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Data representation
Raw data as FHIR resources
Rajkomar et al. Scalable and accurate deep learning with electronic health records. Npj Digital Medicine, 2018.
29Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Token embeddings
[0 0 1 0 0 0 0 …. 0]
0.5 0.2 0.1
0.6 0.1 0.6
0.5 0.8 0.2
0.7 0.9 0.3
0.3 0.5 0.1
0.7 0.8 0.1
...
X = [0.5 0.8 0.2]
N x D embedding matrix
1xN token input (one-hot selection of token)
D-dim token embedding
In general, learning embedding matrices are a useful way to map discrete data into a semantically meaningful, continuous space! Will see frequently in natural language processing.
30Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Today: Token Word Embeddings
[0 0 1 0 0 0 0 …. 0]
0.5 0.2 0.1
0.6 0.1 0.6
0.5 0.8 0.2
0.7 0.9 0.3
0.3 0.5 0.1
0.7 0.8 0.1
...
X = [0.5 0.8 0.2]
N x D embedding matrix
1xN token input (one-hot selection of token)
D-dim token embedding
Words come from a discrete vocabulary! Can learn word embeddings using a similar framework
31Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Skip-gram model
E
xt
ht
xt-2 xt-1 xt+1 xt+2
Word embedding (feature vector), of word at the t-th position
Use word embedding vector to predict the word identity of a set of neighboring positions(Each is an N-way classification if the dictionary has N words)
Can train using a classification loss (e.g. softmax loss) based only on the text structure, without any external labels!
Lt-2 Lt-1Lt+
1
Lt+
2
Captures notion that words occurring in similar contexts should have similar feature vectors (word embeddings)
Aside: trying to learn “good” feature representations using loss functions based on inherent structure in data, as opposed to external labels, is a currently active area of research called “self-supervised learning”Mikolov, et al. Efficient Estimation of Word Representations in Vector Space, 2013.
32Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Transformer architecture framework - Recent approach for sequence processing based on “self-attention” (Vaswani et al. 2017). BERT uses a stack of “encoder layers” each with self-attention (original Transformer also had decoder layers).
Encoder Layer
Encoder Layer
Encoder Layer
...
abnormal findings lung...
Encoder Stack
Encoder self-attention
Feed-forward
Encoder Layer
Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018.Vaswani et al. Attention is All You Need, 2017.
33Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Training BERT
Encoder Layer
Encoder Layer
Encoder Layer
...
abnormal findings lung...
Encoder Stack
Encoder self-attention
Feed-forward
Encoder Layer
Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018.Vaswani et al. Attention is All You Need, 2017.
CLS MASK
1. Predict randomly masked words in sentence inputs (classification)
Input sequences with a start
token
2. Input sentence pairs separated by a [SEP] token, predict whether the 2nd sentence follows the 1st in the text
34Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
ClinicalBERT: training on clinical notes (from MIMIC)
Huang et al. ClinicalBert: Modeling Clinical Notes and Predicting Hospital Readmission, 2019.
Fine-tuning ClinicalBERT for prediction of 30-day hospital readmission:
Use hidden state corresponding to [CLS] token
When performing prediction from long sequences, obtain predictions for each sentence separately and then combine
35Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Some biology basics: starting from DNA
Figure credit: virtualmedicalcentre.comFigure credit: https://en.wikipedia.org/wiki/Nucleobase#/media/File:DNA_chemical_structure.svg
36Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Transcription and translation
Figure credit: https://www.cancer.gov/images/cdr/live/CDR761782-571.jpg
Transcription: DNA -> RNA
Translation: RNA -> Protein
37Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Many data types, e.g. RNA-seq
Produces readout of mRNA content in a tissue sample
Figure credit: https://cdn.technologynetworks.com/tn/images/body/dnasequencinga1529596208892.png
Map back to reference genome for analysis
Now standard approach for transcriptomics study
More recently in 2010s, single-cell RNA-seq!
38Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
ENCODE: identifying and analyzing all functional elements in the human genome
Figure credit: https://www.encodeproject.org/
- Launched by US National Human Genome Research Institute in 2003
- Contributions from worldwide consortium of research groups
39Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
DeepSea
Zhou and Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 2015.
Predict chromatin effects of (non-coding) sequence alterations with single-nucleotide sensitivity (SNPs: single nucleotide polymorphism)
Input: DNA sequence pair with SNPOutput: Predicted chromatin effects (919 total)
- 690 transcription factor profiles- 125 DNase I hypersensitive sites (DHS)
profiles (looser chromatin structure, easier protein binding)
- 104 histone-mark profiles (histone modifications)
Multi-task training!
Multi-task prediction of 919 chromatin profiles, for each allele (variant)
40Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Multimodal dataCan be very similar, e.g. different image acquisition variants
Figure credit: Dong et al. MIUA, 2017.
41Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Multimodal dataOr very different, e.g. different types of clinical data
Figure credit: Rajkomar et al. 2018.
42Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Categorizations of multimodal models
Joint fusion: Both modality-specific components (with learnable parameters) and combined-modality components within the model, that are updated during model training
Huang et al. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines, 2020.
43Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
How can we produce good labels from noisy sources? More sophisticated approach: learn models for how to best aggregate noisy labeling functions!
Dunmon et al. Cross-Modal Data Programming Enables Rapid Medical Machine Learning, 2020.Figure credit: Nishith Khandwala et al., 2017.
44Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
AI and COVID-19- Detection of COVID-19 from CT images- 2 stage process: lung segmentation followed by classification of COVID-19 or not- Multinational dataset of 2724 scans from 2617 patients, with 1029 scans (922) patients
confirmed positive for COVID-19
Harmon et al. Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets, 2020.
46Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 46
Data: x
Just data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, representation / feature learning, density estimation, etc.
Other paradigms of machine learning:Unsupervised learning
Representation learning
Encoder
Input data
Features
Unsupervised training objective
47Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Darabi 2019- Autoencoder-based unsupervised representation learning for multimodal data of 200,000 records
from 250 hospital sites (eICU collaborative Research Database)
- Used feature representation to train models for downstream mortality, readmission prediction tasks
Darabi et al. Unsupervised Representation for EHR Signals and Codes as Patient Status Vector, 2019.
Autoencoder for each code-based modality (e.g. medication, treatment, diagnosis), and signal time-series (e.g. heart rate)
48Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 48
Decoder network
Sample z from
Sample x|z from
Use decoder network. Now sample z from prior!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational autoencoders can also be used to sample new (synthetic) data
Data manifold for 2-d z
Vary z1
Vary z2
49Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 49
Generator network: try to fool the discriminator by generating real-looking imagesDiscriminator network: try to distinguish between real and fake images
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014GANs: Two-player game
49
zRandom noise
Generator Network
Discriminator Network
Fake Images(from generator)
Real Images(from training set)
Real or Fake
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
50Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Example: GAN-based medical image synthesis
Liver lesions of different types (Frid-Adar 2018)
Dermatology lesions (Ghorbani 2019)
Brain MRIs with lesions (Han 2018)
Can be used for data augmentation!
51Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 51
Problems involving an agent interacting with an environment, which provides numeric reward signals
Goal: Learn how to take actions in order to maximize reward
Atari games figure copyright Volodymyr Mnih et al., 2013. Reproduced with permission.
A third paradigm of learning: reinforcement learning
52Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
:neural network with weights
Q-network architecture
52
Current state st: 84x84x4 stack of last 4 frames (after RGB->grayscale conversion, downsampling, and cropping)
16 8x8 conv, stride 4
32 4x4 conv, stride 2
FC-256
FC-4 (Q-values)
[Mnih et al. NIPS Workshop 2013; Nature 2015]
Output expected future reward from taking each of the 4 possible actions
53Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Example: Raghu et al. 2017
Learned a Q-learning based policy to take treatment actions for sepsis patients, using the MIMIC dataset
5x5 possible policy actions at any timestep
Raghu et al. Deep Reinforcement Learning for Sepsis Treatment, 2017.
54Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Interpretability: a challenge in deep learning
https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/DT.png
vs.
55Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Saliency Maps: Class Activation Maps (CAM)- Zhou et al. 2015- Visualizes heatmap
(class activation map) indicating the importance of the activation at spatial grid (x, y) leading to the classification of an image to class c.
Zhou et al. Learning Deep Features for Discriminative Localization, 2016.
Weight (importance) of kth filter activation for predicting cth class
56Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Rajpurkar et al. 2017- Binary classification of pneumonia
presence in chest X-rays- Used ChestX-ray14 dataset with over
100,000 frontal X-ray images with 14 diseases
- 121-layer DenseNet CNN- Compared algorithm performance with 4
radiologists- Also applied algorithm to other diseases to
surpass previous state-of-the-art on ChestX-ray14
Rajpurkar et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. 2017.
CAM visualization
57Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Ethics: many questions around AI / human collaboration in medicine
- How to make diagnosis and/or care decisions when the algorithm disagrees with the human?
- How should AI algorithms work together with humans?- How to handle machine error vs. human error?- How to make sure AI algorithms don’t (perhaps inadvertently) discriminate
against certain populations?- How to handle tradeoffs between algorithmic performance on some groups
vs. others?
58Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Chen et al. 2019- Showed discrepancies in error rates by race, gender, insurance type, etc. for
models trained to make clinical predictions on MIMIC-III data
Error rate for predicting ICU mortality by gender
Chen et al. Can AI Help Reduce Disparities in General Medical and Mental Health Care? 2019.
59Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
More on fairness… there are many possible definitions of fairness!
- Group-independent predictions: predictions should be independent of group membership
- Equal metrics across groups: e.g. equal true positive rates or false positive rates across groups
- Individual fairness: individuals who are similar with respect to a prediction task should have similar outcomes
- Causal fairness: e.g. there should not be a causal pathway from a sensitive attribute to the outcome prediction
Suresh and Guttag. A Framework for Understanding Unintended Consequences of Machine Learning, 2020.
Cannot satisfy all of these simultaneously: satisfying “fairness” according to one definition generally leads to a trade-off respect to another definition!
60Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Mitchell 2019: Model cards for Model Reporting- Documentation accompanying trained models to detail performance characteristics
Mitchell et al. Model Cards for Model Reporting, 2019.
61Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Gebru 2020: Datasheets for Datasets
Gebru et al. Datasheets for Datasets. 2020.
62Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Federated Learning- Related to distributed computing, but with an important property for many medical
settings: data is decentralized and never leaves local silos. Central server controls training across decentralized sources.
Figure credit: https://blogs.nvidia.com/wp-content/uploads/2019/10/federated_learning_animation_still_white.png
63Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Li et al. 2019- NVIDIA Clara’s Federated Learning system for medical imaging data
- Used federated learning to train segmentation model on BRATS
- Achieved comparable performance to non-federated learning, training somewhat slower but data “silos” preserved
Li et al. Privacy-preserving Federated Brain Tumour Segmentation, 2019.
64Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Differential privacyKey idea: output for a dataset, vs. the dataset with a difference for a single entry (e.g., one individual), is “hardly different”. Mathematical guarantees on this idea.
Abadi et al. Deep Learning with Differential Privacy, 2016.
65Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Differential privacySimple intuition behind how we can achieve differential privacy: adding noise!
Figure credit: https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md
Example of reporting a value with Laplacian noise added
66Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Training differentially private deep learning models
Abadi et al. Deep Learning with Differential Privacy, 2016.
Add noise for differential privacy
67Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Implementation of DP-SGD
Utilities for calculating epsilon
Can work with differential privacy within deep learning frameworks
https://blog.tensorflow.org/2019/03/introducing-tensorflow-privacy-learning.htmlhttp://www.cleverhans.io/privacy/2019/03/26/machine-learning-with-differential-privacy-in-tensorflow.html
68Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Where to go from here?- More deep learning courses, e.g. focusing on different domains
- CS 221 and 229: broader AI courses- CS 231N: computer vision- CS 224N: natural language processing- CS 224S: spoken language processing- Many more!: https://ai.stanford.edu/courses/
- More biomedicine focused courses- CS/BMI 273B: deep learning in genomics- CS/BMI 279: computational biology- BMI 217: translational bioinformatics- Many more!: (BMI courses)
https://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&academicYear=&q=BIOMEDIN&collapse=
- (BIODS courses) https://explorecourses.stanford.edu/search?q=BIODS&view=catalog&academicYear=&catalog=&page=0&filter-coursestatus-Active=on&collapse=
- Many research and internship opportunities as well
69Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Where to go from here?- More deep learning courses, e.g. focusing on different domains
- CS 221 and 229: broader AI courses- CS 231N: computer vision- CS 224N: natural language processing- CS 224S: spoken language processing- CS 236: generative models- Many more!: https://ai.stanford.edu/courses/
- More biomedicine focused courses- CS/BMI 273B: deep learning in genomics- CS/BMI 279: computational biology- BMI 217: translational bioinformatics- Many more!: (BMI courses)
https://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&academicYear=&q=BIOMEDIN&collapse=
- (BIODS courses) https://explorecourses.stanford.edu/search?q=BIODS&view=catalog&academicYear=&catalog=&page=0&filter-coursestatus-Active=on&collapse=
70Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -
Thank you!