sok: machine learning governance

19
SoK: Machine Learning Governance Varun Chandrasekaran* , Hengrui Jia* ‡§ , Anvith Thudi* ‡§ , Adelin Travers* ‡§ , Mohammad Yaghini* ‡§ , Nicolas Papernot ‡§ University of Toronto , Vector Institute § , University of Wisconsin-Madison Abstract—The application of machine learning (ML) in com- puter systems introduces not only many benefits but also risks to society. In this paper, we develop the concept of ML governance to balance such benefits and risks, with the aim of achieving responsible applications of ML. Our approach first systematizes research towards ascertaining ownership of data and models, thus fostering a notion of identity specific to ML systems. Building on this foundation, we use identities to hold principals accountable for failures of ML systems through both attribution and auditing. To increase trust in ML systems, we then survey techniques for developing assurance, i.e., confidence that the system meets its security requirements and does not exhibit certain known failures. This leads us to highlight the need for techniques that allow a model owner to manage the life cycle of their system, e.g., to patch or retire their ML system. Put altogether, our systematization of knowledge standardizes the interactions between principals involved in the deployment of ML throughout its life cycle. We highlight opportunities for future work, e.g., to formalize the resulting game between ML principals. I. I NTRODUCTION Like any maturing technology, machine learning (ML) is transitioning from a stage of unfettered innovation to increas- ing calls to mitigate the risks associated with its deployment. In the context of ML systems, these potential risks include privacy violations (e.g., leaking sensitive training data), unfair predictions (e.g., in facial recognition), misinformation (e.g., generative models facilitating the creation of deepfakes), or a lack of integrity (e.g., self-driving car accidents due to an incorrect understanding of their environment). At the societal level, there is a need to ensure that applications of ML balance these risks with the benefits brought by their improved pre- dictive capabilities, just as practical systems security balances the cost of protection with the risk of loss [1]. Machine learning governance standardizes the interactions between principals involved in the deployment of ML with the aim of ensuring a responsible application of ML, i.e., one that is beneficial to society. Governance in ML differs from existing research in several ways. Indeed, the attack surface of ML algorithms has been the subject of many studies; for instance, work on adversarial ML [2] evaluated the worst-case performance of ML under various threat models. However, this line of work often focuses on narrow aspects, such as the technical limitations of ML algorithms, and studies these limitations in isolation. For instance, while it is known how to train a model privately [3], [4], the impact of private learning on model robustness is *All student authors contributed equally and are ordered alphabetically. less studied [5] thus affecting our ability to deploy such algorithms in settings where the model must be aligned with multiple social norms. It, thus, remains unclear how companies and institutions applying ML should manage the different risks associated with ML. This is exacerbated in the online setting where a model is not static but rather iteratively trained and deployed. These difficulties have also manifested themselves in inadequate regulation, as illustrated by the focus on anonymization in privacy regulation [6] despite its well- known limitations [7]. As a result, regulators are unable to formalize, implement, and enforce regulation that contains the societal risks of ML. In this paper, we systematize knowledge from security and privacy research on ML to outline a practical approach to ML governance. Our approach is four-fold and can be thought of as defining the types of interactions that principals (defined in § III-B) of an ML ecosystem will require to develop the trust needed to ensure responsible applications of ML. While existing research on adversarial ML (and more broadly trustworthy ML) serves as a foundation for some of the approaches, our systematization identifies open problems which the computer security community needs to tackle. First, we observe that trust requires that one make unforge- able and undeniable claims of ownership about an ML model and its training data. This establishes the concept of identity, which identifies a key principal in the ML application: its owner. This is a prerequisite to holding model developers accountable for the potential negative consequences of their ML algorithms: if one is unable to prove that a model belongs to a certain entity, it will be impossible to hold the entity accountable for the model’s limitations. Once ownership of the model is established, the second component of our approach to ML governance is accountabil- ity. Here, we attribute model failures to the model owner and its end users, and establish the responsibility of these different principals for failures of the model. These failures may have been identified through the use of the system, but could also result from external scrutiny by entities such as regulators. We thus also discuss how to audit an ML application to identify and attribute failures that have yet to be discovered. To prevent failures in their ML application, and to foster trust from end users, model owners will then deploy a set of techniques to obtain assurance. Loosely defined, these techniques establish confidence that the system meets specific requirements. The model owner here wishes to make certain claims establishing guarantees of controlled risk for their ML application. While these claims are currently often made post- 1 arXiv:2109.10870v1 [cs.CR] 20 Sep 2021

Upload: others

Post on 27-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SoK: Machine Learning Governance

SoK: Machine Learning GovernanceVarun Chandrasekaran*†, Hengrui Jia*‡§, Anvith Thudi*‡§,

Adelin Travers*‡§, Mohammad Yaghini*‡§, Nicolas Papernot‡§

University of Toronto‡, Vector Institute§, University of Wisconsin-Madison†

Abstract—The application of machine learning (ML) in com-puter systems introduces not only many benefits but also risks tosociety. In this paper, we develop the concept of ML governanceto balance such benefits and risks, with the aim of achievingresponsible applications of ML. Our approach first systematizesresearch towards ascertaining ownership of data and models, thusfostering a notion of identity specific to ML systems. Building onthis foundation, we use identities to hold principals accountablefor failures of ML systems through both attribution and auditing.To increase trust in ML systems, we then survey techniquesfor developing assurance, i.e., confidence that the system meetsits security requirements and does not exhibit certain knownfailures. This leads us to highlight the need for techniquesthat allow a model owner to manage the life cycle of theirsystem, e.g., to patch or retire their ML system. Put altogether,our systematization of knowledge standardizes the interactionsbetween principals involved in the deployment of ML throughoutits life cycle. We highlight opportunities for future work, e.g., toformalize the resulting game between ML principals.

I. INTRODUCTION

Like any maturing technology, machine learning (ML) istransitioning from a stage of unfettered innovation to increas-ing calls to mitigate the risks associated with its deployment.In the context of ML systems, these potential risks includeprivacy violations (e.g., leaking sensitive training data), unfairpredictions (e.g., in facial recognition), misinformation (e.g.,generative models facilitating the creation of deepfakes), ora lack of integrity (e.g., self-driving car accidents due to anincorrect understanding of their environment). At the societallevel, there is a need to ensure that applications of ML balancethese risks with the benefits brought by their improved pre-dictive capabilities, just as practical systems security balancesthe cost of protection with the risk of loss [1].

Machine learning governance standardizes the interactionsbetween principals involved in the deployment of ML with theaim of ensuring a responsible application of ML, i.e., one thatis beneficial to society.

Governance in ML differs from existing research in severalways. Indeed, the attack surface of ML algorithms has beenthe subject of many studies; for instance, work on adversarialML [2] evaluated the worst-case performance of ML undervarious threat models. However, this line of work often focuseson narrow aspects, such as the technical limitations of MLalgorithms, and studies these limitations in isolation. Forinstance, while it is known how to train a model privately [3],[4], the impact of private learning on model robustness is

*All student authors contributed equally and are ordered alphabetically.

less studied [5] thus affecting our ability to deploy suchalgorithms in settings where the model must be aligned withmultiple social norms. It, thus, remains unclear how companiesand institutions applying ML should manage the differentrisks associated with ML. This is exacerbated in the onlinesetting where a model is not static but rather iterativelytrained and deployed. These difficulties have also manifestedthemselves in inadequate regulation, as illustrated by the focuson anonymization in privacy regulation [6] despite its well-known limitations [7]. As a result, regulators are unable toformalize, implement, and enforce regulation that contains thesocietal risks of ML.

In this paper, we systematize knowledge from securityand privacy research on ML to outline a practical approachto ML governance. Our approach is four-fold and can bethought of as defining the types of interactions that principals(defined in § III-B) of an ML ecosystem will require todevelop the trust needed to ensure responsible applicationsof ML. While existing research on adversarial ML (and morebroadly trustworthy ML) serves as a foundation for some ofthe approaches, our systematization identifies open problemswhich the computer security community needs to tackle.

First, we observe that trust requires that one make unforge-able and undeniable claims of ownership about an ML modeland its training data. This establishes the concept of identity,which identifies a key principal in the ML application: itsowner. This is a prerequisite to holding model developersaccountable for the potential negative consequences of theirML algorithms: if one is unable to prove that a model belongsto a certain entity, it will be impossible to hold the entityaccountable for the model’s limitations.

Once ownership of the model is established, the secondcomponent of our approach to ML governance is accountabil-ity. Here, we attribute model failures to the model owner andits end users, and establish the responsibility of these differentprincipals for failures of the model. These failures may havebeen identified through the use of the system, but could alsoresult from external scrutiny by entities such as regulators. Wethus also discuss how to audit an ML application to identifyand attribute failures that have yet to be discovered.

To prevent failures in their ML application, and to fostertrust from end users, model owners will then deploy a setof techniques to obtain assurance. Loosely defined, thesetechniques establish confidence that the system meets specificrequirements. The model owner here wishes to make certainclaims establishing guarantees of controlled risk for their MLapplication. While these claims are currently often made post-

1

arX

iv:2

109.

1087

0v1

[cs

.CR

] 2

0 Se

p 20

21

Page 2: SoK: Machine Learning Governance

hoc, by inspecting the trained model, we argue that futureresearch should focus on the training algorithm. We discussways that these claims can be substantiated by, for instance,proactively collecting evidence during training or providingdocumentation to end users.

Finally, the fourth component of our approach realizes thatthe trade-off between ML risks and benefits will be regularlyre-evaluated by the different principals involved. As such, weoutline how there will be a need for mechanisms that allowone to dynamically edit models after they were released: thislifecycle management is analogous to common practices insoftware development like versioning. This represents chal-lenges in how one can remove behavior already learned bya model (i.e., machine unlearn) and learn new behaviors, butalso retire the model altogether.

These interactions between model owners, users, and regu-lators are at the core of our approach to ML governance andresult in a formal formulation of what it means to balance therisks of applying ML (see § III-D). Concretely, we presenta Stackelberg game among the principals involved in an MLgovernance scenario. The approach is more general, and cancapture competing interests of the principals as well as thepossible trade-offs, thus delineating the Pareto frontier.

To summarize, our contributions are the following:• We define the problem of ML governance as balancing

the risks of ML with its benefits. Specifically, ML gover-nance supposes that society has defined norms that serveas the basis for characterizing trade-offs between risksand benefits. We formalize the interactions between MLprincipals: i.e., model owners making claims of havingmanaged certain risks (e.g., privacy, fairness, robustness,etc.), end user who seek to trust these claims, and regula-tors who seek to audit such claims and verify compliance.

• We systematize approaches to establishing identity in MLthrough data and model ownership in § IV. This serves asthe foundation for holding model owners and other MLprincipals accountable for failures of ML in § V, througha combination of techniques for attribution and auditing.

• We systematize research on techniques for assurance, i.e.,proactively building confidence that an ML system meetsspecific security requirements.

• We outline the need for mechanisms that manage modelsthroughout their lifecycle, e.g., enabling model owners toversion usefully. We show how current efforts on machineunlearning provide a partial yet incomplete response tothis need, and that a more comprehensive approach isneeded to develop the ability to patch ML systems.

We recognize that developing solutions for ML governanceis not purely a technical problem because many of the trade-offs that need to be characterized involve a societal compo-nent. Given the intended readership of this paper, we put anemphasis on how computer science research can contributemechanisms for ML governance, but this assumes that the de-sign of these mechanisms will be informed by societal needs.Put another way, this SoK leads us to identify opportunities forcollaboration with communities beyond computer science, i.e.,

opportunities for interactions that inform the goals of computerscience research, but also that would help increase adoptionof technical concepts beyond computer science. We thus hopethat our manuscript will be of interest to readers beyond thecomputer security and ML communities.

II. BACKGROUND

A. Machine Learning Primer

In this paper, we focus on the supervised learning prob-lem in machine learning. We are given a dataset D ={(xi,yi)}i∈[n] of inputs xi and labels yi (encoded as one-hot vectors) and wish to learn a function M that can mapthe inputs xi to yi. For a given task we associate a lossfunction L which characterizes the error of a given functionM in predicting the labels of the dataset. A common choice(for classification tasks) for the loss function is cross-entropy,defined as:

LCE = −yi ln(M(xi)) (1)

The function M , or model, is typically represented as aparameterized function whose parameters, the weights, ideallyminimize the loss function L on dataset D. In practice, whena closed-form solution cannot be found, the main process offinding a low error model is done by iteratively tuning weightsw that parameter the model Mw via gradient descent. Wetypically add a further level of stochasticity to the optimizationprocess by applying the update rule with respect to the losscomputed on a batch of data (x,y) which is randomly sampledfrom the dataset D. To be specific the update rule is given by:

wt+1 = wt − η∂L∂w|wt,xt

where η is a hyper-parameter one tunes.Once the model is trained, it is deployed to make predictions

on points it has not seen during training. To evaluate how wella model will generalize to these new points, one typicallyretains a holdout set of labeled data points to measure theperformance of the model upon completing its training.

B. Operational Life-cycle

Machine learning models have a life cycle. They are trainedon collected (or generated data) that is processed and opti-mized to perform a task which often involves recognizing apattern in the collected data (training). The model is then usedto perform the learned task on new data, that is, to infer thesaid patterns in never-before-seen data. At the high level, wecan distinguish 3 operational life-cycle steps which can befurther subdivided into smaller steps.Data preparation 1/2: Collection. This is the step that gath-

ers data: complexity and governance difficulties in thisprocess may stem from legal, technical (which data tocollect is hard to define), or business issues (there is noincentive to collect that data).

Data preparation 2/2: Cleaning. Here we transform adataset D which has been collected into a datasetD′ which is amenable to ML methods. This includes

2

Page 3: SoK: Machine Learning Governance

removing data that was improperly collected orperforming preliminary feature engineering.

Model development 1/2: Training. This step representswork done by experts to design a model architecture,configure its training algorithm, and learn modelparameters. From a governance point of view, this stepis hard to investigate because it either relies on expertinsight or on (deep) learning algorithms often black-boxin nature.

Model development 2/2: Testing. The model is evaluatedwith a set of metrics estimated on a holdout set of databefore the model is confronted with current productiondata. This evaluation typically assumes that holdout fromthe same distribution than training data is available.

Model deployment 1/2: Model Serving. The model is de-ployed at scale, often using a different software frame-work than the one used to train it. Much of our discussionon governance focuses on mitigating risks that manifestthemselves through failures at this stage.

Model deployment 2/2: Model Retirement. This step car-ries important consequences for governance despite beingoften omitted. We will discuss how to maintain multipleversions of a model, and what the implications are forchoosing to retire (or not) a model.

III. DEFINING ML GOVERNANCE

A. Overview

Conceptually, ML governance can be thought of as a setof solutions for balancing the benefits of applying ML withthe risks it creates, often at the level of society. Governanceis a prerequisite to build trust between principals involved andaffected by an ML development. We describe such principalsin detail in § III-B, but for now it is useful to assume thatthey are three types of principals: the model owner (i.e., thecompany or institution creating and deploying the model),the end user, and a regulator. Governance imposes constraintson how principals interact throughout the lifecycle of an MLapplication (as covered previously in § II-B).

The benefits of an ML pipeline are often application-specific. Typically, these benefits are defined by or at leastmotivate the model owner, e.g., applying ML will generateprofit for a company. One may think that the model’s use-fulness (i.e., utility) is tied to a performance metric (e.g.,accuracy) and every other consideration can only decrease it.This is true if we view training as an optimization problem;having additional constraints limits the feasibility set. Froman asymptotic1 statistical learning perspective, any constrainedproblem can produce an objective that is at most as goodas the unconstrained problem. However, other considerationsmay be just as important as utility—in particular at steps ofthe life cycle before or after training. For instance, modeldecisions may demonstrate significant unwanted biases (e.g.,misclassifying individuals from minorities) and prompt the

1Given the training dynamics, the constraints may direct the search accu-rately and improve the solution found as done in LASSO [8].

model owner to reassess their data collection and algorithmicchoices. Thus, deploying ML comes with inherent risks tothe different principals interacting with the ML pipeline:users may have concerns with models analyzing their data,regulators may seek to limit potential harm to society (e.g.,limit unfair predictions). We detail these risks in § III-C.

These varied risks, however, do not carry equal meaningand urgency for all principals involved in creating, using, andregulating these models. Because risk definition is inherentlytied to a given principal, e.g., due to their own goals and incen-tives, and various principal definitions may come into conflictwith one another, model governance is best formalized as agame. Each principal participating in the game is optimizingfor their own set of constrained objectives at any point in time,which improves the principal’s trust in the ML pipeline andits other principals. The metrics underlying these objectivesmay differ from accuracy, and may be worst-case in nature.Thus, many established practices for evaluating ML fall shortof providing the guarantees necessary for ML governance.

B. Principals, Incentives, and Capabilities

Defining governance requires understanding the variousprincipals in the ecosystem and their capabilities and incen-tives. We begin enumerating them below. In certain settingsthese principals may not always be separate entities—a datacollector could also be the model builder—in which case sucha principal may assume multiple of the roles defined below.

1) End User: This principal is using a system deployingML. It may be a customer of a company, a patient at ahospital, a citizen asking for government benefits, etc.

2) Model Owner: This principal is one with a particular taskthat can be solved using ML. They communicate theirrequirements to the model builders, and clearly specifieshow this trained model can be accessed (and by whom).

3) Model Builder: This principal is responsible for inter-preting the model owner’s requirements and designingan ML solution for them. To do so, the builder requiresaccess to representative data from the domain (see datacollector below), and an appropriate choice of trainingalgorithm (and approach). The model builder or modelowner will then be responsible for deployment.

4) Data Collector: This principal is responsible for sourcingthe data as per the model owner and builder’s require-ments so as to ensure that the learnt model performswell and complies with various characteristics (such asensuring data privacy etc.).

5) Adversaries: The adversary is any entity that wishes todisrupt any phase of the ML life cycle. They may havevarying control and access over the ML system.

6) Trusted Arbitrator: This principal is required in situa-tions where (i) auditing of services provided is required(e.g., a legal agency or a mandated organization verify-ing if deployed models are compliant with GDPR), or(ii) there is a requirement for arbitration between two

3

Page 4: SoK: Machine Learning Governance

principals (in situations related to ownership resolution).We also refer to this principal as the regulator.2

C. Risks

We highlight three classes of risks relevant to ML gover-nance that have been studied extensively in the literature. Wealso outline different failure modes inducing these risks (thatcan occur at different stages of the ML pipeline). Awareness ofthe risks present provides design goals and considerations thatimpact the creation, deployment, and retention of the model,and these failure modes will later shape our discussion onaccountability and assurance in § V and VI. Though we donot exhaustively cover all risks due to space constraints, thegovernance aspects which constitute this paper are generic andcan be expected to generalize to other risks.

1. Integrity: Losing integrity (i.e., correctness of the systemoutputs) is the immediate risk that a model owner tries tominimize, for example, via training the most accurate androbust model. The integrity of a model needs to take intoaccount the overall system this model is deployed within.Integrity risk quantifies the lack of alignment between theoutput of the ML system and what was expected from it.

The accidental failures that cause integrity risks stem fromthe reliance of ML models on data; data used to train modelsneed to possess specific properties and must be (pre-)processedin specific ways to enable effective training (§ IV-A). Thisensures that models are resilient to outliers (data points thatare outside the range of those expected) [9], [10]. Anotherfailure mode for integrity is that of concept drifts (where thedistribution of data changes over time) [11], [12]. Collectively,both these issues result in the model failing to generalize.

Adversary-induced failures follow. At training time, ad-versaries incorporate data poisons (which are specificallyperturbed data points) which result in poor generalization(i.e., indiscriminate poisoning attacks [13]), or targeted mis-classification (i.e., targeted poisoning attacks [13]). Poisoningadversaries can assume that the data collector uses the labelsprovided with the poisons, or can also specify their own la-bels [14], [15]. The former is known to be ineffective for largeparameterized models [16], while the latter is effective acrossthe spectrum. Poisoning is effective across a range of learningparadigms [17], [18], and across different threat models [19].Backdoor attacks use a different strategy; a specific triggerpattern induces a misclassification, and can be added to anyinput whilst generic poisoning ensures misclassification ofinputs belonging to a particular class [20].

During inference, adversaries can strategically modify in-puts called adversarial examples (whilst ensuring that they areindistinguishable from their benign counterparts). This inducesa misclassification [21], [22]. These evasion attacks are knownto be effective across a wide range of hypothesis classes [21],[23]–[26], and learning frameworks [27]–[29]. Initially intro-duced in vision [21], [22], evasion has now been demonstrated

2For simplicity, we conflate the entity providing the regulation or standardand the enforcement entity when they in practice differ.

for a wide variety of domains including audio [30]–[32] andtext [33]–[36]. Attacks can also be physically realized [31],[37], [38] and constrained such that samples generated forevasion preserve input semantics [32], [34], [39]–[42].

2. Privacy: Data is fundamental to training. Research hasdemonstrated how models trained without privacy considera-tions can leak sensitive information [43], [44]. This has raisedconcerns about the use of data containing sensitive informationand its consequences for individuals and groups alike. Thus,privacy is increasingly becoming a societal norm [45]. This isperhaps best illustrated by legislative work seeking to adaptprotective regulations such as the GDPR to the context of MLapplications [46]. Thus, setting aside the value of upholdingprivacy as a human norm, growing regulation of data usagemeans that it is in the interest of model owners to respectprivacy-preserving practices. Indeed, not mitigating privacyrisks in their ML applications could result in a poor reputationfor the model owner and compliance complications.

Additionally, memorization of training data is essential forgeneralization of ML models [47], [48], and recent work showsencouraging the model to memorize may help its performanceand convergence rate [49]. However, memorization could alsolead to accidental failures: a model can memorize sensitivedata, such as credit card numbers, inadvertently [44]. Suchsecret information may then be leaked at inference time, e.g.,when the model tries to auto-complete a sentence talkingabout credit cards. As shown by the authors, the only effectivemethod among those evaluated is differentially private train-ing [4], often at the expense of model performance.

There are also known adversary-induced failures. In amembership inference attack, an adversary wishes to determineif a given data point was used to train the model. Adversariesinitially assumed access to confidential information from theblack-box model [43] which was shown sufficient to determinemembership. Following this, research has further improved(and simplified) black-box attacks [5], [50]–[53], for instanceby performing label-only attacks [54], [55]. Instead, white-boxattacks further leverage information held in model parame-ters [56]–[59], or gradient updates [60].

Beyond membership inference, other attacks attempt toreconstruct data. In attribute inference attacks, the adversaryattempts to impute missing values in a partially populatedrecord using access to an ML model [61]. Model inversionattacks aim to reconstruct data points that are likely fromthe training dataset. In the classification setting, Fredriksonet al. [62] exploit confidence information from the predic-tion vectors (based on the same overfitting assumption madeearlier). With a similar intuition, Yang et al. [63] use an au-toencoder to reconstruct the input from the confidence vectors,while Zhang et al. [64] use GANs to aid reconstruction. Morerecent attacks (applicable in distributed training regimes, suchas federated learning [65]3) enable successful reconstruction

3Federated learning is a paradigm where multiple clients jointly learn amodel by training a local model on their private datasets and sharing gradientswhich are aggregated at a central entity.

4

Page 5: SoK: Machine Learning Governance

from gradients [59], [60], [66]–[69].

3. Fairness: ML models rely on extracting patterns in data,and unwanted biases are reflected within historically-collecteddata [70], [71]. Unmitigated, these biases transfer into algo-rithmic decisions that are fed back into the decision makingprocess, thus aggravating the situation. Furthermore, algorith-mic decisions themselves can also encode new biases that werenot initially present in historical data [72]; this may be dueto poor quality [73], and even the careless use of fairnessconstraints. [74]

Like integrity and privacy, the challenges of algorithmicfairness can be considered through the lens of attacks anddefenses. One can consider, for example, a prejudiced adver-sary that attempts to sabotage a model performance againsta particular sub-group; but this also risks the integrity of themodel. But does guaranteeing integrity mean that we haveensured fairness? The answer is no.

Unlike integrity and privacy, fairness risks largely stem fromdata collection and processing steps. In an ideal world, (i)we would have balanced datasets to train our models on, and(ii) the predictions of models would apply equally to differentmembers of the population which would, in turn, ensure futurebalancedness of our datasets because model contributions4

to historical data would be balanced. But in most areas ofcritical concern in algorithmic fairness, neither is the casebecause there is a de-facto sampling process often (implicitly)encoded in the processes that produce the data collected [75];an essential factor that should thus be considered under datacollection. For example, in pre-trials where a judge has toassess the risk of accepting or rejecting bail, the defendantsare only being subject to algorithmic predictions because theyhave been arrested in the first place. Mitchell et al. [75] providea broad overview on various fairness risks associated withthree broad categories: (i) data bias: both statistical–whichevolves from lack of representation (i.e., selective labels [76]),and societal–which indicates a normative mismatch [73]; (ii)model bias: influence of choice of hypothesis [77] and eveninterpretability, and (iii) evaluation: how today’s predictionscan alter decision subjects’ behaviors, thus requiring newpredictions in the future. We will discuss such performativequalities of models in § III-D1.

We quickly note that alleviating fairness risks in the “ideal”way above, would inherently be incompatible with controllingprivacy and integrity risks, since our samples sizes are limitedby the size of the smallest sub-population, which potentiallylimits the generalization of our models which not only threat-ens integrity but can also have adverse effects on privacy [43].

D. The Governance Games

ML governance follows from the interaction of the princi-pals (§ III-B) who are independent decision-makers with goalsthat may or may not align with each other. For instance, amodel owner’s desire of achieving robustness may exacerbate

4Indeed, it is common for model predictions to be incorporated as trainingdata in the future, as can be the case for instance with online learning.

leakage of an end user’s private information [5], [78] becauseadversarial training encourages the model to generate predic-tions that are especially stable for data points that are closeto ones in the training set. We can thus see ML governanceas a multi-agent system and analyze dynamics between theprincipals using standard tools from game theory [79]. Solu-tions to this game affect the degree of trust among principalsand towards the application of ML itself. In many governancescenarios, principals act sequentially, after they perceive theeffects of other principals’ decisions. This naturally leads to aleader-follower, or Stackelberg competition structure [80].

1) Stackelberg formulation of learning problems: Workusing Stackelberg competitions considers the strategic behav-ior of a principal (the follower) whose objective is distinctfrom that of the leader. This improves over the adversarialassumption of works that use a zero-sum game formulation(i.e., Yao’s principle [81]) which requires that the follower’sobjective be exactly the opposite of the leader’s. The most rele-vant application of Stackelberg competitions to our discussionis found in strategic classification [82]. There, the subjectsof algorithmic classification are strategic. After observingthe decision made, these subjects can adjust their featuresto achieve a better classification outcome. This is also anexample of classification having performative qualities [83]:participants adjusting their features causes a distribution shift,which in turn means that the future predictions of the modelmay be neither fair or optimal w.r.t. the new distribution. Inpractice, this requires model maintainers to re-train, or employan online learning setup to preserve their model’s integrity andfairness.

Using a similar Stackelberg competition, Perdomo et al. [83]propose an equilibrium notion called performative stability.Models that are performative stable produce predictions thatwould remain optimal under the distribution shifts that resultfrom their own predictions. Similarly, Bruckner et al. [84]consider spam filtering and analyze an adversarial predictiongame between a learner (leader) who tries to classify spam andan adversary (follower) that observes the learner’s decisionswith the goal of producing less detectable spam.

2) Beyond two-player competitions: As useful as Stack-elberg games are, they only cover a subset of interactionsthat principals may have with each other as part of an MLgovernance scenario. In other scenarios, several principalsmay be deciding without observing their opponent actions.For example, a model owner may be unaware of possiblefuture regulatory efforts, an end user may modify its trustassumptions about using ML services, multiple model buildercontractors could enter bidding competitions to train a model,etc. This requires a strategic-form game usually captured inthe form of a pay-off table [80].

However, such a table may be difficult to assemble becauseinteractions can be repeating over time in several stages:principals’ expected risks and pay-offs should be adjusted withthe perceived effects of their decisions in the future. In suchcases, an extensive-form game which can be represented asa decision tree can capture multi-stage games with observed

5

Page 6: SoK: Machine Learning Governance

Regulator

ε < 10

γ < 10−1

Model Owner

ModelBuilder 1

ModelBuilder 1

ModelBuilder 2

ModelBuilder 2

Reject

AcceptModel Owner

Reject

AcceptModel Owner

Regulator. . .

Regulator. . .

Fig. 1: Regulated Model Commission Game in Extensive Form—Strategy sets may be continuous (arcs) or discrete (branches).The dashed line indicates non-singleton information sets: the two decision points of model builder 1 are indistinguishable frommodel builder 2. The game may be repeated (dots). For brevity, we do not indicate decision spaces of most principals or theirpay-offs.

actions; it shows (i) the set of principals, (ii) the order of theirdecisions, (iii) their choices at their respective decision points,aka strategy sets; (iv) their pay-off (or cost) as a functionof their own choice as well as the history of the choicesmade prior to it, (v) information sets: the information availableto each principal at any decision point, and finally (vi) theprobability distribution over any exogenous event [80].

We believe extensive-form games are a promising avenueto formalize interactions between principals and develop MLgovernance in a principled way. Figure 1 illustrates this witha hypothetical ML governance scenario. The regulator wouldlike to ensure fairness and privacy guarantees so it requires thata (pure) DP [85] mechanism be used with a budget of ε < 10.For fairness, the regulator requires demographic parity [86]with at most γ < 10% disparity between sub-populations.The model owner can choose its own specifications for εand γ and commission two model builders to produce themodels. Model builders are now in a bidding competition toachieve the most performant model within specification. Theymay choose different architectures, source their data fromvarious data collectors (not shown in the figure for brevity),etc. In particular, they get to choose how they allocate theirprivacy budget between these different steps of the ML lifecycle [87]. We note that model builder 2 does not observeits rival’s decision (hence the non-singleton information setof builder 1 is annotated with dots). Therefore, the sub-gamebetween the two builders is a strategic-form game playedsimultaneously between the two as no one goes first. Finally,the model owner accepts one model and rejects the other.Model builder pay-offs are assigned at this stage. Next, theregulator can verify achieved fairness and privacy guarantees,and issue proportional fines/incentives for the model ownerif they are within specification, or even improve upon it.For instance, privacy-minded regulators could devise finesthat scale with the gap between the achieved ε in practiceand the regulatory specified one; or a fairness regulator maypropose tax breaks for model owners who have achieved acertifiably fairer prediction model. Since the game is repeated,the regulator can learn from the history of the game and updatethe regulation (if necessary) for the next stage of the game.

3) Solution Concepts: Calculating Nash equilibria is anNP-hard problem [88]. As a result, most classical solutionconcepts for extensive-form games make simplifying assump-tions which limit their applicability to governance games.These include (i) assuming sub-game perfect equilibria, (ii)amenability to backward induction, or (iii) agents adoptingbehavioral strategies [80]. While computing exact strategiesmay be intractable in the presence of multiple principals orcontinuous strategy sets, or when the interaction are recurringacross a time horizon [89], [90]; approximations such ascorrelated equilibria (a superset of Nash equilibria) [80] areeasier to find by iterative mechanisms, and have recently seena flurry of research for their efficient calculation even in thepresence of multiple principles and across a time horizon [91].

Furthermore, in many realistic scenarios principals mayhave incomplete information about the game (known asBayesian games [80]). For example, in Figure 1, modelbuilders may not know the technologies (and therefore, thefairness/privacy guarantees) that their rival is able to achieve.

Recently, new solution concepts have been introduced [92]–[94] that tackle these challenges by combining ML withestablished techniques such as backward induction.

Call to action: As discussed in the introduction, researchon the limitations of ML often considers a single classof risk. When the interplay between risks is considered,two classes of risk are isolated to study their interplayand integrity is often one of them. For instance, the MLcommunity has recently started to realize the importanceof considering strategic decision makers and subjects whichhas led to new and practical definitions of robustness [83].The security community has also long depended on gametheoretical studies of potential adversaries [95]. We posit thatit is time for the research community to discover the morecomplex interplay of a realistic governance scenario with (i)multiple principals, (ii) independent pay-offs and objectivefunctions, (iii) allowing for possibly imperfect information.We believe such a multi-faceted approach would alleviateissues that have plagued single-objective approaches, such aschoosing hyper-parameter values for parametric definitions ofprivacy or fairness.

6

Fig. 1: Regulated Model Commission Game in Extensive Form—Strategy sets may be continuous (arcs) or discrete (branches).The dashed line indicates non-singleton information sets: the two decision points of model builder 1 are indistinguishable frommodel builder 2. The game may be repeated (dots). For brevity, we do not indicate decision spaces of most principals or theirpay-offs.

actions; it shows (i) the set of principals, (ii) the order of theirdecisions, (iii) their choices at their respective decision points,aka strategy sets; (iv) their pay-off (or cost) as a functionof their own choice as well as the history of the choicesmade prior to it, (v) information sets: the information availableto each principal at any decision point, and finally (vi) theprobability distribution over any exogenous event [80].

We believe extensive-form games are a promising avenueto formalize interactions between principals and develop MLgovernance in a principled way. Figure 1 illustrates this witha hypothetical ML governance scenario. The regulator wouldlike to ensure fairness and privacy guarantees so it requires thata (pure) DP [85] mechanism be used with a budget of ε < 10.For fairness, the regulator requires demographic parity [86]with at most γ < 10% disparity between sub-populations.The model owner can choose its own specifications for εand γ and commission two model builders to produce themodels. Model builders are now in a bidding competition toachieve the most performant model within specification. Theymay choose different architectures, source their data fromvarious data collectors (not shown in the figure for brevity),etc. In particular, they get to choose how they allocate theirprivacy budget between these different steps of the ML lifecycle [87]. We note that model builder 2 does not observeits rival’s decision (hence the non-singleton information setof builder 1 is annotated with dots). Therefore, the sub-gamebetween the two builders is a strategic-form game playedsimultaneously between the two as no one goes first. Finally,the model owner accepts one model and rejects the other.Model builder pay-offs are assigned at this stage. Next, theregulator can verify achieved fairness and privacy guarantees,and issue proportional fines/incentives for the model ownerif they are within specification, or even improve upon it.For instance, privacy-minded regulators could devise finesthat scale with the gap between the achieved ε in practiceand the regulatory specified one; or a fairness regulator maypropose tax breaks for model owners who have achieved acertifiably fairer prediction model. Since the game is repeated,the regulator can learn from the history of the game and updatethe regulation (if necessary) for the next stage of the game.

3) Solution Concepts: Calculating Nash equilibria is anNP-hard problem [88]. As a result, most classical solutionconcepts for extensive-form games make simplifying assump-tions which limit their applicability to governance games.These include (i) assuming sub-game perfect equilibria, (ii)amenability to backward induction, or (iii) agents adoptingbehavioral strategies [80]. While computing exact strategiesmay be intractable in the presence of multiple principals orcontinuous strategy sets, or when the interaction are recurringacross a time horizon [89], [90]; approximations such ascorrelated equilibria (a superset of Nash equilibria) [80] areeasier to find by iterative mechanisms, and have recently seena flurry of research for their efficient calculation even in thepresence of multiple principles and across a time horizon [91].

Furthermore, in many realistic scenarios principals mayhave incomplete information about the game (known asBayesian games [80]). For example, in Figure 1, modelbuilders may not know the technologies (and therefore, thefairness/privacy guarantees) that their rival is able to achieve.

Recently, new solution concepts have been introduced [92]–[94] that tackle these challenges by combining ML withestablished techniques such as backward induction.

Call to action: As discussed in the introduction, researchon the limitations of ML often considers a single classof risk. When the interplay between risks is considered,two classes of risk are isolated to study their interplayand integrity is often one of them. For instance, the MLcommunity has recently started to realize the importanceof considering strategic decision makers and subjects whichhas led to new and practical definitions of robustness [83].The security community has also long depended on gametheoretical studies of potential adversaries [95]. We posit thatit is time for the research community to discover the morecomplex interplay of a realistic governance scenario with (i)multiple principals, (ii) independent pay-offs and objectivefunctions, (iii) allowing for possibly imperfect information.We believe such a multi-faceted approach would alleviateissues that have plagued single-objective approaches, such aschoosing hyper-parameter values for parametric definitions ofprivacy or fairness.

6

Page 7: SoK: Machine Learning Governance

IV. ESTABLISHING IDENTITY IN ML SYSTEMS

The notion of identity is fundamental to computer secu-rity [96]. Developing the notion of identities involves firstrecognizing which principals are involved in a system andsecond how they interact with this system as it takes actionsover objects. Thus, identities are a prerequisite to holdingprincipals accountable for failures of the system (e.g., whoshould be held responsible for the failures of an ML modeldeployed by a self-driving car?).

Similar to other systems, those that incorporate ML need tointegrate access control mechanisms to ensure that principalsare authorized to interact with the system. We identified anumber of principals in § III-B. For instance, an end usermay need to authenticate themselves before they are allowedto query an MLaaS API, just like authentication could benecessary to query a database. However, because ML systemsintroduce new, often implicit, information flows between dataand models, they call for specific mechanisms.

In particular, we focus on three areas of interest: the trainingdata, the test data, and the learned model. In the followingsubsections, we define ownership as the process of associatinga principal with these key components of an ML system. Later,in § V, we focus on how we can hold principals accountablefor model failures once identities relevant to data and modelshave been clearly established through the process of ownershipresolution.

A. Data Ownership

Data is one of the foundational pillars of a functional MLmodel, both at training and inference (test) time.Training Data: During learning, the training data is used tocompute the objective function being optimized to find themodel’s parameters (recall Equation 1). Since the existenceof training data is a prerequisite to obtaining a model, theentity owning the training data plays an integral role in thegovernance of the ML system.

Specifically, understanding who collected the data can en-able answering various questions regarding (i) representa-tiveness: if the collector paid attention to ensure that thedata collected is not stagnant and captures effects such asdomain/concept shifts (where applicable) [97], (ii) data clean-liness: if the data is processed for usage, and sanitization tech-niques (such as the use of robust statistics) are applied [98],(where applicable, especially for detecting poisons [99]), (iii)fairness: if the data collected is representative of the down-stream task and various subgroups (as specified by the modelowner of § III-B) [100], and (iv) feature engineering: if thedata was pre-processed correctly for various downstream tasks(such as private/fair learning) [101].

Cryptographic primitives are essential in establishing andprotecting data ownership by enforcing confidentiality. Ap-plications of homomorphic encryption [102] to ML wouldprovide the principal data confidentiality whilst still enablingtraining. Yet, the use of cryptography introduces computationalbottlenecks and makes it difficult to leverage hardware ac-celeration, often requiring secure hardware [103], [104]. We

discuss the application of homomorphic encryption to MLin more detail in § VI. In addition, there has been a shiftfrom centralized learning to distributed learning techniques(such as federated learning [65]) to enable co-locating dataand computation. Additionally, these learning techniques areassumed to provide data privacy as the data never leaves thedata owner’s site [60]. Federated learning proposes to haveindividual data owners compute model updates locally ontheir data rather than share this data with a central server.The central server then relies on cryptographic primitives(such as secure aggregation [105]) to aggregate updates fromdifferent data owners. Other approaches involve using tech-niques from secure multi-party computation (MPC) to enabletraining [106]–[113], but most suffer from poor scalability.

Defining data ownership legally is complex, if only becausedefining data itself is complex. Under most jurisdictions facts,that is the building blocks of the information recorded di-rectly or indirectly as measurable data, are not protected bycopyright law [114]. Hence, the question of data ownershipshifts from the protection of individual data records to theircollection, arrangement, and derived works. Collections ofdata and derived works from data may be unprotected bycopyright law depending on a number of conditions such as(i) under the merge doctrine [115], [116], if their exposition isinherently tied to the facts and data they represent, or (ii) if thearrangement is not considered original enough, as evaluated incourt, e.g., a phone book alphabetical order is not protected.For ML, this may entail that if a data collection pipeline beingdeployed is deemed not sufficiently novel, the collected datasetmay not be protected by copyright law. Observe that copyrighton a dataset would not prevent someone from using individualdata points but rather significantly copying the entire dataset.Depending on the jurisdiction, the status of ownership overpersonal information and the existence of a personal dataownership right is debated [114]. Canada leans towards onlyproviding access but not ownership of personal informationwhereas in the EU the existence of a personal data ownershipright has been argued.Test Data: At inference, the model is deployed to predictunseen data. Unlike training data, test data does not influencethe model permanently (i.e., it does not impact the valuesof the model parameters). That said, test data can still beresponsible for model behavior that may make ML governancemore difficult to achieve. For instance, test data may containadversarial examples [23] that affect the integrity of modelpredictions or sensitive data that should not be revealed tothe model owner directly. Thus, like for training data, bothtechnical and legal means are required to establish and protecttest data ownership.

As discussed earlier, cryptography can address certain as-pects of test data ownership as well. For instance, homomor-phic encryption enables a model to perform inference withouthaving a cleartext view of the input point [117]–[119]. Thesame can be achieved using techniques from MPC. This ishelpful in scenarios where proprietary medical records cannot be shared between institutions [120]–[122]. Here again,

7

Page 8: SoK: Machine Learning Governance

moving from a central to a distributed setup for ML can helpfoster trust for the data owners [123].

B. Model Ownership

Model ownership is often a broad term used to refer tothe ownership of the model’s sensitive parameters that wereobtained after the (computationally intensive) training process.Defining ownership is necessitated by the existence of variousthreats that infringe the confidentiality of the model, andthe need to be able to hold principals that own ML modelsaccountable for the failures of their models.What is model extraction? Attackers may target the con-fidentiality of ML models with a model extraction attackfor three reasons: (i) data collection and labeling is time-consuming and expensive [124], (ii) the training process iscomputationally expensive (especially in many deep learningapplications) [125], and (iii) knowledge of the model canserve as reconnaissance to mount a follow-up attack againstthe model in the white-box threat model rather than black-box. Models that are proprietary could be released throughinsider threats [126], and models that are accessible via aninterface can be reverse engineered [127]. If the latter is aquery interface where principals can pose inference queriesto models, then it is formalized as model extraction [128]:the objective of the adversary is to succeed with as fewqueries as possible to avoid detection. By repeatedly posing(inference) queries to the trained model, the adversary recoversthe model’s functionality. But functionality is an abstractconcept; Jagielski et al. [129] provide a detailed taxonomyof what exactly an adversary can recover through this reverseengineering process, ranging from the exact model parametersto replicating output behaviors for specific input data points.

Extraction is demonstrated in various domains, such as im-ages [129], text [130], and for simple hypothesis classes [131],and more complicated ones such as DNNs [132], [133];assuming grey-box access [134], and more strict black-boxregimes [135]. Some recent attacks do not pose inferencequeries, but assume access to hardware side-channels [136],[137]; while such attacks recover the model, they are notmodel extraction attacks in the traditional sense.But how does the adversary generate queries? Assumingthat the adversary has access to the dataset used to traina model, or the distribution from which it was sampledfrom, is a strong assumption to make. To alleviate theseconcerns, Truong et al. [138] propose a refinement to thework of Kariyappa et al. [139]; they propose a data-freeextraction method based on disagreement between the victim(i.e., trained model) and model being trained by the adversary.A query synthesis active learning-based approach was used byChandrasekaran et al. [131] for extracting simple hypothesisclasses; this provides asymptotic bounds on the number ofqueries needed. The same authors also argue that pool-basedactive learning strategies can be used when the adversary doeshave access to a pool of unlabelled data, and Jagielski etal. [129] propose using semi-supervised learning techniquessuch as MixMatch [140] to practically instantiate extraction.

Why is it impossible to prevent model stealing? Various de-fenses have been proposed to prevent the adversary from learn-ing enough information to enable extraction [141]. These rangefrom reducing precision of the outputs [128], to more sophis-ticated approaches such as randomized model selection [142],[143], adding noise in responses [144], protecting data pointsnear decision boundaries by differential privacy [145], orperturb outputs to mislead gradient-based optimization al-gorithms (e.g., SGD) [146]. However, such approaches arerendered ineffective with an adaptive extraction strategy [131],or introduce utility degradation. Chandrasekaran et al. [131]state the inevitability of extraction. The authors argue that anyutilitarian model (i.e., one that is reasonably accurate) leaksinformation which enables extraction.How can one prove ownership? To prove ownership, the pro-cess of watermarking was considered. Digital watermarkinghas been used to covertly embed a marker into digital mediato later prove ownership infringements [147]. In the context ofwatermarking ML models, the model owner encodes a specificquery response pair into the training procedure (analogous toembedding a marker), and can verify if a model is indeedtheir own on observing such behaviours at the time of infer-ence [148]. This process is fundamentally similar to the con-cept of backdooring [149], without any adversarial intent (withthe additional requirement that watermarks are non-forgeable).However, recent work [150] demonstrates that watermarkingis not resilient to model extraction; Jia et al. [150] proposean approach that remediates this issue, but with a nominalperformance penalty.

All forms of watermarking rely on encoding specific se-cret information into the training algorithm. Recent researchsuggests that there exists various forms of secret informationalready available to the true model owner, obtained duringthe process of training. This includes the order in which datais sampled, intermediate checkpoints, hyperparameter values,the intermediate values of model weights, etc. One proposalto prove model ownership relies on the honest model ownerlogging such information during the training procedure. SinceDNN training is a stochastic process, the intermediary statesobtained during training are hard to guess without accessto the secret information aforementioned. If a trusted thirdparty can reproduce these states (within some acceptabletolerance threshold) with the help of the honest principal’ssecret information, then the honest principal’s ownership ofthe model is validated. This is the premise of the proof-of-learning approach to enable ownership [151].Call to action: First, notice that claiming model ownershipand differentiating between two models are two sides of thesame coin. It is unclear if existing metrics (such as the distri-butional distance between model parameters or predictions)capture the closeness, and research is needed on this front.Second, most techniques to detect extraction rely on minorchanges to the training procedure. More research is neededto understand if intrinsic properties of the ML model canserve as its fingerprint (ergo enabling ownership resolution).

8

Page 9: SoK: Machine Learning Governance

C. Is model ownership separate from data ownership?

Maini et al. proposed dataset inference that validateswhether a model was trained using a specific training set(or training-set distribution) [152]. This is related to modelownership because as long as only the model owner has accessto the (proprietary) training data, the model owner can proveownership. However, in other cases (such as training usingpublic data), this technique cannot prove ownership.

Recall that from a legal angle, collections and derived worksfrom data may at times be unprotected by copyright law.Because ML is known to memorize [47], models might not beprotected by copyright law unless the model builder can showthat their implementation non-trivially included significantoriginal work. Similarly, content synthesized by generativemodels [153], and language models such as GPT-3 [154] couldbe, as derived work, respectively protected or in violation ofcopyright law [114]. In Canada, but not necessarily in otherjurisdictions like the European Union, copyright generallyrequires at least one human author—thus excluding automatedart generation processes from copyright protection [114].Given that instances of sufficiently original (and/or difficult)derived work have been allowed copyright [114], if a DNNsatisfies such a condition that would allow for the distinctionbetween data ownership and model.

V. HOLDING ML PRINCIPALS ACCOUNTABLE

Identities, as established in § IV, serve multiple pur-poses [96]; one of them involves enablingaccountability. Inconjunction with traditional access control mechanisms, thenotion of identity provided by data and model ownershipin the context of ML allows us to establish the responsi-bility of principals interacting with an ML system. This isdefined as accountability, and is different from the processof ascertaining accountability which we define as attribution.Indeed, accountability is essential when ML models produceundesirable behaviors (i.e., failures) such as the ones discussedin § III-C. Some of these failure modes will be known andneed to be attributed to a principal (we outline these earlierin § III-B). This is similar to how malware analysis will seekto identify whether a malware originated from a state actor ornot. Principals held responsible could be the model owner,but also end users (e.g., if they are attempting to steal amodel, or to have it produce an incorrect prediction with anadversarial example). Other failure modes of the model maybe initially unknown but uncovered by external scrutiny of anML pipeline, e.g., by a regulator through an auditing processdiscussed later in this section.

A. Attribution of Known ML Failures

Recall that we define attribution as the process used toidentify principals responsible for model failures. Here, wediscuss detection of the failures discussed earlier in § III-C.This is a precursory requirement for attribution.1. Outliers: A considerable amount of research investi-gates how to identify when DNNs are predicting on out-of-distribution (OOD) samples. The model’s confidence for

different outputs can be made more uniform on OOD sam-ples [155]–[160], or the model can be trained explicitly toassign a confidence score which can be used to tell howlikely the input was out of distribution [24], [161], [162].Other methods have also been introduced to distinguish theoutputs of DNNs on OOD and in-distribution samples [163]–[167], including approaches based on inspecting the model’sinternal representations, training set, or performing statisticaltesting [168]–[172]. These approaches could be applied tohold any principal querying the model on OOD samplesaccountable.2. Concept Drift: The model builder may attempt to detectdrifts in the training data distribution to hold the data collectoraccountable. Lu et al. [11] summarize past research into threebroad themes: (i) error rate based detection (a model’s deteri-orating performance serves as an indicator of shifts) [173]–[186], (ii) data distribution based detection (which try anddiscern differences in the distribution of previous and newinputs) [187]–[192], and (iii) multiple hypothesis test driftdetection (which differ from the previous two by utilizing morethan one test to discern drift) [193]–[199]. Note that the twolatter approaches could also be used by the data collector tohold potential downstream data contributors accountable.3. Data Poisoning: Attribution of poisoning can be doneeither at the data collection or model training stages. To thebest of our knowledge, the only known techniques belongingto the former category utilize insight from outlier/anomalydetection [200], [201]. For the latter, research focuses onfinding relationships between poisoned data and special be-haviors in the representation space (or activations of hiddenlayers) [202]–[204], investigate the gradients [205], or observechanges in the learned decision boundaries [206], [207].4. Adversarial Examples: To distinguish adversarial exam-ples from legitimate inputs, detection techniques often firstpre-process the inputs. This includes replacing the input byits internal model representations [208], [209], transformingthe inputs [210], [211], applying dimensionality reductiontechniques [212], [213]. After pre-processing, a classifier orstatistical tests distinguish distributions of adversarial andlegitimate inputs [168], [211], [212], [214]. However, onlya small fraction of the proposed detection techniques areevaluated against an adaptive attacker [215], which may pre-vent robust attribution of adversarial examples and jeopardizeaccountability.5. Model Extraction: To attribute queries contributing to amodel extraction attack, Juuti et al. [141] compare the distri-bution of distances between queries. However, if an adversaryleverages public data (which is in-distribution), the proposeddetection method fails. It is unknown if other techniques canbe devised for this in-distribution querying regime as well.6. Privacy Attacks: There currently exist no mechanisms toidentify if privacy attacks (such as membership inference ordata reconstruction) are occurring at inference time. This isdue to the nature of the attack: adversaries query the modelwith (potentially) in-distribution data and observe the outputs.

9

Page 10: SoK: Machine Learning Governance

Call to action: Discovering approaches that can detect andattribute (adaptive) membership inference while it is beingperformed remains an open problem.

7. Synthetic Content: Generative models have revolutionizedcontent generation [216]–[218]; while this is useful in manyscenarios (such as private data release [219]), it also introducesa myriad of problems. For instance, content produced bygenerative adversarial networks [220] is being actively usedin disinformation campaigns [221]. A number of approacheswere proposed in order to detect such fake content [222]–[225] (including large-scale competitions [226]). Attributingsynthetic content to a particular generative model is difficultbecause models are often capable of producing [153] andencoding [227] any arbitrary content and allow for encodingof arbitrary contents. A promising approach is to study modelcomponents explicitly, e.g., upsampling components whichproduce spectral aliasing [224].

B. Auditing to Identify New Model Failures

Our discussion thus far has focused on techniques devisedto detect failure models that are well known and understood.An alternative involves auditing mechanisms. To this end,we first define ML auditing and present how ML loggingcan support it. We then discuss storage, privacy, and security(open) problems that result.What is ML auditing? Bishop defines an audit as a formalexamination by a (third) party to determine a chain of eventsand actors [96]. In the case of ML, this definition must beextended to include the verification of the claimed propertiesof a model, such as its lack of bias or presence of adversarialrobustness. As noted in § III-B, the different logical entitiesmay, in practice, refer to the same principal. Hence, wecall an audit internal (resp. external) when the model ownerand auditor are the same entity (resp. independent entities).Observe that an internal audit is a proactive action similarto a white-box assessment, and may rely on model logs i.e.,data records collected on the model and its inputs. On thecontrary, an external audit is similar to a black-box assessment,and may only rely on model querying when the model ownercannot be trusted. Prior literature tends to conflate audit withexternal audit. This leads to a pessimistic view of ML auditas a non feasible task. This distinction helps us introducealternate auditing methods including logs which are promisingbut completely unexplored. This apparently simple distinctionprovides deeper insight: (i) during internal audits, the regula-tors have white-box access to model internals, code, and logs;this enables easier verification of purported functionality asopposed to external audits where regulators only observe thefunctionality with no internal knowledge, and (ii) leveraginginternal logs may alleviate technical limitations of an externalaudit e.g., the current lack of guarantees with regards to modelrobustness when the auditor can only query the model.

For instance, the model deletion code (i.e., internal audit)of SISA unlearning [228] can be reviewed for correctness i.e.,one only needs to check if a model is being trained without

the data-point to be unlearned, while external verification (i.e.,external audit) of unlearning (through query access) remainschallenging (because approaches like membership inferenceintroduce a large number of false positives). For privatelearning, mechanisms like DP-SGD [4] come with a privacyaccountant that trivially tracks expenditure, but the privacyof the models can also be audited through computationallyexpensive means post-hoc [229], [230] In the context ofalgorithmic fairness, model owners are being asked to conductfairness audits [231], [232]. Despite the well-intentioned toolsthat have arrived as a result [233], given poor incentives formodel owners, we are now faced with the risk of fake fairnessaudit reports [234]. Prominently, Aivodji et al. demonstrateshow using an interpretable proxy allows model owners toevade accurate fairness audits [235].

Much like system and application logs are well-knownas a fundamental tool in incident investigations in non-MLsettings [96], a model should never be separated from its logs.

Why is logging hard for ML? In general, knowing whatand when to log is non-trivial and highly use-case dependent.Additionally, log storage space is finite, and overhead inducedby logging is challenging in many other domains [236].Logging in ML differs from non-ML applications because of(i) ML pipeline control-path explosion dependent on externalevents, the data, and randomness which increases the contentrequired to be logged, (ii) the different ML-specific steps, liketraining and testing, of the model life cycle which requirelogging very diverse and step specific information, and (iii)there is a need, parallel to that in system and applicationlogs, to focus on both identifying principals responsible for aspecific high-dimensional input but also, specifically for ML,tracking certain model properties (required for accountabilityand attribution respectively). Consequently, the ML loggedinformation may greatly differ from traditional event logs usedin application, network, and system security. For example, theproof (or log, albeit it is not referred as such) created in thework of Jia et al. [151] is created with the explicit goal ofproving model (parameter) ownership and requires informationabout data used to train the model, randomness used in thetraining process (which is hard to precisely capture), and tracksthe change in model parameters as training progresses.

Maintaining log confidentiality. Lastly, when designing anML auditing system based on logs, the privacy and security ofthat system itself should be taken into account. Here again, MLintroduces specific issues. First, since logs may be accessedby auditors, they represent a risk and must be sanitized [96].As illustrated in other sections, this sanitization is not trivialand in the case of logging is further complicated by its streamnature. This (important) requirement makes ML logging hardin practice. Second, adversaries may try to simultaneouslypose threats to the ML model and log analysis system (ifit is ML-based). For example, Travers et al. [237] proposeevasion attacks for audio-based ML models whilst ensuringthat intermediary steps (which can be considered as logs) arewithin acceptable levels.

10

Page 11: SoK: Machine Learning Governance

Call to action: Despite the upstart of a number of industryML logging tools and practices (both for the model andunderlying system performance) [238], only little academicwork has delved into the challenges of ML logging.

VI. ASSURANCE IN MLIn addition to being transparent about failures resulting from

the ML system’s use, which they can be held responsiblefor as shown previously in § V, ML principals will need toestablish trust in their application of ML techniques. Bishopdistinguishes trust, which is the often ill-defined belief that asystem is secure, from assurance which is a set of techniquesby which one builds confidence from evidence that a systemmeets its security requirements [96]. Here, by security we referto the broader mitigation of risks identified in § III-C. Onewell-known example of such techniques is formal methodsfor verification, which have started to emerge for ML sys-tems [239]–[243]. We discuss these techniques and others,following the operational life-cycle described in § II-B

A. Assurance during Data Preparation

Assurance in the context of data preparation ensures thatmodels learnt satisfy specific properties (e.g., privacy, ro-bustness, fairness, etc.). Assurance can be provided throughdata sanitization (e.g., removal of data poisons) [9], [10] andchecks for concept drift [11], bias alleviation [244], [245],and subgroup representation [246], [247]. To the best ofour knowledge, there has been no work that provides for-mal guarantees about a dataset providing the aforementionedproperties. It is also unclear how such guarantees wouldbe provided. An alternative approach to providing assuranceinvolves providing documentation showing the processes usedduring data collection met certain criteria that are designed toprovide assurance; Gebru et al. [248] proposed datasheets fordatasets, a generic documentation scheme meant to be usedto verify the data collector employed practices sufficient toalleviate known failures, and other work have further expandedon (or proposed similar measures) [249]–[252].

B. Assurance for Training

As in the case of prior components of the life cycle,assurance can either be obtained post-hoc (i.e., after trainingcompletes), or be obtained during training. Recall that assur-ance is required among various axes, which we describe next.Integrity: In the context of evasion attacks, robustness isprovably provided using two main strategies. The first isadversarial training [253] and its variants [254]–[261]. Suchstrategies are formalized such that assurance is encoded intothe training algorithm and an auditor would have to inspect thealgorithm itself to understand what properties were provided.An alternative approach involves providing a certificate ofrobustness [240], [262] which provides a certificate indicatingthat the trained model is robust to perturbations within a pre-defined p−norm radius. To the best of our knowledge, Stein-hardt et al. [263] proposed the first certified data poisoningdefense by proving an upper bound on the model’s loss in the

presence of an adversary. However, as they pointed out, theirmethods have to run on clean data, which is often impractical.Ma et al. [264] propose DP as a solution against poisoningand provide theoretical bounds on the number of poisonstolerable by such models. Weber et al. [265] and Levine etal. [266] both propose newer certified defenses against datapoisoning; the former uses techniques similar to randomizedsmoothing [240] and the latter uses collective knowledge froman ensemble of models to prove robustness. Finally, integrityof the computation and model parameter values can also beverified by a mechanism that provides a proof associated withthe particular run of gradient descent which produced themodel parameters, as suggested by Jia et al. [151].

Privacy: Several approaches can be taken to preserve dataprivacy. The de-facto approach is to utilize DP learning.Proposed initially by Chaudhuri et al. [3], DP learning hasevolved from the objective perturbation approach proposed bythe authors to more sophisticated techniques which involveadding noise during optimization [4], or during aggregatingvotes from disjoint learners [267]. These newer approacheshave tighter forms (and analyses) of privacy accounting (i.e.,calculating ε). In the decentralized setting, approaches such asCaPC [123] provide assurances about data confidentiality andprivacy. Recent research [229], [230] demonstrates that theprivacy accounting during training is not tight (in practicalscenarios), and could benefit from external audits.Randomness: Unsurprisingly, providing assurance thatsources of randomness in the process of learning are crypto-graphically secure is essential. Research demonstrates that, ifmanipulated, the order of the batches served to a model duringtraining can itself result in a lack of generalization guaranteesfrom the training algorithm, resulting in a lack of integrity forthe model parameters output by this training algorithm [268].Furthermore, randomness in data sampling is fundamental toamplification results demonstrated for DP learning [4], [269],[270]. Providing assurances on this matter is an open problem.Fairness: For individual fairness, [271] produces a formal cer-tification by ensuring the representations of similar points arekept close in the latent space. For subgroup fairness definitions,however, the primary focus has been in designing unfairnessmitigation techniques despite data bias issues. Consideringa core ML task (often classification) as an example, thesetechniques can be categorized into (i) pre-processing, wherealgorithms re-weight, obfuscate, transform, and/or removesamples to satisfy a notion of fairness [244], [272]–[275]; (ii)in-processing, where additional constraints are added to theoptimization procedure to result in fairer models [245], [276]–[278]; or (iii) post-processing, where the output is changed tomatch a desired distribution [279]–[281]. Other methods havebeen developed to empirically assure the fairness of a modelby thoroughly documenting its performance on different sub-populations [282], [283].

Despite improving fairness, an important limitation of thesetechniques is that they do not strictly provide formal fairnessguarantees. To obtain assurance of fairness during the training

11

Page 12: SoK: Machine Learning Governance

of a model, algorithms that provably result in fair models mustbe implemented. Again, initial work in this vein provides guar-antees through pre-pocessing [272] and in-processing [275],[277], [278]. The most well-known formal guarantees forprediction fairness is provided under a metric-based notionof individual fairness [284]: Rothblum et al. introduce a PAC-like relaxation of the said notion to prove asymptotic fairnessguarantees for the individual [285].

Call to action: The algorithmic fairness community has re-cently recognized the importance of considering the broader“algorithmic system” [286], to consider social actors anddecision making context. While we agree with the message,we posit that we need to go beyond studying fairness—or anysingle class of risk on its own—and consider the complex butrealistic interplay of the principals in the broader context ofML governance. Given the importance of data in all suchsystems, a concrete step is to research fairer, more accurate,and non-privacy-invasive data collection practices.

C. Assurance at Deployment

When a model is deployed as a service, there are mainlytwo entities possibly interacting with the model: the serviceprovider who publishes the model, and the end users queryingthe model. In this section, we discuss assurance provided bythe service providers to the end user.

To build end user confidence, service providers may aimto show two features of their service: (i) integrity, that isthe computation is performed correctly and the prediction isindeed made by the model being advertised (rather than forinstance a simpler but cheaper model), and (ii) maintain con-fidentiality of the query made by the users which can containsensitive information. For example, hospitals may query onlinehealthcare ML models to diagnose patients’ diseases, wherethe queries could contain the patients’ (private/confidential)health condition.

Integrity: End users would require assurance regarding thecorrectness of the prediction generated by the model. Thefield of verified computations [287] is designed to allowa principal to verify the correctness of outsourced compu-tation. The main issue with using off-the-shelf techniquesfrom verified computations literature on DNN inference isthe large overhead incurred. To solve this issue, building onThaler et al.’s [288] work towards efficient verified computing,Ghodsi et al. [289] introduce SafetyNets which applies verifiedcomputing techniques to inference operations efficiently. Notethat verifying the integrity of computation is fundamentallydifferent from approaches that assure confidence of (or trustin) the prediction [170]

Confidentiality: Another concern of using MLaaS is regardingthe confidentiality of the data used to query the ML models.To this end, several approaches rely on homomorphic en-cryption [117], [118], [290]–[292], or use MPC techniquesfor inference [106], [123]. Both these approaches rely oncryptographic techniques which are provably correct.

VII. LIFE CYCLE MANAGEMENT OF ML SYSTEMS

The techniques presented thus far do not preclude MLmodels from needing modifications: training from scratch toremove certain pathological behaviors learned by the modelmay be expensive [228] and leveraging backups will not solveissues which are embedded in the backup model’s weights.The process of patching an ML system, a form of maintenance,is thus the last component of our approach to governance. Inthe following, we outline what these different processes wouldlook like for ML systems.

A. Patching ML Systems

1. Unlearning: After deployment, the model owner may loseaccess to part of the data used to train the model. This maybe due to detecting data poisons, or outliers, or for legalreasons. For example, this may be the case when the datacollector raises privacy concerns and invokes the right-to-be-forgotten [293], [294]. The process of obtaining a modelwithout the influence of said data-point(s) is referred to asmachine unlearning [295].

Various forms of unlearning have been proposed whichoffer different guarantees. For example, retraining a modelfrom scratch would give a concrete guarantee to the end-userthe unlearned model is now independent of the data the userrevoked access to. However, this comes with large (training)costs. One could modify the model’s architecture and trainingpipeline to reduce the retraining cost [228], but to some thecost may still be too high. Alternatively, the approach ofSekhari et al. [296] states that a model with strong privacyguarantees (i.e., DP-privacy) does not need to be modified toforget a user as it does not leak user-specific information inthe training dataset, and thus unlearning requires no additionalcomputational resources. However, private models are oftennot utilitarian [72], [101]. To find a middle ground betweencomputational costs and performance, an alternative approachis to define an unlearning rule (for example adding a hessian-vector product) that modifies a model such that the resultantmodel would be similar to a model that had not trained on theusers’ data [297]–[300].

Unlearning highlights the importance of modeling inter-actions between ML principals and the interplay betweendifferent risks raised (§ III-D). For example, a user’s decisionto revoke access to their information can itself leak informationabout them having been in the training dataset if it is madeadaptively [301]–[303].

Call to action: Regulators will play an important role indeciding what should be forgotten (and to what degree) forthe various scenarios when parts of the training dataset mustbe removed. It could be that any unlearning method thatmitigates privacy leakage would suffice for dealing with usersrevoking access to their data. In contrast, it might be that fordata poisoning (where there could be grave security risks) themodel owner would need to completely retrain or give someguarantee of changing the weights sufficiently to remove thethreat. Determining this is subject to future research.

12

Page 13: SoK: Machine Learning Governance

2. Fine-tuning: One may also want to additionally learnnew model behaviors. This can be achieved by fine-tuningmodels. Fine-tuning is needed in two scenarios: in (i) onlinelearning [304], i.e., more and more data becomes availableafter the model is trained so it is preferable to fine-tunethe model on more data to improve its performance, and(ii) transfer learning [305], where one wants to fine-tunea trained model to help a different but related task, e.g.,classifying traffic signs used in another country. If the model’scapacity remains fixed or training data previously used is notrepeatedly used to train the model, one must take care inavoiding catastrophic forgetting: by learning a new behavior,the model may inadvertently perform poorly on data points ithas previously analyzed [306].

B. Retiring ML Systems

Following our discussion of unlearning, it is easy to seethat one may consider retiring an ML model altogether whena large portion of the training data is requested to be unlearned(and the remaining data is insufficient to obtain a performantmodel). Apart from unlearning, there may be other reasonsfor a model owner to choose to retire an ML pipeline. Thisincludes ethical concerns where retiring the data itself mayhelp prevent further ethical concerns, e.g., as discussed byPeng et al. [307]. Regulators will here again play a crucial role,and may decide to develop frameworks that encourage MLprincipals to retire their data and models when appropriate.

VIII. DISCUSSION

Our systematization led us to develop a framework for MLgovernance. Through the concepts of ownership, accountabil-ity, assurance, and life cycle management, we foster trustbetween principals involved in an ML application. Our gameformulation proposes to explicitly search for equilibria thatbalance the benefits of ML with its risks. Space constraintsprevented us from discussing certain aspects that remain rele-vant to achieving governance. For instance, deploying ML in-volves computer systems that support ML but are not capturedin our analysis; such computer systems are implicitly assumeto form the trusted computing base. Hence, our solutionsfor governance largely rely on the security of these systemsthemselves. Beyond this aspect, we now outline avenues forfuture work to strengthen our understanding of governance.

Risk Interplay: Through this work, we have delved intounderstanding various facets of risk associated with ML. Thegame formulation explains the interaction between variousprincipals and how this leads to a constant recalibration of risk.Yet, we pointed out how there remains a limited understandingof the interplay between the various risks themselves. Limitedresearch has been conducted to analyze the interplay of thevarious facets of risk in unison; research however has beenconducted to understand this interaction in a pair-wise manner(i.e., relationship between utility and robustness [308], orprivacy and fairness [309]), but this paints an incompletepicture as discussed in § III-D.

Inter-principal Trust: An important goal of governance isto improve trust between ML principals. Hence, documentingclearly the limitations of ML systems is of paramount impor-tance to communicate clearly when a satisfactory approach togovernance cannot be found. For this reason, we believe thatapproaches analogous to the one taken by dataset cards [248]will complement well our work on governance. This alsorelates to research on explainability which seeks to enableend users to understand the logic behind learning algorithms.There are two lines of work here: (i) feature-level, and (ii)concept-level attribution. The former attributes every feature(e.g., pixels in an image) to a prediction [310]–[312]. The latterattributes higher-level concepts to a prediction [313]–[315].Interpretability: The techniques discussed in § V arm prin-cipals in identifying various issues. However, they may notunderstand why these issues occur and consequently may havea difficult time resolving them. Ensuring that the learning andprediction process of many ML algorithms is interpretable(particularly to non-experts) is paramount to enable morewidespread and trustworthy usage.Parameterization: Furthermore, the assurance provided isoften parameterized. For example, the assurance providedby DP learning is the expenditure of privacy (ε). However,interpreting these parameters is non-trivial for the averageuser. It is also cumbersome for the user to specify theirrequirements (for example, privacy requirements) to modelbuilders for the same reason. Parameterized definitions arenot easily accessible, and this further impedes governance.Additionally, recent research [316] suggests that user expec-tations for privacy do not match what is provided. Thus,this difference between mental models and reality can alsoimpede governance. While similar to the earlier discussion oninterpretability, we wish to highlight that parameterization ismore focused on providing principals knowledge to bootstrapthe training process, while interpretability provides tools tobetter understand its outcomes.Trust in Regulators: We implicitly assumed that the regulatorprincipal is trustworthy and seeks to preserve the society fromML risks. This may not always be the case, for instance in set-tings where ML is used by authoritarian regimes. Additionally,trusted regulators may not be privy to sensitive information inreality, as such forms of data transfer are strictly governed bylaws [317].Legal Challenges: While we have taken care to highlight cer-tain legal challenges raised by ML governance, there currentlylacks interdisciplinary work on law and ML. Studying the tech-nical problem in isolation is unsatisfactory because without amatch between law and techniques, techniques will not seeadoption. For instance, the technical community focuses onDP but legal frameworks focus on anonymization [46]. Thisgap results in a lack of adoption of advances made to privatelylearn from data. Sometimes the inverse problem holds: whilelaws against disparate impact have long been in effect; it hasbeen fairly recently that we have seen their adoption by theML community [272].

13

Page 14: SoK: Machine Learning Governance

REFERENCES

[1] B. W. Lampson, “Computer security in the real world,” Computer, 2004.[2] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial

machine learning,” in Proceedings of the 4th ACM workshop on Security andartificial intelligence, 2011.

[3] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empiricalrisk minimization.” Journal of Machine Learning Research, 2011.

[4] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, andL. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016ACM SIGSAC conference on computer and communications security, 2016.

[5] L. Song, R. Shokri, and P. Mittal, “Privacy risks of securing machine learningmodels against adversarial examples,” in Proceedings of the 2019 ACM SIGSACConference on Computer and Communications Security, 2019.

[6] R. Cummings and D. Desai, “The role of differential privacy in gdpr compliance,”in FAT’18: Proceedings of the Conference on Fairness, Accountability, andTransparency, 2018.

[7] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparsedatasets,” in 2008 IEEE Symposium on Security and Privacy (sp 2008), 2008.

[8] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of theRoyal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288,1996.

[9] V. Gudivada, A. Apon, and J. Ding, “Data quality considerations for big data andmachine learning: Going beyond data cleaning and transformations,” InternationalJournal on Advances in Software, 2017.

[10] C. Cortes, L. D. Jackel, W.-P. Chiang et al., “Limits on learning machine accuracyimposed by data quality,” in KDD, 1995.

[11] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under conceptdrift: A review,” IEEE Transactions on Knowledge and Data Engineering, 2018.

[12] P. W. Koh, S. Sagawa, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu,M. Yasunaga, R. L. Phillips, I. Gao, T. Lee et al., “Wilds: A benchmark of in-the-wild distribution shifts,” in International Conference on Machine Learning.PMLR, 2021.

[13] L. Munoz-Gonzalez, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C.Lupu, and F. Roli, “Towards poisoning of deep learning algorithms with back-gradient optimization,” in Proceedings of the 10th ACM Workshop on ArtificialIntelligence and Security, 2017.

[14] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, andT. Goldstein, “Poison frogs! targeted clean-label poisoning attacks on neuralnetworks,” arXiv preprint arXiv:1804.00792, 2018.

[15] C. Zhu, W. R. Huang, H. Li, G. Taylor, C. Studer, and T. Goldstein, “Transferableclean-label poisoning attacks on deep neural nets,” in International Conferenceon Machine Learning, 2019.

[16] O. Suciu, R. Marginean, Y. Kaya, H. Daume III, and T. Dumitras, “When doesmachine learning {FAIL}? generalized transferability for evasion and poisoningattacks,” in 27th {USENIX} Security Symposium ({USENIX} Security 18), 2018.

[17] N. Carlini, “Poisoning the Unlabeled Dataset of Semi-Supervised Learning,” arXive-prints, 2021.

[18] Y. Ma, X. Zhang, W. Sun, and X. Zhu, “Policy poisoning in batch reinforcementlearning and control,” arXiv preprint arXiv:1910.05821, 2019.

[19] M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks tobyzantine-robust federated learning,” in 29th {USENIX} Security Symposium({USENIX} Security 20), 2020.

[20] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaningattack on neural networks,” 2017.

[21] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow,and R. Fergus, “Intriguing properties of neural networks,” arXiv preprintarXiv:1312.6199, 2013.

[22] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto,and F. Roli, “Evasion attacks against machine learning at test time,” in JointEuropean conference on machine learning and knowledge discovery in databases,2013.

[23] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarialexamples,” in 3rd International Conference on Learning Representations, ICLR2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2014.

[24] S. Vernekar, A. Gaurav, T. Denouden, B. Phan, V. Abdelzad, R. Salay, andK. Czarnecki, “Analysis of confident-classifiers for out-of-distribution detection,”arXiv preprint arXiv:1904.12220, 2019.

[25] A. Shafahi, W. R. Huang, C. Studer, S. Feizi, and T. Goldstein, “Are adversarialexamples inevitable?” arXiv preprint arXiv:1809.02104, 2018.

[26] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami,“The limitations of deep learning in adversarial settings,” in 2016 IEEE Europeansymposium on security and privacy (EuroS&P), 2016.

[27] Y.-C. Lin, Z.-W. Hong, Y.-H. Liao, M.-L. Shih, M.-Y. Liu, and M. Sun, “Tacticsof adversarial attack on deep reinforcement learning agents,” arXiv preprintarXiv:1703.06748, 2017.

[28] V. Behzadan and A. Munir, “Vulnerability of deep reinforcement learning to policyinduction attacks,” in International Conference on Machine Learning and DataMining in Pattern Recognition, 2017.

[29] J. Kos, I. Fischer, and D. Song, “Adversarial examples for generative models,” in2018 ieee security and privacy workshops (spw), 2018.

[30] N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks onspeech-to-text,” in 2018 IEEE Security and Privacy Workshops (SPW), 2018.

[31] H. Yakura and J. Sakuma, “Robust audio adversarial example for a physicalattack,” arXiv preprint arXiv:1810.11793, 2018.

[32] Y. Qin, N. Carlini, G. Cottrell, I. Goodfellow, and C. Raffel, “Imperceptible,robust, and targeted adversarial examples for automatic speech recognition,” inInternational conference on machine learning, 2019.

[33] J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adversarialexamples for text classification,” arXiv preprint arXiv:1712.06751, 2017.

[34] M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W.Chang, “Generating natural language adversarial examples,” arXiv preprintarXiv:1804.07998, 2018.

[35] S. Garg and G. Ramakrishnan, “Bae: Bert-based adversarial examples for textclassification,” arXiv preprint arXiv:2004.01970, 2020.

[36] Z. Zhao, D. Dua, and S. Singh, “Generating natural adversarial examples,” arXivpreprint arXiv:1710.11342, 2017.

[37] A. Kurakin, I. Goodfellow, S. Bengio et al., “Adversarial examples in the physicalworld,” 2016.

[38] G. Lovisotto, H. Turner, I. Sluganovic, M. Strohmeier, and I. Martinovic, “SLAP:Improving physical adversarial examples with short-lived adversarial perturba-tions,” in 30th USENIX Security Symposium (USENIX Security 21).

[39] T. Dreossi, S. Jha, and S. A. Seshia, “Semantic adversarial deep learning,” inComputer Aided Verification, 2018.

[40] L. Jain, V. Chandrasekaran, U. Jang, W. Wu, A. Lee, A. Yan, S. Chen, S. Jha, andS. A. Seshia, “Analyzing and Improving Neural Networks by Generating SemanticCounterexamples through Differentiable Rendering,” arXiv e-prints, 2019.

[41] H. Qiu, C. Xiao, L. Yang, X. Yan, H. Lee, and B. Li, “Semanticadv: Generatingadversarial examples via attribute-conditioned image editing,” in Computer Vision– ECCV 2020, Cham, 2020.

[42] R. Sheatsley, B. Hoak, E. Pauley, Y. Beugin, M. J. Weisman, and P. McDaniel, “Onthe robustness of domain constraints,” arXiv preprint arXiv:2105.08619, 2021.

[43] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacksagainst machine learning models,” in 2017 IEEE Symposium on Security andPrivacy (SP), 2017.

[44] N. Carlini, C. Liu, U. Erlingsson, J. Kos, and D. Song, “The secret sharer:Evaluating and testing unintended memorization in neural networks,” in 28th{USENIX} Security Symposium ({USENIX} Security 19), 2019.

[45] D. Boyd, “Privacy and publicity in the context of big data,” in Keynote Talk ofThe 19th Int’l Conf. on World Wide Web, vol. 650, 2010.

[46] “Lex access to european union law.”[47] V. Feldman, “Does learning require memorization? a short tale about a long

tail,” in Proceedings of the 52nd Annual ACM SIGACT Symposium on Theoryof Computing, 2020.

[48] V. Feldman and C. Zhang, “What neural networks memorize and why: Discoveringthe long tail via influence estimation,” in Advances in Neural InformationProcessing Systems 33: Annual Conference on Neural Information ProcessingSystems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.

[49] K. Lee, D. Ippolito, A. Nystrom, C. Zhang, D. Eck, C. Callison-Burch, andN. Carlini, “Deduplicating Training Data Makes Language Models Better,” arXive-prints, 2021.

[50] A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes, “Ml-leaks: Model and data independent membership inference attacks and defenseson machine learning models,” arXiv preprint arXiv:1806.01246, 2018.

[51] B. Jayaraman, L. Wang, K. Knipmeyer, Q. Gu, and D. Evans, “Revisiting mem-bership inference under realistic assumptions,” arXiv preprint arXiv:2005.10881,2020.

[52] S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei, “Demystifying membershipinference attacks in machine learning as a service,” IEEE Transactions on ServicesComputing, 2019.

[53] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machinelearning: Analyzing the connection to overfitting,” in 2018 IEEE 31st ComputerSecurity Foundations Symposium (CSF), 2018.

[54] Z. Li and Y. Zhang, “Label-leaks: Membership inference attack with label,” arXive-prints, 2020.

[55] C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label-onlymembership inference attacks,” in International Conference on Machine Learning,2021.

[56] A. Sablayrolles, M. Douze, C. Schmid, Y. Ollivier, and H. Jegou, “White-box vsblack-box: Bayes optimal strategies for membership inference,” in InternationalConference on Machine Learning, 2019.

[57] K. Leino and M. Fredrikson, “Stolen memories: Leveraging model memorizationfor calibrated white-box membership inference,” in 29th {USENIX} SecuritySymposium ({USENIX} Security 20), 2020.

[58] C. Song, T. Ristenpart, and V. Shmatikov, “Machine learning models thatremember too much,” in Proceedings of the 2017 ACM SIGSAC Conference oncomputer and communications security, 2017.

[59] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis ofdeep learning: Passive and active white-box inference attacks against centralizedand federated learning,” in 2019 IEEE Symposium on Security and Privacy, SP2019, San Francisco, CA, USA, May 19-23, 2019, 2019.

[60] L. Melis, C. Song, E. D. Cristofaro, and V. Shmatikov, “Exploiting unintendedfeature leakage in collaborative learning,” in 2019 IEEE Symposium on Securityand Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019, 2019.

[61] J. Jia and N. Z. Gong, “Attriguard: A practical defense against attribute inferenceattacks via adversarial machine learning,” in 27th {USENIX} Security Symposium

14

Page 15: SoK: Machine Learning Governance

({USENIX} Security 18), 2018.[62] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit

confidence information and basic countermeasures,” in Proceedings of the 22ndACM SIGSAC conference on computer and communications security, 2015.

[63] Z. Yang, J. Zhang, E.-C. Chang, and Z. Liang, “Neural network inversion inadversarial setting via background knowledge alignment,” in Proceedings of the2019 ACM SIGSAC Conference on Computer and Communications Security, NewYork, NY, USA, 2019.

[64] Y. Zhang, R. Jia, H. Pei, W. Wang, B. Li, and D. Song, “The secret revealer: Gener-ative model-inversion attacks against deep neural networks,” in 2020 IEEE/CVFConference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle,WA, USA, June 13-19, 2020, 2020.

[65] J. Konecny, B. McMahan, and D. Ramage, “Federated optimization: Distributedoptimization beyond the datacenter,” arXiv preprint arXiv:1511.03575, 2015.

[66] J. Geiping, H. Bauermeister, H. Droge, and M. Moeller, “Inverting gradients–how easy is it to break privacy in federated learning?” arXiv preprintarXiv:2003.14053, 2020.

[67] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the gan: Informationleakage from collaborative deep learning,” in Proceedings of the 2017 ACMSIGSAC Conference on Computer and Communications Security, New York, NY,USA, 2017.

[68] Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi, “Beyond inferringclass representatives: User-level privacy leakage from federated learning,” in 2019IEEE Conference on Computer Communications, INFOCOM 2019, Paris, France,April 29 - May 2, 2019, 2019.

[69] M. Song, Z. Wang, Z. Zhang, Y. Song, Q. Wang, J. Ren, and H. Qi, “Analyzinguser-level privacy attack against federated learning,” IEEE Journal on SelectedAreas in Communications, 2020.

[70] K. Lum and W. Isaac, “To predict and serve?” 2016, eprint:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1740-9713.2016.00960.x.

[71] A. Julia, L. Jeff, , M. Surya, and K. Lauren. Machine bias.[72] E. Bagdasaryan, O. Poursaeed, and V. Shmatikov, “Differential privacy has

disparate impact on model accuracy,” Advances in Neural Information ProcessingSystems, vol. 32, pp. 15 479–15 488, 2019.

[73] H. Suresh and J. V. Guttag, “A framework for understanding sources of harmthroughout the machine learning life cycle,” 2021.

[74] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq, “Algorithmicdecision making and the cost of fairness,” in Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining,New York, NY, USA, 2017.

[75] S. Mitchell, E. Potash, S. Barocas, A. D’Amour, and K. Lum, “Algorithmicfairness: Choices, assumptions, and definitions,” 2021.

[76] H. Lakkaraju, J. Kleinberg, J. Leskovec, J. Ludwig, and S. Mullainathan, “Theselective labels problem: Evaluating algorithmic predictions in the presenceof unobservables,” in Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, New York, NY, USA.

[77] A. Chouldechova and M. G’Sell, “Fairer and more accurate, but for whom?” arXivpreprint arXiv:1707.00046, 2017.

[78] L. Song, R. Shokri, and P. Mittal, “Membership inference attacks against adversar-ially robust deep learning models,” in 2019 IEEE Security and Privacy Workshops(SPW), 2019.

[79] Y. Shoham and K. Leyton-Brown, Multiagent systems: algorithmic, game-theoretic, and logical foundations, 2009, OCLC: 603027890.

[80] D. Fudenberg and J. Tirole, Game theory, 1991.[81] A. C.-C. Yao, “Probabilistic computations: Toward a unified measure of complex-

ity,” in 18th Annual Symposium on Foundations of Computer Science (sfcs 1977),1977.

[82] M. Hardt, N. Megiddo, C. Papadimitriou, and M. Wootters, “Strategic classifica-tion,” in Proceedings of the 2016 ACM Conference on Innovations in TheoreticalComputer Science, 2016.

[83] J. C. Perdomo, T. Zrnic, C. Mendler-Dunner, and M. Hardt, “Performativeprediction,” 2021.

[84] M. Bruckner and T. Scheffer, “Stackelberg games for adversarial predictionproblems,” in Proceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, 2021.

[85] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,”vol. 9, no. 3, pp. 211–407. [Online]. Available: http://www.nowpublishers.com/articles/foundations-and-trends-in-theoretical-computer-science/TCS-042

[86] S. Barocas, M. Hardt, and A. Narayanan, Fairness and Machine Learning, 2018.[87] I. Mironov, “Renyi differential privacy,” 2017.[88] C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou, “The complexity of

computing a nash equilibrium,” SIAM Journal on Computing, vol. 39, no. 1,2009.

[89] M. Hu and M. Fukushima, “Multi-leader-follower games: Models, methods andapplications,” 2015.

[90] J.-S. Pang and M. Fukushima, “Quasi-variational inequalities, generalized Nashequilibria, and multi-leader-follower games,” 2005.

[91] L. Marris, P. Muller, M. Lanctot, K. Tuyls, and T. Graepel, “Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers,”in Proceedings of the 38th International Conference on Machine Learning,ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang,Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 7480–7491. [Online]. Available:https://proceedings.mlr.press/v139/marris21a.html

[92] Z. Li, F. Jia, A. Mate, S. Jabbari, M. Chakraborty, M. Tambe, and Y. Vorobeychik.Solving Structured Hierarchical Games Using Differential Backward Induction.

[93] T. Fiez, B. Chasnov, and L. Ratliff, “Implicit learning dynamics in stackelberggames: Equilibria characterization, convergence analysis, and empirical study,” inProceedings of the 37th International Conference on Machine Learning, 2020.

[94] J. Li, J. Yu, Y. Nie, and Z. Wang, “End-to-End Learning and Intervention inGames,” 2020.

[95] M. H. Manshaei, Q. Zhu, T. Alpcan, T. Bacsar, and J.-P. Hubaux, “Game theorymeets network security and privacy,” ACM Comput. Surv., vol. 45, no. 3, Jul.2013.

[96] M. A. Bishop, “The art and science of computer security,” 2002.[97] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey

on concept drift adaptation,” ACM computing surveys (CSUR), 2014.[98] I. Diakonikolas and D. M. Kane, “Recent advances in algorithmic high-

dimensional robust statistics,” arXiv preprint arXiv:1911.05911, 2019.[99] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector

machines,” arXiv preprint arXiv:1206.6389, 2012.[100] K. Holstein, J. Wortman Vaughan, H. Daume III, M. Dudik, and H. Wallach,

“Improving fairness in machine learning systems: What do industry practitionersneed?” in Proceedings of the 2019 CHI conference on human factors in computingsystems, 2019.

[101] F. Tramer and D. Boneh, “Differentially private learning needs better features (ormuch more data),” arXiv preprint arXiv:2011.11660, 2020.

[102] C. Gentry, A fully homomorphic encryption scheme, 2009.[103] F. Tramer and D. Boneh, “Slalom: Fast, verifiable and private execution of neural

networks in trusted hardware,” arXiv preprint arXiv:1806.03287, 2018.[104] S. Volos, K. Vaswani, and R. Bruno, “Graviton: Trusted execution environments

on gpus,” in 13th {USENIX} Symposium on Operating Systems Design andImplementation ({OSDI} 18), 2018.

[105] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel,D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for federatedlearning on user-held data,” arXiv preprint arXiv:1611.04482, 2016.

[106] S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, P. Mittal, and T. Rabin,“Falcon: Honest-majority maliciously secure framework for private deep learning,”Proceedings on Privacy Enhancing Technologies, 2021.

[107] S. Wagh, D. Gupta, and N. Chandran, “Securenn: 3-party secure computation forneural network training.” Proc. Priv. Enhancing Technol., 2019.

[108] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy-preservingmachine learning,” in 2017 IEEE symposium on security and privacy (SP), 2017.

[109] P. Mohassel and P. Rindal, “Aby3: A mixed protocol framework for machine learning,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer andCommunications Security, 2018.

[110] A. Patra and A. Suresh, “Blaze: blazing fast privacy-preserving machine learning,”arXiv preprint arXiv:2005.09042, 2020.

[111] H. Chaudhari, A. Choudhury, A. Patra, and A. Suresh, “Astra: high throughput3pc over rings with application to secure prediction,” in Proceedings of the 2019ACM SIGSAC Conference on Cloud Computing Security Workshop, 2019.

[112] W. Zheng, R. A. Popa, J. E. Gonzalez, and I. Stoica, “Helen: Maliciously securecoopetitive learning for linear models,” in 2019 IEEE Symposium on Security andPrivacy (SP), 2019.

[113] W. Zheng, R. Deng, W. Chen, R. A. Popa, A. Panda, and I. Stoica, “Cerebro: Aplatform for multi-party cryptographic collaborative learning,” in 30th {USENIX}Security Symposium ({USENIX} Security 21), 2021.

[114] T. Scassa, Data ownership, 2018.[115] “New york mercantile ex. v. intercontinentalexcha.” 2007.[116] “Banxcorp v. costco wholesale corp.” 2013.[117] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan, “Gazelle: A low latency

framework for secure neural network inference,” in Proceedings of the 27thUSENIX Conference on Security Symposium, USA, 2018.

[118] P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R. A. Popa, “Delphi: Acryptographic inference service for neural networks,” in 29th {USENIX} SecuritySymposium ({USENIX} Security 20), 2020.

[119] R. Lehmkuhl, P. Mishra, A. Srinivasan, and R. A. Popa, “Muse: Secure in-ference resilient to malicious clients,” in 30th {USENIX} Security Symposium({USENIX} Security 21), 2021.

[120] J. Alvarez-Valle, P. Bhatu, N. Chandran, D. Gupta, A. Nori, A. Rastogi, M. Rathee,R. Sharma, and S. Ugare, “Secure medical image analysis with cryptflow,” arXivpreprint arXiv:2012.05064, 2020.

[121] G. Kaissis, A. Ziller, J. Passerat-Palmbach, T. Ryffel, D. Usynin, A. Trask,I. Lima, J. Mancuso, F. Jungmann, M.-M. Steinborn et al., “End-to-end privacypreserving deep learning on multi-institutional medical imaging,” Nature MachineIntelligence, 2021.

[122] A. Soin, P. Bhatu, R. Takhar, N. Chandran, D. Gupta, J. Alvarez-Valle, R. Sharma,V. Mahajan, and M. P. Lungren, “Production-level open source privacy preservinginference in medical imaging,” CoRR, 2021.

[123] C. A. Choquette-Choo, N. Dullerud, A. Dziedzic, Y. Zhang, S. Jha, N. Papernot,and X. Wang, “Capc learning: Confidential and private collaborative learning,”arXiv preprint arXiv:2102.05188, 2021.

[124] A. Halevy, P. Norvig, and F. Pereira, “The unreasonable effectiveness of data,”IEEE Intelligent Systems, 2009.

[125] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerationsfor deep learning in nlp,” in Proceedings of the 57th Annual Meeting of theAssociation for Computational Linguistics, 2019.

15

Page 16: SoK: Machine Learning Governance

[126] M. Bishop and C. Gates, “Defining the insider threat,” in Proceedings of the4th Annual Workshop on Cyber Security and Information Intelligence Research:Developing Strategies to Meet the Cyber Security and Information IntelligenceChallenges Ahead, New York, NY, USA, 2008.

[127] D. Lowd and C. Meek, “Adversarial learning,” in Proceedings of the eleventhACM SIGKDD international conference on Knowledge discovery in data mining,2005.

[128] F. Tramer, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machinelearning models via prediction apis,” in 25th {USENIX} Security Symposium({USENIX} Security 16), 2016.

[129] M. Jagielski, N. Carlini, D. Berthelot, A. Kurakin, and N. Papernot, “Highaccuracy and high fidelity extraction of neural networks,” in 29th USENIX SecuritySymposium (USENIX Security 20), 2020.

[130] K. Krishna, G. S. Tomar, A. P. Parikh, N. Papernot, and M. Iyyer,“Thieves on sesame street! model extraction of bert-based apis,” arXiv preprintarXiv:1910.12366, 2019.

[131] V. Chandrasekaran, K. Chaudhuri, I. Giacomelli, S. Jha, and S. Yan, “Exploringconnections between active learning and model extraction,” in 29th USENIXSecurity Symposium (USENIX Security 20), 2020.

[132] N. Carlini, M. Jagielski, and I. Mironov, “Cryptanalytic extraction of neuralnetwork models,” in Annual International Cryptology Conference, 2020.

[133] D. Rolnick and K. Kording, “Reverse-engineering deep relu networks,” in Inter-national Conference on Machine Learning, 2020.

[134] S. Zanella-Beguelin, S. Tople, A. Paverd, and B. Kopf, “Grey-box extraction ofnatural language models,” in Proceedings of the 38th International Conferenceon Machine Learning, 2021.

[135] S. Milli, L. Schmidt, A. D. Dragan, and M. Hardt, “Model reconstruction frommodel explanations,” in Proceedings of the Conference on Fairness, Accountabil-ity, and Transparency, 2019.

[136] L. Batina, S. Bhasin, D. Jap, and S. Picek, “CSI NN: Reverse engineering of neuralnetwork architectures through electromagnetic side channel,” in 28th USENIXSecurity Symposium (USENIX Security 19), 2019.

[137] Y. Zhu, Y. Cheng, H. Zhou, and Y. Lu, “Hermes attack: Steal {DNN} models withlossless inference accuracy,” in 30th {USENIX} Security Symposium ({USENIX}Security 21), 2021.

[138] J.-B. Truong, P. Maini, R. J. Walls, and N. Papernot, “Data-free model extraction,”in Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), 2021.

[139] S. Kariyappa, A. Prakash, and M. K. Qureshi, “Maze: Data-free model stealingattack using zeroth-order gradient estimation,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition, 2021.

[140] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel,“Mixmatch: A holistic approach to semi-supervised learning,” in Advances inNeural Information Processing Systems, 2019.

[141] M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “Prada: protecting against dnnmodel stealing attacks,” in 2019 IEEE European Symposium on Security andPrivacy (EuroS&P), 2019.

[142] S. Kariyappa, A. Prakash, and M. K. Qureshi, “Protecting dnns from theftusing an ensemble of diverse models,” in International Conference on LearningRepresentations, 2020.

[143] I. M. Alabdulmohsin, X. Gao, and X. Zhang, “Adding robustness to support vectormachines against adversarial reverse engineering,” in Proceedings of the 23rdACM International Conference on Conference on Information and KnowledgeManagement, 2014.

[144] S. Kariyappa and M. K. Qureshi, “Defending against model stealing attacks withadaptive misinformation,” in 2020 IEEE/CVF Conference on Computer Vision andPattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, 2020.

[145] H. Zheng, Q. Ye, H. Hu, C. Fang, and J. Shi, “Bdpl: A boundary differentiallyprivate layer against machine learning model extraction attacks,” in ComputerSecurity – ESORICS 2019, Cham, 2019.

[146] T. Orekondy, B. Schiele, and M. Fritz, “Prediction poisoning: Towards defensesagainst DNN model stealing attacks,” in 8th International Conference on LearningRepresentations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.

[147] F. A. Petitcolas, R. J. Anderson, and M. G. Kuhn, “Information hiding-a survey,”Proceedings of the IEEE, 1999.

[148] Y. Adi, C. Baum, M. Cisse, B. Pinkas, and J. Keshet, “Turning your weakness intoa strength: Watermarking deep neural networks by backdooring,” in 27th USENIXSecurity Symposium (USENIX Security 18), 2018.

[149] T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities in themachine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017.

[150] H. Jia, C. A. Choquette-Choo, V. Chandrasekaran, and N. Papernot, “Entangledwatermarks as a defense against model extraction,” in 30th {USENIX} SecuritySymposium ({USENIX} Security 21), 2021.

[151] H. Jia, M. Yaghini, C. A. Choquette-Choo, N. Dullerud, A. Thudi, V. Chan-drasekaran, and N. Papernot, “Proof-of-learning: Definitions and practice,” arXivpreprint arXiv:2103.05633, 2021.

[152] P. Maini, M. Yaghini, and N. Papernot, “Dataset inference: Ownership resolutionin machine learning,” in International Conference on Learning Representations,2021.

[153] B. Zhang, J. P. Zhou, I. Shumailov, and N. Papernot, “On attribution of deepfakes,”arXiv preprint arXiv:2008.09194, 2020.

[154] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Nee-lakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot

learners,” arXiv preprint arXiv:2005.14165, 2020.[155] A. Meinke and M. Hein, “Towards neural networks that provably know when they

don’t know,” arXiv preprint arXiv:1909.12180, 2019.[156] K. Lee, H. Lee, K. Lee, and J. Shin, “Training confidence-calibrated classifiers for

detecting out-of-distribution samples,” arXiv preprint arXiv:1711.09325, 2017.[157] M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why relu networks yield high-

confidence predictions far away from the training data and how to mitigate theproblem,” in Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition, 2019.

[158] D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection withoutlier exposure,” arXiv preprint arXiv:1812.04606, 2018.

[159] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictiveuncertainty estimation using deep ensembles,” arXiv preprint arXiv:1612.01474,2016.

[160] A. Vyas, N. Jammalamadaka, X. Zhu, D. Das, B. Kaul, and T. L. Willke, “Out-of-distribution detection using an ensemble of self supervised leave-out classifiers,”in Proceedings of the European Conference on Computer Vision (ECCV), 2018.

[161] T. DeVries and G. W. Taylor, “Learning confidence for out-of-distributiondetection in neural networks,” arXiv preprint arXiv:1802.04865, 2018.

[162] A. Malinin and M. Gales, “Predictive uncertainty estimation via prior networks,”arXiv preprint arXiv:1802.10501, 2018.

[163] S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distributionimage detection in neural networks,” arXiv preprint arXiv:1706.02690, 2017.

[164] K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” Advances in neural informationprocessing systems, 2018.

[165] J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. A. DePristo, J. V. Dillon, andB. Lakshminarayanan, “Likelihood ratios for out-of-distribution detection,” arXivpreprint arXiv:1906.02845, 2019.

[166] J. Serra, D. Alvarez, V. Gomez, O. Slizovskaia, J. F. Nunez, and J. Luque, “Inputcomplexity and out-of-distribution detection with likelihood-based generativemodels,” arXiv preprint arXiv:1909.11480, 2019.

[167] Y.-C. Hsu, Y. Shen, H. Jin, and Z. Kira, “Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data,” in Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.

[168] J. Raghuram, V. Chandrasekaran, S. Jha, and S. Banerjee, “A general frameworkfor detecting anomalous inputs to dnn classifiers,” in International Conference onMachine Learning, 2021.

[169] C. S. Sastry and S. Oore, “Detecting out-of-distribution examples with in-distribution examples and gram matrices,” arXiv preprint arXiv:1912.12510, 2019.

[170] H. Jiang, B. Kim, M. Y. Guan, and M. Gupta, “To trust or not to trust a classifier,”in Proceedings of the 32nd International Conference on Neural InformationProcessing Systems, Red Hook, NY, USA, 2018.

[171] N. Papernot and P. McDaniel, “Deep k-nearest neighbors: Towards confident,interpretable and robust deep learning,” arXiv preprint arXiv:1803.04765, 2018.

[172] S. Jha, S. Raj, S. Fernandes, S. K. Jha, S. Jha, B. Jalaian, G. Verma, and A. Swami,“Attribution-based confidence metric for deep neural networks,” 2019.

[173] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,”in Brazilian symposium on artificial intelligence, 2004.

[174] J. Gama and G. Castillo, “Learning with local drift detection,” in Internationalconference on advanced data mining and applications, 2006.

[175] M. Baena-Garcıa, J. del Campo-Avila, R. Fidalgo, A. Bifet, R. Gavalda, andR. Morales-Bueno, “Early drift detection method,” in Fourth international work-shop on knowledge discovery from data streams, 2006.

[176] I. Frias-Blanco, J. del Campo-Avila, G. Ramos-Jimenez, R. Morales-Bueno,A. Ortiz-Diaz, and Y. Caballero-Mota, “Online and non-parametric drift detectionmethods based on hoeffding’s bounds,” IEEE Transactions on Knowledge andData Engineering, 2014.

[177] A. Liu, G. Zhang, and J. Lu, “Fuzzy time windowing for gradual concept driftadaptation,” in 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017.

[178] S. Xu and J. Wang, “Dynamic extreme learning machine for data streamclassification,” Neurocomputing, 2017.

[179] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: theory andapplications,” Neurocomputing, 2006.

[180] G. J. Ross, N. M. Adams, D. K. Tasoulis, and D. J. Hand, “Exponentially weightedmoving average charts for detecting concept drift,” Pattern recognition letters,2012.

[181] K. Nishida and K. Yamauchi, “Detecting concept drift using statistical testing,”in International conference on discovery science, 2007.

[182] A. Bifet and R. Gavalda, “Learning from time-changing data with adaptivewindowing,” in Proceedings of the 2007 SIAM international conference on datamining, 2007.

[183] ——, “Adaptive learning from evolving data streams,” in International Symposiumon Intelligent Data Analysis, 2009.

[184] A. Bifet, G. Holmes, B. Pfahringer, and R. Gavalda, “Improving adaptive baggingmethods for evolving data streams,” in Asian conference on machine learning,2009.

[185] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda, “New ensemblemethods for evolving data streams,” in Proceedings of the 15th ACM SIGKDDinternational conference on Knowledge discovery and data mining, 2009.

[186] H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger,G. Holmes, and T. Abdessalem, “Adaptive random forests for evolving data stream

16

Page 17: SoK: Machine Learning Governance

classification,” Machine Learning, 2017.[187] D. Kifer, S. Ben-David, and J. Gehrke, “Detecting change in data streams,” in

VLDB, 2004.[188] T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi, “An information-

theoretic approach to detecting changes in multi-dimensional data streams,” in InProc. Symp. on the Interface of Statistics, Computing Science, and Applications,2006.

[189] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Statistical change detection formulti-dimensional data,” in Proceedings of the 13th ACM SIGKDD internationalconference on Knowledge discovery and data mining, 2007.

[190] A. A. Qahtan, B. Alharbi, S. Wang, and X. Zhang, “A pca-based change detectionframework for multidimensional data streams: Change detection in multidimen-sional data streams,” in Proceedings of the 21th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, 2015.

[191] L. Bu, D. Zhao, and C. Alippi, “An incremental change detection test basedon density difference estimation,” IEEE Transactions on Systems, Man, andCybernetics: Systems, 2017.

[192] A. Liu, Y. Song, G. Zhang, and J. Lu, “Regional concept drift detection anddensity synchronized drift adaptation,” in IJCAI International Joint Conferenceon Artificial Intelligence, 2017.

[193] C. Alippi, G. Boracchi, and M. Roveri, “Just-in-time ensemble of classifiers,” inThe 2012 international joint conference on neural networks (IJCNN), 2012.

[194] H. Wang and Z. Abraham, “Concept drift detection for streaming data,” in 2015international joint conference on neural networks (IJCNN), 2015.

[195] Y. Zhang, G. Chu, P. Li, X. Hu, and X. Wu, “Three-layer concept drifting detectionin text data streams,” Neurocomputing, 2017.

[196] L. Du, Q. Song, L. Zhu, and X. Zhu, “A selective detector ensemble for conceptdrift detection,” The Computer Journal, 2015.

[197] B. I. F. Maciel, S. G. T. C. Santos, and R. S. M. Barros, “A lightweight conceptdrift detection ensemble,” in 2015 IEEE 27th International Conference on Toolswith Artificial Intelligence (ICTAI), 2015.

[198] C. Alippi, G. Boracchi, and M. Roveri, “Hierarchical change-detection tests,”IEEE transactions on neural networks and learning systems, 2016.

[199] H. Raza, G. Prasad, and Y. Li, “Ewma model based shift-detection methods fordetecting covariate shifts in non-stationary environments,” Pattern Recognition,2015.

[200] Y. Liu, Y. Xie, and A. Srivastava, “Neural trojans,” in 2017 IEEE InternationalConference on Computer Design, ICCD 2017, Boston, MA, USA, November 5-8,2017, 2017.

[201] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deeplearning systems using data poisoning,” ArXiv, 2017.

[202] B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” in Pro-ceedings of the 32nd International Conference on Neural Information ProcessingSystems, Red Hook, NY, USA, 2018.

[203] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. M.Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networksby activation clustering,” in Workshop on Artificial Intelligence Safety 2019co-located with the Thirty-Third AAAI Conference on Artificial Intelligence2019 (AAAI-19), Honolulu, Hawaii, January 27, 2019, ser. CEUR WorkshopProceedings, 2019.

[204] Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang, “Abs: Scanning neuralnetworks for back-doors by artificial brain stimulation,” in Proceedings of the 2019ACM SIGSAC Conference on Computer and Communications Security, 2019.

[205] S. Hong, V. Chandrasekaran, Y. Kaya, T. Dumitras, and N. Papernot, “On theeffectiveness of mitigating data poisoning attacks with gradient shaping,” CoRR,2020.

[206] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neuralcleanse: Identifying and mitigating backdoor attacks in neural networks,” in 2019IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA,May 19-23, 2019, 2019.

[207] H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Deepinspect: A black-box trojandetection and mitigation framework for deep neural networks,” in Proceedings ofthe 28th International Joint Conference on Artificial Intelligence, 2019.

[208] J. Lu, T. Issaranon, and D. Forsyth, “Safetynet: Detecting and rejecting adversarialexamples robustly,” in Proceedings of the IEEE International Conference onComputer Vision (ICCV), 2017.

[209] E. Wong and Z. Kolter, “Provable defenses against adversarial examples viathe convex outer adversarial polytope,” in Proceedings of the 35th InternationalConference on Machine Learning, 2018.

[210] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting adversarial examples indeep neural networks,” in 25th Annual Network and Distributed System SecuritySymposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018,2018.

[211] K. Roth, Y. Kilcher, and T. Hofmann, “The odds are odd: A statistical testfor detecting adversarial examples,” in Proceedings of the 36th InternationalConference on Machine Learning, 2019.

[212] D. Hendrycks and K. Gimpel, “Early methods for detecting adversarial images,”in 5th International Conference on Learning Representations, ICLR 2017, Toulon,France, April 24-26, 2017, Workshop Track Proceedings, 2017.

[213] A. N. Bhagoji, D. Cullina, and P. Mittal, “Dimensionality reduction as a de-fense against evasion attacks on machine learning classifiers,” arXiv preprintarXiv:1704.02654, 2017.

[214] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel, “On the

(statistical) detection of adversarial examples,” arXiv preprint arXiv:1702.06280,2017.

[215] F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks toadversarial example defenses,” arXiv preprint arXiv:2002.08347, 2020.

[216] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture forgenerative adversarial networks,” 2019.

[217] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generativeadversarial networks,” in Proceedings of the 36th International Conference onMachine Learning, 2019.

[218] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis withspatially-adaptive normalization,” in Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition (CVPR), 2019.

[219] N. C. Abay, Y. Zhou, M. Kantarcioglu, B. Thuraisingham, and L. Sweeney,“Privacy preserving synthetic data release using deep learning,” in MachineLearning and Knowledge Discovery in Databases, M. Berlingerio, F. Bonchi,T. Gartner, N. Hurley, and G. Ifrim, Eds., 2019.

[220] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in NeuralInformation Processing Systems, 2014.

[221] F. Carmichael, “How a fake network pushes pro-china propaganda,” 2021.[222] S. McCloskey and M. Albright, “Detecting gan-generated imagery using color

cues,” 2018.[223] H. Li, B. Li, S. Tan, and J. Huang, “Identification of deep network generated

images using disparities in color components,” Signal Processing, 2020.[224] X. Zhang, S. Karaman, and S.-F. Chang, “Detecting and simulating artifacts in

gan fake images,” 2019.[225] F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi, “Do gans leave artificial

fingerprints?” 2018.[226] F. AI, “Deepfake detection challenge results: An open initiative to advance ai,”

2020.[227] R. Abdal, Y. Qin, and P. Wonka, “Image2stylegan: How to embed images into the

stylegan latent space?” in Proceedings of the IEEE/CVF International Conferenceon Computer Vision (ICCV), 2019.

[228] L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers,B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,” arXiv preprintarXiv:1912.03817, 2019.

[229] M. Jagielski, J. Ullman, and A. Oprea, “Auditing differentially private machinelearning: How private is private sgd?” arXiv preprint arXiv:2006.07709, 2020.

[230] M. Nasr, S. Song, A. Thakurta, N. Papernot, and N. Carlini, “Adversary instan-tiation: Lower bounds for differentially private machine learning,” arXiv preprintarXiv:2101.04535, 2021.

[231] W. House, “Big data: seizing opportunities, preserving values(report for the president),” Washington DC, USA: ExecutiveOffice of the President.[WWW document] http://www. whitehouse.gov/sites/default/files/docs/big data privacy report may 1 2014. pdf, 2014.

[232] P. T. Kim, “Auditing algorithms for discrimination essay,” vol. 166, p. 189, 2017.[233] P. Saleiro, B. Kuester, A. Stevens, A. Anisfeld, L. Hinkson, J. London,

and R. Ghani, “Aequitas: A bias and fairness audit toolkit,” arXiv preprintarXiv:1811.05577, 2018.

[234] K. Fukuchi, S. Hara, and T. Maehara, “Faking fairness via stealthily biasedsampling,” 2019.

[235] U. Aıvodji, H. Arai, O. Fortineau, S. Gambs, S. Hara, and A. Tapp, “Fairwashing:the risk of rationalization,” in International Conference on Machine Learning.PMLR, 2019, pp. 161–170.

[236] A. Bates, K. R. B. Butler, and T. Moyer, “Trustworthy whole-system provenancefor the linux kernel,” 2015.

[237] A. Travers, L. Licollari, G. Wang, V. Chandrasekaran, A. Dziedzic, D. Lie,and N. Papernot, “On the exploitability of audio machine learning pipelines tosurreptitious adversarial examples,” 2021.

[238] “whylogs: A data and machine learning logging standard,” Standard, 2021,original-date: 2020-08-14T23:25:32Z.

[239] A. Raghunathan, J. Steinhardt, and P. Liang, “Certified defenses against adversarialexamples,” arXiv preprint arXiv:1801.09344, 2018.

[240] J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness viarandomized smoothing,” in Proceedings of the 36th International Conference onMachine Learning, 2019. [Online]. Available: http://proceedings.mlr.press/v97/cohen19c.html

[241] D. M. Sommer, L. Song, S. Wagh, and P. Mittal, “Towards probabilistic verifica-tion of machine unlearning,” arXiv preprint arXiv:2003.04247, 2020.

[242] G. Singh, T. Gehr, M. Puschel, and M. Vechev, “An abstract domain for certifyingneural networks,” Proceedings of the ACM on Programming Languages, vol. 3,no. POPL, pp. 1–30, 2019.

[243] G. Singh, T. Gehr, M. Mirman, M. Puschel, and M. T. Vechev, “Fast and effectiverobustness certification.” Advances in Neural Information Processing Systems 31(NeurIPS 2018), 2018.

[244] A. Wang, A. Narayanan, and O. Russakovsky, “Revise: A tool for measuring andmitigating bias in visual datasets,” in European Conference on Computer Vision,2020.

[245] B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases withadversarial learning,” in Proceedings of the 2018 AAAI/ACM Conference on AI,Ethics, and Society, 2018.

[246] J. A. Buolamwini, “Gender shades: intersectional phenotypic and demographicevaluation of face datasets and gender classifiers,” Ph.D. dissertation, Mas-

17

Page 18: SoK: Machine Learning Governance

sachusetts Institute of Technology, 2017.[247] S. Selwood, “The politics of data collection: Gathering, analysing and using data

about the subsidised cultural sector in england,” Cultural trends, 2002.[248] T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H. Daume III,

and K. Crawford, “Datasheets for datasets,” arXiv preprint arXiv:1803.09010,2018.

[249] S. Holland, A. Hosny, S. Newman, J. Joseph, and K. Chmielinski, “The datasetnutrition label: A framework to drive higher data quality standards,” arXiv preprintarXiv:1805.03677, 2018.

[250] M. Miceli, T. Yang, L. Naudts, M. Schuessler, D. Serbanescu, and A. Hanna,“Documenting computer vision datasets: An invitation to reflexive data practices,”in Proceedings of the 2021 ACM Conference on Fairness, Accountability, andTransparency, 2021.

[251] M. R. Costa-jussa, R. Creus, O. Domingo, A. Domınguez, M. Escobar, C. Lopez,M. Garcia, and M. Geleta, “Mt-adapted datasheets for datasets: Template andrepository,” arXiv preprint arXiv:2005.13156, 2020.

[252] M. Hanley, A. Khandelwal, H. Averbuch-Elor, N. Snavely, and H. Nissenbaum,“An ethical highlighter for people-centric dataset creation,” arXiv preprintarXiv:2011.13583, 2020.

[253] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deeplearning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083,2017.

[254] A. Shafahi, M. Najibi, A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis,G. Taylor, and T. Goldstein, “Adversarial training for free!” arXiv preprintarXiv:1904.12843, 2019.

[255] C. Xie and A. Yuille, “Intriguing properties of adversarial training at scale,” arXivpreprint arXiv:1906.03787, 2019.

[256] C. Xie, M. Tan, B. Gong, A. Yuille, and Q. V. Le, “Smooth adversarial training,”arXiv preprint arXiv:2006.14536, 2020.

[257] F. Tramer and D. Boneh, “Adversarial training and robustness for multipleperturbations,” arXiv preprint arXiv:1904.13000, 2019.

[258] M. Andriushchenko and N. Flammarion, “Understanding and improving fastadversarial training,” arXiv preprint arXiv:2007.02617, 2020.

[259] A. Shafahi, M. Najibi, Z. Xu, J. Dickerson, L. S. Davis, and T. Goldstein,“Universal adversarial training,” in Proceedings of the AAAI Conference onArtificial Intelligence, 2020.

[260] Y. Wang, X. Ma, J. Bailey, J. Yi, B. Zhou, and Q. Gu, “On the convergence androbustness of adversarial training.” in ICML, 2019.

[261] E. Wong, L. Rice, and J. Z. Kolter, “Fast is better than free: Revisiting adversarialtraining,” arXiv preprint arXiv:2001.03994, 2020.

[262] H. Salman, J. Li, I. P. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and G. Yang,“Provably robust deep learning via adversarially trained smoothed classifiers,”in Advances in Neural Information Processing Systems 32: Annual Conferenceon Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14,2019, Vancouver, BC, Canada, 2019.

[263] J. Steinhardt, P. W. Koh, and P. Liang, “Certified defenses for data poisoning at-tacks,” in Proceedings of the 31st International Conference on Neural InformationProcessing Systems, Red Hook, NY, USA, 2017.

[264] Y. Ma, X. Zhu, and J. Hsu, “Data poisoning against differentially-private learners:Attacks and defenses,” in IJCAI, 2019.

[265] M. Weber, X. Xu, B. Karlas, C. Zhang, and B. Li, “Rab: Provable robustnessagainst backdoor attacks,” ArXiv, 2020.

[266] A. Levine and S. Feizi, “Deep partition aggregation: Provable defenses againstgeneral poisoning attacks,” in 9th International Conference on Learning Repre-sentations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.

[267] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and U. Erlingsson,“Scalable private learning with pate,” arXiv preprint arXiv:1802.08908, 2018.

[268] I. Shumailov, Z. Shumaylov, D. Kazhdan, Y. Zhao, N. Papernot, M. A. Erdogdu,and R. Anderson, “Manipulating sgd with data ordering attacks,” arXiv preprintarXiv:2104.09667, 2021.

[269] U. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, S. Song, K. Talwar,and A. Thakurta, “Encode, shuffle, analyze privacy revisited: Formalizations andempirical evaluation,” arXiv preprint arXiv:2001.03618, 2020.

[270] P. Kairouz, B. McMahan, S. Song, O. Thakkar, A. Thakurta, and Z. Xu,“Practical and private (deep) learning without sampling or shuffling,” arXivpreprint arXiv:2103.00039, 2021.

[271] A. Ruoss, M. Balunovic, M. Fischer, and M. Vechev, “Learning certified individ-ually fair representations,” in Advances in Neural Information Processing Systems33, 2020.

[272] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubra-manian, “Certifying and removing disparate impact,” in proceedings of the 21thACM SIGKDD international conference on knowledge discovery and data mining,2015.

[273] F. P. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney,“Optimized pre-processing for discrimination prevention,” in Proceedings of the31st International Conference on Neural Information Processing Systems, 2017.

[274] F. Kamiran and T. Calders, “Data preprocessing techniques for classificationwithout discrimination,” Knowledge and Information Systems, 2012.

[275] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork, “Learning fair represen-tations,” in International conference on machine learning, 2013.

[276] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma, “Fairness-aware classifierwith prejudice remover regularizer,” in Joint European Conference on MachineLearning and Knowledge Discovery in Databases, 2012.

[277] L. E. Celis, L. Huang, V. Keswani, and N. K. Vishnoi, “Classification with fairnessconstraints: A meta-algorithm with provable guarantees,” in Proceedings of theconference on fairness, accountability, and transparency, 2019.

[278] A. Ruoss, M. Balunovic, M. Fischer, and M. Vechev, “Learning certified individ-ually fair representations,” arXiv preprint arXiv:2002.10312, 2020.

[279] M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervisedlearning,” Advances in neural information processing systems, 2016.

[280] G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger, “On fairnessand calibration,” arXiv preprint arXiv:1709.02012, 2017.

[281] F. Kamiran, A. Karim, and X. Zhang, “Decision theory for discrimination-awareclassification,” in 2012 IEEE 12th International Conference on Data Mining, 2012.

[282] M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson,E. Spitzer, I. D. Raji, and T. Gebru, “Model cards for model reporting,” inProceedings of the conference on fairness, accountability, and transparency, 2019.

[283] K. Yang, J. Stoyanovich, A. Asudeh, B. Howe, H. Jagadish, and G. Miklau, “Anutritional label for rankings,” in Proceedings of the 2018 international conferenceon management of data, 2018.

[284] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness throughawareness,” in Proceedings of the 3rd Innovations in Theoretical ComputerScience Conference, New York, NY, USA, 2012.

[285] G. Yona and G. Rothblum, “Probably approximately metric-fair learning,” inProceedings of the 35th International Conference on Machine Learning, 2018.[Online]. Available: http://proceedings.mlr.press/v80/yona18a.html

[286] C. Barabas, C. Doyle, J. Rubinovitz, and K. Dinakar, “Studying up: Reorientingthe study of algorithmic fairness around issues of power,” in Proceedings of the2020 Conference on Fairness, Accountability, and Transparency, 2020.

[287] M. Walfish and A. J. Blumberg, “Verifying computations without reexecutingthem,” Commun. ACM, 2015.

[288] J. Thaler, “Time-optimal interactive proofs for circuit evaluation,” in Advances inCryptology – CRYPTO 2013, Berlin, Heidelberg, 2013.

[289] Z. Ghodsi, T. Gu, and S. Garg, “Safetynets: Verifiable execution of deep neuralnetworks on an untrusted cloud,” Advances in Neural Information Processing Sys-tems, 2017, 31st Annual Conference on Neural Information Processing Systems,NIPS 2017 ; Conference date: 04-12-2017 Through 09-12-2017.

[290] N. Dowlin, R. Gilad-Bachrach, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing,“Cryptonets: Applying neural networks to encrypted data with high throughputand accuracy,” in Proceedings of the 33rd International Conference on MachineLearning, 2016.

[291] J. Liu, M. Juuti, Y. Lu, and N. Asokan, “Oblivious neural network predictions viaminionn transformations,” in Proceedings of the 2017 ACM SIGSAC Conferenceon Computer and Communications Security, 2017.

[292] B. Reagen, W.-S. Choi, Y. Ko, V. T. Lee, H.-H. S. Lee, G.-Y. Wei, andD. Brooks, “Cheetah: Optimizing and accelerating homomorphic encryption forprivate inference,” in 2021 IEEE International Symposium on High-PerformanceComputer Architecture (HPCA), 2021.

[293] T. Bertram, E. Bursztein, S. Caro, H. Chao, R. C. Feman et al., “Five years ofthe right to be forgotten,” in Proceedings of the Conference on Computer andCommunications Security, 2019.

[294] A. Mantelero, “The EU Proposal for a General Data Protection Regulation andthe roots of the ‘right to be forgotten’,” Computer Law & Security Review, 2013.

[295] Y. Cao and J. Yang, “Towards making systems forget with machine unlearning,”in 2015 IEEE Symposium on Security and Privacy, 2015.

[296] A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh, “Remember what you wantto forget: Algorithms for machine unlearning,” arXiv preprint arXiv:2103.03279,2021.

[297] C. Guo, T. Goldstein, A. Hannun, and L. Van Der Maaten, “Certified data removalfrom machine learning models,” arXiv preprint arXiv:1911.03030, 2019.

[298] L. Graves, V. Nagisetty, and V. Ganesh, “Amnesiac machine learning,” arXivpreprint arXiv:2010.10981, 2020.

[299] A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Se-lective forgetting in deep networks,” in Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition, 2020.

[300] T. Baumhauer, P. Schottle, and M. Zeppelzauer, “Machine unlearning: Linearfiltration for logit-based classifiers,” arXiv preprint arXiv:2002.02730, 2020.

[301] S. Zanella-Beguelin, L. Wutschitz, S. Tople, V. Ruhle, A. Paverd, O. Ohrimenko,B. Kopf, and M. Brockschmidt, “Analyzing information leakage of updates tonatural language models,” in Proceedings of the 2020 ACM SIGSAC Conferenceon Computer and Communications Security, 2020.

[302] V. Gupta, C. Jung, S. Neel, A. Roth, S. Sharifi-Malvajerdi, and C. Waites,“Adaptive machine unlearning,” arXiv preprint arXiv:2106.04378, 2021.

[303] M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y. Zhang, “Whenmachine unlearning jeopardizes privacy,” arXiv preprint arXiv:2005.02205, 2020.

[304] O. Fontenla-Romero, B. Guijarro-Berdinas, D. Martinez-Rego, B. Perez-Sanchez,and D. Peteiro-Barral, “Online machine learning,” in Efficiency and ScalabilityMethods for Computational Intellect. IGI Global, 2013.

[305] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transferlearning,” in International conference on artificial neural networks. Springer,2018.

[306] R. M. French, “Catastrophic forgetting in connectionist networks,” Trends inCognitive Sciences, vol. 3, 1999.

[307] K. Peng, A. Mathur, and A. Narayanan, “Mitigating dataset harms requiresstewardship: Lessons from 1000 papers,” arXiv e-prints, 2021.

[308] E. Dohmatob, “Generalized no free lunch theorem for adversarial robustness,” in

18

Page 19: SoK: Machine Learning Governance

International Conference on Machine Learning. PMLR, 2019.[309] H. Chang and R. Shokri, “On the privacy risks of algorithmic fairness,” arXiv

preprint arXiv:2011.03731, 2020.[310] M. T. Ribeiro, S. Singh, and C. Guestrin, “”why should i trust you?”: Explaining

the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, New York,NY, USA, 2016.

[311] P. Kindermans, K. T. Schutt, M. Alber, K. Muller, D. Erhan, B. Kim, and S. Dahne,“Learning how to explain neural networks: Patternnet and patternattribution,” in6th International Conference on Learning Representations, ICLR 2018, Vancouver,BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.

[312] A. Kapishnikov, S. Venugopalan, B. Avci, B. Wedin, M. Terry, and T. Bolukbasi,“Guided integrated gradients: An adaptive path method for removing noise,”in Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), 2021.

[313] B. Kim, M. Wattenberg, J. Gilmer, C. J. Cai, J. Wexler, F. B. Viegas, andR. Sayres, “Interpretability beyond feature attribution: Quantitative testing withconcept activation vectors (TCAV),” in ICML 2018, 2018.

[314] B. Zhou, Y. Sun, D. Bau, and A. Torralba, “Interpretable basis decomposition forvisual explanation,” in Computer Vision – ECCV 2018, 2018.

[315] C. Yeh, B. Kim, S. O. Arik, C. Li, T. Pfister, and P. Ravikumar, “On completeness-aware concept-based explanations in deep neural networks,” in Advances in NeuralInformation Processing Systems 33: Annual Conference on Neural InformationProcessing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.

[316] G. Kaptchuk, R. Cummings, and E. M. Redmiles, ““i need a better description”:An investigation into user expectations for differential privacy,” in 2021 Workshopon Theory and Practice of Differential Privacy, 2021.

[317] “In re apple, inc.” p. 341, 2016.

19