sok: oracles from the ground truth to market manipulation

15
SoK: Oracles from the Ground Truth to Market Manipulation Shayan Eskandari Concordia University Montreal, QC, Canada ConsenSys Diligence Brooklyn, NY, USA Mehdi Salehi Concordia University Montreal, QC, Canada Wanyun Catherine Gu Stanford University Stanford, CA, USA Jeremy Clark Concordia University Montreal, QC, Canada [email protected] ABSTRACT One fundamental limitation of blockchain-based smart contracts is that they execute in a closed environment. Thus, they only have access to data and functionality that is already on the blockchain, or is fed into the blockchain. Any interactions with the real world need to be mediated by a bridge service, which is called an oracle. As de- centralized applications mature, oracles are playing an increasingly prominent role. With their evolution comes more attacks, necessi- tating greater attention to their trust model. In this systemization of knowledge paper (SoK), we dissect the design alternatives for oracles, showcase attacks, and discuss attack mitigation strategies. ACM Reference Format: Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark. 2021. SoK: Oracles from the Ground Truth to Market Manipulation. In 3rd ACM Conference on Advances in Financial Technologies (AFT ’21), September 26–28, 2021, Arlington, VA, USA. ACM, New York, NY, USA, 15 pages. https: //doi.org/10.1145/3479722.3480994 1 INTRODUCTION With billions of dollars at stake, decentralized networks are prone to attacks. It is essential that the smart contracts, which govern how systems are run on these networks, are executed correctly. Public blockchains, like Ethereum, ensure the correct execution of smart contract code by taking the consensus of a large, open network of nodes operating the Ethereum software. For consensus to form, many nodes need to make decisions based on the exact same input data. Hypothetically, if a decision requires nodes to fetch data or use a service provider outside of the blockchain, there can be no guarantee that every node in a global network has the same access and view of this external source. For this reason, blockchains only execute on internal sources: data and code provided in a current transaction, or past data and code already stored on the blockchain. S. Eskandari and M. Salehi are equal first authors. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. AFT ’21, September 26–28, 2021, Arlington, VA, USA © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-9082-8/21/09. . . $15.00 https://doi.org/10.1145/3479722.3480994 Many potential decentralized applications seem very natural until the designer hits the ‘oracle problem’ and realizes an inter- face to the external world is required. An oracle is a solution to this problem. It is a service that feeds off-chain data into on-chain storage. The trust model of oracles vary—some data comes with cryptographic certification while other data is assumed to be true based on trusting the oracle, or a set of oracles. Oracle-supplied data cannot easily be changed or removed once finalized on-chain, allowing disputes over data accuracy to be based on a public record. Leveraging this immutability is one approach to incentivizing ora- cles to post truthful information. We aim to construct in this paper a systematization of knowledge (SoK) of implementation choices for oracles, facilitated by breaking down the operation of an oracle into a set of modules. For each module, we explore potential system vulnerabilities and discuss attack vectors. We also aim to categorize all the significant oracle proposals of different projects within a taxonomy we propose. The goal of this SoK is to help the reader better understand the system design for oracles across different use cases and implementations. 2 PRELIMINARIES Ethereum [111] is a prominent public blockchain with the largest de- veloper headcount. While oracles are applicable to any blockchain, we will adopt Ethereum as a concrete example of a blockchain for the purposes of explaining each concept in this paper. Ethereum is inspired by Bitcoin but adds a verbose language for programming smart contracts that execute on the Ethereum Virtual Machine (EVM). All transactions and executions are verified by a decentral- ized network of nodes. Solidity is the main high-level programming language used by developers for developing smart contracts and decentralized applications (DApps). Smart contracts are small code bases that live on a blockchain. In short, smart contracts can be seen as blackbox applications that get inputs from a user and follow the code flow to the output, which can update the state of the contract and trigger monetary transactions. The Oracle Problem. Smart contracts cannot access external re- sources (e.g., a website or an online database) to fetch data that resides outside of the blockchain (e.g., a price quote of an asset). External data needs to be relayed to smart contracts with an oracle. An oracle is a bridge or gateway that connects the off-chain real world knowledge and the on-chain blockchain network. The ‘or- acle problem’ [22] describes the limitation with which the types arXiv:2106.00667v2 [cs.CR] 2 Sep 2021

Upload: others

Post on 05-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market ManipulationShayan Eskandari∗Concordia UniversityMontreal, QC, CanadaConsenSys DiligenceBrooklyn, NY, USA

Mehdi Salehi∗Concordia UniversityMontreal, QC, Canada

Wanyun Catherine GuStanford UniversityStanford, CA, USA

Jeremy ClarkConcordia UniversityMontreal, QC, [email protected]

ABSTRACTOne fundamental limitation of blockchain-based smart contracts isthat they execute in a closed environment. Thus, they only haveaccess to data and functionality that is already on the blockchain, oris fed into the blockchain. Any interactions with the real world needto be mediated by a bridge service, which is called an oracle. As de-centralized applications mature, oracles are playing an increasinglyprominent role. With their evolution comes more attacks, necessi-tating greater attention to their trust model. In this systemizationof knowledge paper (SoK), we dissect the design alternatives fororacles, showcase attacks, and discuss attack mitigation strategies.ACM Reference Format:Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark.2021. SoK: Oracles from the Ground Truth to Market Manipulation. In 3rdACM Conference on Advances in Financial Technologies (AFT ’21), September26–28, 2021, Arlington, VA, USA. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3479722.3480994

1 INTRODUCTIONWith billions of dollars at stake, decentralized networks are proneto attacks. It is essential that the smart contracts, which govern howsystems are run on these networks, are executed correctly. Publicblockchains, like Ethereum, ensure the correct execution of smartcontract code by taking the consensus of a large, open networkof nodes operating the Ethereum software. For consensus to form,many nodes need to make decisions based on the exact same inputdata. Hypothetically, if a decision requires nodes to fetch data oruse a service provider outside of the blockchain, there can be noguarantee that every node in a global network has the same accessand view of this external source. For this reason, blockchains onlyexecute on internal sources: data and code provided in a currenttransaction, or past data and code already stored on the blockchain.∗S. Eskandari and M. Salehi are equal first authors.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’21, September 26–28, 2021, Arlington, VA, USA© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-9082-8/21/09. . . $15.00https://doi.org/10.1145/3479722.3480994

Many potential decentralized applications seem very naturaluntil the designer hits the ‘oracle problem’ and realizes an inter-face to the external world is required. An oracle is a solution tothis problem. It is a service that feeds off-chain data into on-chainstorage. The trust model of oracles vary—some data comes withcryptographic certification while other data is assumed to be truebased on trusting the oracle, or a set of oracles. Oracle-supplieddata cannot easily be changed or removed once finalized on-chain,allowing disputes over data accuracy to be based on a public record.Leveraging this immutability is one approach to incentivizing ora-cles to post truthful information.

We aim to construct in this paper a systematization of knowledge(SoK) of implementation choices for oracles, facilitated by breakingdown the operation of an oracle into a set of modules. For eachmodule, we explore potential system vulnerabilities and discussattack vectors. We also aim to categorize all the significant oracleproposals of different projects within a taxonomy we propose. Thegoal of this SoK is to help the reader better understand the systemdesign for oracles across different use cases and implementations.

2 PRELIMINARIESEthereum [111] is a prominent public blockchain with the largest de-veloper headcount. While oracles are applicable to any blockchain,we will adopt Ethereum as a concrete example of a blockchain forthe purposes of explaining each concept in this paper. Ethereum isinspired by Bitcoin but adds a verbose language for programmingsmart contracts that execute on the Ethereum Virtual Machine(EVM). All transactions and executions are verified by a decentral-ized network of nodes. Solidity is the main high-level programminglanguage used by developers for developing smart contracts anddecentralized applications (DApps). Smart contracts are small codebases that live on a blockchain. In short, smart contracts can be seenas blackbox applications that get inputs from a user and follow thecode flow to the output, which can update the state of the contractand trigger monetary transactions.

The Oracle Problem. Smart contracts cannot access external re-sources (e.g., a website or an online database) to fetch data thatresides outside of the blockchain (e.g., a price quote of an asset).External data needs to be relayed to smart contracts with an oracle.An oracle is a bridge or gateway that connects the off-chain realworld knowledge and the on-chain blockchain network. The ‘or-acle problem’ [22] describes the limitation with which the types

arX

iv:2

106.

0066

7v2

[cs

.CR

] 2

Sep

202

1

Page 2: SoK: Oracles from the Ground Truth to Market Manipulation

AFT ’21, September 26–28, 2021, Arlington, VA, USA Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark

of applications that can execute solely within a fully decentralized,adversarial environment like Ethereum. Generally speaking, a pub-lic blockchain environment is chosen to avoid dependencies ona single (or a small set) of trusted parties. One of the first oracleimplementations used a smart contract in the form of a database(i.e., mapping1) and was updated by a trusted entity known as theowner. More modern oracle updating methods use consensus pro-tocol with multiple data feeds or polling techniques based on the‘wisdom of the crowd’. The data reported by an oracle will alwaysintroduce a time lag from the data source and more complex pollingmethods generally imply longer latency.

Trusted Third Parties. A natural question for smart contract de-velopers to ask is: if you trust the oracle, why not just have itcompute everything? There are a few answers to this question: (1)there may be benefits to minimizing the trust (i.e., to just providingdata instead of full execution), (2) there are widely trusted organi-zations and institutes—convincing one to operate an oracle serviceis a much lower technical ask than convincing one to operate acomplete platform, and (3) if a data source becomes untrustworthy,it may require less effort to switch oracles than to redeploy thesystem.

Methodology. We found papers and other resources by exam-ining the proceedings of top ranked security, cryptography, andblockchain venues; attending blockchain-focused community events;and leveraging our expertise and experience. Our inputs includeacademic papers, industry whitepapers, blog and social mediaposts, and talks at industry conferences on blockchain technology,Ethereum, and decentralized finance (DeFi).

Oracle Use-Cases. Oracles have been proposed for a wide varietyof applications. Based on our reading, most of the use-cases fallinto one of the main categories below.

• Stablecoins [26, 54, 74, 77, 86] and synthetic assets [93]require the exchange rate between the asset they are price-targeting and the price of an on-chain source of collateral.

• Derivatives [11, 44, 98] and prediction markets [25, 87]require external prices or event outcomes to settle on-chaincontracts.

• Provenance systems [92, 102] require tracking informa-tion of real world assets like gold, diamonds, mechanicalparts, and shipments.

• Identity [63, 75] and other on-chain reputation systems re-quire knowledge of governmental records to establish iden-tities.

• Randomness [21] can only be produced deterministicallyon a blockchain. In order to use any non-deterministic ran-dom number, an external oracle is needed to feed the random-ness into the smart contract. Lotteries [88] and games [47]are examples. Additionally, cryptographic tools like verifi-able random functions (VRF) [52, 76] and verifiable delayfunctions (VDFs) [14, 18] can mitigate, respectively, any pre-dictability or manipulability in the randomness.

• Decentralized exchanges can use prices from an exter-nal oracles to set parameters. On-chain market makers [60]

1A Solidity mapping is simply a key-value database stored on a smart contract.

uses such prices to minimize the deviation from the exter-nal market prices and tailor the pricing function. Addition-ally, some use oracles to provide sufficient liquidity nearthe mid-market price for more efficient automated marketmaking [39, 91, 113].

• Dynamic non-fungible tokens (NFTs) [13] are crypto-collectables that can be minted, burned, or updated basedon external data. For example, sports trading cards whichdepends on the real-time performance of a player.

3 RELATEDWORKGiven this paper is a systemization of knowledge (SoK), we willreview work on oracles themselves throughout the paper. In thissection, we only discuss other works with a similar goal of provid-ing an overview of different approaches to oracle design, operation,and security. Al-Breiki et al. [4] present a trust-based categorizationof oracle systems, as well as the type of interaction that the on-chain component of the oracle has with the off-chain components.Liu and Szalachowski [70] focus on oracles in the decentralizedfinance (DeFi) ecosystem, presenting technical architectures and ameasurement study on deviations between external market pricesand on-chain data from commonly used price oracles. Lo et al. [71]propose a framework for assessing the reliability of oracles andranked them based on the failure probability rate. Angeris and Chi-tra [6] analyze the logic behind Automated Market Maker (AMM)projects (e.g., Uniswap [1] and Balancer [8]) and discuss how theseprojects could be used as price feeder oracles for other systems.Williams and Peterson [109] map oracle systems into two groups—requesters and reporters— and perform a game theoretical analysisof three defined scenarios between requesters and reporters.

By contrast, in our work, we inspect 17 different oracle systems2and breakdown their design decisions and mechanism implemen-tations (listed in Table 2). We also discuss theoretical and possibleattacks on the different building blocks of the oracle systems. Com-paratively, we look at a broader types of oracles, including priceoracles, binary outcome oracles, and oracle systems, for any typeof data such as weather condition information.

4 MODULARWORK FLOWFor our main contribution, we deconstruct how an oracle operatesinto several modules that generally operate sequentially (but insome solutions, certain steps are skipped) and then we study eachmodule one-by-one. An overview of the work flow is as follows:

4.1 Ground Truth: The goal of the oracle system is to relaythe ground truth (i.e., the real true data) to the requester ofthe data.4.2 Data Sources: Data Sources are entities that store ormeasure a representation of the ground truth. There area diverse set of data sources: databases, hardware sensors,humans, other smart contracts, etc.4.3Data Feeders: Data feeders report off-chain data sourcesto an on-chain oracle system. In order to incentivize truthfuldata reporting, an oracle system can introduce a mechanismto select data feeders from a collection of available dataproviders. The incentive mechanism can be collateral-based,

2To our knowledge, a much larger set than other research on oracle systems.

Page 3: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market Manipulation AFT ’21, September 26–28, 2021, Arlington, VA, USA

such as staking, or reputation-based to find a reliable set ofdata feeders for each round of selection.4.4 Selection of Data Feeders: The process of determiningwhich data feeders should be used in an oracle system canbe categorized into two main types: centralized and decen-tralized selection.4.5 Aggregation: When data is submitted by multiple datafeeders, the final representation of the data is an aggregationof each data feeder’s input. The aggregation method can berandom selection or algorithmic rule-based, such as usingweighted average (the mean) or majority opinion. The designof the aggregation method is one of the most importantaspects of an oracle system, as intentional manipulationor unintentional errors during the aggregation process canresult in untruthful data reporting by the oracle system.4.6 Dispute Phase: Some oracle designs allow for a disputephase as a countermeasure to oracle manipulation. The dis-pute phasemight correct submitted data or punish untruthfuldata feeders. The dispute phase might also introduce furtherlatency.

The steps above are visualized in Figure 1. Next we dive deeperinto the modular workflow by trying to further define each module.As appropriate, we also discuss feasible attacks on the modules andpossible mitigation measures.

4.1 Ground TruthWhile not a module itself, ground truth is the initial input to anoracle system. Oracle designers cannot solve basic philosophicalquestions like what is truth? However it has to be understood (i)what the data actually represents and (ii) if it is reliable. Data issometimes sensitive to small details. Consider a volatility statisticfor a financial asset: basics like which volatility measure is beingused over what precise time period are obvious, but smaller thingslike the tick size of the market generating the prices could be rele-vant [48]. When data is aggregated from multiple sources, minordifferences in what is being represented (called semantic hetero-geneity) can lead to deviations between values [55, 73, 112].

While oracle systems will attempt to solve the issue of mali-cious participants who mis-report the ground truth, it does notaddress the fundamental question of whether the ground truthitself is reliable. Some philosophers argue truth is observed, andobservations require a ‘web of beliefs’ that is subject to error (forits consequences in security, see [59]). Reliability is judged by theassumptions made about the data source, described next.

4.2 Data SourcesData Sources are defined here as passive entities that store and mea-sure the representation of the ground truth. Common types of datasources include databases, sensors, humans, smart contracts,or a combination of them. Depending on how data sources gatherand retrieve the ground truth, different attack types arrise. Using ahybrid of data sources (if possible) could reduce the reliability ona single point of input. We describe each common type and theirsecurity considerations.

4.2.1 Humans. A human may provide the requested data, eitherby direct observation or by indirectly relaying data from another

Figure 1: A visualization of our oracle workflow as described in the text.

data source. Humans are prone to errors which is the main risk ofthis data source. Human errors include how the data is retrieved,how the data collector interprets the truth, and if data is relayedfrom a reliable source. Researchers have categorized human errorsinto the following three types (from least to most probable): verysimple tasks, routine tasks, and complicated non-routine tasks [71].An example for each category is, respectably, reading Bitcoin’s

Page 4: SoK: Oracles from the Ground Truth to Market Manipulation

AFT ’21, September 26–28, 2021, Arlington, VA, USA Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark

exchange rate from an unverified source, inputting the data intothe system, and configuring the oracle system.

Humans may also act maliciously and deliberately report wrongdata when they perceive it will benefit them. As we will see infurther modules, a robust oracle system will use incentives anddisputes to promote truthful statements.

4.2.2 Sensors. Sensors are electronic devices that collect raw datafrom the outside world and make it available to other devices. Thedata source may use more than one sensor to obtain the desired data.One example from traditional finance is the weather derivative, firstintroduced by the Chicago Mercantile Exchange (CME) [79]. Theseinstruments use weather data provided by trusted institutions, suchas the National Climate Data Center,3 which collects weather datathrough a network of sensors.

Provenance is a highly cited application of blockchain, whereproducts are tagged and traced through out the supply chain, in-cluding transportation, for management and/or certification [78,102, 114]. The tags could be visual (barcodes) or electronic (RFID).A host of attacks on RFID have been proposed outside of blockchainoracles [5]. Blockchain technology does not solve some importanttrust issues: ensuring the proper tag is affixed to the proper product,each product has one tag, each tag is affixed to only one product,and tags cannot be transferred between products. This is called thestapling problem [92].

Sensors can produce noisy data or malfunction. The hardwareof a sensor can also be modified when remote or physical accessis unauthenticated (or weakly authenticated as many sensors areconstrained devices). Probably the highest profile sensor attack(outside of blockchain) is Stuxnet [67]—malware that manipulatedthe vibration sensors, the valve control sensors, and the rotor speedsensors of Iran’s nuclear centrifuges, causing the system to quietlyfail [66].

4.2.3 Databases and application programming interfaces (APIs).The most common mechanism used by software to fetch data is touse an API to obtain the data directly from a centralized database. Adatabase is a set of tables that collect system events, while the APIis an interface with the database. For example, a financial exchangekeeps track of information in a database about every trade that hasbeen executed. A data source that needs the daily traded volume ofan asset could use the appropriate API of the exchange’s databaseto extract the data from the related table in the database.

An active attacker can attack the system from two points. Mod-ifying the data at rest in the database, or modifying the data intransit before and after the API call.

4.2.4 Smart Contracts. Smart contract could be used as a datasource similar to a database. Decentralized finance (DeFi) appli-cations on Ethereum include decentralized exchange services likeUniswap [1], or other oracles that operate on-chain. For instanceAPI3 oracle [9] uses other on-chain oracles, called dAPIs, as theirdata source. These oracles are whitelisted through voting by API3token holders.

Automated Market Makers (AMMs) [107] are an on-chain alter-native to centralized exchanges. Liquidity providers collateralize

3https://www.ncdc.noaa.gov/

the contract with an equally valued volume of two types of cryp-toassets. A mathematical rule governs how many assets of the onetype are needed to purchase assets of the other. A well-knownexample of such mathematical rule is the Constant Function MarketMakers (CFMM) to calculate the exchange rates of tokens in a sin-gle trade [95]. The idea behind AMM was first raised by Hanson’slogarithmic market scoring rule (LMSR) for prediction markets [56].A class of DeFi projects (e.g., Uniswap [1, 2] and Balancer [8]) usesCFMM to automate their market-making process. One of the utiliza-tions of AMM is the ability to measure the price of an asset in a fullydecentralized way, which addresses the pricing oracle problem [6].

One potential attack vector to the auto price discovery mecha-nism in an AMM is to manipulate prices provided by an algorithm,since the algorithmic rules used by an AMM is written in the smartcontract and therefore how prices are quoted by the AMM can becalculated in advance. One real case example on bZx is describedin Section 5.3.2. In addition to market manipulation, smart contractvulnerabilities [7, 24] could possibly be used to influence the datacoming from the Oracle, which we will discuss more in section 5.3.

4.3 Data FeedersData feeders are entities who gather and report the data from a datasource (Section 4.2) to the oracle system. A common configurationconsists of an external feeder which draws from off-chain datasources and deposit the data to an on-chain module. In case thedata source is already on the blockchain, the data feeder step canbe skipped.

It is not common to assume data feeders are fully honest, how-ever a variety of threat models exist. Generally, this module willnot attempt to determine if the data has been falsified (the latersections data selection (Section 4.4), data aggregation (Section 4.5)and dispute phase (Section 4.6) modules will deal with this issue);rather it will consist of tunnelling the data through the feederwith some useful security provisions. We discuss most importantsecurity provisions to achieve data integrity, confidentiality, andnon-repudiation on any specific data.

4.3.1 Source Authentication. Data integrity can be enhanced by au-thenticating the source of the data and ensuringmessage integrity ispreserved. It is sufficient to have the source sign the data, assumingthe source’s true signature verification (i.e., public) key is knownto the recipient of the data. This is most appropriate for sourceslike humans and sensors (although sensors may use a lightweightcryptographic alternative to expensive digital signatures [94]).

Databases, websites, and APIs typically support many crypto-graphic protocols, including the popular HTTPS (HTTP over SS-L/TLS) which adds server authentication and message integrity toHTTP data [27]. However HTTPS alone is typically not sufficient,as the message integrity it provides can only be verified by a clientthat connects to the server and engages in an interactive handshakeprotocol. This client cannot, for example, produce a transcript ofwhat occurred and show it to a third party (e.g., a smart contracton Ethereum) as proof that the message was not modified. To turnHTTPS data into signed data (or something similar), a trusted thirdparty can vouch that the data is as received. TLS notary [103]and DECO [116] offer solutions that attest for the authenticity of

Page 5: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market Manipulation AFT ’21, September 26–28, 2021, Arlington, VA, USA

HTTPS data. Town Crier [115] uses Trusted Execution Environ-ments (TEE) like Intel SGX [32] to push the trust assumption ontoTEE technology and, ultimately, the chip manufacturer.

4.3.2 Confidentiality. For many smart contracts that rely on ora-cles, the final data is made transparent (e.g., prices, weather, eventoutcomes). In a few cases, oracles feed data that is private (e.g.,identities, supply chain information) and the contract enforces anaccess structure of which entities under which circumstances canaccess it [75].

Confidentiality might also be temporary. Given the fact that in-formation submitted to the mempool is public, there is a natural riskon the oracle system that a data feeder uses another data feeder’sinformation to self-report to the system. This form of collusionbetween data feeders is called mirroring attack [41] in computersecurity literature. The data feeders are willing to freeload anotherdata feeder’s response to minimize their cost of data provision.They will also be confident that their data will not be an outlier andbe penalized. To mitigate the risk of mirroring attacks, the oracledesigner should consider mechanisms that ensure the confidential-ity of the data sent by the data feeders. A popular technique toachieve confidentiality is to use a commitment scheme [15]. In acommitment scheme, each data feeder should send a commitmentof the plain data as an encrypted message to the receiver. Later, thesender can reveal the original plain data and verify its authenticityusing the commitment.

4.3.3 Non-Repudiation. A non-repudiation mechanism assuresthat a party cannot deny the sender’s proposal after being sub-mitted to the system. Oracle systems might rely on cryptographicsignature schemes to eliminate the risk of in-transit corruption andto create irrefutable evidence of the data being provided by a source,for use in the dispute phase (Section 4.6) as needed.

4.4 Selection of Data FeedersIn order to ensure correct data is fed into the system, the designmust select legitimate data feeders and weed out less qualifiedand malicious participants. In a non-adversarial environment, thedesign might aggregate all the incoming data without any selection,skipping this step.

The earliest designs for oracle systems, such as Oraclizeit [10]and PriceGeth [44], were designed using just one single data feeder;however, to improve data quality and the degree of decentralization,more complex oracle systems such as ChainLink [41] involves se-lecting qualified data feeders to aggregate an output that is expectedto be more representative of the ground truth.

This process can be categorized into two main types: centralizedand decentralized selection, with decentralized selection havingmultiple approaches through voting and staking. Centralized selec-tion and decentralized selection through voting, create an allowlistof legitimate data feeders, in contrast to selecting based on thealgorithmic criteria in decentralized selection through staking.

4.4.1 Centralized (Allowlist) Selection. A centralized selection is apermissioned approach where a centralized entity selects a numberof data feeders directly without the involvement of other partici-pants in the network. A centralized selection is analogous to havingan allowlist for authorized data feeds (e.g.,Maker Oracle V1 [74]).

Compared to a decentralized approach, centralized selection is fastand direct. However the trust footprint on the central entity islarge: it must solely select legitimate data feeders and also havehigh availability to update the allowlist as needed.

4.4.2 Decentralized (Allowlist) Selection through Voting. By decen-tralizing the selection process, the goal is to distribute the trustfrom a single entity to a collective decentralized governance. Vot-ing distributes trust and provides a degree of robustness againstentities failing to participate, however it adds latency and intro-duces the threat that an actor can accumulate voting rights to swaythe vote [72], or even to do distrust and destroy the system (e.g.,Goldfinger attack [64]).

For instance, in Maker V2 oracle [74], the selection of the datafeeders is done through a decentralized governance process [54].MKR4 token holders vote on the number of authorized data feedersand who these data feeders can be [28].

Note that sometimes voting processes can provide the illusion ofdecentralization while not being much different than a centralizedprocess in practice. To illustrate, consider a project with a gover-nance token, in which most tokens are held by a few individualswhere the project leaders advocate for their preferences and there isno established venue for dissenting opinions. If voters only informthemselves from one source of information, that source becomes ade facto centralized decision maker.

4.4.3 Decentralized Selection through Staking. Like voting, stakingattempts to utilize a token to align the incentives of the participantswith the current functioning of the system. Mechanically, it worksdifferent: data feeders post collateral against the data they provide.In the dispute phase 4.6, any malicious data feeders will be punishedby losing a portion or all of their collateral (called slashing). Evenwithout slashing, the collateral amount acts as a barrier to entryfor participants and rate-limits participant.

The stake can be both in token value and reputation of the datafeeder. As an example, in Chainlink [41] protocol has a reputa-tion contract that keeps track of the accuracy of data reporting ofdifferent feeders. The ExplicitStaking module in Chainlink 2.0defines the number of Link tokens each oracle node must stake tobecome a data feeder, while the service agreement of the Chainlinkoracle defines the circumstances in which a node’s stake will beslashed [84]. Put together, the incentives for selected data feedersto act honestly are avoiding reputational loss, avoiding loss of stakeand penalty fees, and maintaining good standing for future income.In terms of selection, the data selection module forms a leaderboard,based on collateral and reputation, to select the highest ranked datafeeders from all available feeders.

Another approach, introduced by ASTRAEA [3], uses a combina-tion of game theory and collateralization between different actorsin the system (Voters and Certifiers) to achieve equilibrium on whatthe final data should be.

A staking-based selection module avoids a central trusted thirdparty, but it can add latency for adding/remove data feeds and otheradjustments. It is also open to sybil attacks by design, while workingto ensure these attacks have a significant cost for the adversary.

4MakerDAO Governance Token

Page 6: SoK: Oracles from the Ground Truth to Market Manipulation

AFT ’21, September 26–28, 2021, Arlington, VA, USA Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark

Category Example NoTru

sted Third Party

Lowlatency

Resilient to

SybilAttacks

Resilient to

Targetted D

oSAttacks

Incentivesare

Endogen

ous

Centralized Maker V1 Oracle • •Voting Maker V2 Oracle • •Staking Chainlink, ASTRAEA • • ◦ • •

Table 1: Evaluation Framework on selection of data feeders. For details see Section 4.4.4

One challenge for designing a staking mechanism is setting ahigh enough punishment (slashing) mechanism to thwart maliciousactions. Projects like UMA [104], another smart contract oracledesign, dynamically adjust staked collateral needed for each roundto ensure that Cost of Corruption (CoC) is higher than the projectedProfit from Corruption (PfC). Profit from Corruption is defined bythe data requester, in which UMA contracts require higher collateralto finalize the data from the data feeders. It is also important thatparticipants are incentivized to file correct disputes—ones that willultimately lead to identifying misbehaviour. If disputes are filedon-chain, the disputer will have to pay gas costs that need to beultimately reimbursed by the resolution process.

Decentralized selection is done by the holders of some scarcetoken, typically a governance token specific to the oracle service.The simplest decentralizedmechanism to hold a vote amongst tokenholders, who are indirectly incentivized (we call this an exogenousincentive) cast informed votes since they hold a token tied to thesuccess of the system (e.g., TruthCoin [99]). In a staking system,token holders are directly incentivized (a endogenous incentive) tovote ‘correctly’ (this remains to be defined but assume for now itmeans they vote in a way that will not be disputed) by postingsome amount of their tokens as a fidelity bond. Stakers stand tobe rewarded with new tokens and/or penalized (collateral slashed)depending on the performance of the data feeders they vote for.

Additionally a protocol could introduce a random selectionwithinthe data feeders to decrease the chance of sybil attacks. As an ex-ample Band Protocol [89], choses a random validator from top 100staked participants for their oracle system.

Another approach used by Tellor oracle [100] is a simple Proofof Work (PoW) algorithm for each round of data. The first 5 minersto submit their desired data alongside the solution to the miningpuzzle are selected as the data feeders of the round. The selectionis based on the hash power of each data feeder and randomnessnature of proof of work consensus.

4.4.4 Evaluation Framework on the selection of data feeders. Tocompare designs for data feeder selection, we provide an evaluationframework. The definition of each evaluation criteria (i.e., columnof the table) follows, specifying what it means to receive a full dot(•), partial dot (◦) or to not receive a dot.

No Trusted Third Party. A selection process that is distributed ordecentralized among several equally-powerful entities earns a full dot(•). A process that relies on a single entity for critical functions is notawarded a dot.

The voting and staking processes are decentralized amongstmultiple token holders (•). As the name implies, the centralizedprocess uses a trusted third party (no dot).

Low Latency. A selection process that can move from proposal tofinality within a single transaction is awarded a full dot (•). A processthat requires multiple rounds of communication or communicationamong several entities is not awarded a dot.

The centralized process can make selection decisions unilaterally(•). The voting process involves a round of communication withall of the participants (no dot). The staking process draws feedersunilaterally from an established leaderboard (•).

Resilient to Sybil Attacks. A selection process that only allowsunique feeders to participate is awarded a full dot (•). The evaluationdoes not consider what specific method is used to determine entities areunique but assumes it works reasonably well (not strictly infallible).A process that is open to multiple fake feeders controlled by the sameadversary is awarded a partial dot (◦) if each additional feeder createdby the adversary has a material financial cost. If there is no materialcost to creating additional fake feeders, the process receives no dot.

The centralized process manages an allowlist based on real worldreputations. We assume this reasonably prevents sybils (•). Thestaking process admits sybils but deters them by requiring stakedtokens for each, which is costly (◦). The voting process does notdeter sybils from entering the election but relies instead on thevoting process to not select them (no dot).

Resilient to Targeted Denial of Service Attacks. A selection processthat only halts when multiple entities to go offline or fail is awardeda full dot (•). If critical functionalities cannot be performed with thefailure of a single entity, but the basic selection process can proceed,it is awarded a partial dot (◦). If the process can be fully halted bythe failure of a single entity, it is awarded no dot.

The voting and staking processes can proceed until enough hon-est participants fail that a dishonest majority remains (•). By con-trast, a failure with the central entity in a centralized process canprevent critical functionalities, like updating the allowlist (◦).

Page 7: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market Manipulation AFT ’21, September 26–28, 2021, Arlington, VA, USA

Incentives are Endogenous. Every selection process should have theability to remove untruthful feeders. Some selection processes mightgo beyond this and incentivize feeders to provide truthful information.Processes are awarded a full dot (•) if the awards/punishments can berealized by the selection process itself. If the selection process relies onlyon external incentives (e.g., damage to reputation), it is awarded nodot. The evaluation does not consider how information is determinedto be truthful or not. Endogenous means the design is simpler but doesnot imply it is more secure (cf. [50]).

The staking process requires feeders to post collateral that canbe taken (i.e., slashed) for malicious behavior (•). Centralized andvoting processes do not use internal incentives (no dot).

4.5 Aggregation of Data FeedsAggregation is the process of synthesizing the selected data feedsinto one single output. The quality of the output depends on the datafeed selection (see Section 4.4) and the aggregation process used.To highlight the importance of designing an aggregation methodcorrectly, consider the case of Synthetix, a trading platform [98] thatused the average (or mean) of two data feeders as their aggregationmethod. An attacker leveraged this to manipulate one of the twofeeders by inflating the real price by 1000x. Mean aggregation ishighly sensitive to outlier data and the attack resulted in Synthetix’sloss of several million dollars [97].

4.5.1 Statistical Measures. The three core statistics for aggregationare mean, median and mode. Many oracle systems use the medianas the aggregated output, by selecting the middle entry of a listof ordinal data inputs. Unlike the mean, the median is not skewedby outliers, although it assumes the inputs have an appropriatestatistical distribution where the median is a representative statisticfor the underlying ground-truth value. For example, if we believedata from the feeders is normally distributed with possible outliers,the median is appropriate. However if we believe it is bi-modallydistributed, then discretizing and computing the mode (most com-mon value) of the data is more appropriate. The mode is usefulfor non-numeric data (and nominal numbers). An approximationto the mode is picking a data input at random, however accessto randomness from a smart contract is a well-documented chal-lenge [18, 21, 23]. Oracle projects like Chainlink do not prescribe afixed aggregation method and let the data requesters select one.

To improve the quality of simple statistics such as the medianand the mode, weights can be applied in the calculation. For in-stance, to mitigate manipulation of price data, one can choose touse time-weighted average price (TWAP) [105], or liquidity volume,or both [2]. Typically, the liquidity and trading volume of a marketcorrelates with the quality of the price data. To illustrate, UniswapV2 uses TWAP over several blocks (e.g., mean price in the last10 blocks) to reduce the possibility of market manipulation in asingle block (e.g., via flash loans [90]). In Uniswap V3, TWAP isoptimized for more detailed queries including the liquidity volumeand allowing users to compute the geometric mean TWAP [2].

4.5.2 Stale Data. Some use cases require frequent updates to data,such as weather data and asset prices. Stale data can be seen as validdata and pass the selection criteria, but it will reduce the aggregateddata quality. Projects like Chainlink rank feeders based on historic

timeliness. A naive approach ignores this issue and always usesthe last submitted data of a data feeder even if the data feeder hasnot updated its price for some specific period. This approach isproblematic if the underlying data is expected to change frequently.An example occurred on Black Thursday 2020 [108] to MakerDaowhen Maker’s data feeders could not update their feeds becauseof very high network congestion. After a significant delay in time,feeds were updated. The price had shifted by a large amount andthe reported data jumped, leading to sudden, massive liquidationsthat were not adequately auctioned off.

4.6 Dispute PhaseThe dispute phase is used to safeguard the quality of the final outputand give the stakeholders a chance to mitigate inclusion of wrongdata. Dispute resolution can be an independent module after theaggregation phase or it can be implemented at any other oraclemodule (e.g., at the end of every aggregation 4.5 or data feeder selec-tion 4.4). Most oracle systems do dispute resolution internally, butmarket specialization has produced firms that provide outsourceddispute resolution as service (e.g., Kleros [68]). To systemize thelandscape, we first distinguish between systems that aim to detect(and remove) bad data providers and systems that vet the data itself.We then iterate how data is determined to be valid or invalid forthe purposes of a dispute. Finally, we illustrate the consequences ofa successful dispute: what happens to the disputed data and whathappens to its provider.

4.6.1 Provider-level and Data-level Vetting. Dispute resolution canbe provider-oriented or data-oriented. Under a provider-orientedregime, the focus is on selecting honest data providers and usingdisputes to remove data providers from serving as oracles in the fu-ture. In the optimistic case that providers are honest, oracle data isavailable immediately, however if an honest provider is corrupted, itwill have a window of opportunity to provide malicious data beforebeing excluded. One illustration of a provider-oriented system isoperating a centralized allowlist of data providers (e.g., MakerDAOv2) where providers can be removed. Chainlink [41] strives to de-centralize this functionality, where a reputation-based leaderboardreplaces the allowlist.

In a data-oriented regime, the focus is vetting the data itself.This can result in a slower system as oracle data is staged for adispute period before it is finalized, however it can also correctfalse data (not merely remove the corrupted data feeder from futuresubmissions). One illustration of a data-oriented system is Tellor [31,100], where data is staged for 24 hours before finalization. If itis disputed, a period of up to 7 days is implemented to resolvethe dispute. It is also possible that a system allows the resolutionitself to be further disputed with one or more additional rounds. InAugur [87] for instance, the dispute step may happen in one round(takes maximum 1 day) or may contain other rounds of disputesthat can last more than 7 days.

4.6.2 Determining the Truth. In the optimistic case, an oracle sys-tem will feed and finalize truthful data, while disputes enable re-course for incorrect data. However disputes also introduce thepossibility of two types of errors.

Page 8: SoK: Oracles from the Ground Truth to Market Manipulation

AFT ’21, September 26–28, 2021, Arlington, VA, USA Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark

No Disputes DisputedData is correct Correct False PositiveData is incorrect False Negative Correct

Dispute resolution in oracle systems focus on false positives.Incentivizing the discovery of false positives is present in somestaking-based systems, however false negatives are not otherwisedealt with. In order to resolve a false positive, correct data mustbe used as a reference but, of course, if correct data is availableas a reference, then it could replace the entire oracle system. Thatleaves two reasons for why an oracle systemmight still exist: (a) thereference for correct data is too expensive to consult on a regularbasis, or (b) there is no reference for correct data and it must beapproximated.

If feeders are placed on an allowlist by a trusted party, disputescould be filed with the trusted party and manually verified. As far aswe know, this is the only example of (a), although (a) is the basis forother blockchain-based dispute resolution protocols like optimisticroll-ups [62]. The rest of the truth discovery mechanisms are basedon (b) approximating the truth.

A statistical approach is selecting, from a set of values proposedby different feeders, the median of the values (e.g., appropriatelydistributed continuous numerical data) or the mode (e.g., non-continuous or non-numerical data). It is possible to augment thisapproach by having feeders stake collateral in some cryptocurrency(e.g., a governance token for the oracle project), and this collateralis taken (slashed) from the feeder if their data deviates from themedian by some threshold. If the amount slashed is payed, in partor in full, to the entity that filed and/or supported a dispute on thedata, this incentivizes feeders to help reduce false negative errorsin addition to false positives. One challenge is setting an accept-able threshold for slashing. A large threshold tolerates moderatelyincorrect data without punishment, while a small threshold couldpunish data feeders that are generally honest but faulty, slow, orreporting on highly volatile data.

If a governance token exists for the oracle project, a related ap-proach is to introduce voting on disputed data by any token holder,and not limit the decision to just the feeders. In Augur [87] and AS-TRAEA [3], disputers vote to change the tentative outcome becausethey believe that outcome is false. Voting occurs over a window oftime which extends the time to resolve disputes. By comparison,statistical mechanisms can be applied automatically and nearlyinstantly after the data is aggregated. However voting incorpo-rates human judgement which might produce better outcomes innuanced situations.

One final truth discovery mechanism is arbitrage which is ap-plicable in the narrow category of exchange rates between twoon-chain tokens. This can be illustrated by the NEST oracle [81]where data feeders assert the correct exchange rate between twotokens by offering a minimum amount of both tokens at this rate(e.g., 10 ETH and 39,000 USDT for a rate of ETH/USDT = 3900). Ifthe rate is incorrect, other participants will be given an arbitrageopportunity to buy/sell ETH at this rate, an action that can correctthe price. This is very similar to drawing a price from an on-chainexchange, like Uniswap, and suffers from he same issue: an adver-sary can manipulate the oracle price by spending money. It is securewhen the Cost of Corruption (CoC) is greater than the Profit fromCorruption (PfC), however PfC can never be adequately accounted

for because profit can come from extraneous (extra-Ethereum) fac-tors [50]. The UMA [65] oracle system has data feeders providetheir own PfC estimates for the data they provide.

4.6.3 Consequences for Incorrect Data. We now consider the conse-quences for disputed data that has been determined to be incorrect.In provider-oriented dispute resolution, incorrect data has conse-quences for the data feeder (see next subsection) but not the dataitself. By the time the dispute is resolved, it is too late to change thedata itself.

In data-oriented dispute resolution, data that has been deemedincorrect can either be reverted or corrected. Reversion means theoutcome result will be annulled and the system should start fromscratch to obtain new data, while corrected data will reflect a newundisputed value. The difference between the two is essentially inthe complexity of the dispute resolution system. For reversion, acollective decision is taken to accept or reject data — a binary optionthat is known in advanced. By contrast, correcting data requiresnew data to be proposed and then a collective decision to be madeon all the proposals which is more complex but does not avoidsrerunning the oracle workflow.

These differences also impact finality: when should oracle databe considered usable? Dispute periods, re-running the workflow,and allowing resolved disputes to be further disputed can all intro-duce delays. To illustrate, consider Augur [87] which implements aprediction market on binary events. Any observer with an objectionto a tentative outcome can start a dispute round by staking REP(Augur’s native token) on the opposite outcome. Dispute windowsare 24 hours and then extended to 7 days for disputes on disputes. Ifthe total staked amount exceeds 2.5% of all REP tokens, the marketenters a 60-day settlement phase called a fork window when allREP holders are obliged to stake on the final outcome.

4.6.4 Consequences for Data Feeder. If data has been deemed in-correct through disputes or rejected for being an outlier, the feederwho provided the data might face consequences like being banned,slashed, or suffering reputational loss. It is also possible that thereis no consequence for the feeder other than the data being discarded.For example, in a sensor network, results from faulty sensors couldhave their data filtered out but continue to contribute data in ex-pectation that they will be repaired in the future.

In oracle designs based on allowlists, a feeder could be bannedor temporarily suspended for providing incorrect data. For disputeresolution based on staking, a feed could suffer economic loss byhaving their stake taken from them. It is important to reiteratethat this economic loss does not necessarily outweigh the utility ofattempting to corrupt oracle data. The profit from corruption de-pends on where the data is being used, which could be within largersystem than the blockchain itself [50]. Finally, a feeder might sufferreputational loss for providing incorrect data. One can imagine thiswould be the case if, for example, the Associated Press misreportedthe outcome of the 2020 US Presidential election after announcingthat it would serve as an oracle for this event on Ethereum.

Another illustration of these options is Chainlink, which main-tains a decentralized analogue to a leaderboard where feeders areranked according to the amount of LINK (Chainlink’s token) theystake, as well as their past behavior in providing data that is timelyand found to be correct. Data feeders with the outlier data will be

Page 9: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market Manipulation AFT ’21, September 26–28, 2021, Arlington, VA, USA

punished by losing their collateralized LINK tokens and reducingtheir reputation score on the reputation registry. The lost of tokensis a direct cost, while the loss of reputation could impact their futurerevenue.

4.7 Classification of Current Oracle ProjectsIn Table 2, we present a classification of several oracle implementa-tions using the modular framework described in this section. Thistable showcases a wide variety of approaches, as well as somespecialization on specific modules (e.g., TownCrier and Deco ondata source and Kleros on dispute resolution). We caution thatblockchain projects can change how they work very quickly, newprojects will emerge, and current projects will be abandoned. Ta-ble 2 has a limited shelf-life of usefulness, however the workflowitself (modules, sub-modules, and design choices) is based on gen-eral principles and intended to have long-lasting usefulness.

5 INTERACTINGWITH THE BLOCKCHAINWhile the initial inputs to an oracle are generally off-chain (with theexception of pulling data from another smart contract) and the finaloutput is by definition on-chain, the oracle designer will chooseto implement the intermediary modules—data feeder selection, ag-gregation and dispute resolution—as either off-chain or on-chain.Generally, on-chain modules are preferred for transparency andimmutability, while off-chain modules are preferred for lower costsand greater scalability.

To illustrate, Chainlink and NEST Protocols were ranked #5and #7 respectively in gas usage among all DApps on Ethereum.5This ranking was achieved mainly because they implement allmodules fully on-chain. Later, Chainlink implemented an off-chainreporting (OCR) protocol [17] with the goal of reducing the gas costsassociated with on-chain transactions. This protocol uses digitalsignatures to authenticate feeders and a standard (e.g., Byzantinefault tolerant [20]) consensus protocol between Chainlink nodes.

At some point, an oracle system must move on-chain and startinteracting with the underlying blockchain. We assume for thepurpose of illustration that Ethereum is the blockchain being used.Data flow from an off-chain module to a smart contract involvesthe following three components which we detail in this section.

5.1 Off-chain Infrastructure: Assuming at least one moduleis off-chain, an infrastructure is required to monitor requestsfor oracle data from the blockchain, gather the data from thedata sources, implement a communication network betweendata feeders, and create a final transaction to be sent to theblockchain infrastructure.

5.2 Blockchain Infrastructure: Off-chain infrastructure willpass the data as a transaction to blockchain nodes, whichrelay transactions and use a consensus algorithm agree onnew blocks. The nodes run by miners are discussed in partic-ular as they dictate the order of transactions in every blockthey mine.

5.3 Smart Contracts: The transaction triggers a state changein a smart contract on the blockchain, typically a contract

5Based on Huobi DeFiLabs Insight on September 2020 [61]

owned by the oracle which is accessible from all other con-tracts. Alternatively, the oracle could write directly into adata consumer’s contract (called a callback).

5.1 Off-chain InfrastructureDepending on the oracle design, there can be different types ofoff-chain infrastructure. If financial data is pulled from Uniswap’soracle [105], there is no off-chain infrastructure needed becausethe oracle is already a fully on-chain oracle. For other applications,off-chain infrastructure could consist of a single server (e.g., Town-Crier [115]) or many nodes that intercommunicate through theirown consensus protocol (e.g., Chainlink OCR [17]). Availability andDOS-resistance [96] are core requirements of off-chain infrastruc-ture, specially in oracle systems working with time-sensitive dataand high update frequency. In this section we describe differentpossible components of the off-chain infrastructure.

5.1.1 Monitoring the Blockchain. For oracles that are capable ofreturning a custom data request made on-chain (called request-response oracles), every data feeder needs to monitor the oracle’ssmart contract for data requests. The common implementationconsists of a server subscribing to a blockchain node for specificevents.

5.1.2 Connection to Data Source. The data feeder requires a con-nection to the data source 4.2 to fetch the desired data. This con-nection can be an entry point for an adversary to manipulate thedata however it is possible to mitigate this issue by integrating mes-sage authentication (recall source authentication in Section 4.3.1).Examples include relaying HTTPS data (e.g., Provable [10] viaTLSNotary [103]) or from trusted hardware enclaves (e.g., Town-Crier [115] via Intel SGX [32]). Vulnerabilities with the web-serveror SGX itself [16] are still possible attack vectors.

5.1.3 Data Feeders Network. In order to increase the scalabilityof the oracle network, multiple data feeders might aggregate theirdata off-chain (e.g., Chainlink OCR [17]). In OCR, a leader is chosenfrom the participants to gather signed data points from other nodes.Once consensus is achieved on the aggregated set of data, thefinalized data, accompanied by the signatures, is transmitted to theblockchain node. This reduces the costs as only one transaction issent to the blockchain, while maintaining similar security as havingeach chainlink nodes send the data themselves.

Like any network system, availability is essential to the operationof the oracle. To illustrate, in December 2020, MakerDAO’s oracleV2 had an outage due to a bug in their peer-to-peer data feedernetwork stack [51]. We do not summarize all the literature on peer-to-peer network attacks, but denial-of-service attacks [110] andsybil-attacks [40] are critical to mitigate to ensure the availabilityof the network and the oracle.

5.1.4 Transaction Creation. In order to submit data to a blockchain,the data feeder is required to construct a valid blockchain trans-action that includes the requested data. This transaction must besigned with the data feeder’s private key to be validated and au-thenticated on-chain. The data feeders must protect the signingkeys from theft and loss [43], as this key can be used to impersonatethe oracle.

Page 10: SoK: Oracles from the Ground Truth to Market Manipulation

AFT ’21, September 26–28, 2021, Arlington, VA, USA Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark

DataSource

Selection

Mecha

nism

Staking

Aggregation

Mecha

nism

Provider/DataVettin

g

Determining theTru

th

Consequences

(Slash, Ban, Loss)

Oracle Data Feeder Dispute

ChainLink [41] API Reputation,Staking • Statistical

Measure P StatisticalMeasure S

UMA [104] Human, API FCFS† • × D Staking S

Augur [87] Human SingleSource★ • × D Voting S

Uniswap [105] Smart Contract × × TWAP × × ×

MakerDAO V1 [74] Human, API CentralizedAllowlist × Median × × ×

MakerDAO V2 [74] Human, API DecentralizedAllowlist × Median P Voting B

NEST [81] Human × • ×★★ D Arbitrage L

Band protocol [89] API RandomSelection • Statistical

Measure P Staking S

Tellor [31] Human, API PoW • Median P Staking SB

ASTRAEA [3]TruthCoin [99] Human Staking • Mode D Voting S

Provable [10]PriceGeth [44] API × × × × × ×

DIA Oracle [38] API,Smart Contract × × × D Staking B

DECO [116]TownCrier [115] HTTPS × × × × × ×

API3 [9] \w Kleros [68] Oracles DecentralizedAllowlist • Statistical

Measure P Voting SB

Table 2: A classification of the existing oracle implementations using the modular framework described in Section 4.• indicates the properties (columns) are implemented in the corresponding oracle (rows), and × indicates the property is not applicable.† First Come First Serve★The Market Creator assigns the designated reporter ★★The series of reported prices will be sent to requester without aggregation (See 4.6.1)

Transactions compete for inclusion in the next block by offeringdifferent levels of transaction fees, known as the gas fee in Ethereum.In time-sensitive oracle applications, the relay must specify anappropriate amount of gas according to market conditions. Forinstance, on ‘Black Thursday’ in March 2020 [108], the Ethereumnetwork was congested by high fee transactions and some oraclesfailed to adjust their price feed. Tomitigate this problem, themodulewhich is responsible for creating the final transaction must have adynamic gas mechanism for situations where gas prices are rapidlyclimbing. In this case, pending transactions must be canceled, andnew ones must be generated with higher gas price, which may takea few iterations to get in. Dynamic fees depend directly on thenetwork state and require a connection to the blockchain node toestimate the adequate gas price.

In addition, the data feeder’s sending address on the blockchainmust have sufficient funds to be able to pay the estimated gasprice. It is crucial for the availability of the oracle that the data

feeders monitor their account balance as spam attacks might draintheir reserves with high gas fees, as happened to nine Chainlinkoperators in September 2020 [33].

5.2 Blockchain InfrastructureIn this section, we discuss the blockchain infrastructure that isrequired by any entity interacting with the blockchain. While thisinfrastructure is not specific to oracles, we illustrate key points thatcan impact oracle availability.

5.2.1 Blockchain Node. A blockchain node relays transactions tothe other nodes in the network for inclusion in the blockchain. Thenode is responsible for storing, verifying, and syncing blockchaindata. The availability of nodes is very important for the oraclesystem, as a blocked node cannot send transactions. Extensiveresearch on network partitioning attacks apply to decentralizednetworks, with the main objective of surrounding an honest nodes

Page 11: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market Manipulation AFT ’21, September 26–28, 2021, Arlington, VA, USA

with the malicious nodes [57, 58, 82, 106, 117]. This results in thenode believing it is connected to the blockchain network when it isnot.

5.2.2 Block Creation. Transactions that have been circulated tothe blockchain network are stored in each node’s mempool. Miningnodes select transactions from their mempool according to theirpriorities (e.g., by highest gas price as in Geth [46], while respectingnonces). Front-running attacks [35, 45] try to manipulate how min-ers sequence transactions. For example, someone might observe anunconfirmed oracle transaction in the mempool, craft a transactionthat profits from knowing what the oracle data will be, and attemptto have this transaction confirmed before the oracle transactionitself (called an insertion attack [45]). This might be conducted bythe miner themself. In this case, it is called transaction reordering,and the profit miners stand to make from doing this is termedMiner Extractable Value (MEV) [35]). Other nodes or users on thenetwork who can act quickly and offer high fees can also conductfront-running attacks. Users might also attempt a bulk displacementattack [45] that fills the consecutive blocks completely to delayreported data from oracles. There could be a profit motive for thisattack if the oracle data becomes expired, or if the data feeder’scollateral is slashed and redistributed to the attacker.

Research on MEV (e.g., Flashbots [49]) has shown the possibil-ity of new type of attacks based on reordering the transactions,such that if there’s a high profit for changing the order of sometransactions in a (few) blocks, miner is incentivize to use his hashrate to perform a reorganization attack6 [69], and profit from theexecution of the newly ordered transactions. For instance, Uniswapuses the last price in a block to determine the average price (TWAP),in which a miner can add new trades while reordering the pasttrades with the goal of manipulating the price average to profit onother applications that uses Uniswap as price oracle.

5.2.3 Consensus. The goal of the consensus algorithm used in theblockchain is to verify and append the next block of transactionsto the blockchain. If the nodes do not come to agreement on a statechange, a fork in the network happens with different nodes tryingto finalize different forks of the blockchain. Given the network isdecentralized, short-lived forks happens frequently in the networkthat generally are resolved within a few blocks [83]. All valid trans-actions in the abandoned fork will eventually be mined in the mainchain, likely in a new order (called reorganization or a reorg).

A reorg opens the possibility of attacks by using known, uncon-firmed, transactions from the abandoned fork. To illustrate, considerEtheroll [47], an on-chain gambling game where users bet by send-ing a number that payouts if it is smaller than a random numberdetermined by an oracle. To prevent front-running from the mem-pool, the Etheroll oracle would only respond when a bet was ina block. Despite this mitigation, in April 2020, the Etheroll teamdetected an ongoing front-running attack on their platform [42].The attacker was betting rigorously and waiting for small forks—collected by Ethereum in uncle blocks—where the original bet andoracle’s random number response were temporarily discarded bythe reorg. The attacker would place a winning bet with a high feeto front-run the original bet and eventual inclusion of the oracle’s

6Also referred to as Time-bandit attacks [35]

transaction in the reorganized chain. A general principle of thisattack is that even if oracle data bypasses the mempool and is in-corporated directly by miners, front-running through reorgs is stillpossible.

There are two solutions to front-running through reorgs. Thefirst is to delay the settlement of the bet by a few blocks to preventissues caused by small reorganization forks. The second is to in-corporate a hash of the request (e.g., request-id) in the response toprevents the request (e.g., bet) from being swapped out once theresponse (e.g., random number) is known.

Other consensus attacks [12, 53, 58] exist but are less related tooracles. We omit discussion of them.

5.3 Smart ContractsAlthough oracles are usually designed to be the source of truth foron-chain smart contracts, some smart contracts can also be usedas oracles by others even though they were not designed with theoracle use-case in mind. To expand this idea, oracles could be a ’anend in itself’, which is to say they are designed specifically to beused as a source of truth. These oracles fetch the data from externalsources(4.2) and make it available on-chain (e.g., PriceGeth [44]).

By contrast, a means to an end oracle is a contract that producesuseful data as a byproduct of what it is otherwise doing. Exam-ples are on-chain markets and exchanges like Uniswap and otherautomated market makers (AMMs). The markets are designed forfacilitating trades but provide pricing information (price discovery)that can be used by other contracts (e.g., margin trading platforms)as their source of truth.

In this section we dive deeper in the relationship between theoracle’s smart contract and the data consumer smart contract. Westart by defining possible interaction models, and then discussspecific issues related to the oracle’s contract and the consumer’scontract.

5.3.1 Oracle Interaction Models. A distinction in the oracle designis whether the interaction between with the consumer’s contract isimplemented as a feed, a request-response, or the related subscribe-response.

A Feed is a smart contract system that publishes the data forothers to use. It does not require any requests to fetch the data andusing an interval to update the data on its smart contract (e.g.,MakerDAO Oracle [74]). From a technical aspect, in order to use a feedoracle, the data consumer smart contract only needs to query theoracle’s smart contract and no additional transactions are needed.

The Request-Response model is similar to a client-server APIrequest on traditional web development. The requester must senda request to the oracle’s smart contract, which then is picked upby the off-chain module of the oracle to fetch the requested datafrom the data source. The data is then encapsulated in a transactionand sent back to the data requester smart contract through theoracle’s smart contract. Due to the nature of this design, at leasttwo transactions are needed to complete the work flow, one fromthe requester and another for the responder.

The Subscribe-Response model is similar to Request-Responsewith one main difference, the request does not need to be in atransaction. If there is pre-arranged agreement, the oracle willwatch for emitted events from the requester smart contract and

Page 12: SoK: Oracles from the Ground Truth to Market Manipulation

AFT ’21, September 26–28, 2021, Arlington, VA, USA Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark

respond to the requests. Alternatively, the requester is allowed toread the feed through an off-chain agreement (e.g., API3 [9]).

5.3.2 Oracle’s Smart Contract. In the oracle designs that implementsome of the modules on-chain, the oracle’s smart contract couldinclude data feeder selection (Section 4.4), aggregation (Section 4.5),and dispute resolution (Section 4.6). In addition to these modules,the oracle’s smart contract can be used as the data feed storage forother smart contracts to read from, or to authenticate the oracle’sresponse on the consumer smart contract. In the feed model, theoracle’s smart contract is where the consumer fetches the oracledata from. In the Request-Response model, the data consumer smartcontract (defined below in Section 5.3.3) requires knowledge ofthe oracle’s smart contract’s address in advance, for the initialrequest and also verification of the oracle’s response. For the rest ofthis section, we discussion potential attacks on the oracle’s smartcontract.

Implementation Flaws. There are many known smart contractvulnerabilities that have been extensively discussed [24, 30] andpossibly could affect the legitimacy of the oracle system.

In many DeFi projects, a common design pattern is to use on-chain markets, such as Uniswap, for the price oracle, however, thesesystems were not designed to be used as oracles and are prone tomarket manipulation. The end result is that currently, the mostprevalent attack vector in DeFi is oracle manipulation [34]. Toillustrate this attack, consider the lending (and margin trading)platform bZx. It fetched prices from KyberSwap, a decentralizedexchange, to calculate the amount of collateral of one cryptoasset isneeded to back the loan of a different asset. In one attack on bZx [85],the attacker used a flash loan to manipulate KyberSwap’s sUSD/ETHexchange rate. The attacker then borrowed ETH with insufficientcollateral because the bZx contract believed the collateralized sUSDwas worth much more than it actually was. When the attackerabsconded with the borrowed ETH, forgoing its collateral, and thenunwound its other positions and repaid the flash loan, it profitedat bZx’s expense. Arguably bZx (the data consumer) is the flawedcontract but the ease in which KyberSwap (the oracle contract)could be manipulated was not well understood at the time either.In reaction, decentralized exchanges embraced their role as a priceoracle and hardened themselves against price manipulation byusing aggregation methods like the Time-Weighted Average Price(TWAP) (described in Section 4.5).

Governance. In order to remove the centralization of control inmany DeFi projects, a governance model is introduced that uses anative token for voting and staking. The governance model for anoracle could propose, vote, and finalize changes to system variableslike the approved data feeders on the oracle’s allowlist or variousfees.

While a decentralized governance model removes the trust ina central entity, it does not remove the possibility of a wealthyentity (a whale) taking control of the system by accumulating (orborrowing [90]) enough tokens to pass their proposals. In addition,logical issues in the governance implementation could result intricking the voters into approving a proposal that has maliciousconsequences [80].

As an example, in the MakerDAO platform, MKR token holderscan vote to change parameters related toMaker’s oraclemodule [74].An attacker in October 2020, used a flash loan to borrow enoughMKR tokens to pass a governance proposal, aimed to change the listof consumer smart contracts and obtain read access to the Maker’soracle [72]. It could be more dangerous if the attacker planed tochange the other parameters of the oracle such as Whitelisted datafeeders or bar parameter : the sufficient number of data feeders fordata feeder selection module. Potentially an attacker may pay abribe to the MKR holders to buy their votes, or use a DecentralizedAutonomous Organization (DAO) to pay for the votes without havingownership of the tokens [36].

5.3.3 Data Consumer Smart Contract. The final point in the oracleworkflow is the smart contract that needs the data for its businesslogic. Aside from any possible code vulnerabilities in this smartcontract, there are common implementation patterns concerningthe oracle workflow.

In the feed model, the data consumer smart contract relies onoracles to fetch the required data in order to function as intended.It is essential to use oracles with multiple data feeders and a properaggregation methods. To illustrate the importance, consider thelending service Compound [29] which initially only used CoinbasePro as their data feeder without any aggregation mechanisms [37].In November 2020, a faulty price feed on Coinbase Pro, resultedin undercollarization of Compound loans and a liquidation of $89million dollars of the collateral. This could have been prevented byusing an oracle with sufficient data feeders and a proper aggregationmechanism.

Due to the commonality of this issue, there has been someEthereum Improvement Proposals (EIPs) to standardize the in-terface of the oracles implementing a feed (e.g., EIP-2362 [101]).An interface would allow data consumer smart contracts to easilyswitch between feeds or use multiple oracle feeds in their logic.

In the request-response model, the data consumer smart contractsends a request for specific data to the oracle’s smart contract. Insome projects this request contains more information like the datafeeder selection method, aggregation algorithm and parametersfor dispute phase (e.g., Service Level Agreement in Chainlink). It iscrucial that the data consumer smart contract, verifies the authen-ticity of the oracle response. Failure to verify the oracle’s responsecould result in malicious data injection in the data consumer smartcontract. To illustrate, the insurance service Nexus Mutual [80]implemented an oracle’s response function (or callback) withoutany proper access control. This opened the possibility of unautho-rized entities providing data updates which would be wrongfullyassumed to have originated from the oracle’s smart contract.

6 CONCLUDING REMARKSIn this paper, we described a specialized modular framework toanalyze oracles. After our systematization, we present the followingdiscussion points and lessons learned from our work.

(1) Many oracles projects introduce their own governance to-kens that are used to secure the oracle system (e.g., throughstaking). Two conditions seem necessary: the market capital-ization of the token stays material and the token is evenly dis-tributed. More consideration should be given to leveraging

Page 13: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market Manipulation AFT ’21, September 26–28, 2021, Arlington, VA, USA

an existing token with these properties (even a non-oracletoken) instead of creating new specialized tokens [19]. Alsoa collapse in the value of the governance token threatens theentire system.

(2) Oracle systems with on-chain modules are expensive torun on public blockchains like Ethereum, which prices outcertain use-cases that consume a lot of oracle data but donot generate proportional amount of revenue (e.g., Weatherdata).

(3) Diversity in software promotes resilience in the system. Ifthe oracle market coalesces behind a single project, a failurewithin this project could cause cascading failures across DeFiand other blockchain applications.

(4) While determining the profit from corrupting the oracle is apromising approach to thwarting manipulation (by ensuringthe cost of corruption is greater), one can never capture thefull extent of the potential profit. Attackers can profit outsideof Ethereum by attacking oracles on Ethereum [50].

In summary of this paper, the framework we present facilitatesa modular approach in evaluating the security of any oracle designand its associated components that exist today or to be implementedin the future. As an example, the level of centralization can be mea-sured using choke points such as aggregation 4.5, or how the data isproceeded to the blockchain 5.1. In order to design a secure oracle,all modules must be rigorously stress tested to make sure it cannotbe gamed by participants or malicious actors. In addition, manysecurity auditors and analysis tools are specialized in detectingoracle-related attacks through code review of the smart contracts.Specially with the rise of DeFi smart contracts, the importanceof a secure oracle system remain a paramount component of thedecentralized blockchain ecosystem.

ACKNOWLEDGMENTSJ. Clark acknowledges support for this research project from (i) TheChaire Fintech: AMF – Finance Montréal, (ii) National Sciences andEngineering Research Council (NSERC) through the NSERC, Ray-mond Chabot Grant Thornton, and Catallaxy Industrial ResearchChair in Blockchain Technologies (https://www.nserc-crsng.gc.ca/Chairholders-TitulairesDeChaire/Chairholder-Titulaire_eng.asp?pid=1045), and (iii) an NSERC Discovery Grant .

REFERENCES[1] Hayden Adams. 2019. Uniswap. URl: https://uniswap.org/docs (2019).[2] Hayden Adams, Noah Zinsmeister, Moody Salem, River Keefer, and Dan Robin-

son. 2021. Uniswap v3 Core. (2021).[3] John Adler, Ryan Berryhill, Andreas Veneris, Zissis Poulos, Neil Veira, and

Anastasia Kastania. 2018. Astraea: A decentralized blockchain oracle. In 2018IEEE International Conference on Internet of Things (iThings) and IEEE GreenComputing and Communications (GreenCom) and IEEE Cyber, Physical and SocialComputing (CPSCom) and IEEE Smart Data (SmartData). IEEE, 1145–1152.

[4] Hamda Al-Breiki, Muhammad Habib Ur Rehman, Khaled Salah, and DavorSvetinovic. 2020. Trustworthy Blockchain Oracles: Review, Comparison, andOpen Research Challenges. IEEE Access 8 (2020), 85675–85685.

[5] Mojtaba Alizadeh, Mazdak Zamani, Ali Rafiei Shahemabad, Jafar Shayan, andAhmad Azarnik. 2012. A survey on attacks in RFID networks. Open InternationalJournal of Informatics (OIJI) 1, 1 (2012), 15–24.

[6] Guillermo Angeris and Tarun Chitra. 2020. Improved Price Oracles: ConstantFunction Market Makers. arXiv preprint arXiv:2003.10001 (2020).

[7] Nicola Atzei, Massimo Bartoletti, and Tiziana Cimoli. 2017. A survey of attackson ethereum smart contracts (sok). In International conference on principles ofsecurity and trust. Springer, 164–186.

[8] Balancer. 2020. A protocol for programmable liquidity. URl:https://balancer.finance/ (2020).

[9] Burak Benligiray, Saša Milic, and Heikki Vänttinen. [n.d.]. Decentralized APIsfor Web 3.0. ([n. d.]).

[10] T Bernani. 2016. Oraclize. URl: https://www.oraclize.it/ (2016).[11] Alex Biryukov, Dmitry Khovratovich, and Sergei Tikhomirov. 2017. Findel: Se-

cure derivative contracts for Ethereum. In International Conference on FinancialCryptography and Data Security. Springer, 453–467.

[12] George Bissias, Brian Neil Levine, A Pinar Ozisik, and Gavin Andresen. 2016.An analysis of attacks on blockchain consensus. arXiv preprint arXiv:1610.07985(2016).

[13] Chainlink Blog. 2021. 16 Ways to Create Dynamic Non-Fungible Tokens (NFT)Using Chainlink Oracles. URl: https://blog.chain.link/create-dynamic-nfts-using-chainlink-oracles/ (2021).

[14] Dan Boneh, Joseph Bonneau, Benedikt Bünz, and Ben Fisch. 2018. VerifiableDelay Functions. In Advances in Cryptology – CRYPTO 2018 (Lecture Notes inComputer Science, Vol. 10991). Springer, 757–788. https://doi.org/10.1007/978-3-319-96884-1_25

[15] Gilles Brassard, David Chaum, and Claude Crépeau. 1988. Minimum disclosureproofs of knowledge. Journal of computer and system sciences 37, 2 (1988),156–189.

[16] Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, SrdjanCapkun, and Ahmad-Reza Sadeghi. 2017. Software grand exposure:{SGX} cacheattacks are practical. In 11th {USENIX} Workshop on Offensive Technologies({WOOT} 17).

[17] Lorenz Breidenbach, Christian Cachin, Alex Coventry, Ari Juels, andAndrew Miller. 2021. Chainlink Off-chain Reporting Protocol. URl:https://blog.chain.link/off-chain-reporting-live-on-mainnet/ (2021).

[18] Benedikt Bünz, StevenGoldfeder, and Joseph Bonneau. 2017. Proofs-of-delay andrandomness beacons in ethereum. IEEE Security and Privacy on the blockchain(IEEE S&B) (2017).

[19] Vitalik Buterin. 2021. UNI should become an oracle token. URl:https://gov.uniswap.org/t/uni-should-become-an-oracle-token/11988 (2021).

[20] Miguel Castro and Barbara Liskov. 2002. Practical Byzantine fault tolerance andproactive recovery. ACM Transactions on Computer Systems (TOCS) 20, 4 (2002),398–461.

[21] Chainlink. 2020. Chainlink VRF: On-chain Verifiable Randomness. URl:https://blog.chain.link/verifiable-random-functions-vrf-random-number-generation-rng-feature/ (2020).

[22] ChainLink. 2020. What is the Blockchain Oracle Problem. URl:https://blog.chain.link/what-is-the-blockchain-oracle-problem/ (2020).

[23] Krishnendu Chatterjee, Amir Kafshdar Goharshady, and Arash Pourdamghani.2019. Probabilistic smart contracts: Secure randomness on the blockchain. In2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).IEEE, 403–412.

[24] Huashan Chen, Marcus Pendleton, Laurent Njilla, and Shouhuai Xu. 2020. ASurvey on Ethereum Systems Security: Vulnerabilities, Attacks, and Defenses.ACM Computing Surveys (CSUR) 53, 3 (2020), 1–43.

[25] Jeremy Clark, Joseph Bonneau, Edward W Felten, Joshua A Kroll, AndrewMiller, and Arvind Narayanan. 2014. On decentralizing prediction markets andorder books. InWorkshop on the Economics of Information Security, State College,Pennsylvania.

[26] Jeremy Clark, Didem Demirag, and Seyedehmahsa Moosavi. 2020. SoK: Demys-tifying Stablecoins. CACM (2020).

[27] Jeremy Clark and Paul C Van Oorschot. 2013. SoK: SSL and HTTPS: Revisitingpast challenges and evaluating certificate trust model enhancements. In 2013IEEE Symposium on Security and Privacy. IEEE, 511–525.

[28] Coinmonk. 2019. Vulnerabilities in Maker: Oracle-GovernanceAttacks, Attack DAOs, and (De)Centralization. URl:https://medium.com/coinmonks/vulnerabilities-in-maker-oracle-governance-attacks-attack-daos-and-de-centralization-d943685adc2f (2019).

[29] Compound. 2020. Open Price Feed. URl: https://compound.finance/docs/prices(2020).

[30] Consensys. 2020. Ethereum Smart Contract Best Practices.[31] Tellor Core. 2019. Staking, Disputes, and Voting. URl:

https://medium.com/tellor/staking-disputes-and-voting-ad09c66eb7bc/ (2019).[32] Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. IACR Cryptol.

ePrint Arch. 2016, 86 (2016), 1–118.[33] The Block Crypto. 2020. Chainlink nodes were targeted in

an attack last weekend that cost them at least 700 ETH. URl:https://www.theblockcrypto.com/post/76986/chainlink-nodes-attack-eth (2020).

[34] The Block Crypto. 2021. DeFi attacks: the general picture. URl:https://www.theblockcrypto.com/research/105472/defi-attacks-the-general-picture (2021).

[35] Philip Daian, Steven Goldfeder, Tyler Kell, Yunqi Li, Xueyuan Zhao, Iddo Ben-tov, Lorenz Breidenbach, and Ari Juels. 2020. Flash boys 2.0: Frontrunning indecentralized exchanges, miner extractable value, and consensus instability. In2020 IEEE Symposium on Security and Privacy (SP). IEEE, 910–927.

Page 14: SoK: Oracles from the Ground Truth to Market Manipulation

AFT ’21, September 26–28, 2021, Arlington, VA, USA Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, and Jeremy Clark

[36] Philip Daian, Tyler Kell, Ian Miers, and Ari Juels. 2018. On-Chain Vote Buyingand the Rise of Dark DAOs. URl: https://hackingdistributed.com/2018/07/02/on-chain-vote-buying/ (2018).

[37] Decrypt. 2020. Oracle Exploit Sees $89 Million Liquidated on Com-pound. URl: https://decrypt.co/49657/oracle-exploit-sees-100-million-liquidated-on-compound (2020).

[38] DIA. 2020. DIA — Transparent Oracles for a Decentralised Financial Ecosystem.[39] DODO. 2020. The DODO Advantage. URl:

https://dodoex.github.io/docs/docs/advantages (2020).[40] John R Douceur. 2002. The sybil attack. In International workshop on peer-to-peer

systems. Springer, 251–260.[41] Steve Ellis, Ari Juels, and Sergey Nazarov. 2017. Chainlink: A decentralized

oracle network. Retrieved March 11 (2017), 2018.[42] Shayan Eskandari. 2020. EtherRoll Bug Thread. URl:

https://twitter.com/sbetamc/status/1263220679937265671 (2020).[43] S Eskandari, D Barrera, E Stobert, and J Clark. 2015. A first look at the usability

of Bitcoin key management. In USEC.[44] Shayan Eskandari, Jeremy Clark, Vignesh Sundaresan, and Moe Adham. 2017.

On the feasibility of decentralized derivatives markets. In International Confer-ence on Financial Cryptography and Data Security. Springer, 553–567.

[45] Shayan Eskandari, Seyedehmahsa Moosavi, and Jeremy Clark. 2019. Sok: Trans-parent dishonesty: front-running attacks on blockchain. In International Confer-ence on Financial Cryptography and Data Security. Springer, 170–189.

[46] Ethereum. 2015. Official Go implementation of the Ethereum protocol.[47] Etheroll. 2020. Ethereum casino. URl: https://etheroll.com/ (2020).[48] Aykut Firat, Stuart Madnick, and Benjamin Grosof. 2002. Knowledge integration

to overcome ontological heterogeneity: Challenges from financial informationsystems. (2002).

[49] Flashbots. 2021. Flashbots. URl: https://github.com/flashbots/pm (2021).[50] Bryan Ford and Rainer Böhme. 2019. Rationality is Self-Defeating in Permis-

sionless Systems. arXiv:1910.08820 [cs.CR][51] MakerDAO Forum. 2020. Scientific Governance and Risk #120. URl:

https://forum.makerdao.com/t/agenda-discussion-scientific-governance-and-risk-120-thursday-december-3-16-00-utc/5357 (2020).

[52] Sharon Goldberg, Dimitrios Papadopoulos, and Jan Včelák. 2017. Verifiable Ran-dom Functions (VRFs). Internet-Draft draft-goldbe-vrf-01. Internet EngineeringTask Force. https://datatracker.ietf.org/doc/html/draft-goldbe-vrf-01 Work inProgress.

[53] Vincent Gramoli. 2020. From blockchain consensus back to Byzantine consensus.Future Generation Computer Systems 107 (2020), 760–769.

[54] Wanyun Catherine Gu, Anika Raghuvanshi, and Dan Boneh. 2020. EmpiricalMeasurements on Pricing Oracles and Decentralized Governance for Stablecoins.Available at SSRN 3611231 (2020).

[55] Farshad Hakimpour and Andreas Geppert. 2001. Resolving semantic hetero-geneity in schema integration. In Proceedings of the international conference onFormal Ontology in Information Systems-Volume 2001. 297–308.

[56] Robin Hanson. 2003. Combinatorial information market design. InformationSystems Frontiers 5, 1 (2003), 107–119.

[57] Ethan Heilman, Alison Kendler, Aviv Zohar, and Sharon Goldberg. 2015. Eclipseattacks on bitcoin’s peer-to-peer network. In 24th {USENIX} Security Symposium({USENIX} Security 15). 129–144.

[58] Sebastian Henningsen, Daniel Teunis, Martin Florian, and Björn Scheuermann.2019. Eclipsing ethereum peers with false friends. In 2019 IEEE European Sym-posium on Security and Privacy Workshops (EuroS&PW). IEEE, 300–309.

[59] C. Herley and P. C. Van Oorschot. 2017. SoK: Science, Security and the ElusiveGoal of Security as a Scientific Pursuit. In 2017 IEEE Symposium on Security andPrivacy (SP). 99–120. https://doi.org/10.1109/SP.2017.38

[60] Eyal Hertzog, Guy Benartzi, and Galiar Benartzi. 2017. Bancor protocol: contin-uous liquidity for cryptographic tokens through their smart contracts. Whitepaper (2017).

[61] Huobi DeFiLabs Insight. 2020. Price Oracle - A Must Have Infrastructure. URl:https://www.huobidefilabs.io/en/Insight/1313163603254243330/ (2020).

[62] Harry Kalodner, Steven Goldfeder, Xiaoqi Chen, S. Matthew Weinberg, andEdward W. Felten. 2018. Arbitrum: Scalable, private smart contracts. In USENIXSecurity.

[63] Dmitry Khovratovich and Jason Law. 2017. Sovrin: digital identities in theblockchain era. Technical Report. Sovrin.

[64] Joshua A Kroll, Ian C Davey, and Edward W Felten. 2013. The economics ofBitcoin mining, or Bitcoin in the presence of adversaries. In Proceedings of WEIS,Vol. 2013. 11.

[65] Hart Lambur, Allison Lu, and Regina Cai. 2019. Uma data verificationmechanism:Adding economic guarantees to blockchain oracles. Risk Labs, Inc., Tech. Rep.,Jul (2019).

[66] Langner. 2020. The Stuxnet Story. URl: https://www.langner.com/2020/07/the-stuxnet-story/ (2020).

[67] Ralph Langner. 2011. Stuxnet: Dissecting a cyberwarfare weapon. IEEE Security& Privacy 9, 3 (2011), 49–51.

[68] Clement Lesaege, Federico Ast, and William George. 2019. KlerosThe Justice Protocol. URl: https://kleros.io/static/whitepaper_en-8bd3a0480b45c39899787e17049ded26.pdf (2019).

[69] Iuon-Chang Lin and Tzu-Chun Liao. 2017. A survey of blockchain securityissues and challenges. IJ Network Security 19, 5 (2017), 653–659.

[70] Bowen Liu and Pawel Szalachowski. 2020. A First Look into DeFi Oracles. arXivpreprint arXiv:2005.04377 (2020).

[71] Sin Kuang Lo, Xiwei Xu, Mark Staples, and Lina Yao. 2020. Reliability analysisfor blockchain oracles. Computers & Electrical Engineering 83 (2020), 106582.

[72] LongForWisdom. 2020. Flash Loans and securing the Maker Protocol.URl: https://forum.makerdao.com/t/urgent-flash-loans-and-securing-the-maker-protocol/4901 (2020).

[73] Stuart Madnick and Hongwei Zhu. 2006. Improving data quality througheffective use of data semantics. Data & Knowledge Engineering 59, 2 (2006),460–475.

[74] MakerDAO. 2019. Introducing Oracles V2 and DeFi Feeds.[75] Deepak Maram, Harjasleen Malvai, Fan Zhang, Nerla Jean-Louis, Alexander

Frolov, Tyler Kell, Tyrone Lobban, Christine Moy, Ari Juels, and Andrew Miller.2021. Candid: can-do decentralized identity with legacy compatibility, sybil-resistance, and accountability. In 2021 IEEE Symposium on Security and Privacy(SP). IEEE, 1348–1366.

[76] Silvio Micali, Michael Rabin, and Salil Vadhan. 1999. Verifiable random functions.In 40th annual symposium on foundations of computer science (cat. No. 99CB37039).IEEE, 120–130.

[77] Amani Moin, Kevin Sekniqi, and Emin Gun Sirer. 2020. SoK: A ClassificationFramework for Stablecoin Designs. In Financial Cryptography.

[78] Saikat Mondal, Kanishka P Wijewardena, Saranraj Karuppuswami, Nitya Kriti,Deepak Kumar, and Premjeet Chahal. 2019. Blockchain inspired RFID-basedinformation architecture for food supply chain. IEEE Internet of Things Journal6, 3 (2019), 5803–5813.

[79] Andreas Müller and Marcel Grandi. 2000. Weather derivatives: a risk man-agement tool for weather-sensitive industries. The Geneva Papers on Risk andInsurance. Issues and Practice 25, 2 (2000), 273–287.

[80] Nexus Mutual. 2020. Responsible Vulnerability Disclosure. URl:https://medium.com/nexus-mutual/responsible-vulnerability-disclosure-ece3fe3bcefa (2020).

[81] NEST. 2020. NEST Protocol: A Distributed Price Oracle Network. URl:https://nestprotocol.org/doc/ennestwhitepaper.pdf (2020).

[82] Till Neudecker, Philipp Andelfinger, and Hannes Hartenstein. 2015. A simulationmodel for analysis of attacks on the bitcoin peer-to-peer network. In 2015IFIP/IEEE International Symposium on Integrated Network Management (IM).IEEE, 1327–1332.

[83] Till Neudecker and Hannes Hartenstein. 2019. Short paper: An empirical anal-ysis of blockchain forks in bitcoin. In International Conference on FinancialCryptography and Data Security. Springer, 84–92.

[84] ChainLink Oracle. 2021. Explicit Staking in Chainlink 2.0: An Overview. URl:https://blog.chain.link/explicit-staking-in-chainlink-2-0/ (2021).

[85] PeckShield. 2020. bZx Hack II Full Disclosure (With Detailed Profit Analysis).[86] Ingolf G A Pernice, Sebastian Henningsen, Roman Proskalovich, Martin Florian,

and Hermann Elendner. 2019. Monetary Stabilization in Cryptocurrencies:Design Approaches and Open Questions. In CVCBT.

[87] Jack Peterson and Joseph Krug. 2015. Augur: a decentralized, open-sourceplatform for prediction markets. arXiv preprint arXiv: 1501.01042 (2015).

[88] PoolTogether. 2020. PoolTogether is a no-loss, audited savings game poweredby blockchain technology. URl: https://www.pooltogether.com/ (2020).

[89] Band Protocol. 2020. BandChain Whitepaper. URl:https://docs.bandchain.org/whitepaper/system-overview.html (2020).

[90] Kaihua Qin, Liyi Zhou, Benjamin Livshits, and Arthur Gervais. 2020. Attack-ing the DeFi Ecosystem with Flash Loans for Fun and Profit. arXiv preprintarXiv:2003.03810 (2020).

[91] Haseeb Qureshi. 2020. Introducing CoFiX: a Next-Generation AMM.URl: https://medium.com/dragonfly-research/introducing-cofix-a-next-generation-amm-199aea686b6b (2020).

[92] Scott Ruoti, Ben Kaiser, Arkady Yerukhimovich, Jeremy Clark, and RobertCunningham. 2020. Blockchain Technology: What is It Good For? Commun.ACM 63, 1 (2020), 46–53.

[93] Mehdi Salehi, Jeremy Clark, and Mohammad Mannan. 2021. Red-Black Coins:Dai without liquidations. In Financial Cryptography: DeFi.

[94] Suzan Sallam and Babak D Beheshti. 2018. A survey on lightweight crypto-graphic algorithms. In TENCON 2018-2018 IEEE Region 10 Conference. IEEE,1784–1789.

[95] Andrey Shevchenko. 2020. Uniswap and automated market makers, ex-plained. URl: https://cointelegraph.com/explained/uniswap-and-automated-market-makers-explained (2020).

[96] Krushang Sonar and Hardik Upadhyay. 2014. A survey: DDOS attack on Internetof Things. International Journal of Engineering Research and Development 10, 11(2014), 58–63.

Page 15: SoK: Oracles from the Ground Truth to Market Manipulation

SoK: Oracles from the Ground Truth to Market Manipulation AFT ’21, September 26–28, 2021, Arlington, VA, USA

[97] Synthetix. 2019. Synthetix Response to Oracle Incident. URl:https://blog.synthetix.io/response-to-oracle-incident/ (2019).

[98] Synthetix. 2020. A derivatives liquidity protocol. URl: https://www.synthetix.io/(2020).

[99] Paul Sztorc. 2015. Truthcoin. peer-to-peer oracle system and prediction market-place. (2015).

[100] Tellor. [n.d.]. A decentralized Oracle. https://tellor.io/storage/TellorWhitepaper.pdf

[101] Tellor-io. 2020. EIP-2362: Pull Oracle Interface. URl: https://github.com/tellor-io/EIP-2362 (2020).

[102] Feng Tian. 2016. An agri-food supply chain traceability system for China basedon RFID & blockchain technology. In 2016 13th international conference on servicesystems and service management (ICSSSM). IEEE, 1–6.

[103] TLSnotary. 2014. A mechanism for independently audited https sessions. URl:https://tlsnotary.org/TLSNotary.pdf (2014).

[104] UMA. 2020. UMA Data Verification Mechanism: AddingEconomic Guarantees to Blockchain Oracles. URl:https://github.com/UMAprotocol/whitepaper/blob/master/UMA-DVM-oracle-whitepaper.pdf (2020).

[105] Uniswap. 2020. Uniswap Oracle. URl: https://uniswap.org/docs/v2/core-concepts/oracles (2020).

[106] Marie Vasek, Micah Thornton, and Tyler Moore. 2014. Empirical analysis ofdenial-of-service attacks in the Bitcoin ecosystem. In International conferenceon financial cryptography and data security. Springer, 57–71.

[107] Yongge Wang. 2020. Automated Market Makers for Decentralized Finance(DeFi). arXiv preprint arXiv:2009.01676 (2020).

[108] whiterabbit. 2020. Black Thursday for MakerDAO: $8.32 million was liquidatedfor 0 DAI.

[109] Austin K Williams and Jack Peterson. 2019. Decentralized Common KnowledgeOracles. arXiv preprint arXiv:1912.01215 (2019).

[110] Anthony D Wood and John A Stankovic. 2002. Denial of service in sensornetworks. computer 35, 10 (2002), 54–62.

[111] Gavin Wood et al. 2014. Ethereum: A secure decentralised generalised transac-tion ledger. Ethereum project yellow paper 151, 2014 (2014), 1–32.

[112] Michael F. Worboys and S. Misbah Deen. 1991. Semantic heterogeneity indistributed geographic databases. ACM Sigmod Record 20, 4 (1991), 30–34.

[113] Zaugust. 2020. CoFiX: A Computable Trading System. URl:https://cofix.io/doc/Trading_Compensation_CoFiX.pdf (2020).

[114] Pamela J Zelbst, KennethW Green, Victor E Sower, and Philip L Bond. 2019. Theimpact of RFID, IIoT, and Blockchain technologies on supply chain transparency.Journal of Manufacturing Technology Management (2019).

[115] Fan Zhang, Ethan Cecchetti, Kyle Croman, Ari Juels, and Elaine Shi. 2016. Towncrier: An authenticated data feed for smart contracts. In Proceedings of the2016 aCM sIGSAC conference on computer and communications security. ACM,270–282.

[116] Fan Zhang, Sai Krishna Deepak Maram, Harjasleen Malvai, Steven Goldfeder,and Ari Juels. 2019. DECO: Liberating Web Data Using Decentralized Oraclesfor TLS. arXiv preprint arXiv:1909.00938 (2019).

[117] Shijie Zhang and Jong-Hyouk Lee. 2019. Double-spending with a Sybil attack inthe Bitcoin decentralized network. IEEE Transactions on Industrial Informatics15, 10 (2019), 5715–5722.