DIA: The Data Hub Powering the Web3 Ecosystem and Beyond

12 min readOct 13, 2022

According to Coinmarketcap, 21,247 cryptocurrencies, 240 decentralized exchanges, and 526 centralized spot exchanges exist, with a market cap of over $945B as of October 2022. DefiLlama reports a total number of 149 different chains hosting several DeFi protocols with over $54B locked on its website.

Rank of blockchains based on their TVL — source: DefiLlama.

As users engage with various blockchain protocols, thousands of trade-level data points are generated every second from various on-chain and off-chain sources (CEXes, DEXes, DeFi, NFT, Metaverse, etc.). As the blockchains develop and age, additional platforms and protocols are introduced, increasing the number of users in each situation.

It becomes crucial to aggregate these blockchain data from various chains and present them in a simple, thorough, and verifiable way. However, this effort is arduous because blockchain data is randomly stored, making it challenging to curate, organize, and validate them. DIA stepped in at this point.

This article will provide an overview of DIA, the sources and methods used to obtain their data, their data architecture, their data categories, and everything related to their DAO.

What is DIA?

DIA is an open-source financial data and smart contract ecosystem. It is a Web3 platform for cross-chain, end-to-end, open-source data and oracle. The DIA platform allows crowdsourcing and sharing of transparent and validated data feed for asset pricing, metaverse data, and more by bringing together data providers, users, and DAO community members. Developers have access to DIA data across all pertinent layer one and layer two networks, which are simultaneously sourced at the trade level from various on-chain and off-chain sources.

The whitepaper states that DIA serves as a reliable and verifiable bridge between off-chain data from multiple sources and on-chain smart contracts, which may be used to develop a variety of financial dapps.

How does DIA make this possible?

DIA sources trade data directly from a wide range of centralized and decentralized exchanges simultaneously, processing that data and then delivering it, thus covering the entire data value chain from sourcing to delivery. To best satisfy the needs of each use case, the granular, aggregated trade data is then generated with customized price determination algorithms and cleaned based on a completely transparent process to eliminate outliers.

DIA is designed as a hybrid system with on-chain components delivering data to financial smart contracts and off-chain components handling massive data storage and processing volumes. DIA uses its technically flexible design to provide the web3 space with the broadest coverage of blockchains. Any publicly available data feed can be accessed regardless of its trade volume or listing on exchanges. Therefore, there is no need to rely on data providers from outside sources.

By executing a transaction with the external data as a payload, DIA uses Oracles (smart contract interfaces that transfer data from an external source into a smart contract) to obtain external data into a smart contract. This means that the Oracles’ external data is transmitted to smart contracts by being wrapped in a transaction in response to a user request. A smart contract can use that information to perform computations and activities that rely on it, such as computing interest based on published reference interest rates from a central bank.

DIA’s Data Architecture

All DIA services are operated on a Kubernetes cluster, which enables containers to run across many machines and environments (virtual, physical, cloud-based, and on-premises) and are hosted on IBM cloud to guarantee these services operate without interruption. The DIA backend is divided into three main building blocks: Data collection, storage, and publication.

Data Collection: The data collection mechanisms fetch data from various sources such as crypto exchanges and NFT marketplaces.
Data Storage: After cleansing and further processing, the data is stored in a highly flexible database layer that can handle various types of data, ranging from static relational data (PostgreSQL) to high-frequency time-series data (InfluxDB). DIA uses a Redis caching layer ( a system that works by temporarily storing information in a key-value data structure) for fast access and high-performance data retrieval.
Data Publication: The processed data is distributed off-chain using various REST, WebSocket, and graphQL API endpoints. Furthermore, selected data feeds are published on-chain through our oracle system.

DIA architecture overview. Source: DIA docs

DIA Data Categories

DIA utilizes the broadest possible sources and ensures maximum coverage of asset price data, providing a crowd-sourcing approach to data sourcing. DIA currently runs oracles on 25+ blockchains, covering a wide range of traditional equities and cryptocurrency assets in its price feeds. Here is an overview of DIA’s data offerings:

Cryptocurrency price data: 2000+ digital assets
Traditional finance price data: 20,000+ equities
NFT floor price data: 18,000+ NFT collections
Randomness: Multichain, distributed, and verifiable randomness
Other data: FX rates, farming pools, lending rates, etc.

DIA offers development Oracles for each chain it supports so that developers can test its oracle services. These Oracle Development contracts are smart contracts that provide a specific range of asset prices on several blockchains for real-time testing on the corresponding Mainnet and Testnet.

Price Feeds Oracle

Most oracle providers depend on third-party centralized data providers to deploy price oracles. They cannot provide price feed oracles for long-tail assets. Contrarily, DIA uses a distinctive, open, end-to-end method of developing price oracles that will produce price references for any digital asset, regardless of the terms and circumstances surrounding its listing. DIA is launching a data explorer that allows anyone to browse and look for specific feeds and sources and learn about the context of each data point.

For a wide range of DeFi, TradFi, and Metaverse use cases, including lending, staking, and stablecoin protocols, DIA deployed a specialized and custom smart contract oracle to provide the best price feeds for the assets.

xStream, which is DIA’s self-service tool, allows users to create custom feeds and customize any API endpoint without interacting with developers. It’s a core feature of the price feeds Oracle that allows users to specify what they require, including source markets and methodology parameters like time range, window size, update triggers, and frequency.

Multi-Randomness Oracle

Randomness is the property of lacking any sensible predictability. Random number generation has become essential in decentralized applications ranging from Web3 games to NFT collections or prediction markets. For instance, a random number can grant a game with a lottery mechanism to reward players fairly or provide unpredictable attributes to a collection of NFT artworks.

Two key factors make it challenging to provide a source of randomness on-chain. The deterministic nature of any EVM system makes it very challenging to produce random events on-chain, to name one issue. Second, bad actors may influence the randomness from the source, causing irreparable harm to protocols and users if a central authority does the creation of random numbers.

DIA’s Multi-Randomness provides dApps with an auditable, tamper-proof, distributed, and publicly verifiable random number via oracle smart contract across multiple L1s and L2s chains integrated by DIA (See the full list of the chains here).

DIA’s Multi-Randomness leverages Drand’s distributed and decentralized beacon generated by a network of participants to offer distributed and verifiable randomness on-chain. Each node in Drand’s network generates a partial signature, which it broadcasts to the rest of the nodes. Once any node has enough, i.e., a threshold number of signatures, it computes the new randomness beacon, which is the hash of the signature aggregate. To learn more about how DIA’s multi-randomness functions, see this article.

Games, lotteries (which require trustless and tamper-proof on-chain random numbers to execute lotteries), NFT platforms (which require the randomized assignment of traits and attributes to NFTs), and prediction markets can all benefit from DIA’s Multi-Randomness.

NFT data Oracle

DIA offers Oracles for NFT Data, which enable historical NFT sales data to be extracted from the NFT marketplace’s smart contract on its particular network. These oracles contain data on NFT price quotations derived directly from on-chain activities. DIA offers a range of floor price data streams for over 18,000 NFT collections, allowing Web3 developers to easily integrate NFT data into their protocols.

DIA deploys dedicated NFT oracles on demand, each with a unique set of collections and data points. Users can view a collection’s current floor price, current circulating supply, and the timestamp of the last update in the presently deployed sample version. Please see the docs for further information on accessing the related Oracle smart contract on the Ropsten testnet or Goerli testnet.

For NFT floor pricing determination, DIA employs an outlier detection and cleansing process. The Interquartile Range filter in DIA checks all trades in a trades block and arranges them by their recorded price. It then divides them into four pricing blocks known as Quartiles.

DIA’s Data Sources

The data retrieval is processed through a series of asset collectors, capable of fetching all financial data from different sources. These sources can be classified into the following groups:

Cryptocurrency Data.

These are the data relating to the web3 ecosystems and the blockchain platforms. It includes but is not limited to data from CEXes, DEXes, NFT marketplaces, and cross-chain bridges. DIA also sources, aggregates, and provides financial data from traditional finance companies (i.e., non-blockchain native companies), which we will look at in detail subsequently.

For all cryptocurrency exchanges, the raw trade data is the foundation for any further quantities we may want to generate, such as prices and circulating supply figures. DIA collects and preserves the trading data with the most excellent precision and as close to the data source as is practical. Let’s examine these sources more closely.

Centralized Exchange

All trading data are retrieved through Rest APIs, or WebSocket APIs created and maintained by the appropriate trading platform. All of the data sources from centralized exchanges are listed comprehensively in the docs. The trading data obtained in this format are continuously received as it is generated rather than in regular intervals. Additionally, DIA uses scrapers to get information from centralized exchanges for cryptocurrencies and other assets (such as stocks, futures, and rare earth metals), such as Binance, FTX, or Kraken.

Decentralized Exchange

DIA obtains trading data from each blockchain directly, as contrasted to centralized exchanges, by subscribing to the matching pool events, such as swap events corresponding to the LP pools on decentralized exchanges. Examples of sources are Uniswap, Curve Finance, and PancakeSwap. The complete list is available here. For a few exchanges, DIA employs the Rest API to retrieve the trading data to maintain a quick and consistent data flow. However, they are currently integrating data retrieval directly over the blockchain to eliminate unnecessary dependencies.

Cross-chain Bridges

Cross-Chain bridges make it possible to exchange cryptocurrency assets from one blockchain for their equivalent value on another — for example, swap USDT on Polygon for USDT on Optimism. Cross-Chain bridges can be viewed as a hybrid between centralized and decentralized systems because portions of the complete bridging transactions are visible to the public on each blockchain. But the actual bridging is carried out by a central structure.

The swaps are collected by DIA, which allows them to port prices and map assets across blockchains. Examples of cross-chain bridge data sources integrated by DIA are; Multichain, Anyswap, and Hop.

NFT Data Collection

DIA collects live trading data on the blockchain for all NFT transactions happening on the blockchains that have been integrated. Therefore, the data (such as trades, bids, and offers) is sourced directly from the origin, allowing users to have precise market data. Examples of DIA’s NFT data sources are Opensea, Looksrare, and TofuNFT.

Establishing the NFT’s price during the data collection is crucial since it must be the right price. DIA employs the following approaches for pricing NFT assets:

Floor price: DIA uses the lowest sale price recorded on the blockchain during the determined time window
Moving average of floor price: returns the moving average price of the collections’ floor price for the selected moving average period based on a specified floor window. DIA applies an interquartile range outlier detection filter for this category which allows for determining the more realistic market prices.

Third-Party Data Aggregation

High-relevance data sources that pre-aggregate market data use DIA as a communication layer.

The ecosystem of high-end data sources accessed through DIA constantly expands and upholds the highest quality standards. The 3rd party platforms DIA utilizes are categorized into two:

Digital Finance asset quotations providers: DIA collects quotations from central providers such as Coingecko and Coinmarketcap. In addition to these, DIA also collects daily comparison data against various foreign currencies.
Traditional Finance asset quotation providers: DIA compiles quotes from central sources, including the daily exchange rates provided by the European Central Bank (ECB) against a range of foreign currencies. Accurate information on exchange rates between the major fiat currencies of the globe is provided by DIA’s ECB (European Central Bank) foreign exchange data scraper. Its data source is the data from this table: We use the fx data to generate and normalize our asset prices collected in pairs with different fiat currencies.

DIA DAO

One of DIA’s main goals is to decentralize the organization and empower people with various skills to self-organize and provide value to the project (i.e., a DAO). As of right now, DIA has a formal framework that enables the DIA community to support the expansion and improvement of DIA while also earning rewards.

The DAO has already taken part in some significant proposals and votes up to this point, including using treasury funds to promote savings programs and distributing an airdrop to DIA holders impacted by the 2020 KuCoin attack.

The DIA DAO’s main objective is to generate value for all the organization’s stakeholders, including token holders, data consumers, partners, and others. However, this objective is accomplished by incorporating the DAO in data sourcing and validation, creating a new feature, backend infrastructure, and Community ownership and governance, which is the core of the DIA’s engine. Additionally, DIA wants to use the DAO to leverage straightforward marketing and business development tasks into exposure, awareness, education, and interaction.

DIA DAO currently uses Dework and Crew3 to host several bounties and quests for its DAO contributors with rewards and several associated benefits. Some of the works and bounties covered by DIA DAO are related to the following fields: — Product development, Business development, Design & video animation, Copywriting & translation, Operations, Growth, and Human resources. All the activities of DIA DAO contributors are organized on Discord.

DIA DAO Governance

The DIA DAO governance is carried out in stages of different processes. All governance-related discussions, proposals, and requests are available and deployable in DIA DAO Forum.

The first stage of the governance process is the discussion in which any DIA DAO community member can open a new topic and contribute to a discussion. The topic will advance to the proposal stage in the second round if there is sufficient community interaction and it is intended to benefit both the DIA platform and the DIA DAO. The author must formalize the proposal in such a way as to satisfy the requirements for submission.

If the proposal satisfies all the criteria. A DIA Forum moderator will formalize it, assign it a proposal number, make a poll, and publish it under the #proposals channel (and classify it accordingly) for community members to validate. This phase is called the review process.

In this fourth phase — the validation phase, a poll is conducted (which usually lasts for seven days) in the forum to pre-validate the proposal. Forum members votes and decides if the proposal should be voted on or not on DIA DAO’s Snapshot page. If successful, the proposal will be moved onto the 5th phase — the voting stage, where it is made available only to holders of at least 1 DIA token to vote.

The proposal would be stored on-chain and moved into the sixth implementation phase if the previous stage’s vote was successful. The DIA core team will be responsible for launching the proposal at this point. which is completed following the previously stated strategy, resources, and schedule. defined in the proposal. When the proposal is successfully implemented, it will be communicated and announced in the #updates channel by the forum admins.

Closing Thoughts

The DIA platform has garnered tremendous experience since its launch and has leveraged its expertise to build a data hub that will power the entire web3 space and beyond. It has also built a vibrant community member as its DAO members, contributing to its growth.