Web3 Data Retrieval Challenges Explained

The article discusses the challenges Web3 faces in data retrieval, highlighting the gap between transaction speed and data processing capabilities.

A few years ago, the industry debated how to increase blockchain throughput. Today, many networks can handle tens of thousands of transactions, with some claiming to reach hundreds of thousands. However, it turns out that writing data to the blockchain is only half the battle; the data also needs to be found, indexed, verified, and delivered to applications.

This has led to situations where the speed of data generation sometimes exceeds the infrastructure's ability to process it. ForkLog explores how blockchain is evolving in response.

The Faster, The Longer

About a decade ago, blockchain development was described through the so-called scalability trilemma. According to this concept, networks must find a compromise between security, decentralization, and performance. However, by 2026, it has become clear that even if the throughput issue is partially resolved, a new challenge arises.

Blockchains themselves lack user interfaces; various applications take on this role. These applications, in turn, must continuously receive data:

address balances;
transaction history;
smart contract states;
events and logs;
market analytics;
risk management data;
inter-network messages.

The faster a network operates, the more data it needs to process.

There is a common misconception among users: if information is recorded on the blockchain, it should be easy to access. In reality, the opposite is true. Reading "raw" data directly from the blockchain in real-time is a slow, costly, and technically complex process. In the Web3 ecosystem, an intermediary infrastructure layer commonly connects wallets with dapp applications.

For instance, a wallet application, to display a user's balance in milliseconds, queries RPC providers, indexers, analytics platforms, cache servers, specialized databases, etc.

The process works as follows:

Data collection: Specialized programs continuously "read" the blockchain as new blocks appear.
Indexing (structuring): They parse this data and organize it into traditional, very fast databases (e.g., PostgreSQL or ClickHouse). Here, it is structured conveniently: "Address — List of all its tokens."
Instant response: The wallet receives a pre-filtered response from the cache in milliseconds.

In fact, most popular Web3 applications operate through an additional layer of information processing. Imagine if the blockchain processed 50,000 transactions per second, and millions of wallets simultaneously send RPC requests to refresh their displays. Provider servers struggle under such load. Reading, indexing, and sorting data for users is a complex computational task. Indexers and data access services often lag behind the current state of the network by several blocks because processing, structuring, and delivering data takes extra time. This isn't merely due to "outdated infrastructure," although that is certainly a factor; it stems from a fundamental conflict between Web2 and Web3 architectures.

Users and applications interact with the blockchain as actively and frequently as they do with the traditional internet, which offers instant responses. When scrolling through a social media feed, an application makes thousands of requests per second to the server to update likes, comments, and images. Trading bots in Web2 can poll exchange servers millions of times per minute. Google or Amazon servers can easily handle this because they are centralized, with data stored, so to speak, in one massive database that can be instantly copied to thousands of mirror servers worldwide.

Blockchains are different; they are not equipped for such hardware demands. Until recently, the main speed impediment was mathematics and cryptography. It was necessary to get thousands of computers worldwide to quickly reach consensus on the validity of a transaction. Developers addressed this issue by "teaching" machines to perform parallel execution and separate consensus from execution. For example, Solana, Monad, and Aptos support parallel execution of independent transactions, unlike Ethereum's classic sequential model. Monad particularly emphasizes the separation of transaction ordering consensus from their subsequent execution, while Solana and Aptos implement parallelism through runtime architecture and state conflict management.

This allows for the approval of tens of thousands of transactions per second (TPS). However, therein lies the trap.

Historically, blockchains performed four functions simultaneously:

transaction execution;
consensus;
data storage;
data access provision.

Increased performance raises the load on all four functions simultaneously. The system generates data faster than the infrastructure can read it, creating what is known as the indexer gap.

In the documentation of Helius, one of the largest infrastructure providers in the Solana ecosystem, it is noted that the sequential structure of the blockchain is well-suited for ensuring data integrity and high throughput but makes historical queries slow and inefficient. Consequently, most companies are forced to build their own indexers and separate databases on top of the blockchain.

Analysts at ChainScore Labs identify the indexer gap as one of the key issues in the Solana ecosystem. They estimate that traditional indexing approaches struggle with the network's architecture, where high block frequency and parallel transaction execution create an enormous data flow.

This results in a situation where the network can confirm operations almost instantly, but applications require significantly more time to process the consequences of those operations.

Web3 Speeds Hit Basic Physics (and More)

More precisely, they hit the throughput limits of processors, hard drives, and network cables. It turns out that blockchain scalability does not equate to the scalability of the surrounding infrastructure. This needs to be addressed as quickly as possible.

Imagine a network with 100,000 TPS. It is necessary not only to record a transaction but also to:

save the state;
update indexes;
respond to wallet requests;
serve bots;
serve analysts;
serve search engines;
serve AI agents.

Thus, high throughput creates competition for resources among consensus, transaction execution, and infrastructure services above the network.

The parallel development of some technologies involved necessitates addressing this issue now. For humans, delays of seconds or even minutes may be tolerable. For AI agents, trading systems, and autonomous services, they are not. If a machine makes decisions based on on-chain data, outdated information means errors, missed opportunities, or direct financial losses.

Moreover, the Ethereum Foundation's updated documentation for 2026 states that archive nodes require between 3 to 12 TB of disk space, and initial synchronization can take up to a month, even on sufficiently powerful hardware. The limiting factors are SSD speed, memory capacity, and processor performance.

Furthermore, Geth developers separately describe the old model of archival storage, where the size of the Ethereum database could exceed 20 TB, and synchronization took months. This is why a new path-based state storage architecture had to be created.

So yes, hardware, processors, network throughput, and CPU are real physical constraints in the race for information growth. But they are not the only ones. Modern servers can already handle vast amounts of data. The question is: how much should thousands of independent network participants pay for this?

For instance, if full participation in the ecosystem requires tens of terabytes of SSD, hundreds of gigabytes of RAM, and expensive communication channels, the number of infrastructure operators will inevitably decrease. This leads to a new centralization.

Formally, data can be processed, but it cannot be done cheaply and in a decentralized manner simultaneously. The cost of processing information begins to rise faster than the cost of the transactions themselves.

Market Reactions

Participants in the race already understand that the winners will be those networks that can convert transactions into accessible information faster, cheaper, and more reliably. This year, the market unexpectedly shifted its focus to the transition to modular blockchains.

If the first generation of networks tried to perform all tasks simultaneously, the new generation divides responsibilities among specialized layers. Instead of one network, there are now separate layers:

execution layer;
settlement layer;
consensus layer;
data availability layer.

Developers compare this process to the evolution of data centers. Previously, one server performed all functions at once. Today, computation, data storage, and network services scale independently of each other.

One of the fastest-growing areas of the market has become DA networks. At first glance, the idea seems strange: why create a separate blockchain for temporarily storing data from another blockchain? But that is precisely what is happening. In a modular architecture, transaction execution and data storage can exist separately. A rollup publishes data in an external DA layer rather than the main network. This significantly reduces scaling costs and increases throughput.

A few years ago, RPC was considered a technical detail. Today, it is one of the most crucial elements of crypto infrastructure. In May 2026, Triton One, in collaboration with the Solana Foundation, released an updated announcement for RPC 2.0 — a new approach to building data reading architecture in the network.

The key idea is to separate access to the current state of the network from its history. To achieve this, two independent modules are introduced: one indexes account states in real-time, while the other optimizes historical data handling. Instead of fully scanning the blockchain, the system creates adaptive indexes tailored to specific application requests, reducing latency and processing costs.

Thus, Triton and Solana aim to eliminate several systemic limitations: the costly and inefficient monolithic architecture of RPC nodes, a narrow set of standard JSON-RPC requests, and developers' dependence on their own or proprietary solutions for data handling. In the new model, reading scales separately from consensus, and access to history becomes faster through the use of columnar storage and pre-sorted data.

The project relies on tools already implemented in the ecosystem — including data streaming from validators (Geyser, Yellowstone gRPC) and solutions for historical processing. The entire infrastructure is distributed as open-source, and its development is coordinated with the participation of the Solana Foundation.

As a result, Solana is effectively attempting to transition from a "universal" RPC to a modular and specialized data infrastructure, which is expected to lower barriers for developers and make working with blockchain data as convenient as traditional databases.

Does Modularity Solve the Problem?

If Solana succeeds in standardizing the reading layer, it could strengthen its position as a network with a developed application infrastructure rather than just high throughput. However, this also intensifies competition with independent RPC providers and infrastructure platforms, which will either have to adapt to the new standard or offer additional services on top of it.

The modular architecture eliminates some infrastructure limitations but shifts them to other layers of the system. The desire to reduce costs and simplify access to data, which is essential for DeFi, NFTs, wallets, analytics, and compliance tools, is understandable. However, it seems that the very nature of Web3 contains a cascading complexity effect: solving one problem inevitably creates new challenges.

The new scheme will undoubtedly require a more complex infrastructural superstructure: with indexers, storage, cache, separate pipelines, and new points of failure. Instead of a single simple RPC layer, the ecosystem may end up with several parallel implementations, incompatible optimizations, and even greater dependence on infrastructure providers. In such a case, a formally open architecture does not necessarily mean a truly open and user-friendly access model for all.

For now, we are at a stage where the market has shifted from competing over who can extract data from the network better to a race to see who can create products based on that data first. Who will pay for it and how much — we will likely find out soon.

Why Web3 Struggles with Data Retrieval

The Faster, The Longer

Web3 Speeds Hit Basic Physics (and More)

Market Reactions

Does Modularity Solve the Problem?