Bitcoin operates on an open ledger, allowing anyone with the right skills to trace any transaction from address to address. This has led to a whole industry where traders track whales and inflows to exchanges, blockchain forensic experts help recover stolen funds, and compliance teams filter out "dirty" coins.
In this step-by-step guide, we will explore how to manually analyze transactions, automate the process with tools, and understand why even the most advanced tracing provides probabilities rather than certainties.
We will leave basic definitions and market metrics aside, as there is separate material available in card format for that.
Part 1. Analyzing Transactions Manually
Step 1. Finding a Transaction by TXID
Each transaction in the blockchain has a unique identifier known as TXID, which is the transaction hash. This is a long string, typically 64 characters, generated by the network by running all transaction data—inputs, outputs, amounts, and signatures—through the SHA-256 algorithm.
Hash generation scheme. Source: Arkham.It is nearly impossible to forge such a hash: even the slightest alteration in the data results in a completely different string, meaning no two TXIDs are alike. Essentially, it serves as a "receipt number" that any node in the network can use to find and verify the transaction.
The method for obtaining the hash depends on what information you already have:
- If the transaction has already been completed, it can be found in the wallet or exchange history. There is usually a link next to the transaction that says "View in Explorer," which leads to the transaction page where the identifier is fully displayed;
- If you only have the sender's or recipient's address, you can enter it into the search bar of any blockchain explorer. This will open the address history with all transactions; you can identify the desired transaction by its amount and date, and find its hash on the transaction page.
Step 2. Analyzing the Transaction Structure: Inputs, Outputs, and Change
Unlike a bank account, Bitcoin does not store balance as a single number. The network operates on a UTXO (unspent transaction output) model: funds exist as separate "bills" of varying denominations, and the wallet only holds the keys to them. The available balance is the sum of all such outputs controlled by the owner.
You cannot spend a "bill" partially. When making a payment, the entire output is sent in the transaction, and in return, the network creates two new outputs: one for the recipient and another for change sent back to the sender at a new address.
For example, if Alice has an output of 5 BTC and sends Bob 0.01 BTC, the blockchain will show an input of 5 BTC, a payment of 0.01 BTC, and change of approximately 4.99 BTC (minus fees).
A transaction from Alice: the input of 5 BTC splits into a payment to Bob and change to a new address. Source: ForkLog.Any observer can see the entire operation through the explorer—"round" payments can easily be distinguished from "fractional" change. This characteristic is used to define change addresses in blockchain forensics.
Step 3. Checking Confirmations and the Mempool
A sent transaction does not enter the blockchain instantly. First, it goes into the mempool—a shared queue of operations waiting to be included in a block. At this stage, delays can occur: miners typically prioritize transactions with higher fees, so a transaction with a low fee may remain in the queue for a long time.
Once the transaction is included in a block, it receives its first confirmation. With each subsequent confirmation, the level of reliability increases, and the likelihood of reversal decreases. For small amounts, one or two confirmations are usually sufficient, while for larger payments, it is customary to wait for six.
You can conveniently track this in real-time using the hash: the explorer will show whether the transaction is stuck in the queue, which block it was included in, how many confirmations it has received, and what fee the sender paid. If the transaction is stuck for a long time, the same page will indicate the reason—most often, this is due to a low fee.
The Bitcoin mempool. Source: mempool.space.Step 4. Tracing the Path of Coins
Each input of a new operation refers to a specific output from a previous transaction by its hash, and it can have multiple inputs and outputs. Therefore, transactions do not form a single chain but rather a branching network where coins converge and diverge between addresses.
This network allows you to trace the movement of funds in both directions—forward to new addresses and backward, all the way to the coinbase transaction where they first appeared as a reward for mining a block.
In practice, an analyst follows the output addresses to see where the funds went next. They then repeat this process with the new addresses, step by step, until a complete chain is established.
This way, characteristic routes emerge on the blockchain: transfers to exchanges, splitting large amounts into smaller parts, or laundering stolen funds through several intermediary wallets.
For instance, in 2025, an early investor's address became active for the first time after 13 years of inactivity, transferring 909 BTC (about $85 million) to a new wallet. These coins were acquired back in 2012-2013 when Bitcoin's price was only a few dollars.
The transaction history of a "woken up" Bitcoin wallet. Source: Arkham.This transfer is visible in any explorer: simply open the address, follow the output chains, and see where the funds moved next.
Part 2. Streamlining Analysis
Manual analysis is effective when there are only one or two transactions. However, the ledger is updated every second, and thousands of transfers cannot be monitored by eye.
This is where automation comes in: programs can request data from the network, calculate metrics, and send notifications at the right moment. We will explore three levels of this automation—data access, analytics, and monitoring.
Step 5. Connecting to Data via API
The first level is programmatic access to the blockchain through node and explorer APIs. This interface allows a script to request the same information that we could previously see on the transaction page (but without human involvement and in any volume).
The mempool.space service offers two options. The REST API responds to one-time requests: transaction status, address balance, mempool status. The WebSocket API maintains a constant connection and sends updates automatically—you can subscribe to an address and receive a signal every time a new transaction occurs.
For bulk checks, Blockchair and Bitquery are suitable: they provide data for multiple addresses at once and support webhooks.
Step 6. Automating Analytics
The second level involves platforms that allow you to write a query once and receive a ready-made dashboard instead of a one-time result.
On Dune, blockchain data is queried using SQL and displayed on graphs; reports on exchange flows, whale activity, or top holders are updated automatically without the need for repeated queries.
Flipside operates similarly—it has a free Python SDK that allows you to pull data directly into user scripts.
The key difference from the manual method is simple: the query is written once and runs continuously. Previously, an analyst would review a graph daily—now there are dashboards that update automatically.
Step 7. Setting Up Monitoring and Alerts
The third level involves notifications instead of manually refreshing the page. A combination of "API plus bot" monitors the necessary addresses and sends notifications as soon as funds come in or go out. This is how public alerts for large whale transactions and private alerts for specific wallet withdrawals work.
The basic setup can be configured without programming: platforms like Arkham offer ready-made alerts for transfers and whale activity that arrive via email or in an app. For those needing custom event processing logic, webhooks are available—they automatically send notifications to a server when a specified event occurs.
Alert setup page for various events (related to Lazarus Group wallets, transactions over $100 million, withdrawals from CEX). Source: Arkham.Part 3. Tracing and Its Limits
Step 8. How Blockchain Forensics Works
The pinnacle of automation is tracing stolen coins and investigating transactions. Forensic engines replicate the logic of manual analysis but apply it on a network-wide scale.
This technology is based on clustering addresses using heuristics, or rule-based assumptions; two of these are particularly important:
- common input: if multiple UTXOs are spent in a single transaction, they are likely controlled by one owner;
- change address identification: using the characteristic from Step 2—round payment versus fractional change—the engine calculates which output returned to the sender.
On top of clustering, typical money laundering patterns are recognized. This includes splitting amounts or "peeling"—where small portions are repeatedly taken from a large wallet.
Next comes risk assessment and attribution: linking clusters to real exchanges, services, or individuals. This is done by both commercial platforms (Chainalysis, TRM, Elliptic) and open-source engines (GraphSense, BlockSci).
Graph of address and entity clustering. Source: Arkham.Step 9. Can We Trust Automation?
Automated analysis has a fundamental limitation. Clustering provides probabilities, not certainties. Heuristics can make mistakes: for instance, the CoinJoin technology intentionally combines UTXOs from different users in a single transaction, which can lead to errors in the common input rule.
You can also manually reduce the risk of data leakage. The Coin control feature in wallets (like Sparrow or Trezor Suite) allows you to choose which UTXO to spend: selecting an output that matches the payment amount and avoiding mixing coins from different sources in a single transaction. This reduces change and prevents the common input heuristic from triggering—the very clues on which clustering is based.
The Coin control interface as an example of counteracting tracing. Source: Trezor.Bitcoin provides not anonymity but pseudonymity—a weaker property that persistent analysis can often "break through," but not always: the result remains an estimate, albeit a well-founded one. As emphasized by Chainalysis, attribution heuristics yield probabilistic results, not certainties, making risk assessment merely a basis for an analyst's decision, not a verdict.
From this, a practical conclusion follows. It is important to distinguish between automated address grouping and human-verified attribution: the former is a hypothesis, while the latter is a conclusion. The outcome of a risk assessment is a reason to scrutinize an address, not a condemnation of its owner.
***
The open ledger makes every Bitcoin transaction traceable—this is the foundation of all on-chain analysis. For manual work, a basic understanding is sufficient: knowing what TXID is, understanding the UTXO model and change mechanics, and being able to read inputs, outputs, and confirmations in the explorer. This knowledge is enough to navigate through transaction chains and see where the funds have moved.
Automation in the form of APIs, SQL dashboards, and bots does not eliminate the need for this foundational knowledge—it simply scales the work: what an analyst does manually with one transaction, tools replicate for the entire network.
At the top lies forensics with clustering and risk assessment. However, it relies on probabilistic heuristics, thus providing hypotheses rather than proof.
It is logical to approach the topic from the bottom up: first, learn the explorer, then APIs and dashboards, and only later specialized on-chain analytics tools. The deeper the tracing, the more crucial it becomes to distinguish between what is probable and what is proven.
