BITCOIN NODE - Merkle Tree
- BITCOIN NODE - Merkle Tree
BITCOIN NODE - Merkle Tree
Background
머클 트리(Merkle Tree)는 비트코인의 '압축 알고리즘'이다.
머클 증명(Merkle Proof) 은 대량의 데이터 묶음(e.g 블록체인 블록의 모든 거래)에서 특정 데이터 조각(거래)이 해당 묶음에 포함되어 있는지 효율적으로 증명하는 기술로, 전체 데이터를 모두 알 필요 없이 최소한의 해시값(머클 경로)만으로 검증을 가능하게 한다.
- 비트코인에서 Merkle Tree 는 이진 트리(Binary Tree)
- 각 리프 노드는 트랜잭션의 Double SHA-256 해시 (32 bytes)
- 최종 결과가 Merkle Root (32 bytes)
- 블록 헤더에 저장되는 것은 전체 트랜잭션이 아니라 Merkle Root 하나다.
- Merkle Tree 는 특정 트랜잭션이 블록에 포함되었는지를 O(log n) 크기의 증명으로 검증할 수 있게 해준다.
- Merkle Tree 없으면 모든 트랜잭션을 내려받아야 한다. Merkle Tree 덕분에 블록 헤더 + Merkle Proof 만으로 검증 가능하며, 이것이 SPV 의 기술적 기반이다. 이 덕분에 모바일 기기에서 비트코인 사용이 가능하다.
사토시 나카모토가 비트코인 백서 8장에서 SPV를 제안하지 않았다면, 비트코인은 수백 기가바이트의 블록체인을 저장할 수 있는 고성능 서버들만의 전유물이 되었을 것이다.
32바이트의 머클 루트 하나로 수천 개의 거래를 요약하고, 단 수백 바이트의 증거로 100%의 수학적 확신을 주는 이 구조 덕분에, 우리는 스마트폰에서도 비트코인을 안전하게 사용할 수 있는 것이다.
Why Merkle Trees Are Essential to Bitcoin
Imagine you want to verify that a specific transaction is included in a block containing 2,000 transactions. Without Merkle trees, you would need to:
- Download all 2,000 transactions (~1-2 MB)
- Hash each transaction
- Search through all hashes
With Merkle trees, you only need:
- The transaction you're interested in (~200-500 bytes)
- A Merkle proof (~384 bytes for 2,000 transactions)
- The block header containing the Merkle root (80 bytes)
Total: Less than 1 KB instead of 1-2 MB!
This 99.95% reduction in data transfer makes Bitcoin practical for mobile devices and light clients. This innovation, borrowed from cryptographic research papers, is one of the key reasons Bitcoin can scale to millions of users.
The Three Superpowers of Merkle Trees
- Efficient Verification: Verify any transaction in O(log n) time and space
- Tamper-Proof: Changing any transaction invalidates the Merkle root
- SPV Wallets: Mobile devices can verify payments without full blockchain
What is a Merkle Tree?
A Merkle tree (named after Ralph Merkle, 1979) is a binary tree where:
- Leaf nodes contain data hashes (transaction hashes)
- Non-leaf nodes contain hashes of their children
- Root node summarizes the entire tree
Visual Example: 4 Transactions
Merkle Root
Hash(Hash01 + Hash23)
/ \
Hash01 Hash23
Hash(H0 + H1) Hash(H2 + H3)
/ \ / \
Hash0 Hash1 Hash2 Hash3
(TX A) (TX B) (TX C) (TX D)
| | | |
TX A TX B TX C TX D
Construction Algorithm
Step 1: Hash each transaction
[Hash(TX A), Hash(TX B), Hash(TX C), Hash(TX D)]
Step 2: Hash pairs
[Hash(Hash(TX A) + Hash(TX B)), Hash(Hash(TX C) + Hash(TX D))]
Step 3: Hash pairs again (until one hash remains)
[Hash(Hash01 + Hash23)] ← This is the Merkle Root!
The Merkle root is a 32-byte fingerprint of all transactions in the block. It's stored in the block header and is part of the Proof-of-Work calculation.
Why Binary Tree?
Binary trees provide optimal balance between:
- Proof size: O(log₂ n) hashes needed
- Verification time: O(log₂ n) hash operations
- Construction time: O(n) hash operations
With 2,000 transactions:
- Tree height: log₂(2000) ≈ 11 levels
- Proof size: 11 hashes × 32 bytes = 352 bytes
- Verification: 11 hash operations
The Power of Merkle Proofs
A Merkle proof allows you to prove a transaction is in a block without revealing other transactions.
Example: Proving Transaction B (4 Transactions)
Goal: Prove TX B is in the block
Root
/ \
Hash01 Hash23 ← Need Hash23
/ \
Hash0 Hash1 ← Need Hash0 (sibling of Hash1)
|
TX B ← This is what we're proving
Merkle Proof for TX B: [Hash0, Hash23]
Verification Steps:
- Start with
Hash1 = Hash(TX B) - Combine with sibling:
Hash01 = Hash(Hash0 + Hash1) - Combine with sibling:
Root = Hash(Hash01 + Hash23) - Check: Does computed root match block header's Merkle root?
- ✅ YES → TX B is in the block
- ❌ NO → TX B is not in the block (or proof is fake)
Proof Size Growth
| Transactions | Tree Height | Proof Size | Data Reduction |
|---|---|---|---|
| 1 | 1 | 0 bytes | N/A |
| 2 | 2 | 32 bytes | 94% |
| 4 | 3 | 64 bytes | 96% |
| 10 | 4 | 128 bytes | 98% |
| 100 | 7 | 224 bytes | 99.3% |
| 1,000 | 10 | 320 bytes | 99.7% |
| 2,000 | 11 | 352 bytes | 99.8% |
| 10,000 | 14 | 448 bytes | 99.95% |
Key Insight: Proof size grows logarithmically, not linearly!
Even with 10,000 transactions (~5-10 MB block), the proof is only 448 bytes.
TDD Implementation Process
Following the Red-Green-Refactor cycle, we implemented the Merkle tree package with 98.6% test coverage.
Phase 1: RED - Write the Tests First
We started with the simplest possible test case: the Genesis Block.
// TestGenesisBlockMerkleRoot tests Merkle root calculation for the Genesis Block.
//
// The Genesis Block (Block 0) contains exactly one transaction (the coinbase).
// When there's only one transaction, the Merkle root equals the transaction hash.
//
// Block Details:
// - Block Height: 0
// - Transaction Count: 1
// - Coinbase TXID: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
// - Merkle Root: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
//
// This is the simplest possible Merkle tree: a single leaf node.
func TestGenesisBlockMerkleRoot(t *testing.T) {
// Genesis Block has exactly 1 transaction
// For single transaction, Merkle root = transaction hash
txidHex := "4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b"
txid, _ := hex.DecodeString(txidHex)
// Convert to internal format (little-endian)
txHash := crypto.ReverseBytes(txid)
// Build Merkle tree
tree := BuildMerkleTree([][]byte{txHash})
// Get root and convert back to display format
root := tree.Root()
rootDisplay := crypto.ReverseBytes(root)
rootHex := hex.EncodeToString(rootDisplay)
// For single transaction, root = transaction hash
assert.Equal(t, txidHex, rootHex)
}
Why Start with Genesis Block?
- Simplest case: 1 transaction
- Well-documented and immutable
- If we can verify this, our implementation is correct
Phase 2: GREEN - Implement the Merkle Tree
Step 1: Define the Structure
type MerkleTree struct {
root []byte // The Merkle root (32 bytes)
levels [][][]byte // All tree levels (for proof generation)
txCount int // Number of transactions
}
Design Decision: Store all levels, not just the root.
Why?
- Enables efficient Merkle proof generation
- Allows traversal from leaf to root
- Memory cost: O(2n - 1) hashes ≈ O(n)
For 2,000 transactions:
- Total nodes: ~4,000
- Memory: 4,000 × 32 bytes = 128 KB (negligible)
Step 2: Implement Tree Construction
func BuildMerkleTree(txHashes [][]byte) *MerkleTree {
if len(txHashes) == 0 {
return nil
}
tree := &MerkleTree{
txCount: len(txHashes),
levels: make([][][]byte, 0),
}
// Level 0: Transaction hashes (leaves)
currentLevel := make([][]byte, len(txHashes))
copy(currentLevel, txHashes)
tree.levels = append(tree.levels, currentLevel)
// Build tree bottom-up
for len(currentLevel) > 1 {
nextLevel := make([][]byte, 0)
// Process pairs
for i := 0; i < len(currentLevel); i += 2 {
left := currentLevel[i]
// Handle odd number of nodes: duplicate last node
var right []byte
if i+1 < len(currentLevel) {
right = currentLevel[i+1]
} else {
right = left // Bitcoin's odd-handling rule
}
// Hash the pair
combined := append(left, right...)
parentHash := crypto.DoubleSHA256(combined)
nextLevel = append(nextLevel, parentHash)
}
tree.levels = append(tree.levels, nextLevel)
currentLevel = nextLevel
}
// Last level contains the root
tree.root = currentLevel[0]
return tree
}
Key Implementation Details:
- Odd Number Handling: If a level has an odd number of nodes, duplicate the last one
- Bottom-Up Construction: Start from leaves, work toward root
- Level Storage: Store every level for proof generation
- DoubleSHA256: All hashing uses Bitcoin's double SHA-256
Step 3: Implement Merkle Proof Generation
func (t *MerkleTree) GetProof(txIndex int) [][]byte {
if txIndex < 0 || txIndex >= t.txCount {
return nil
}
proof := make([][]byte, 0)
index := txIndex
// Traverse from leaf to root
for level := 0; level < len(t.levels)-1; level++ {
currentLevel := t.levels[level]
// Find sibling index
var siblingIndex int
if index%2 == 0 {
siblingIndex = index + 1 // Right sibling
} else {
siblingIndex = index - 1 // Left sibling
}
// Add sibling to proof
if siblingIndex < len(currentLevel) {
proof = append(proof, currentLevel[siblingIndex])
} else {
// Odd number: sibling is the same node (duplicated)
proof = append(proof, currentLevel[index])
}
// Move to parent index
index = index / 2
}
return proof
}
Algorithm Complexity:
- Time: O(log n) - traverse from leaf to root
- Space: O(log n) - store log n sibling hashes
Step 4: Implement Proof Verification
func VerifyProof(txHash []byte, proof [][]byte, merkleRoot []byte,
txIndex, txCount int) bool {
// Validate inputs
if len(txHash) != 32 || len(merkleRoot) != 32 {
return false
}
if txIndex < 0 || txIndex >= txCount {
return false
}
currentHash := txHash
index := txIndex
// Process each proof level
for _, siblingHash := range proof {
if len(siblingHash) != 32 {
return false
}
// Determine if current node is left or right child
var combined []byte
if index%2 == 0 {
// Current is left child
combined = append(currentHash, siblingHash...)
} else {
// Current is right child
combined = append(siblingHash, currentHash...)
}
// Hash the combined result
currentHash = crypto.DoubleSHA256(combined)
index = index / 2
}
// Check if computed root matches expected root
return bytes.Equal(currentHash, merkleRoot)
}
Why Left/Right Order Matters:
Hash(A + B) ≠ Hash(B + A)
Example:
Hash(0x01 + 0x02) = 0xabcd...
Hash(0x02 + 0x01) = 0x1234... (completely different!)
The transaction index tells us which side the node is on:
- Even index (0, 2, 4, …): Left child
- Odd index (1, 3, 5, …): Right child
Phase 3: REFACTOR - Add Comprehensive Tests
After basic tests passed, we added:
- Multiple Transaction Sizes: 2, 3, 4, 7 transactions
- Odd Number Handling: Verify duplication works correctly
- Merkle Proof Tests: Generate and verify proofs for all positions
- Error Cases: Invalid indices, wrong hashes, nil trees
- Real Bitcoin Data: Genesis Block and Block 1
// TestMerkleTreeThreeTransactions tests Merkle tree with odd number of transactions.
//
// With 3 transactions (A, B, C), Bitcoin duplicates the last hash:
//
// Root = Hash(Hash(A+B) + Hash(C+C))
// / \
// Hash(A+B) Hash(C+C)
// / \ / \
// Hash(A) Hash(B) Hash(C) Hash(C) ← C is duplicated
// | | | |
// A B C C
//
// This tests Bitcoin's handling of odd-numbered transaction sets.
func TestMerkleTreeThreeTransactions(t *testing.T) {
// Create 3 transactions
txHash1 := makeHash(0x01)
txHash2 := makeHash(0x02)
txHash3 := makeHash(0x03)
tree := BuildMerkleTree([][]byte{txHash1, txHash2, txHash3})
// Manual calculation to verify
// Level 1: Hash pairs
hash12 := crypto.DoubleSHA256(append(txHash1, txHash2...))
hash33 := crypto.DoubleSHA256(append(txHash3, txHash3...)) // Duplicate!
// Level 2: Root
expectedRoot := crypto.DoubleSHA256(append(hash12, hash33...))
root := tree.Root()
assert.Equal(t, expectedRoot, root)
}
Bitcoin's Merkle Tree Rules
Bitcoin's Merkle tree implementation has specific rules that differ from textbook implementations:
Rule 1: Duplicate on Odd Numbers
When a level has an odd number of nodes, duplicate the last node.
Example: 3 transactions (A, B, C)
Level 0 (leaves): A B C
↓
Level 0 (adjusted): A B C C ← C duplicated
\ / \ /
Level 1: AB CC
\ /
Level 2 (root): ABCC
Why not just promote the odd node?
- Ensures tree remains a perfect binary tree
- Simplifies proof verification logic
- Maintains consistent tree height
Security Consideration: This duplication can lead to a vulnerability called the "Merkle tree attack" if not carefully implemented. Bitcoin Core has protections against this.
Rule 2: Double SHA-256
All hashing uses DoubleSHA256:
Hash(x) = SHA256(SHA256(x))
Why double hash?
- Protection against length extension attacks
- Additional security margin
- Consistency with Bitcoin's transaction hashing
Rule 3: Internal Format (Little-Endian)
Transaction hashes are in internal format (little-endian) when building the tree:
TXID (display format, big-endian):
4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
Transaction hash (internal format, little-endian):
3ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a
↑
Use this for Merkle tree
Why?
- Bitcoin protocol internally uses little-endian
- Block headers store Merkle root in internal format
- Consistency with rest of protocol
When to Reverse:
- ✅ Use internal format: Building tree, computing root
- ✅ Use display format: Showing TXID to users, block explorers
SPV: Simplified Payment Verification
SPV (Simplified Payment Verification) is one of Bitcoin's most important innovations. It allows light clients to verify payments without downloading the full blockchain.
How SPV Works
Full Node (needs entire blockchain):
1. Download all blocks (~500 GB)
2. Validate all transactions
3. Maintain full UTXO set
4. Fully trustless
SPV Node (only needs block headers):
1. Download block headers only (~80 bytes × 800,000 blocks = 64 MB)
2. Request Merkle proof for specific transactions
3. Verify proof against block header
4. Mostly trustless (trust that majority of hashrate is honest)
SPV Verification Process
Scenario: Alice's mobile wallet wants to verify she received a payment.
Step 1: Alice receives payment
- TX hash: abc123...
- Block height: 700,000
Step 2: Alice's wallet requests Merkle proof
- "Give me proof that TX abc123... is in block 700,000"
Step 3: Full node responds with:
- Block header (80 bytes)
- Merkle proof (~350 bytes for 2,000 TX block)
- Total: ~430 bytes
Step 4: Alice's wallet verifies:
1. Verify block header has valid Proof-of-Work ✓
2. Verify block header is in longest chain ✓
3. Verify Merkle proof using block header's Merkle root ✓
4. Conclusion: Payment confirmed! ✓
Data transferred: 430 bytes instead of ~1 MB (99.96% reduction!)
SPV Security Model
What SPV Guarantees:
- ✅ Transaction is in a block with valid Proof-of-Work
- ✅ Transaction format is valid (if you parse it)
- ✅ Block is in the longest chain (with sufficient confirmations)
What SPV Doesn't Guarantee:
- ❌ Transaction inputs are unspent (could be double-spend)
- ❌ Transaction is in the main chain (could be on a fork)
- ❌ Full consensus rule validation
Trust Assumptions:
- Trust that majority of hash rate follows consensus rules
- Trust that peers provide accurate block headers
- Acceptable for most retail payments
Security Recommendations:
- Wait for multiple confirmations (6+ blocks)
- Connect to multiple peers
- Cross-check with block explorers
- Use for smaller amounts
Real-World SPV Usage
Mobile Wallets (Bitcoin Wallet, BRD, etc.):
- Store only headers: ~60-80 MB
- Request proofs for user's transactions only
- Enables Bitcoin on smartphones
Point-of-Sale Systems:
- Quick payment verification
- Don't need full node infrastructure
- Acceptable risk for retail transactions
IoT Devices:
- Constrained memory and bandwidth
- Can verify payments without full blockchain
- Enables Bitcoin on embedded systems
Real-World Examples
Example 1: Genesis Block
The simplest possible Merkle tree.
Block 0 (Genesis Block):
- Transactions: 1
- Coinbase TXID: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
- Merkle Root: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
Tree structure:
Root
|
TX 0 (coinbase)
For single transaction: Merkle Root = Transaction Hash
Example 2: Block 100,000 (Famous Block)
Block 100,000 is famous because it demonstrated Bitcoin had survived and thrived.
Block 100,000:
- Height: 100,000
- Date: 2010-12-29
- Transactions: 4
- Block Hash: 000000000003ba27aa200b1cecaad478d2b00432346c3f1f3986da1afd33e506
- Merkle Root: f3e94742aca4b5ef85488dc37c06c3282295ffec960994b2c0d5ac2a25a95766
Tree structure:
Root
/ \
Hash01 Hash23
/ \ / \
Hash0 Hash1 Hash2 Hash3
| | | |
TX0 TX1 TX2 TX3
Merkle Proof for TX1:
- Sibling path: [Hash0, Hash23]
- Proof size: 64 bytes
- Verification: 2 hash operations
Example 3: Large Block (2,000 Transactions)
Modern blocks regularly contain 2,000+ transactions.
Block with 2,000 transactions:
- Merkle tree height: 11 levels
- Total nodes: ~4,000
- Merkle proof size: 11 × 32 bytes = 352 bytes
Tree structure (simplified):
Root (Level 11)
/ \
Level 10 Level 10
/ \ / \
... 11 levels total ...
/ \
TX 0 ... TX 999 TX 1000 ... TX 1999
To verify TX 1337:
- Need 11 sibling hashes (Merkle proof)
- 11 DoubleSHA256 operations
- 352 bytes of data
- Sub-millisecond verification
Without Merkle tree:
- Need all 2,000 transaction hashes
- 2,000 × 32 bytes = 64,000 bytes
- Must search through all hashes
- 181x more data!
Example 4: Bitcoin's Largest Block
As of 2025, blocks can contain 3,000-4,000 transactions with SegWit.
Block with 3,500 transactions:
- Merkle tree height: 12 levels
- Merkle proof size: 12 × 32 bytes = 384 bytes
- Block size: ~2-3 MB
- Proof is 0.01% of block size!
SPV wallet verification:
- Download: 80 bytes (header) + 384 bytes (proof) = 464 bytes
- Instead of: ~2.5 MB (full block)
- Reduction: 99.98%
- Verification time: ~1-2 milliseconds
Test Coverage and Quality
We achieved 98.6% test coverage with comprehensive tests covering:
1. Happy Path Tests
✅ Genesis Block Merkle Root
func TestGenesisBlockMerkleRoot(t *testing.T) {
// Verifies the first Bitcoin transaction ever
// TXID = Merkle Root (single transaction)
}
✅ Multiple Transaction Sizes
func TestMerkleTreeTwoTransactions(t *testing.T) // Perfect binary tree
func TestMerkleTreeThreeTransactions(t *testing.T) // Odd number (duplication)
func TestMerkleTreeFourTransactions(t *testing.T) // Perfect binary tree
func TestMerkleTreeSevenTransactions(t *testing.T) // Multiple odd levels
✅ Merkle Proof Generation and Verification
func TestMerkleProof(t *testing.T) {
// Generate proofs for all transactions
// Verify each proof
// Ensure 100% success rate
}
✅ Real Bitcoin Block
func TestBlock1MerkleRoot(t *testing.T) {
// Block 1 (first block after Genesis)
// Verifies against real blockchain data
}
2. Error Path Tests
✅ Nil Tree Handling
func TestNilMerkleTree(t *testing.T) {
var tree *MerkleTree
root := tree.Root() // Should return nil
height := tree.Height() // Should return 0
proof := tree.GetProof(0) // Should return nil
}
✅ Out of Bounds Proof Requests
func TestGetProofOutOfBounds(t *testing.T) {
tree.GetProof(-1) // Negative index
tree.GetProof(999) // Beyond txCount
}
✅ Invalid Proof Verification
func TestVerifyProofInvalidLength(t *testing.T) {
// Wrong txHash length
// Wrong merkleRoot length
// Wrong proof element length
}
func TestMerkleProofInvalidIndex(t *testing.T) {
// Wrong transaction hash
// Wrong index
}
3. Edge Cases
✅ Empty Transaction List
func TestEmptyMerkleTree(t *testing.T) {
tree := BuildMerkleTree([][]byte{})
// Should return nil
}
✅ Odd Number Duplication
func TestMerkleTreeSingleTransactionDuplication(t *testing.T) {
// 5 transactions
// Verify proofs work correctly with duplication
}
4. Tree Properties
✅ Height Calculation
func TestMerkleTreeHeight(t *testing.T) {
// 1 TX: height = 1
// 2 TX: height = 2
// 4 TX: height = 3
// 8 TX: height = 4
// 16 TX: height = 5
}
Coverage Report
btcnode/pkg/merkle/merkle.go:80: BuildMerkleTree 100.0%
btcnode/pkg/merkle/merkle.go:142: Root 100.0%
btcnode/pkg/merkle/merkle.go:164: Height 100.0%
btcnode/pkg/merkle/merkle.go:205: GetProof 95.0%
btcnode/pkg/merkle/merkle.go:288: VerifyProof 96.0%
────────────────────────────────────────────────────────────
total: 98.6%
What's not covered?
- Some unreachable defensive checks
- Extremely rare edge cases
98.6% coverage ensures:
- Every major code path is tested
- Error handling is verified
- Integration with crypto package works correctly
- Real Bitcoin data validates our implementation
Key Takeaways
1. Merkle Trees Enable Scalability
Without Merkle trees, every light client would need:
- Full blockchain: ~500 GB
- Constant bandwidth for new blocks
- Powerful hardware
With Merkle trees, light clients need:
- Block headers: ~80 MB
- Proofs: <1 KB per transaction
- Smartphones can verify Bitcoin payments!
This is why Bitcoin can have millions of users.
2. Logarithmic Growth is Magical
Proof size grows logarithmically, not linearly:
1 TX → 0 bytes
10 TX → 128 bytes
100 TX → 224 bytes
1,000 TX → 320 bytes
10,000 TX → 448 bytes
Even with 10,000 transactions, the proof is less than 0.5 KB!
3. Security Through Cryptography
Merkle trees provide strong security guarantees:
- Tamper-proof: Changing any transaction changes the root
- Efficient verification: O(log n) operations
- Minimal trust: SPV needs minimal assumptions
But remember: SPV is not as secure as running a full node!
4. Bitcoin's Specific Rules Matter
Bitcoin's Merkle tree implementation has quirks:
- Duplicate on odd numbers
- Double SHA-256 hashing
- Little-endian internal format
These rules are consensus-critical: Getting them wrong breaks compatibility!
5. TDD Catches Bugs Early
Our 98.6% test coverage includes:
- Real Bitcoin data (Genesis Block, Block 1)
- Edge cases (empty, nil, odd numbers)
- Error paths (invalid indices, wrong hashes)
- Integration tests (crypto package)
Every test passed on the first try after implementation. This is the power of TDD.