BITCOIN NODE - Merkle Tree

BITCOIN NODE - Merkle Tree

Background

머클 트리(Merkle Tree)는 비트코인의 '압축 알고리즘'이다.

머클 증명(Merkle Proof) 은 대량의 데이터 묶음(e.g 블록체인 블록의 모든 거래)에서 특정 데이터 조각(거래)이 해당 묶음에 포함되어 있는지 효율적으로 증명하는 기술로, 전체 데이터를 모두 알 필요 없이 최소한의 해시값(머클 경로)만으로 검증을 가능하게 한다.

비트코인에서 Merkle Tree 는 이진 트리(Binary Tree)
각 리프 노드는 트랜잭션의 Double SHA-256 해시 (32 bytes)
최종 결과가 Merkle Root (32 bytes)
블록 헤더에 저장되는 것은 전체 트랜잭션이 아니라 Merkle Root 하나다.
Merkle Tree 는 특정 트랜잭션이 블록에 포함되었는지를 O(log n) 크기의 증명으로 검증할 수 있게 해준다.
Merkle Tree 없으면 모든 트랜잭션을 내려받아야 한다. Merkle Tree 덕분에 블록 헤더 + Merkle Proof 만으로 검증 가능하며, 이것이 SPV 의 기술적 기반이다. 이 덕분에 모바일 기기에서 비트코인 사용이 가능하다.

사토시 나카모토가 비트코인 백서 8장에서 SPV를 제안하지 않았다면, 비트코인은 수백 기가바이트의 블록체인을 저장할 수 있는 고성능 서버들만의 전유물이 되었을 것이다.

32바이트의 머클 루트 하나로 수천 개의 거래를 요약하고, 단 수백 바이트의 증거로 100%의 수학적 확신을 주는 이 구조 덕분에, 우리는 스마트폰에서도 비트코인을 안전하게 사용할 수 있는 것이다.

Why Merkle Trees Are Essential to Bitcoin

Imagine you want to verify that a specific transaction is included in a block containing 2,000 transactions. Without Merkle trees, you would need to:

Download all 2,000 transactions (~1-2 MB)
Hash each transaction
Search through all hashes

With Merkle trees, you only need:

The transaction you're interested in (~200-500 bytes)
A Merkle proof (~384 bytes for 2,000 transactions)
The block header containing the Merkle root (80 bytes)

Total: Less than 1 KB instead of 1-2 MB!

This 99.95% reduction in data transfer makes Bitcoin practical for mobile devices and light clients. This innovation, borrowed from cryptographic research papers, is one of the key reasons Bitcoin can scale to millions of users.

The Three Superpowers of Merkle Trees

Efficient Verification: Verify any transaction in O(log n) time and space
Tamper-Proof: Changing any transaction invalidates the Merkle root
SPV Wallets: Mobile devices can verify payments without full blockchain

What is a Merkle Tree?

A Merkle tree (named after Ralph Merkle, 1979) is a binary tree where:

Leaf nodes contain data hashes (transaction hashes)
Non-leaf nodes contain hashes of their children
Root node summarizes the entire tree

Visual Example: 4 Transactions

                    Merkle Root
                 Hash(Hash01 + Hash23)
                /                      \
           Hash01                    Hash23
       Hash(H0 + H1)              Hash(H2 + H3)
        /         \                /          \
    Hash0       Hash1          Hash2        Hash3
    (TX A)      (TX B)         (TX C)       (TX D)
     |           |              |            |
   TX A        TX B           TX C         TX D

Construction Algorithm

Step 1: Hash each transaction
[Hash(TX A), Hash(TX B), Hash(TX C), Hash(TX D)]

Step 2: Hash pairs
[Hash(Hash(TX A) + Hash(TX B)), Hash(Hash(TX C) + Hash(TX D))]

Step 3: Hash pairs again (until one hash remains)
[Hash(Hash01 + Hash23)]  ← This is the Merkle Root!

The Merkle root is a 32-byte fingerprint of all transactions in the block. It's stored in the block header and is part of the Proof-of-Work calculation.

Why Binary Tree?

Binary trees provide optimal balance between:

Proof size: O(log₂ n) hashes needed
Verification time: O(log₂ n) hash operations
Construction time: O(n) hash operations

With 2,000 transactions:

Tree height: log₂(2000) ≈ 11 levels
Proof size: 11 hashes × 32 bytes = 352 bytes
Verification: 11 hash operations

The Power of Merkle Proofs

A Merkle proof allows you to prove a transaction is in a block without revealing other transactions.

Example: Proving Transaction B (4 Transactions)

Goal: Prove TX B is in the block

            Root
           /    \
      Hash01   Hash23  ← Need Hash23
       /  \
  Hash0  Hash1  ← Need Hash0 (sibling of Hash1)
           |
         TX B  ← This is what we're proving

Merkle Proof for TX B: [Hash0, Hash23]

Verification Steps:

Start with Hash1 = Hash(TX B)
Combine with sibling: Hash01 = Hash(Hash0 + Hash1)
Combine with sibling: Root = Hash(Hash01 + Hash23)
Check: Does computed root match block header's Merkle root?
- ✅ YES → TX B is in the block
- ❌ NO → TX B is not in the block (or proof is fake)

Proof Size Growth

Transactions	Tree Height	Proof Size	Data Reduction
1	1	0 bytes	N/A
2	2	32 bytes	94%
4	3	64 bytes	96%
10	4	128 bytes	98%
100	7	224 bytes	99.3%
1,000	10	320 bytes	99.7%
2,000	11	352 bytes	99.8%
10,000	14	448 bytes	99.95%

Key Insight: Proof size grows logarithmically, not linearly!

Even with 10,000 transactions (~5-10 MB block), the proof is only 448 bytes.

TDD Implementation Process

Following the Red-Green-Refactor cycle, we implemented the Merkle tree package with 98.6% test coverage.

Phase 1: RED - Write the Tests First

We started with the simplest possible test case: the Genesis Block.

// TestGenesisBlockMerkleRoot tests Merkle root calculation for the Genesis Block.
//
// The Genesis Block (Block 0) contains exactly one transaction (the coinbase).
// When there's only one transaction, the Merkle root equals the transaction hash.
//
// Block Details:
//   - Block Height: 0
//   - Transaction Count: 1
//   - Coinbase TXID: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
//   - Merkle Root:   4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
//
// This is the simplest possible Merkle tree: a single leaf node.
func TestGenesisBlockMerkleRoot(t *testing.T) {
    // Genesis Block has exactly 1 transaction
    // For single transaction, Merkle root = transaction hash

    txidHex := "4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b"
    txid, _ := hex.DecodeString(txidHex)

    // Convert to internal format (little-endian)
    txHash := crypto.ReverseBytes(txid)

    // Build Merkle tree
    tree := BuildMerkleTree([][]byte{txHash})

    // Get root and convert back to display format
    root := tree.Root()
    rootDisplay := crypto.ReverseBytes(root)
    rootHex := hex.EncodeToString(rootDisplay)

    // For single transaction, root = transaction hash
    assert.Equal(t, txidHex, rootHex)
}

Why Start with Genesis Block?

Simplest case: 1 transaction
Well-documented and immutable
If we can verify this, our implementation is correct

Phase 2: GREEN - Implement the Merkle Tree

Step 1: Define the Structure

type MerkleTree struct {
    root    []byte      // The Merkle root (32 bytes)
    levels  [][][]byte  // All tree levels (for proof generation)
    txCount int         // Number of transactions
}

Design Decision: Store all levels, not just the root.

Why?

Enables efficient Merkle proof generation
Allows traversal from leaf to root
Memory cost: O(2n - 1) hashes ≈ O(n)

For 2,000 transactions:

Total nodes: ~4,000
Memory: 4,000 × 32 bytes = 128 KB (negligible)

Step 2: Implement Tree Construction

func BuildMerkleTree(txHashes [][]byte) *MerkleTree {
    if len(txHashes) == 0 {
        return nil
    }

    tree := &MerkleTree{
        txCount: len(txHashes),
        levels:  make([][][]byte, 0),
    }

    // Level 0: Transaction hashes (leaves)
    currentLevel := make([][]byte, len(txHashes))
    copy(currentLevel, txHashes)
    tree.levels = append(tree.levels, currentLevel)

    // Build tree bottom-up
    for len(currentLevel) > 1 {
        nextLevel := make([][]byte, 0)

        // Process pairs
        for i := 0; i < len(currentLevel); i += 2 {
            left := currentLevel[i]

            // Handle odd number of nodes: duplicate last node
            var right []byte
            if i+1 < len(currentLevel) {
                right = currentLevel[i+1]
            } else {
                right = left  // Bitcoin's odd-handling rule
            }

            // Hash the pair
            combined := append(left, right...)
            parentHash := crypto.DoubleSHA256(combined)

            nextLevel = append(nextLevel, parentHash)
        }

        tree.levels = append(tree.levels, nextLevel)
        currentLevel = nextLevel
    }

    // Last level contains the root
    tree.root = currentLevel[0]

    return tree
}

Key Implementation Details:

Odd Number Handling: If a level has an odd number of nodes, duplicate the last one
Bottom-Up Construction: Start from leaves, work toward root
Level Storage: Store every level for proof generation
DoubleSHA256: All hashing uses Bitcoin's double SHA-256

Step 3: Implement Merkle Proof Generation

func (t *MerkleTree) GetProof(txIndex int) [][]byte {
    if txIndex < 0 || txIndex >= t.txCount {
        return nil
    }

    proof := make([][]byte, 0)
    index := txIndex

    // Traverse from leaf to root
    for level := 0; level < len(t.levels)-1; level++ {
        currentLevel := t.levels[level]

        // Find sibling index
        var siblingIndex int
        if index%2 == 0 {
            siblingIndex = index + 1  // Right sibling
        } else {
            siblingIndex = index - 1  // Left sibling
        }

        // Add sibling to proof
        if siblingIndex < len(currentLevel) {
            proof = append(proof, currentLevel[siblingIndex])
        } else {
            // Odd number: sibling is the same node (duplicated)
            proof = append(proof, currentLevel[index])
        }

        // Move to parent index
        index = index / 2
    }

    return proof
}

Algorithm Complexity:

Time: O(log n) - traverse from leaf to root
Space: O(log n) - store log n sibling hashes

Step 4: Implement Proof Verification

func VerifyProof(txHash []byte, proof [][]byte, merkleRoot []byte,
                 txIndex, txCount int) bool {
    // Validate inputs
    if len(txHash) != 32 || len(merkleRoot) != 32 {
        return false
    }
    if txIndex < 0 || txIndex >= txCount {
        return false
    }

    currentHash := txHash
    index := txIndex

    // Process each proof level
    for _, siblingHash := range proof {
        if len(siblingHash) != 32 {
            return false
        }

        // Determine if current node is left or right child
        var combined []byte
        if index%2 == 0 {
            // Current is left child
            combined = append(currentHash, siblingHash...)
        } else {
            // Current is right child
            combined = append(siblingHash, currentHash...)
        }

        // Hash the combined result
        currentHash = crypto.DoubleSHA256(combined)
        index = index / 2
    }

    // Check if computed root matches expected root
    return bytes.Equal(currentHash, merkleRoot)
}

Why Left/Right Order Matters:

Hash(A + B) ≠ Hash(B + A)

Example:
Hash(0x01 + 0x02) = 0xabcd...
Hash(0x02 + 0x01) = 0x1234...  (completely different!)

The transaction index tells us which side the node is on:

Even index (0, 2, 4, …): Left child
Odd index (1, 3, 5, …): Right child

Phase 3: REFACTOR - Add Comprehensive Tests

After basic tests passed, we added:

Multiple Transaction Sizes: 2, 3, 4, 7 transactions
Odd Number Handling: Verify duplication works correctly
Merkle Proof Tests: Generate and verify proofs for all positions
Error Cases: Invalid indices, wrong hashes, nil trees
Real Bitcoin Data: Genesis Block and Block 1

// TestMerkleTreeThreeTransactions tests Merkle tree with odd number of transactions.
//
// With 3 transactions (A, B, C), Bitcoin duplicates the last hash:
//
//           Root = Hash(Hash(A+B) + Hash(C+C))
//          /                    \
//    Hash(A+B)                Hash(C+C)
//     /    \                   /    \
// Hash(A) Hash(B)         Hash(C) Hash(C)  ← C is duplicated
//    |      |                |      |
//    A      B                C      C
//
// This tests Bitcoin's handling of odd-numbered transaction sets.
func TestMerkleTreeThreeTransactions(t *testing.T) {
    // Create 3 transactions
    txHash1 := makeHash(0x01)
    txHash2 := makeHash(0x02)
    txHash3 := makeHash(0x03)

    tree := BuildMerkleTree([][]byte{txHash1, txHash2, txHash3})

    // Manual calculation to verify
    // Level 1: Hash pairs
    hash12 := crypto.DoubleSHA256(append(txHash1, txHash2...))
    hash33 := crypto.DoubleSHA256(append(txHash3, txHash3...)) // Duplicate!

    // Level 2: Root
    expectedRoot := crypto.DoubleSHA256(append(hash12, hash33...))

    root := tree.Root()
    assert.Equal(t, expectedRoot, root)
}

Bitcoin's Merkle Tree Rules

Bitcoin's Merkle tree implementation has specific rules that differ from textbook implementations:

Rule 1: Duplicate on Odd Numbers

When a level has an odd number of nodes, duplicate the last node.

Example: 3 transactions (A, B, C)

Level 0 (leaves):     A    B    C
                           ↓
Level 0 (adjusted):   A    B    C    C  ← C duplicated
                       \  /      \  /
Level 1:              AB        CC
                        \      /
Level 2 (root):          ABCC

Why not just promote the odd node?

Ensures tree remains a perfect binary tree
Simplifies proof verification logic
Maintains consistent tree height

Security Consideration: This duplication can lead to a vulnerability called the "Merkle tree attack" if not carefully implemented. Bitcoin Core has protections against this.

Rule 2: Double SHA-256

All hashing uses DoubleSHA256:

Hash(x) = SHA256(SHA256(x))

Why double hash?

Protection against length extension attacks
Additional security margin
Consistency with Bitcoin's transaction hashing

Rule 3: Internal Format (Little-Endian)

Transaction hashes are in internal format (little-endian) when building the tree:

TXID (display format, big-endian):
4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b

Transaction hash (internal format, little-endian):
3ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a
                                                            ↑
                                           Use this for Merkle tree

Why?

Bitcoin protocol internally uses little-endian
Block headers store Merkle root in internal format
Consistency with rest of protocol

When to Reverse:

✅ Use internal format: Building tree, computing root
✅ Use display format: Showing TXID to users, block explorers

SPV: Simplified Payment Verification

SPV (Simplified Payment Verification) is one of Bitcoin's most important innovations. It allows light clients to verify payments without downloading the full blockchain.

How SPV Works

Full Node (needs entire blockchain):

Download all blocks (~500 GB)
Validate all transactions
Maintain full UTXO set
Fully trustless

SPV Node (only needs block headers):

Download block headers only (~80 bytes × 800,000 blocks = 64 MB)
Request Merkle proof for specific transactions
Verify proof against block header
Mostly trustless (trust that majority of hashrate is honest)

SPV Verification Process

Scenario: Alice's mobile wallet wants to verify she received a payment.

Step 1: Alice receives payment
- TX hash: abc123...
- Block height: 700,000

Step 2: Alice's wallet requests Merkle proof
- "Give me proof that TX abc123... is in block 700,000"

Step 3: Full node responds with:
- Block header (80 bytes)
- Merkle proof (~350 bytes for 2,000 TX block)
- Total: ~430 bytes

Step 4: Alice's wallet verifies:
1. Verify block header has valid Proof-of-Work ✓
2. Verify block header is in longest chain ✓
3. Verify Merkle proof using block header's Merkle root ✓
4. Conclusion: Payment confirmed! ✓

Data transferred: 430 bytes instead of ~1 MB (99.96% reduction!)

SPV Security Model

What SPV Guarantees:

✅ Transaction is in a block with valid Proof-of-Work
✅ Transaction format is valid (if you parse it)
✅ Block is in the longest chain (with sufficient confirmations)

What SPV Doesn't Guarantee:

❌ Transaction inputs are unspent (could be double-spend)
❌ Transaction is in the main chain (could be on a fork)
❌ Full consensus rule validation

Trust Assumptions:

Trust that majority of hash rate follows consensus rules
Trust that peers provide accurate block headers
Acceptable for most retail payments

Security Recommendations:

Wait for multiple confirmations (6+ blocks)
Connect to multiple peers
Cross-check with block explorers
Use for smaller amounts

Real-World SPV Usage

Mobile Wallets (Bitcoin Wallet, BRD, etc.):

Store only headers: ~60-80 MB
Request proofs for user's transactions only
Enables Bitcoin on smartphones

Point-of-Sale Systems:

Quick payment verification
Don't need full node infrastructure
Acceptable risk for retail transactions

IoT Devices:

Constrained memory and bandwidth
Can verify payments without full blockchain
Enables Bitcoin on embedded systems

Real-World Examples

Example 1: Genesis Block

The simplest possible Merkle tree.

Block 0 (Genesis Block):
- Transactions: 1
- Coinbase TXID: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
- Merkle Root: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b

Tree structure:
    Root
     |
   TX 0 (coinbase)

For single transaction: Merkle Root = Transaction Hash

Example 2: Block 100,000 (Famous Block)

Block 100,000 is famous because it demonstrated Bitcoin had survived and thrived.

Block 100,000:
- Height: 100,000
- Date: 2010-12-29
- Transactions: 4
- Block Hash: 000000000003ba27aa200b1cecaad478d2b00432346c3f1f3986da1afd33e506
- Merkle Root: f3e94742aca4b5ef85488dc37c06c3282295ffec960994b2c0d5ac2a25a95766

Tree structure:
                Root
               /    \
          Hash01      Hash23
          /   \       /   \
      Hash0  Hash1  Hash2  Hash3
       |      |      |      |
      TX0    TX1    TX2    TX3

Merkle Proof for TX1:
- Sibling path: [Hash0, Hash23]
- Proof size: 64 bytes
- Verification: 2 hash operations

Example 3: Large Block (2,000 Transactions)

Modern blocks regularly contain 2,000+ transactions.

Block with 2,000 transactions:
- Merkle tree height: 11 levels
- Total nodes: ~4,000
- Merkle proof size: 11 × 32 bytes = 352 bytes

Tree structure (simplified):
                     Root (Level 11)
                   /      \
              Level 10  Level 10
              /    \    /    \
        ... 11 levels total ...
            /                  \
    TX 0 ... TX 999      TX 1000 ... TX 1999

To verify TX 1337:
- Need 11 sibling hashes (Merkle proof)
- 11 DoubleSHA256 operations
- 352 bytes of data
- Sub-millisecond verification

Without Merkle tree:
- Need all 2,000 transaction hashes
- 2,000 × 32 bytes = 64,000 bytes
- Must search through all hashes
- 181x more data!

Example 4: Bitcoin's Largest Block

As of 2025, blocks can contain 3,000-4,000 transactions with SegWit.

Block with 3,500 transactions:
- Merkle tree height: 12 levels
- Merkle proof size: 12 × 32 bytes = 384 bytes
- Block size: ~2-3 MB
- Proof is 0.01% of block size!

SPV wallet verification:
- Download: 80 bytes (header) + 384 bytes (proof) = 464 bytes
- Instead of: ~2.5 MB (full block)
- Reduction: 99.98%
- Verification time: ~1-2 milliseconds

Test Coverage and Quality

We achieved 98.6% test coverage with comprehensive tests covering:

1. Happy Path Tests

✅ Genesis Block Merkle Root

func TestGenesisBlockMerkleRoot(t *testing.T) {
    // Verifies the first Bitcoin transaction ever
    // TXID = Merkle Root (single transaction)
}

✅ Multiple Transaction Sizes

func TestMerkleTreeTwoTransactions(t *testing.T)    // Perfect binary tree
func TestMerkleTreeThreeTransactions(t *testing.T)  // Odd number (duplication)
func TestMerkleTreeFourTransactions(t *testing.T)   // Perfect binary tree
func TestMerkleTreeSevenTransactions(t *testing.T)  // Multiple odd levels

✅ Merkle Proof Generation and Verification

func TestMerkleProof(t *testing.T) {
    // Generate proofs for all transactions
    // Verify each proof
    // Ensure 100% success rate
}

✅ Real Bitcoin Block

func TestBlock1MerkleRoot(t *testing.T) {
    // Block 1 (first block after Genesis)
    // Verifies against real blockchain data
}

2. Error Path Tests

✅ Nil Tree Handling

func TestNilMerkleTree(t *testing.T) {
    var tree *MerkleTree
    root := tree.Root()     // Should return nil
    height := tree.Height() // Should return 0
    proof := tree.GetProof(0) // Should return nil
}

✅ Out of Bounds Proof Requests

func TestGetProofOutOfBounds(t *testing.T) {
    tree.GetProof(-1)   // Negative index
    tree.GetProof(999)  // Beyond txCount
}

✅ Invalid Proof Verification

func TestVerifyProofInvalidLength(t *testing.T) {
    // Wrong txHash length
    // Wrong merkleRoot length
    // Wrong proof element length
}

func TestMerkleProofInvalidIndex(t *testing.T) {
    // Wrong transaction hash
    // Wrong index
}

3. Edge Cases

✅ Empty Transaction List

func TestEmptyMerkleTree(t *testing.T) {
    tree := BuildMerkleTree([][]byte{})
    // Should return nil
}

✅ Odd Number Duplication

func TestMerkleTreeSingleTransactionDuplication(t *testing.T) {
    // 5 transactions
    // Verify proofs work correctly with duplication
}

4. Tree Properties

✅ Height Calculation

func TestMerkleTreeHeight(t *testing.T) {
    // 1 TX:  height = 1
    // 2 TX:  height = 2
    // 4 TX:  height = 3
    // 8 TX:  height = 4
    // 16 TX: height = 5
}

Coverage Report

btcnode/pkg/merkle/merkle.go:80:  BuildMerkleTree  100.0%
btcnode/pkg/merkle/merkle.go:142: Root             100.0%
btcnode/pkg/merkle/merkle.go:164: Height           100.0%
btcnode/pkg/merkle/merkle.go:205: GetProof          95.0%
btcnode/pkg/merkle/merkle.go:288: VerifyProof       96.0%
────────────────────────────────────────────────────────────
total:                                              98.6%

What's not covered?

Some unreachable defensive checks
Extremely rare edge cases

98.6% coverage ensures:

Every major code path is tested
Error handling is verified
Integration with crypto package works correctly
Real Bitcoin data validates our implementation

Key Takeaways

1. Merkle Trees Enable Scalability

Without Merkle trees, every light client would need:

Full blockchain: ~500 GB
Constant bandwidth for new blocks
Powerful hardware

With Merkle trees, light clients need:

Block headers: ~80 MB
Proofs: <1 KB per transaction
Smartphones can verify Bitcoin payments!

This is why Bitcoin can have millions of users.

2. Logarithmic Growth is Magical

Proof size grows logarithmically, not linearly:

    1 TX →    0 bytes
   10 TX →  128 bytes
  100 TX →  224 bytes
1,000 TX →  320 bytes
10,000 TX → 448 bytes

Even with 10,000 transactions, the proof is less than 0.5 KB!

3. Security Through Cryptography

Merkle trees provide strong security guarantees:

Tamper-proof: Changing any transaction changes the root
Efficient verification: O(log n) operations
Minimal trust: SPV needs minimal assumptions

But remember: SPV is not as secure as running a full node!

4. Bitcoin's Specific Rules Matter

Bitcoin's Merkle tree implementation has quirks:

Duplicate on odd numbers
Double SHA-256 hashing
Little-endian internal format

These rules are consensus-critical: Getting them wrong breaks compatibility!

5. TDD Catches Bugs Early

Our 98.6% test coverage includes:

Real Bitcoin data (Genesis Block, Block 1)
Edge cases (empty, nil, odd numbers)
Error paths (invalid indices, wrong hashes)
Integration tests (crypto package)

Every test passed on the first try after implementation. This is the power of TDD.