Why Your Go Concurrency Patterns Are Leaking Memory? (And How to Fix Them)

You deployed your Go microservice.
It handles requests beautifully.
Load tests passed.

But three weeks later, your monitoring alert, memory usage climbing steadily. No panic. No crash. Just a slow, insidious creep toward OOM.

This isn’t hypothetical. It’s the No.1 concurrency bug I’ve debugged in production Go systems over 6 years of backend engineering. And the culprit? Goroutine leaks—tiny, forgotten threads that never terminate, holding references, blocking channels, and silently consuming resources.

In this guide, you’ll learn:

  • The 5 concurrency patterns that cause memory leaks (with code examples)
  • How to detect leaks early using pprof and runtime metrics
  • Production-ready patterns for safe goroutine lifecycle management
  • When to use context.WithCancel, errgroup, and sync.WaitGroup correctly

Let’s dive in.

Pattern #1: Forgotten Context Cancellation

The Problem

func StartWorker(ctx context.Context, dataChan chan Data) {
    go func() {
        for {
            select {
            case data := <-dataChan:
                process(data)
            // ❌ Missing: <-ctx.Done()
            }
        }
    }()
}

What happens: When the parent context is cancelled, this goroutine keeps running forever, blocked on dataChan. If dataChan is never written to again, the goroutine is leaked.

Go’s scheduler doesn’t automatically terminate goroutines when their parent context ends. Context is a cooperative cancellation signal—your code must explicitly check ctx.Done(). The garbage collector cannot reclaim a running goroutine, even if it’s blocked indefinitely

The Fix: Always Listen to Context

func StartWorker(ctx context.Context, dataChan chan Data) {
    go func() {
        defer log.Println("Worker shutting down") // Always defer cleanup logs
        
        for {
            select {
            case data := <-dataChan:
                process(data)
            case <-ctx.Done():
                // ✅ Graceful shutdown: drain channel if needed
                for len(dataChan) > 0 {
                    data := <-dataChan
                    process(data) // Or queue for later
                }
                return
            }
        }
    }()
}

Pro Tip: Use context.WithTimeout for batch operations to prevent indefinite hangs

ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second)
defer cancel() // Critical: always defer cancel to free resources

Pattern #2: Unbuffered Channels Without Exit Strategy

The Problem

func FetchAll(urls []string) []Result {
    results := make(chan Result)
    
    for _, url := range urls {
        go func(u string) {
            results <- fetch(u) // ❌ Blocks forever if receiver stops reading
        }(url)
    }
    
    // ❌ What if one fetch hangs? This range blocks forever.
    var all []Result
    for r := range results {
        all = append(all, r)
    }
    return all
}

Why This Leaks??

If any goroutine blocks on results <- fetch(u) because the receiver has stopped reading (e.g., due to an error), that goroutine is leaked. Worse, if you have 100 URLs and 1 hangs, you leak 99 goroutines waiting to send.

The Fix: Bounded Concurrency + Error Handling

func FetchAll(ctx context.Context, urls []string) ([]Result, error) {
    type indexedResult struct {
        idx int
        res Result
        err error
    }
    
    results := make(chan indexedResult, len(urls)) // ✅ Buffered to prevent send-block
    sem := make(chan struct{}, 10) // ✅ Limit concurrent fetches
    
    for i, url := range urls {
        i, url := i, url // ✅ Capture loop variables
        go func() {
            sem <- struct{}{} // Acquire semaphore
            defer func() { <-sem }() // Release
            
            res, err := fetchWithContext(ctx, url)
            results <- indexedResult{idx: i, res: res, err: err}
        }()
    }
    
    // Collect results with context awareness
    var all = make([]Result, len(urls))
    var firstErr error
    
    for i := 0; i < len(urls); i++ {
        select {
        case r := <-results:
            if r.err != nil && firstErr == nil {
                firstErr = r.err
            }
            if r.res != nil {
                all[r.idx] = r.res
            }
        case <-ctx.Done():
            return nil, ctx.Err() // ✅ Propagate cancellation
        }
    }
    
    return all, firstErr
}

Pattern #3: Tickers and Timers Without Stop()

The Problem

func StartMonitor() {
    ticker := time.NewTicker(5 * time.Minute)
    
    go func() {
        for range ticker.C { // ❌ ticker never stopped
            checkHealth()
        }
    }()
}

The Hidden Leak time.Ticker holds a system timer resource. If you don’t call ticker.Stop(), that resource is never freed—even if the goroutine exits. Over months, this accumulates.

The Fix: Defer Stop + Context Awareness

func StartMonitor(ctx context.Context) {
    ticker := time.NewTicker(5 * time.Minute)
    
    go func() {
        defer ticker.Stop() // ✅ Always defer stop
        
        for {
            select {
            case <-ticker.C:
                checkHealth()
            case <-ctx.Done():
                log.Println("Monitor shutting down")
                return
            }
        }
    }()
}

Pattern #4: Goroutines Waiting on Never-Sent Channels

The Problem

func ProcessBatch(items []Item) error {
    done := make(chan error, 1)
    
    go func() {
        for _, item := range items {
            if err := validate(item); err != nil {
                done <- err // ❌ What if len(items)==0? This send never happens.
                return
            }
        }
        done <- nil
    }()
    
    return <-done // ❌ Blocks forever if goroutine doesn't send
}

The Fix: Guarantee Channel Communication

func ProcessBatch(items []Item) error {
    done := make(chan error, 1) // ✅ Buffered to prevent sender block
    
    go func() {
        defer close(done) // ✅ Ensure channel closes even on panic
        
        if len(items) == 0 {
            return // ✅ Early return: no send needed, but defer close handles it
        }
        
        for _, item := range items {
            if err := validate(item); err != nil {
                done <- err
                return
            }
        }
        done <- nil
    }()
    
    err, ok := <-done
    if !ok {
        return fmt.Errorf("processor exited unexpectedly") // ✅ Handle premature close
    }
    return err
}

Pattern #5: Recursive Goroutine Spawning Without Depth Control

The Problem

func crawl(url string, depth int) {
    if depth <= 0 { return }
    
    links := fetchLinks(url)
    for _, link := range links {
        go crawl(link, depth-1) // ❌ Exponential goroutine explosion
    }
}

Why This Is Dangerous??

Because at depth=5 with 10 links/page, you spawn 10⁵ = 100,000 goroutines. Each uses ~2KB stack initially (grows as needed). That’s 200MB+ just for stacks, plus channel overhead.

The Fix: Worker Pool + Depth Tracking

type job struct {
    url   string
    depth int
}

func crawlConcurrently(startURL string, maxDepth int, maxWorkers int) {
    jobs := make(chan job, 100)
    var wg sync.WaitGroup
    
    // Start fixed worker pool
    for i := 0; i < maxWorkers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := range jobs {
                if j.depth <= 0 { continue }
                links := fetchLinks(j.url)
                for _, link := range links {
                    select {
                    case jobs <- job{link, j.depth - 1}: // ✅ Non-blocking send with buffer
                    default:
                        log.Printf("Queue full, skipping %s", link)
                    }
                }
            }
        }()
    }
    
    jobs <- job{startURL, maxDepth}
    close(jobs) // ✅ Signal no more jobs
    wg.Wait()   // ✅ Wait for all workers
}

How to Detect Goroutine Leaks in Production

1. Use runtime.NumGoroutine() for Alerting

// metrics.go
func MonitorGoroutines(threshold int) {
    ticker := time.NewTicker(1 * time.Minute)
    defer ticker.Stop()
    
    for range ticker.C {
        if n := runtime.NumGoroutine(); n > threshold {
            alert(fmt.Sprintf("Goroutine count: %d (threshold: %d)", n, threshold))
        }
    }
}

2. Profile with pprof

# Capture goroutine profile
curl http://localhost:6060/debug/pprof/goroutine?debug=2 > goroutines.prof

# Analyze in Go toolchain
go tool pprof goroutines.prof
(pprof) list StartWorker  # See where goroutines are stuck

3. Add Leak Detection in Tests

func TestNoGoroutineLeak(t *testing.T) {
    initial := runtime.NumGoroutine()
    
    // Run your concurrent code
    runTestScenario()
    
    // Force GC and wait
    runtime.GC()
    time.Sleep(100 * time.Millisecond)
    
    final := runtime.NumGoroutine()
    if leaked := final - initial; leaked > 5 { // Allow small buffer
        t.Errorf("Potential goroutine leak: %d new goroutines", leaked)
    }
}

Before merging concurrent Go code, ask yourself this:

  • Does every go func() have a clear exit condition?
  • Are all context.Context values checked in select statements?
  • Are time.Ticker/Timer stopped with defer?
  • Are channels buffered appropriately to prevent sender block?
  • Is there a maximum concurrency limit for unbounded work?
  • Are loop variables captured correctly (i, url := i, url)?
  • Does defer cancel() follow every context.WithCancel?
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *