You deployed your Go microservice.
It handles requests beautifully.
Load tests passed.
But three weeks later, your monitoring alert, memory usage climbing steadily. No panic. No crash. Just a slow, insidious creep toward OOM.
This isn’t hypothetical. It’s the No.1 concurrency bug I’ve debugged in production Go systems over 6 years of backend engineering. And the culprit? Goroutine leaks—tiny, forgotten threads that never terminate, holding references, blocking channels, and silently consuming resources.
In this guide, you’ll learn:
- The 5 concurrency patterns that cause memory leaks (with code examples)
- How to detect leaks early using
pprofand runtime metrics - Production-ready patterns for safe goroutine lifecycle management
- When to use
context.WithCancel,errgroup, andsync.WaitGroupcorrectly
Let’s dive in.
Pattern #1: Forgotten Context Cancellation
The Problem
func StartWorker(ctx context.Context, dataChan chan Data) {
go func() {
for {
select {
case data := <-dataChan:
process(data)
// ❌ Missing: <-ctx.Done()
}
}
}()
}
What happens: When the parent context is cancelled, this goroutine keeps running forever, blocked on dataChan. If dataChan is never written to again, the goroutine is leaked.
Go’s scheduler doesn’t automatically terminate goroutines when their parent context ends. Context is a cooperative cancellation signal—your code must explicitly check
ctx.Done(). The garbage collector cannot reclaim a running goroutine, even if it’s blocked indefinitely
The Fix: Always Listen to Context
func StartWorker(ctx context.Context, dataChan chan Data) {
go func() {
defer log.Println("Worker shutting down") // Always defer cleanup logs
for {
select {
case data := <-dataChan:
process(data)
case <-ctx.Done():
// ✅ Graceful shutdown: drain channel if needed
for len(dataChan) > 0 {
data := <-dataChan
process(data) // Or queue for later
}
return
}
}
}()
}
Pro Tip: Use context.WithTimeout for batch operations to prevent indefinite hangs
ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second)
defer cancel() // Critical: always defer cancel to free resources
Pattern #2: Unbuffered Channels Without Exit Strategy
The Problem
func FetchAll(urls []string) []Result {
results := make(chan Result)
for _, url := range urls {
go func(u string) {
results <- fetch(u) // ❌ Blocks forever if receiver stops reading
}(url)
}
// ❌ What if one fetch hangs? This range blocks forever.
var all []Result
for r := range results {
all = append(all, r)
}
return all
}
Why This Leaks??
If any goroutine blocks on results <- fetch(u) because the receiver has stopped reading (e.g., due to an error), that goroutine is leaked. Worse, if you have 100 URLs and 1 hangs, you leak 99 goroutines waiting to send.
The Fix: Bounded Concurrency + Error Handling
func FetchAll(ctx context.Context, urls []string) ([]Result, error) {
type indexedResult struct {
idx int
res Result
err error
}
results := make(chan indexedResult, len(urls)) // ✅ Buffered to prevent send-block
sem := make(chan struct{}, 10) // ✅ Limit concurrent fetches
for i, url := range urls {
i, url := i, url // ✅ Capture loop variables
go func() {
sem <- struct{}{} // Acquire semaphore
defer func() { <-sem }() // Release
res, err := fetchWithContext(ctx, url)
results <- indexedResult{idx: i, res: res, err: err}
}()
}
// Collect results with context awareness
var all = make([]Result, len(urls))
var firstErr error
for i := 0; i < len(urls); i++ {
select {
case r := <-results:
if r.err != nil && firstErr == nil {
firstErr = r.err
}
if r.res != nil {
all[r.idx] = r.res
}
case <-ctx.Done():
return nil, ctx.Err() // ✅ Propagate cancellation
}
}
return all, firstErr
}
Pattern #3: Tickers and Timers Without Stop()
The Problem
func StartMonitor() {
ticker := time.NewTicker(5 * time.Minute)
go func() {
for range ticker.C { // ❌ ticker never stopped
checkHealth()
}
}()
}
The Hidden Leak time.Ticker holds a system timer resource. If you don’t call ticker.Stop(), that resource is never freed—even if the goroutine exits. Over months, this accumulates.
The Fix: Defer Stop + Context Awareness
func StartMonitor(ctx context.Context) {
ticker := time.NewTicker(5 * time.Minute)
go func() {
defer ticker.Stop() // ✅ Always defer stop
for {
select {
case <-ticker.C:
checkHealth()
case <-ctx.Done():
log.Println("Monitor shutting down")
return
}
}
}()
}
Pattern #4: Goroutines Waiting on Never-Sent Channels
The Problem
func ProcessBatch(items []Item) error {
done := make(chan error, 1)
go func() {
for _, item := range items {
if err := validate(item); err != nil {
done <- err // ❌ What if len(items)==0? This send never happens.
return
}
}
done <- nil
}()
return <-done // ❌ Blocks forever if goroutine doesn't send
}
The Fix: Guarantee Channel Communication
func ProcessBatch(items []Item) error {
done := make(chan error, 1) // ✅ Buffered to prevent sender block
go func() {
defer close(done) // ✅ Ensure channel closes even on panic
if len(items) == 0 {
return // ✅ Early return: no send needed, but defer close handles it
}
for _, item := range items {
if err := validate(item); err != nil {
done <- err
return
}
}
done <- nil
}()
err, ok := <-done
if !ok {
return fmt.Errorf("processor exited unexpectedly") // ✅ Handle premature close
}
return err
}
Pattern #5: Recursive Goroutine Spawning Without Depth Control
The Problem
func crawl(url string, depth int) {
if depth <= 0 { return }
links := fetchLinks(url)
for _, link := range links {
go crawl(link, depth-1) // ❌ Exponential goroutine explosion
}
}
Why This Is Dangerous??
Because at depth=5 with 10 links/page, you spawn 10⁵ = 100,000 goroutines. Each uses ~2KB stack initially (grows as needed). That’s 200MB+ just for stacks, plus channel overhead.
The Fix: Worker Pool + Depth Tracking
type job struct {
url string
depth int
}
func crawlConcurrently(startURL string, maxDepth int, maxWorkers int) {
jobs := make(chan job, 100)
var wg sync.WaitGroup
// Start fixed worker pool
for i := 0; i < maxWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
if j.depth <= 0 { continue }
links := fetchLinks(j.url)
for _, link := range links {
select {
case jobs <- job{link, j.depth - 1}: // ✅ Non-blocking send with buffer
default:
log.Printf("Queue full, skipping %s", link)
}
}
}
}()
}
jobs <- job{startURL, maxDepth}
close(jobs) // ✅ Signal no more jobs
wg.Wait() // ✅ Wait for all workers
}
How to Detect Goroutine Leaks in Production
1. Use runtime.NumGoroutine() for Alerting
// metrics.go
func MonitorGoroutines(threshold int) {
ticker := time.NewTicker(1 * time.Minute)
defer ticker.Stop()
for range ticker.C {
if n := runtime.NumGoroutine(); n > threshold {
alert(fmt.Sprintf("Goroutine count: %d (threshold: %d)", n, threshold))
}
}
}
2. Profile with pprof
# Capture goroutine profile
curl http://localhost:6060/debug/pprof/goroutine?debug=2 > goroutines.prof
# Analyze in Go toolchain
go tool pprof goroutines.prof
(pprof) list StartWorker # See where goroutines are stuck
3. Add Leak Detection in Tests
func TestNoGoroutineLeak(t *testing.T) {
initial := runtime.NumGoroutine()
// Run your concurrent code
runTestScenario()
// Force GC and wait
runtime.GC()
time.Sleep(100 * time.Millisecond)
final := runtime.NumGoroutine()
if leaked := final - initial; leaked > 5 { // Allow small buffer
t.Errorf("Potential goroutine leak: %d new goroutines", leaked)
}
}
Before merging concurrent Go code, ask yourself this:
- Does every
go func()have a clear exit condition? - Are all
context.Contextvalues checked inselectstatements? - Are
time.Ticker/Timerstopped withdefer? - Are channels buffered appropriately to prevent sender block?
- Is there a maximum concurrency limit for unbounded work?
- Are loop variables captured correctly (
i, url := i, url)? - Does
defer cancel()follow everycontext.WithCancel?