Terms AggDate HistogramCardinalityPercentilesPipeline AggsNested AggsGeo AggsAnalytics

Aggregations & Analytics

Bucket, metric, and pipeline aggregations — using Elasticsearch as a real-time analytics engine alongside search.

35 min read9 sections

What Aggregations Are

Aggregations are Elasticsearch's analytics framework — they let you compute summaries, group data into buckets, and derive insights from your documents. Think of them as SQL's GROUP BY + aggregate functions (COUNT, AVG, SUM), but distributed across shards and running in parallel.

Key difference from search: aggregations operate on all matching documents (or the entire index), not just the top N hits. You can run aggregations alongside a search query — the query narrows the document set, and the aggregation summarizes it.

📊

SQL GROUP BY — But Distributed

In SQL, GROUP BY runs on a single node scanning one table. In Elasticsearch, each shard computes partial aggregation results locally, then the coordinating node merges them. It's like having 5 accountants each tallying their portion of receipts, then combining totals — much faster than one accountant doing everything sequentially.

Aggregation vs SQL — Conceptual Mappingtext

SQL:
  SELECT category, COUNT(*), AVG(price)
  FROM products
  WHERE status = 'active'
  GROUP BY category

Elasticsearch equivalent:
  POST /products/_search
  {
    "size": 0,
    "query": { "term": { "status": "active" } },
    "aggs": {
      "by_category": {
        "terms": { "field": "category" },
        "aggs": {
          "avg_price": { "avg": { "field": "price" } }
        }
      }
    }
  }

Key differences:
  - "size": 0 → don't return search hits, only agg results
  - "query" narrows the document set (like WHERE)
  - "aggs" computes analytics on that set (like GROUP BY + aggregates)
  - Runs in parallel across all shards
  - Results are approximate for high-cardinality terms (shard-level merging)

🔑 Aggregations Use doc_values, Not the Inverted Index

Search uses the inverted index (term → docs). Aggregations use doc_values — a columnar, on-disk data structure (doc → values). This is why keyword fields support aggregations but text fields don't: text fields have no doc_values by default. Always use keyword or numeric fields for aggregations.

Three Families of Aggregations

✅Bucket — group documents into buckets (like GROUP BY). Each bucket can contain sub-aggregations
✅Metric — compute numeric values from fields (avg, sum, min, max, cardinality, percentiles)
✅Pipeline — operate on the output of other aggregations (derivative, moving average, cumulative sum)

Bucket Aggregations

Bucket aggregations group documents into buckets based on field values, ranges, or other criteria. Each bucket is essentially a set of documents that share a common trait. You can nest metric or other bucket aggregations inside each bucket.

Bucket Type	What It Does	Use Case
terms	Group by unique field values (top N)	Category breakdown, status distribution
date_histogram	Group by time intervals (minute, hour, day, month)	Time-series charts, daily order counts
histogram	Group by numeric ranges of fixed width	Price distribution (0-10, 10-20, 20-30)
range	Group by custom-defined ranges	Age groups (0-18, 18-35, 35-65, 65+)
filter	Single bucket matching a query	Count of premium users specifically
filters	Named buckets, each matching a different query	Breakdown by multiple conditions
nested	Bucket for nested object arrays	Aggregating on nested comments/reviews
composite	Paginate through all bucket combinations	Export all category+brand combinations

Terms Aggregation with Sub-Aggregationtext

POST /orders/_search
{
  "size": 0,
  "aggs": {
    "by_status": {
      "terms": {
        "field": "status",
        "size": 10,
        "order": { "total_revenue": "desc" }
      },
      "aggs": {
        "total_revenue": {
          "sum": { "field": "amount" }
        },
        "avg_order_value": {
          "avg": { "field": "amount" }
        },
        "recent_orders": {
          "top_hits": {
            "size": 3,
            "sort": [{ "created_at": "desc" }],
            "_source": ["order_id", "amount", "created_at"]
          }
        }
      }
    }
  }
}

Response:
{
  "aggregations": {
    "by_status": {
      "buckets": [
        {
          "key": "completed",
          "doc_count": 15420,
          "total_revenue": { "value": 2847500.00 },
          "avg_order_value": { "value": 184.66 },
          "recent_orders": { "hits": { "hits": [...] } }
        },
        {
          "key": "pending",
          "doc_count": 3200,
          "total_revenue": { "value": 589000.00 },
          "avg_order_value": { "value": 184.06 },
          "recent_orders": { "hits": { "hits": [...] } }
        }
      ]
    }
  }
}

Date Histogram — Time-Series Bucketingtext

POST /orders/_search
{
  "size": 0,
  "aggs": {
    "orders_over_time": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month",
        "format": "yyyy-MM",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2024-01-01",
          "max": "2024-12-31"
        }
      },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } },
        "unique_customers": { "cardinality": { "field": "customer_id" } }
      }
    }
  }
}

# calendar_interval: "minute", "hour", "day", "week", "month", "year"
# fixed_interval: "30s", "1h", "7d" (exact durations)
# min_doc_count: 0 → include empty buckets
# extended_bounds → ensure all months appear even with no data

💡 terms Size vs Accuracy

The size parameter in terms agg controls how many top buckets to return (default 10). Each shard returns its local top N, and the coordinator merges them. For high-cardinality fields, the merged result is approximate — a term ranked #11 on every shard might actually be in the global top 10. Increase shard_size(default 1.5x size) for better accuracy at the cost of memory.

Metric Aggregations

Metric aggregations compute numeric values from document fields. They can be single-value (returning one number) or multi-value (returning a set of statistics). They're typically nested inside bucket aggregations to compute per-bucket metrics.

Metric	Type	What It Computes
avg	Single-value	Arithmetic mean of a numeric field
sum	Single-value	Total sum of a numeric field
min / max	Single-value	Minimum or maximum value
value_count	Single-value	Count of values (including duplicates)
cardinality	Single-value	Approximate unique count (HyperLogLog)
stats	Multi-value	count, min, max, avg, sum in one request
extended_stats	Multi-value	stats + variance, std_deviation, bounds
percentiles	Multi-value	Value at given percentile ranks (p50, p95, p99)
percentile_ranks	Multi-value	Percentile rank of given values
top_hits	Multi-value	Top matching documents per bucket
weighted_avg	Single-value	Average weighted by another field

Cardinality — Approximate Unique Count (HyperLogLog)text

POST /pageviews/_search
{
  "size": 0,
  "aggs": {
    "unique_visitors": {
      "cardinality": {
        "field": "user_id",
        "precision_threshold": 3000
      }
    }
  }
}

# How HyperLogLog works:
#   - Hashes each value, uses leading zeros in hash to estimate cardinality
#   - Memory: ~(precision_threshold * 8) bytes per shard
#   - precision_threshold: 3000 → ~3KB memory, <5% error for up to 3000 unique values
#   - Above threshold: error increases gradually (still typically <5%)
#   - Max precision_threshold: 40000 (~40KB per shard)
#
# Trade-off: exact COUNT(DISTINCT) requires storing all values in memory
#   - 100M unique user IDs = ~800MB memory
#   - HyperLogLog: ~40KB memory regardless of cardinality
#   - Acceptable for dashboards, not for billing

Percentiles — Latency Distribution (TDigest)text

POST /api_logs/_search
{
  "size": 0,
  "query": {
    "range": { "timestamp": { "gte": "now-1h" } }
  },
  "aggs": {
    "latency_percentiles": {
      "percentiles": {
        "field": "response_time_ms",
        "percents": [50, 75, 90, 95, 99, 99.9]
      }
    },
    "latency_stats": {
      "extended_stats": { "field": "response_time_ms" }
    }
  }
}

Response:
{
  "aggregations": {
    "latency_percentiles": {
      "values": {
        "50.0": 45.2,
        "75.0": 89.7,
        "90.0": 156.3,
        "95.0": 234.8,
        "99.0": 567.1,
        "99.9": 1245.6
      }
    },
    "latency_stats": {
      "count": 48520,
      "min": 2.1,
      "max": 5420.3,
      "avg": 78.4,
      "std_deviation": 112.6
    }
  }
}

# TDigest algorithm: approximates percentiles using compression
# More accurate at extremes (p1, p99) than middle (p50)
# compression parameter (default 100): higher = more accurate, more memory

🔑 Exact vs Approximate

sum, avg, min, max are exact. cardinality (HyperLogLog) and percentiles (TDigest) are approximate algorithms designed for distributed systems where exact computation would require collecting all values on one node. The approximation error is well-bounded and configurable.

Pipeline Aggregations

Pipeline aggregations operate on the output of other aggregations rather than on documents directly. They take the computed buckets or metrics as input and derive new values — derivatives, moving averages, cumulative sums, or bucket filtering.

There are two types: parent pipelines (compute a value from a parent bucket's sub-aggregation) and siblingpipelines (compute a value from a sibling aggregation's output).

Pipeline Agg	Type	What It Does
derivative	Parent	Rate of change between consecutive buckets
moving_fn	Parent	Moving window function (avg, sum, custom)
cumulative_sum	Parent	Running total across buckets
bucket_sort	Sibling	Sort or truncate buckets by a metric
bucket_selector	Sibling	Filter out buckets based on a condition
bucket_script	Sibling	Compute new metric from other metrics per bucket
avg_bucket	Sibling	Average of a metric across all buckets
max_bucket / min_bucket	Sibling	Bucket with highest/lowest metric value

Pipeline Aggregations — Revenue Trendstext

POST /orders/_search
{
  "size": 0,
  "aggs": {
    "monthly_revenue": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      },
      "aggs": {
        "revenue": {
          "sum": { "field": "amount" }
        },
        "revenue_derivative": {
          "derivative": {
            "buckets_path": "revenue"
          }
        },
        "cumulative_revenue": {
          "cumulative_sum": {
            "buckets_path": "revenue"
          }
        },
        "revenue_moving_avg": {
          "moving_fn": {
            "buckets_path": "revenue",
            "window": 3,
            "script": "MovingFunctions.unweightedAvg(values)"
          }
        }
      }
    },
    "best_month": {
      "max_bucket": {
        "buckets_path": "monthly_revenue>revenue"
      }
    }
  }
}

Response (simplified):
{
  "aggregations": {
    "monthly_revenue": {
      "buckets": [
        {
          "key_as_string": "2024-01",
          "revenue": { "value": 125000 },
          "cumulative_revenue": { "value": 125000 },
          "revenue_moving_avg": { "value": 125000 }
        },
        {
          "key_as_string": "2024-02",
          "revenue": { "value": 142000 },
          "revenue_derivative": { "value": 17000 },
          "cumulative_revenue": { "value": 267000 },
          "revenue_moving_avg": { "value": 133500 }
        },
        {
          "key_as_string": "2024-03",
          "revenue": { "value": 138000 },
          "revenue_derivative": { "value": -4000 },
          "cumulative_revenue": { "value": 405000 },
          "revenue_moving_avg": { "value": 135000 }
        }
      ]
    },
    "best_month": {
      "value": 142000,
      "keys": ["2024-02-01T00:00:00.000Z"]
    }
  }
}

# derivative: +17000 (Feb grew), -4000 (Mar declined)
# cumulative_sum: running total across months
# moving_fn: 3-month rolling average smooths spikes
# max_bucket: identifies the best-performing month

Bucket Selector — Filter Low-Performing Categoriestext

POST /products/_search
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": { "field": "category", "size": 50 },
      "aggs": {
        "total_sales": { "sum": { "field": "sales_count" } },
        "avg_rating": { "avg": { "field": "rating" } },
        "high_performers_only": {
          "bucket_selector": {
            "buckets_path": {
              "sales": "total_sales",
              "rating": "avg_rating"
            },
            "script": "params.sales > 1000 && params.rating > 4.0"
          }
        }
      }
    }
  }
}

# Only returns category buckets where:
#   total_sales > 1000 AND avg_rating > 4.0
# Useful for dashboards that only show significant segments

💡 buckets_path Syntax

Pipeline aggs reference other aggs using buckets_path. Use > to traverse nested aggs:"monthly_revenue>revenue" means "the revenue metric inside the monthly_revenue bucket agg." Use dot notation for multi-value metrics: "latency_stats.avg".

Nested Aggregations

The real power of Elasticsearch aggregations comes from composing them — nesting bucket aggs inside bucket aggs, with metric aggs at the leaves. This creates a tree of analytics that answers complex multi-dimensional questions in a single request.

Real-World: Sales by Category → Month → Avg Pricetext

POST /sales/_search
{
  "size": 0,
  "query": {
    "range": { "sold_at": { "gte": "2024-01-01", "lte": "2024-12-31" } }
  },
  "aggs": {
    "by_category": {
      "terms": { "field": "category", "size": 20 },
      "aggs": {
        "by_month": {
          "date_histogram": {
            "field": "sold_at",
            "calendar_interval": "month",
            "format": "yyyy-MM"
          },
          "aggs": {
            "avg_price": { "avg": { "field": "price" } },
            "total_revenue": { "sum": { "field": "price" } },
            "unique_buyers": { "cardinality": { "field": "buyer_id" } }
          }
        },
        "category_total": { "sum": { "field": "price" } }
      }
    }
  }
}

# Result structure:
#   Electronics (15,000 docs)
#     ├── 2024-01: avg_price=299, revenue=450K, unique_buyers=1200
#     ├── 2024-02: avg_price=312, revenue=520K, unique_buyers=1450
#     └── ...
#   Clothing (22,000 docs)
#     ├── 2024-01: avg_price=45, revenue=180K, unique_buyers=3200
#     └── ...

# This single query answers:
#   - What are the top categories by volume?
#   - How does each category trend month-over-month?
#   - What's the average price point per category per month?
#   - How many unique buyers per category per month?

E-commerce Dashboard — Multi-Dimensional Analysistext

POST /orders/_search
{
  "size": 0,
  "aggs": {
    "by_region": {
      "terms": { "field": "shipping_region", "size": 10 },
      "aggs": {
        "by_payment_method": {
          "terms": { "field": "payment_method" },
          "aggs": {
            "order_stats": { "stats": { "field": "total_amount" } },
            "avg_delivery_days": { "avg": { "field": "delivery_days" } }
          }
        },
        "region_revenue": { "sum": { "field": "total_amount" } },
        "top_products": {
          "terms": { "field": "product_category", "size": 5 },
          "aggs": {
            "revenue": { "sum": { "field": "total_amount" } }
          }
        }
      }
    }
  }
}

# Answers in one request:
#   - Revenue breakdown by region
#   - Payment method distribution per region
#   - Average order value per region per payment method
#   - Average delivery time per region per payment method
#   - Top 5 product categories per region by revenue

🔑 Depth vs Performance

Each level of nesting multiplies the number of buckets. A terms agg (size 10) → date_histogram (12 months) → terms (size 5) creates 10 × 12 × 5 = 600 leaf buckets. Each leaf runs its metric aggs. Keep nesting depth to 3-4 levels max and use reasonable size values to avoid memory pressure.

🌳

A Tree of Questions

Nested aggregations form a tree. The root is your full document set. Each bucket agg creates branches (categories, time periods, regions). Metric aggs are the leaves that compute final values. You're building a pivot table — but one that runs across a distributed cluster in milliseconds.

Geo Aggregations

Geo aggregations bucket documents by geographic criteria — distance from a point, grid cells, or bounding boxes. They power store locators, delivery zone analysis, heatmaps, and location-based analytics dashboards.

Geo Agg	What It Does	Use Case
geo_distance	Buckets by distance rings from a point	Store locator: 0-1km, 1-5km, 5-20km
geohash_grid	Buckets by geohash precision cells	Heatmaps, density visualization
geotile_grid	Buckets by map tile coordinates	Map-based clustering at zoom levels
geo_centroid	Computes center point of geo_points in a bucket	Pin placement for clustered markers
geo_bounds	Computes bounding box of all points in a bucket	Auto-zoom map to fit results

Store Locator — Distance Rings with Countstext

POST /stores/_search
{
  "size": 0,
  "aggs": {
    "distance_rings": {
      "geo_distance": {
        "field": "location",
        "origin": { "lat": 40.7128, "lon": -74.0060 },
        "unit": "km",
        "ranges": [
          { "to": 1, "key": "walking" },
          { "from": 1, "to": 5, "key": "biking" },
          { "from": 5, "to": 20, "key": "driving" },
          { "from": 20, "key": "far" }
        ]
      },
      "aggs": {
        "avg_rating": { "avg": { "field": "rating" } },
        "center": { "geo_centroid": { "field": "location" } }
      }
    }
  }
}

Response:
{
  "aggregations": {
    "distance_rings": {
      "buckets": [
        { "key": "walking", "doc_count": 3, "avg_rating": { "value": 4.2 },
          "center": { "location": { "lat": 40.713, "lon": -74.005 } } },
        { "key": "biking", "doc_count": 12, "avg_rating": { "value": 3.9 },
          "center": { "location": { "lat": 40.718, "lon": -73.998 } } },
        { "key": "driving", "doc_count": 45, "avg_rating": { "value": 4.1 },
          "center": { "location": { "lat": 40.735, "lon": -73.982 } } }
      ]
    }
  }
}

Geohash Grid — Heatmap Datatext

POST /deliveries/_search
{
  "size": 0,
  "aggs": {
    "delivery_heatmap": {
      "geohash_grid": {
        "field": "dropoff_location",
        "precision": 5
      },
      "aggs": {
        "center": { "geo_centroid": { "field": "dropoff_location" } },
        "avg_delivery_time": { "avg": { "field": "delivery_minutes" } }
      }
    }
  }
}

# precision 1 = ~5000km cells (continent level)
# precision 3 = ~156km cells (region level)
# precision 5 = ~5km cells (neighborhood level)
# precision 7 = ~150m cells (street level)
#
# Use case: identify delivery hotspots and slow zones
# Each bucket = a grid cell with count + centroid + metrics
# Feed directly into map visualization libraries

💡 Geo Fields Must Be geo_point Type

Geo aggregations only work on fields mapped as geo_point. If your data has separate lat/lon fields, create a geo_point field in your mapping and index coordinates as{ "lat": 40.71, "lon": -74.00 }. You cannot aggregate on raw numeric lat/lon fields.

Performance Considerations

Aggregations can be expensive — they process every matching document, not just the top N hits. A poorly designed aggregation on a large index can consume gigabytes of memory and take seconds to complete. Understanding the performance characteristics helps you design efficient analytics queries.

Optimization Strategies

✅Use 'size': 0 when you only need agg results — skips the expensive scoring and fetching of hits
✅Prefer keyword fields over text for aggregations — keyword uses doc_values (columnar, disk-based), text requires fielddata (heap memory, disabled by default)
✅Set reasonable 'size' on terms aggs — default 10 is fine for dashboards; avoid size: 100000 which loads all buckets into memory
✅Use 'execution_hint': 'map' for low-cardinality terms aggs on high-doc-count indices — avoids building global ordinals
✅Leverage the request cache — aggregations on static/time-based indices are cached at shard level; use 'request_cache': true explicitly for time-filtered queries
✅Use 'filter' agg instead of query context when you need multiple different filtered agg sets — avoids re-executing the query per agg
✅Pre-compute with transforms — for expensive recurring aggregations, use ES transforms to materialize summary indices on a schedule
✅Use composite agg for pagination — instead of terms with size: 100000, paginate through all buckets with composite agg's after_key
✅Avoid deep nesting (>3-4 levels) — bucket count multiplies at each level; 10 × 12 × 10 × 5 = 6000 leaf buckets per request
✅Use 'shard_size' wisely — higher shard_size improves terms agg accuracy but increases memory; default is 1.5 × size + 10

Performance Anti-Pattern vs Optimizedtext

# ❌ SLOW — aggregating on text field (requires fielddata)
POST /logs/_search
{
  "aggs": {
    "by_message": {
      "terms": { "field": "message" }
    }
  }
}
# Error: "Fielddata is disabled on text fields by default"
# Even if enabled: loads ALL unique tokens into JVM heap → OOM risk

# ✅ FAST — use keyword sub-field or dedicated keyword field
POST /logs/_search
{
  "size": 0,
  "aggs": {
    "by_level": {
      "terms": { "field": "level" }
    }
  }
}
# "level" is keyword → uses doc_values (disk-based, memory-efficient)

# ❌ SLOW — fetching all 50,000 categories
{
  "aggs": {
    "all_categories": {
      "terms": { "field": "category", "size": 50000 }
    }
  }
}

# ✅ FAST — paginate with composite aggregation
{
  "size": 0,
  "aggs": {
    "all_categories": {
      "composite": {
        "size": 1000,
        "sources": [
          { "category": { "terms": { "field": "category" } } }
        ],
        "after": { "category": "last_seen_value" }
      }
    }
  }
}
# Fetches 1000 at a time, memory-bounded, deterministic ordering

🔑 The Request Cache

ES caches aggregation results at the shard level. The cache key includes the query + agg definition. Cache is invalidated when the shard's data changes (refresh). For time-series data, old shards (yesterday's index) never change → their agg results stay cached indefinitely. This is why time-based index patterns (logs-2024.01.15) are so effective for analytics.

Interview Questions

Q:How does the cardinality aggregation work internally?

A: Cardinality uses the HyperLogLog++ (HLL) algorithm. It hashes each value and uses the pattern of leading zeros in the hash to estimate the number of unique values. The key insight: if you've seen a hash with 5 leading zeros, you've probably seen ~2^5 = 32 unique values. It uses multiple 'registers' (controlled by precision_threshold) to average estimates. Memory usage is fixed (~precision_threshold × 8 bytes) regardless of actual cardinality — 40KB can estimate billions of unique values with <5% error. Trade-off: not suitable for exact counts (billing), but perfect for dashboards (unique visitors, distinct IPs).

Q:Why are terms aggregation results approximate across shards?

A: Each shard computes its local top-N terms independently. The coordinating node merges these partial results. The problem: a term that's #11 on every shard (just below the cutoff) might actually be in the global top 10 when counts are summed. ES mitigates this with shard_size (fetches more candidates per shard than the final 'size' requests). Default shard_size = size × 1.5 + 10. For exact results on low-cardinality fields, set size >= total unique values. For high-cardinality fields, accept the approximation or use the composite aggregation which paginates through ALL buckets deterministically.

Q:When would you use pipeline aggregations instead of computing in application code?

A: Use pipeline aggs when: (1) you need derivatives/rates of change across time buckets — computing month-over-month growth in-cluster avoids transferring all raw bucket data to the app; (2) you need moving averages for smoothing — the moving_fn runs efficiently on already-computed bucket values; (3) you need to filter buckets by computed metrics (bucket_selector) — e.g., only return categories where avg_price > 100; (4) you need cumulative sums for running totals. The key benefit: pipeline aggs run on the coordinating node after shard-level aggs complete, so they add minimal overhead while saving network transfer and app-side computation.

Q:How do Elasticsearch aggregations differ from SQL GROUP BY?

A: Key differences: (1) Distribution — ES aggs run in parallel across shards, SQL GROUP BY runs on one node; (2) Approximation — ES terms agg is approximate for high cardinality (shard-level merging), SQL is exact; (3) Nesting — ES supports arbitrary nesting (terms → date_histogram → avg), SQL requires subqueries or window functions; (4) Real-time — ES aggs run on the live index with near-real-time data, SQL may need materialized views for performance; (5) Pipeline — ES has built-in derivative, moving average, cumulative sum as first-class pipeline aggs; (6) Trade-off — ES sacrifices exactness for speed and distribution, SQL sacrifices speed for exactness.

Q:How would you design an analytics dashboard backed by Elasticsearch?

A: Architecture: (1) Use time-based indices (metrics-2024.01.15) so old shards are immutable and cacheable; (2) Set 'size': 0 on all dashboard queries — you only need agg results, not hits; (3) Use date_histogram as the primary bucket for time-series charts; (4) Nest terms aggs for dimensional breakdowns (by region, by product); (5) Use pipeline aggs for trends (derivative for growth rate, moving_fn for smoothing); (6) Leverage request_cache — immutable old indices have 100% cache hit rate; (7) For expensive aggregations that don't need real-time data, use ES transforms to pre-compute summary indices on a schedule (e.g., hourly rollups); (8) Use composite agg for paginated exports instead of massive terms size values.

Common Mistakes

🔤

Aggregating on text fields

Running a terms aggregation on a 'text' field. Text fields are analyzed (broken into tokens), so aggregating on them returns individual tokens ('quick', 'brown', 'fox') instead of full values ('the quick brown fox'). ES disables fielddata by default to prevent this — you'll get an error.

✅Always aggregate on keyword fields. If you need both full-text search and aggregation on the same field, use the multi-field pattern: map as text with a .raw keyword sub-field. Aggregate on 'field_name.raw' instead of 'field_name'.

💾

High cardinality terms without size limit

Running a terms aggregation with size: 100000 on a field with millions of unique values (user IDs, session IDs). This loads all bucket data into the coordinating node's heap memory, causing GC pressure or OutOfMemoryError.

✅Use the composite aggregation to paginate through high-cardinality fields in bounded batches (size: 1000 with after_key). For approximate top-N, keep terms size reasonable (10-100) and accept that results are approximate. For exact unique counts, use cardinality agg (HyperLogLog) instead of counting terms buckets.

🎯

Not using filter context with aggregations

Running expensive aggregations on the entire index when you only need results for a subset. Or using a query that scores documents when you only need filtering — scoring adds overhead that aggregations don't benefit from.

✅Wrap your query in a bool/filter context (no scoring) when you only need to narrow the document set for aggregations. Use 'size': 0 to skip hit retrieval entirely. For multiple filtered agg sets, use the filters aggregation instead of running separate queries.

📊

Ignoring shard-level approximation in terms aggs

Treating terms aggregation doc_counts as exact numbers for billing, compliance, or SLA reporting. The counts are approximate because each shard returns its local top-N, and a term might be undercounted if it's just below the cutoff on some shards.

✅For exact counts: use the composite aggregation (paginates through ALL values deterministically). For better accuracy with terms agg: increase shard_size (e.g., shard_size: 5000 when size: 100). Check the 'doc_count_error_upper_bound' and 'sum_other_doc_count' fields in the response to understand the error margin.

🐌

Deep nesting without considering bucket explosion

Nesting 5+ levels of bucket aggregations without considering the multiplicative effect. A terms(20) → date_histogram(365 days) → terms(10) creates 73,000 leaf buckets, each computing metrics. Response size explodes and memory usage spikes.

✅Limit nesting to 3-4 levels. Use reasonable size values at each level. Consider whether you actually need all dimensions in one query — often two simpler queries are faster than one deeply nested one. Use the 'collect_mode: breadth_first' for deep trees where only top-level buckets matter.