Operations & Tuning
Index Lifecycle Management, vector search, performance tuning, snapshots, monitoring, and knowing when Elasticsearch is NOT the right choice.
Table of Contents
Index Lifecycle Management (ILM)
Time-series data (logs, metrics, events) grows indefinitely. You can't keep everything on fast SSDs forever. ILM automates the lifecycle of indices β moving data through phases based on age or size, optimizing cost and performance at each stage.
Phase Flow: HOT β WARM β COLD β FROZEN β DELETE βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ β HOT βββββΆβ WARM βββββΆβ COLD βββββΆβ FROZEN βββββΆβ DELETE β β β β β β β β β β β β Active β β No new β β Rarely β β Cheapestβ β Remove β β writes β β writes β β queried β β storage β β entirelyβ β Fast SSDβ β Warm SSDβ β HDD β β Snapshotβ β β β Full β β Shrink β β Freeze β β mount β β β β replicasβ β replicasβ β replicasβ β β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ Typical timeline for logs: HOT: 0-3 days (actively written, frequently searched) WARM: 3-30 days (read-only, occasionally searched) COLD: 30-90 days (rarely accessed, compliance retention) FROZEN: 90-365 days (searchable snapshots, near-zero cost) DELETE: >365 days (removed entirely)
PUT _ilm/policy/logs-policy { "policy": { "phases": { "hot": { "min_age": "0ms", "actions": { "rollover": { "max_primary_shard_size": "50gb", "max_age": "1d" }, "set_priority": { "priority": 100 } } }, "warm": { "min_age": "3d", "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 }, "set_priority": { "priority": 50 }, "allocate": { "require": { "data": "warm" } } } }, "cold": { "min_age": "30d", "actions": { "allocate": { "require": { "data": "cold" } }, "set_priority": { "priority": 0 } } }, "frozen": { "min_age": "90d", "actions": { "searchable_snapshot": { "snapshot_repository": "my-s3-repo" } } }, "delete": { "min_age": "365d", "actions": { "delete": {} } } } } }
Rollover & Data Streams
Rollover automatically creates a new index when the current one hits a size or age threshold. Data streams are the modern abstraction β they manage a series of backing indices behind a single name, with automatic rollover built in.
# Data stream = append-only time-series abstraction # Behind the scenes: a series of backing indices Data Stream: "logs-nginx" βββ .ds-logs-nginx-2024.01.01-000001 (oldest, cold) βββ .ds-logs-nginx-2024.01.15-000002 (warm) βββ .ds-logs-nginx-2024.01.28-000003 (warm) βββ .ds-logs-nginx-2024.02.10-000004 (current write index, hot) # Writes always go to the latest backing index # Reads span all backing indices transparently # Rollover creates a new backing index automatically # Create an index template for the data stream: PUT _index_template/logs-template { "index_patterns": ["logs-*"], "data_stream": {}, "template": { "settings": { "index.lifecycle.name": "logs-policy", "number_of_shards": 3, "number_of_replicas": 1 }, "mappings": { "properties": { "@timestamp": { "type": "date" }, "message": { "type": "text" }, "level": { "type": "keyword" }, "service": { "type": "keyword" } } } } }
π― Data Streams Are the Modern Way
For any time-series data (logs, metrics, traces), use data streams instead of manually managing index aliases and rollover. Data streams enforce append-only semantics, handle rollover automatically, and integrate cleanly with ILM policies.
Vector & Semantic Search
Traditional search matches keywords β if the user searches "affordable laptop" but the document says "budget notebook," BM25 won't find it. Vector search encodes meaning into dense embeddings, enabling semantic similarity matching. Elasticsearch supports both and can combine them.
# Step 1: Define a dense_vector field in your mapping PUT /products { "mappings": { "properties": { "title": { "type": "text" }, "description": { "type": "text" }, "embedding": { "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" } } } } # Step 2: Index documents with embeddings # (embeddings generated by a model like sentence-transformers) POST /products/_doc/1 { "title": "Budget Notebook Computer", "description": "Lightweight laptop for everyday use", "embedding": [0.12, -0.34, 0.56, ...] // 768-dimensional vector } # Step 3: kNN query β find semantically similar documents POST /products/_search { "knn": { "field": "embedding", "query_vector": [0.11, -0.33, 0.55, ...], // query embedding "k": 10, "num_candidates": 100 } } # How HNSW works (the index structure): # - Hierarchical Navigable Small World graph # - Multi-layer graph: top layers = long-range links, bottom = fine-grained # - Search starts at top layer, greedily descends to nearest neighbors # - Approximate: trades perfect recall for speed (configurable) # - Parameters: m (connections per node), ef_construction (build quality)
Hybrid Search β BM25 + Vector with RRF
Pure keyword search misses semantic matches. Pure vector search misses exact keyword matches (e.g., product SKUs, error codes). Hybrid search combines both using Reciprocal Rank Fusion (RRF) to merge ranked results from each approach.
POST /products/_search { "query": { "match": { "description": "affordable lightweight laptop" } }, "knn": { "field": "embedding", "query_vector": [0.11, -0.33, 0.55, ...], "k": 10, "num_candidates": 100 }, "rank": { "rrf": { "window_size": 100, "rank_constant": 60 } } } # How RRF works: # For each document, compute: score = Ξ£ 1/(rank_constant + rank_i) # where rank_i is the document's rank in each result set # # Example: # Doc A: rank 1 in BM25, rank 5 in kNN # Doc B: rank 3 in BM25, rank 1 in kNN # # Score A = 1/(60+1) + 1/(60+5) = 0.0164 + 0.0154 = 0.0318 # Score B = 1/(60+3) + 1/(60+1) = 0.0159 + 0.0164 = 0.0323 # Doc B wins (strong in both) # When hybrid beats pure approaches: # - "error code NX-4012 memory leak" β keyword matches the code, vector # matches the concept # - "cheap macbook alternative" β vector understands "cheap" = "affordable", # keyword catches "macbook" exactly
Two Librarians Working Together
Keyword search is like a librarian who only matches exact words in the catalog. Vector search is like a librarian who understands what you mean even if you use different words. Hybrid search asks both librarians, then picks books that both recommend highly β giving you the best of exact matching and semantic understanding.
π‘ Embedding Model Choice Matters
The quality of vector search depends entirely on the embedding model. General-purpose models (sentence-transformers, OpenAI embeddings) work for broad content. Domain-specific fine-tuned models dramatically outperform them for specialized content (medical, legal, e-commerce). Always evaluate retrieval quality with your actual data.
Index Design Patterns
How you structure indices determines performance, scalability, and operational complexity. Unlike relational databases, Elasticsearch has no joins β your index design must account for query patterns upfront.
Index Design Strategies
- β Time-based indices (logs-2024.01.15): natural ILM alignment, easy to delete old data, queries target specific time ranges
- β Index per tenant: strong isolation, independent scaling, simple access control β but operational overhead grows with tenant count
- β Single index with tenant field: simpler ops, use filtered aliases for tenant isolation β but noisy neighbor risk
- β Denormalize aggressively: ES has no joins β embed related data into documents at index time (e.g., order contains full product info)
- β Nested objects: use when array elements must be queried independently (e.g., 'find orders where item.color=red AND item.size=large' on the same item)
- β Flattened fields: use for high-cardinality dynamic keys (e.g., user-defined labels) to prevent mapping explosion
# Problem: You need to change a field mapping, but mappings are immutable. # Solution: Create a new index, reindex data, swap the alias atomically. # Step 1: Your app always queries the alias, never the real index name # App β "products" (alias) β products-v1 (real index) # Step 2: Create new index with updated mapping PUT /products-v2 { "mappings": { ... updated mapping ... } } # Step 3: Reindex data from old to new POST /_reindex { "source": { "index": "products-v1" }, "dest": { "index": "products-v2" } } # Step 4: Atomic alias swap (zero downtime) POST /_aliases { "actions": [ { "remove": { "index": "products-v1", "alias": "products" } }, { "add": { "index": "products-v2", "alias": "products" } } ] } # Step 5: Delete old index when confident DELETE /products-v1 # The app never knew anything changed β it always queries "products"
π― Always Use Aliases
Never let applications query index names directly. Always use aliases. This gives you the freedom to reindex, split, shrink, or restructure indices without any application changes. It's the ES equivalent of a database view.
Performance Tuning
Elasticsearch performance tuning splits into two domains: indexing throughput (how fast you can write) and search latency (how fast you can read). They often trade off against each other.
Indexing Performance
| Technique | What It Does | Impact |
|---|---|---|
| Bulk API | Batch multiple index/update/delete operations in one request | 10-100x faster than individual requests β always use bulk for batch loads |
| refresh_interval: 30s | Increase from default 1s during bulk loads | Fewer segment creates = faster indexing (data visible with delay) |
| number_of_replicas: 0 | Disable replicas during initial bulk load | Halves write work β re-enable after load completes |
| Disable swapping | bootstrap.memory_lock: true | Prevents JVM heap from being swapped to disk (catastrophic for latency) |
| Translog flush threshold | Increase index.translog.flush_threshold_size | Fewer fsyncs during heavy indexing β slight durability trade-off |
| Mapping: index false | Set index: false on fields you never search | Saves CPU and disk β field is stored but not indexed |
Search Performance
| Technique | What It Does | When to Use |
|---|---|---|
| Filter context | Filters are cached as bitsets, not scored | Always use filter for yes/no conditions (status, date ranges, tenant ID) |
| Avoid leading wildcards | query: '*phone' forces scan of all terms | Use edge n-grams or reverse field instead of leading wildcard queries |
| search_after | Cursor-based pagination using sort values | Deep pagination (page 1000+) β from/size degrades linearly |
| Profile API | _search with profile: true shows time per query phase | Diagnosing slow queries β identifies which clause is expensive |
| Routing | Direct queries to specific shards | Multi-tenant: route by tenant_id so queries hit 1 shard instead of all |
| Shard sizing | Target 10-50 GB per shard | Too many small shards = overhead; too few large shards = slow queries |
# Problem: from: 10000, size: 10 requires coordinating node to # fetch 10010 docs from EACH shard, then discard 10000. Extremely wasteful. # Solution: search_after uses the sort values of the last result as cursor # First page: POST /logs/_search { "size": 100, "sort": [ { "@timestamp": "desc" }, { "_id": "asc" } ], "query": { "match": { "level": "error" } } } # Next page β pass sort values from last hit: POST /logs/_search { "size": 100, "sort": [ { "@timestamp": "desc" }, { "_id": "asc" } ], "query": { "match": { "level": "error" } }, "search_after": [1707523200000, "doc_abc123"] } # Each shard only returns docs AFTER the cursor β no wasted work # Consistent performance regardless of page depth
Memory & Resource Management
Elasticsearch runs on the JVM but relies heavily on the OS page cache for Lucene segment reads. Getting the memory split wrong is one of the most common causes of cluster instability.
Total Server RAM: 64 GB (example) βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β 64 GB Total RAM β ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ€ β JVM Heap: 31 GB β OS Page Cache: ~31 GB β β (ES internals) β (Lucene segment files) β β β β β β’ Field data β β’ Segment reads (search) β β β’ Node query β β’ Merges β β cache β β’ Stored fields β β β’ Indexing buffer β β’ Doc values β β β’ Cluster state β β’ Term dictionaries β β β’ Aggregations β β ββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ Rules: 1. JVM heap β€ 50% of RAM (leave rest for page cache) 2. JVM heap β€ 31 GB (compressed oops threshold) 3. Both Xms and Xmx must be equal (avoid resize pauses) 4. Never disable swap β use memory_lock instead
π‘ The 32 GB Compressed Oops Boundary
The JVM uses "compressed ordinary object pointers" (oops) when heap is below ~32 GB. This lets 4-byte pointers address 32 GB of memory. Above 32 GB, pointers expand to 8 bytes β effectively wasting ~30% of heap on pointer overhead. A 31 GB heap often outperforms a 40 GB heap. Never set heap between 32-40 GB.
| Resource | Purpose | Tuning Guidance |
|---|---|---|
| JVM Heap | ES internal data structures, caches, aggregation buffers | Set to min(50% RAM, 31 GB). Equal Xms/Xmx. G1GC for heaps > 8 GB. |
| OS Page Cache | Caches Lucene segment files for fast reads | Leave at least 50% RAM for this. More = faster searches. |
| Field Data Cache | In-memory uninverted index for text field aggregations | AVOID β use keyword fields or doc_values instead. Set indices.fielddata.cache.size limit. |
| Node Query Cache | Caches filter clause results as bitsets | Default 10% heap. Effective for repeated filters (tenant_id, status). |
| Indexing Buffer | Buffers new documents before creating segments | Default 10% heap. Increase for heavy indexing workloads. |
| Circuit Breakers | Prevent OOM by rejecting requests that would exceed limits | Parent breaker: 95% heap. Don't disable β they protect cluster stability. |
Monitoring & Reliability
Elasticsearch clusters degrade silently before they fail loudly. Monitoring the right metrics gives you early warning. Snapshots give you recovery when things go wrong.
| Metric | What It Means | Alert Threshold | Action |
|---|---|---|---|
| Cluster Health | green/yellow/red β shard allocation status | Yellow > 5 min, Red immediately | Yellow = unassigned replicas. Red = unassigned primaries (data loss risk). |
| JVM Heap Usage | Percentage of heap in use | > 85% sustained | Frequent GC, risk of OOM. Reduce caches, add nodes, or increase heap (up to 31 GB). |
| GC Time | Time spent in garbage collection | > 500ms per collection, or > 5% of time in GC | Long GC pauses cause node timeouts. Check for field data, large aggregations. |
| Search Latency (p99) | 99th percentile search response time | Depends on SLA (e.g., > 500ms) | Check slow log, profile queries, verify shard sizing. |
| Indexing Rate | Documents indexed per second | Sudden drop or spike | Drop = upstream issue or rejections. Spike = bulk load affecting search. |
| Thread Pool Rejections | Requests rejected due to full queue | > 0 (search or write rejections) | Cluster is overloaded. Scale out or reduce request rate. |
| Disk Watermarks | Low (85%), High (90%), Flood (95%) | Approaching low watermark | At high: no new shards allocated. At flood: index set to read-only. |
Snapshot & Restore
# Register a snapshot repository (S3 example) PUT /_snapshot/my-s3-repo { "type": "s3", "settings": { "bucket": "my-es-backups", "region": "us-east-1", "base_path": "elasticsearch/snapshots" } } # Create a snapshot (incremental β only new/changed segments) PUT /_snapshot/my-s3-repo/snapshot-2024-02-10 { "indices": "logs-*,products", "ignore_unavailable": true, "include_global_state": false } # Automate with SLM (Snapshot Lifecycle Management) PUT /_slm/policy/nightly-snapshots { "schedule": "0 30 2 * * ?", "name": "<nightly-snap-{now/d}>", "repository": "my-s3-repo", "config": { "indices": ["*"], "ignore_unavailable": true }, "retention": { "expire_after": "30d", "min_count": 5, "max_count": 50 } } # Restore a specific index from snapshot POST /_snapshot/my-s3-repo/snapshot-2024-02-10/_restore { "indices": "products", "rename_pattern": "(.+)", "rename_replacement": "restored-$1" }
Slow Log
# Enable slow log for an index PUT /products/_settings { "index.search.slowlog.threshold.query.warn": "5s", "index.search.slowlog.threshold.query.info": "2s", "index.search.slowlog.threshold.fetch.warn": "1s", "index.indexing.slowlog.threshold.index.warn": "10s", "index.indexing.slowlog.threshold.index.info": "5s" } # Slow log output includes: # - The full query that was slow # - Which shard it ran on # - Total time breakdown (query phase, fetch phase) # - Number of hits # Use slow log to: # 1. Identify expensive queries in production # 2. Find queries that need optimization (wildcards, deep aggs) # 3. Detect shard hotspots (one shard consistently slow)
π― Shard Allocation Awareness
Use shard allocation awareness to spread replicas across failure domains (availability zones, racks). This ensures that losing one zone doesn't lose both primary and replica of the same shard. Configure with cluster.routing.allocation.awareness.attributes.
Security
Elasticsearch clusters exposed without security have been the source of numerous data breaches. Since version 8.0, security is enabled by default β but understanding the layers is essential.
| Layer | Mechanism | Scope |
|---|---|---|
| Encryption in transit | TLS for HTTP (clientβnode) and transport (nodeβnode) | Prevents eavesdropping and MITM attacks on cluster traffic |
| Authentication | Native realm, LDAP, Active Directory, SAML, OIDC, PKI (mTLS) | Verifies identity of users and services connecting to the cluster |
| Authorization (RBAC) | Roles with index, cluster, and field-level privileges | Controls what authenticated users can do |
| Field-level security | Roles can restrict which fields a user can see | e.g., support team can see order status but not payment details |
| Document-level security | Roles include a query filter β users only see matching docs | e.g., tenant_id filter ensures each tenant sees only their data |
| API Keys | Scoped, time-limited keys for service-to-service auth | Preferred over username/password for applications |
# Create a role with index-level and field-level security POST /_security/role/support_agent { "indices": [ { "names": ["orders-*"], "privileges": ["read"], "field_security": { "grant": ["order_id", "status", "customer_name", "created_at"], "except": ["payment_card", "billing_address"] }, "query": { "term": { "region": "us-east" } } } ] } # Create an API key for a microservice POST /_security/api_key { "name": "order-service-key", "expiration": "30d", "role_descriptors": { "order_writer": { "indices": [ { "names": ["orders-*"], "privileges": ["write", "create_index"] } ] } } }
Elasticsearch vs Alternatives
Elasticsearch is powerful but not always the right tool. Understanding when alternatives are better prevents over-engineering and reduces operational burden.
| Comparison | ES Strength | Alternative Strength | Choose Alternative When |
|---|---|---|---|
| ES vs PostgreSQL FTS | Distributed, custom analyzers, relevance tuning, fuzzy matching, aggregations | No extra infra, ACID transactions, simpler ops, good enough for basic search | < 1M docs, simple search needs, already using Postgres, can't justify another system |
| ES vs Solr | Better REST API, easier clustering, faster innovation, richer ecosystem (ELK) | Mature, battle-tested, better for static collections, strong XML/faceting | Existing Solr investment, static document collections, Hadoop integration needed |
| ES vs Pinecone/Weaviate | Hybrid search (keyword + vector), full-text capabilities, existing ecosystem | Purpose-built for vectors, simpler API, managed scaling, better recall at scale | Pure semantic search, no keyword needs, want managed service, billions of vectors |
| ES vs ClickHouse | Full-text search, fuzzy matching, complex query DSL | 10-100x faster for analytical queries, columnar storage, SQL interface | Analytics/OLAP workload, aggregations over structured data, no full-text needs |
| ES vs Loki (logs) | Rich query language, full-text search across logs, aggregations | 10x cheaper storage, label-based indexing, native Grafana integration, simpler ops | Cost-sensitive log storage, label-based filtering is sufficient, Grafana stack |
When ES Is Overkill
Skip Elasticsearch When
- βSimple LIKE queries on < 1M rows β PostgreSQL full-text search or trigram index handles this fine
- βPure analytics/dashboards on structured data β ClickHouse or BigQuery are 10-100x faster and cheaper
- βYou only need log grep with labels β Loki + Grafana costs a fraction of ELK
- βPure vector/embedding search β Pinecone, Weaviate, or pgvector are simpler and purpose-built
- βYou need ACID transactions β ES is eventually consistent, not a primary database
- βTeam lacks ES operational expertise β the learning curve and ops burden are significant
Elasticsearch Is Ideal When
- β Full-text search with relevance ranking, fuzzy matching, synonyms, and custom analyzers
- β Hybrid search combining keyword (BM25) and vector (kNN) approaches
- β Real-time log analytics with complex queries across terabytes of data
- β Autocomplete/typeahead with edge n-grams at scale
- β Geo-spatial search combined with text and filters
- β Multi-tenant search with per-tenant relevance tuning
Common Mistakes
Using Elasticsearch as a primary database
Treating ES as the source of truth β writing data only to ES, relying on it for transactions and consistency. When a node fails or a reindex is needed, data is lost or inconsistent.
β ES is a secondary index, not a primary store. Write to your primary database (PostgreSQL, DynamoDB) first, then sync to ES. The primary DB is the source of truth. ES can always be rebuilt from it.
Not setting up ILM for time-series data
Indexing months of logs into a single growing index. The index becomes enormous, searches slow down, and you can't delete old data without reindexing everything.
β Use data streams with ILM policies from day one. Define rollover conditions (50 GB or 1 day), warm/cold/frozen phases, and a delete phase. This keeps hot indices small and fast while aging data to cheaper storage automatically.
JVM heap greater than 32 GB
Setting -Xmx to 48 GB or 64 GB thinking more heap = better performance. Above ~32 GB, the JVM loses compressed oops β pointers double in size, wasting 30%+ of heap on overhead. A 48 GB heap may perform worse than 31 GB.
β Never exceed 31 GB heap (stay below the compressed oops threshold). If you need more capacity, add nodes rather than increasing heap. The remaining RAM serves the OS page cache, which Lucene depends on for fast segment reads.
No snapshot/backup policy
Running a production cluster without automated snapshots. A bad mapping change, accidental delete, or cluster failure means permanent data loss with no recovery path.
β Configure SLM (Snapshot Lifecycle Management) to take daily incremental snapshots to S3/GCS. Snapshots are incremental β only new segments are uploaded. Test restore procedures regularly. A snapshot that's never been tested is not a backup.
Ignoring the slow log
Not configuring slow log thresholds. Expensive queries silently degrade cluster performance for all users. By the time you notice, the cluster is already under pressure.
β Enable slow log with reasonable thresholds (warn at 5s, info at 2s). Review slow log weekly. Common culprits: leading wildcards, deep aggregations, script queries, and unbounded from/size pagination. Use the Profile API to diagnose specific slow queries.