Query DSL & Relevance
Full-text queries, term-level queries, bool composition, BM25 scoring, and custom relevance with function_score.
Table of Contents
Query vs Filter Context
This is the single most important concept in Elasticsearch querying. Every clause in a query runs in one of two contexts, and choosing the wrong one is the most common performance mistake.
| Aspect | Query Context | Filter Context |
|---|---|---|
| Purpose | How WELL does this document match? | Does this document match? (yes/no) |
| Scoring | Calculates _score (relevance ranking) | No scoring β binary match |
| Caching | Not cached (score depends on full index) | Cached in bitset β extremely fast on repeat |
| Performance | Slower β must compute TF/IDF/BM25 per doc | Faster β cached boolean check |
| Use for | Full-text search, relevance ranking | Exact filters: status, date range, category |
| Bool clause | must, should | filter, must_not |
GET /products/_search { "query": { "bool": { "must": [ { "match": { "title": "wireless headphones" } } ], "filter": [ { "term": { "status": "active" } }, { "range": { "price": { "gte": 50, "lte": 200 } } }, { "term": { "category": "electronics" } } ] } } } # What happens: # "must" β QUERY context # - "wireless headphones" is analyzed and scored # - Documents with both terms score higher # - Documents with terms in the title field score higher # - This determines the _score (ranking) # # "filter" β FILTER context # - status = "active" β yes/no, no scoring # - price between 50-200 β yes/no, no scoring # - category = "electronics" β yes/no, no scoring # - These are cached as bitsets for fast repeat queries # - They narrow results WITHOUT affecting relevance ranking
The Restaurant Analogy
Filter context is like telling the waiter 'only show me vegetarian dishes under $20' β it narrows the menu but doesn't rank anything. Query context is like asking 'which of these remaining dishes best matches my taste for spicy Thai food?' β it ranks the filtered results by relevance. Always filter first (cheap), then score what's left (expensive).
π‘ Rule of Thumb: Filter Everything You Can
If a clause doesn't need to affect ranking, put it in filter context. Status checks, date ranges, category filters, boolean flags β all of these should be filters. Only use query context for clauses where relevance ranking matters (typically full-text search on user input). This dramatically improves performance because filters are cached and skip scoring.
Full-Text Queries
Full-text queries analyze the input text using the same analyzer configured on the target field. They are designed for searching natural language content β product descriptions, article bodies, user reviews. The three workhorses are match,match_phrase, and multi_match.
# Basic match β analyzes input, finds documents containing any term GET /products/_search { "query": { "match": { "description": { "query": "lightweight running shoes", "operator": "and" } } } } # operator: "or" (default) β match ANY term (lightweight OR running OR shoes) # operator: "and" β match ALL terms (lightweight AND running AND shoes) # minimum_should_match: "75%" β at least 75% of terms must match # What happens internally: # 1. "lightweight running shoes" β analyzer β ["lightweight", "run", "shoe"] # 2. Search inverted index for each token # 3. Score documents based on BM25 (TF, IDF, field length) # 4. Return ranked results
# match_phrase β terms must appear in exact order (with optional slop) GET /articles/_search { "query": { "match_phrase": { "body": { "query": "distributed systems design", "slop": 2 } } } } # slop: 0 (default) β exact phrase, no words between # slop: 1 β one word can be between/moved # slop: 2 β two positional moves allowed # Examples with slop: # "distributed systems design" β matches (exact) # "distributed computing systems design" β matches with slop >= 1 # "systems distributed design" β matches with slop >= 2 (two swaps) # How it works: # Uses positional data stored in the inverted index # Checks that token positions are within 'slop' distance of each other # More expensive than match β requires position lookups
# multi_match β search the same query across multiple fields GET /products/_search { "query": { "multi_match": { "query": "noise cancelling headphones", "fields": ["title^3", "description", "tags^2"], "type": "best_fields", "tie_breaker": 0.3 } } } # Types: # best_fields (default) β score from the BEST matching field # Use when: fields compete (title vs description for same content) # tie_breaker: 0-1, how much other fields contribute (0.3 = 30%) # # most_fields β SUM scores from all matching fields # Use when: same text indexed with different analyzers # (e.g., english analyzer + standard analyzer on same content) # # cross_fields β treats all fields as ONE big field # Use when: a concept spans fields (first_name + last_name) # "John Smith" β "John" in first_name AND "Smith" in last_name # # Field boosting: "title^3" means title matches are 3x more important # This is the most common way to tune relevance without function_score
π Choosing the Right multi_match Type
Use best_fields for most e-commerce/content search (title vs description). Use cross_fields for person names, addresses, or any concept split across fields. Usemost_fields only when the same content is analyzed multiple ways (rare). Getting this wrong produces confusing relevance results.
Term-Level Queries
Term-level queries find documents based on exact valuesin structured data. They do NOT analyze the input β what you provide is exactly what's looked up in the inverted index. Use them on keyword fields, numbers, dates, and booleans.
| Query | What It Does | Use Case |
|---|---|---|
| term | Exact match on a single value | status: 'active', category: 'electronics' |
| terms | Match any of multiple values (OR) | status IN ('active', 'pending') |
| range | Numeric/date range (gte, lte, gt, lt) | price 50-200, date last 7 days |
| exists | Field has a non-null value | Find docs where 'email' field exists |
| prefix | Starts with a value | Autocomplete on keyword fields |
| wildcard | Pattern with * and ? wildcards | SKU matching: 'PROD-*-2024' |
| fuzzy | Edit distance matching (typo tolerance) | Search 'headphnes' matches 'headphones' |
# term β exact match (use on keyword fields ONLY) { "term": { "status": "active" } } # terms β match any value in a list { "terms": { "category": ["electronics", "accessories", "audio"] } } # range β numeric or date range { "range": { "price": { "gte": 50, "lte": 200 } } } { "range": { "created_at": { "gte": "now-7d/d", "lte": "now/d", "format": "strict_date_optional_time" } } } # exists β field has a value { "exists": { "field": "discount_price" } } # prefix β starts with (on keyword fields) { "prefix": { "sku": "PROD-2024" } } # wildcard β pattern matching (* = any chars, ? = single char) { "wildcard": { "sku": "PROD-*-BLUE" } } # fuzzy β edit distance (typo tolerance) { "fuzzy": { "name": { "value": "headphnes", "fuzziness": "AUTO" } } } # fuzziness AUTO: 0-2 chars β exact, 3-5 chars β 1 edit, 6+ chars β 2 edits
β οΈ Never Use term on text Fields
The term query does NOT analyze its input. If you search { "term": { "title": "Running Shoes" } }on a text field, it looks for the exact token "Running Shoes" in the index. But the text field was analyzed to ["running", "shoes"] β no match. Use match for text fields, term for keyword fields. This is the #1 source of "why does my query return no results?" bugs.
Bool Query
The bool query is the composition engine of Elasticsearch. It combines multiple clauses using four operators, each with different scoring and caching behavior. Almost every production search query is a bool query.
| Clause | Context | Behavior | Affects Score? |
|---|---|---|---|
| must | Query | Document MUST match. Contributes to score. | Yes |
| filter | Filter | Document MUST match. No scoring, cached. | No |
| should | Query | Document SHOULD match. Boosts score if it does. | Yes |
| must_not | Filter | Document MUST NOT match. No scoring, cached. | No |
GET /products/_search { "query": { "bool": { "must": [ { "multi_match": { "query": "wireless noise cancelling headphones", "fields": ["title^3", "description", "brand^2"], "type": "best_fields" } } ], "filter": [ { "term": { "status": "active" } }, { "term": { "in_stock": true } }, { "range": { "price": { "gte": 50, "lte": 300 } } }, { "terms": { "category": ["headphones", "earbuds"] } } ], "should": [ { "term": { "is_prime_eligible": true } }, { "range": { "rating": { "gte": 4.0 } } }, { "term": { "brand": "Sony" } } ], "must_not": [ { "term": { "condition": "refurbished" } }, { "range": { "price": { "lt": 10 } } } ], "minimum_should_match": 0 } } } # How this works: # 1. FILTER (fast, cached, no scoring): # - Only active, in-stock products # - Price between $50-$300 # - Category is headphones or earbuds # - NOT refurbished, NOT suspiciously cheap # # 2. MUST (scored): # - Full-text search across title (3x boost), description, brand (2x) # - This determines the base relevance score # # 3. SHOULD (score boosters): # - Prime eligible? Bonus points # - Rating >= 4.0? Bonus points # - Brand is Sony? Bonus points # - minimum_should_match: 0 means none required, just bonus # # Result: Filtered set, ranked by text relevance + business signals
π‘ minimum_should_match Behavior
When must or filter clauses exist,should defaults to minimum_should_match: 0β should clauses are optional score boosters. When there are ONLY should clauses (no must/filter), at least one must match by default. Override with minimum_should_match to require a specific number or percentage.
BM25 Scoring
BM25 (Best Matching 25) is Elasticsearch's default relevance scoring algorithm. It replaced TF-IDF in Elasticsearch 5.0 and produces better results because of term frequency saturation β a term appearing 100 times doesn't score 100x more than appearing once.
BM25 Score for a term in a document: score(term, doc) = IDF(term) * (TF(term, doc) * (k1 + 1)) βββββββββββββββββββββββββββββββββββββββββ TF(term, doc) + k1 * (1 - b + b * |doc| / avgDL) Three components: 1. IDF β Inverse Document Frequency "How rare is this term across ALL documents?" - Common words (the, is, a) β low IDF β low score contribution - Rare words (elasticsearch, kubernetes) β high IDF β high score - Formula: log(1 + (N - n + 0.5) / (n + 0.5)) where N = total docs, n = docs containing term 2. TF β Term Frequency (with saturation) "How often does this term appear in THIS document?" - TF-IDF: score grows linearly (10 occurrences = 10x score) β BAD - BM25: score saturates (diminishing returns after ~5-10 occurrences) - k1 controls saturation speed (default 1.2) k1 = 0: TF is ignored entirely k1 = large: TF matters more (slower saturation) 3. Field Length Normalization "How long is this document's field?" - Short fields with the term score higher than long fields - "wireless" in a 3-word title scores higher than in a 500-word description - b controls normalization strength (default 0.75) b = 0: no length normalization b = 1: full normalization (short fields heavily favored) Example: Query: "elasticsearch" Doc A: title = "Elasticsearch Guide" (2 words, term appears 1x) Doc B: title = "The Complete Guide to Elasticsearch and Search" (7 words, 1x) Doc C: description = "...elasticsearch...elasticsearch..." (500 words, 5x) Ranking: A > B > C - A: short field + term present = highest score - B: longer field, same TF = lower score - C: high TF but saturates + very long field = lowest score
BM25 vs TF-IDF: The Keyword Stuffing Problem
TF-IDF is like paying someone per mention of a keyword β say 'pizza' 100 times and you rank #1. BM25 is like a smart reviewer who notices the first few mentions but stops caring after that. Saying 'pizza' 5 times is useful, but saying it 100 times doesn't make your restaurant 20x better. BM25's saturation curve solved the keyword stuffing problem that plagued early search engines.
| Parameter | Default | Effect | When to Tune |
|---|---|---|---|
| k1 | 1.2 | Controls TF saturation speed. Higher = TF matters more. | Increase for long documents where repetition signals relevance |
| b | 0.75 | Controls field length normalization. Higher = short fields favored more. | Decrease for fields with naturally variable length (product descriptions) |
π When to Tune BM25 Parameters
Almost never. The defaults (k1=1.2, b=0.75) work well for most use cases. Only tune when you have measurable relevance problems AND you've already tried field boosting and function_score. Changing BM25 params affects ALL queries on that field β it's a global change with hard-to-predict side effects.
Boosting & function_score
BM25 handles text relevance, but real search needs business logic: popular products should rank higher, recent articles should beat old ones, promoted items need a boost. function_scorelets you inject these signals into the relevance score without replacing BM25 β you augment it.
GET /products/_search { "query": { "function_score": { "query": { "bool": { "must": { "multi_match": { "query": "wireless headphones", "fields": ["title^3", "description"] } }, "filter": [ { "term": { "status": "active" } } ] } }, "functions": [ { "field_value_factor": { "field": "sales_count", "factor": 1.2, "modifier": "log1p", "missing": 1 }, "weight": 2 }, { "gauss": { "created_at": { "origin": "now", "scale": "30d", "offset": "7d", "decay": 0.5 } }, "weight": 1.5 }, { "filter": { "term": { "is_sponsored": true } }, "weight": 5 }, { "script_score": { "script": { "source": "Math.log(2 + doc['review_count'].value) * doc['avg_rating'].value" } }, "weight": 1 } ], "score_mode": "sum", "boost_mode": "multiply", "max_boost": 50 } } } # Functions explained: # # 1. field_value_factor β popularity signal # Uses sales_count field. log1p modifier prevents huge values from dominating. # Product with 10000 sales doesn't get 10000x boost (log1p(10000) β 9.2) # # 2. gauss decay β recency signal # Documents from the last 7 days get full score (offset). # Score decays to 50% at 30 days (scale + decay). # Old documents still match but rank lower. # # 3. filter + weight β sponsored boost # Sponsored products get a flat 5x weight boost. # Only applies to docs matching the filter. # # 4. script_score β custom formula # Combines review_count and avg_rating into a quality signal. # Full flexibility but slower (runs per document). # # score_mode: how function scores combine (sum, multiply, avg, max, min) # boost_mode: how combined function score merges with query score # multiply (default): final = query_score * function_score # sum: final = query_score + function_score # replace: final = function_score (ignores BM25 entirely)
| Decay Function | Shape | Best For |
|---|---|---|
| gauss | Bell curve β smooth falloff both directions | Recency, geo distance, price proximity |
| exp | Exponential β sharp initial drop, long tail | Strong recency bias (news, social feeds) |
| linear | Straight line β constant decay rate | Simple proportional decay |
π‘ Start Simple, Add Complexity
Don't jump to function_score immediately. Start with field boosting in multi_match (title^3). If that's not enough, add should clauses for business signals. Only reach for function_score when you need continuous numeric signals (popularity, recency decay, custom formulas). Each layer adds complexity and makes debugging harder.
Search Features
Beyond querying and scoring, Elasticsearch provides essential search features for building production search experiences: highlighting matched terms, paginating results, sorting, and controlling which fields are returned.
GET /articles/_search { "query": { "match": { "body": "distributed consensus algorithms" } }, "_source": ["title", "author", "published_at", "url"], "highlight": { "fields": { "body": { "fragment_size": 150, "number_of_fragments": 3, "pre_tags": ["<mark>"], "post_tags": ["</mark>"] } } }, "sort": [ { "_score": "desc" }, { "published_at": "desc" } ], "from": 0, "size": 20 } # _source: only return these fields (reduces network payload) # highlight: wraps matched terms in <mark> tags for UI display # sort: primary by score, secondary by date (tiebreaker) # from/size: offset-based pagination (page 1 = from:0, page 2 = from:20) # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # search_after β cursor-based pagination (for deep pages) # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ GET /articles/_search { "query": { "match": { "body": "distributed systems" } }, "sort": [ { "_score": "desc" }, { "_id": "asc" } ], "size": 20, "search_after": [0.85, "doc_id_12345"] } # search_after uses the sort values from the last result of the previous page # No deep pagination cost β always O(size) regardless of page number # Requires a unique tiebreaker field (_id) in sort # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # explain API β debug why a document scored the way it did # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ GET /products/_explain/product_123 { "query": { "match": { "title": "wireless headphones" } } } # Returns detailed breakdown: IDF, TF, field length norm, boosts # Essential for debugging relevance issues
Search Feature Best Practices
- β Always use _source filtering β don't return entire documents when you only need 3 fields
- β Use search_after for any pagination beyond page 10 (from+size has a 10,000 hit limit by default)
- β Add a unique tiebreaker to sort (usually _id) to ensure stable pagination
- β Use the explain API to debug unexpected ranking β it shows exactly how the score was calculated
- β Set track_total_hits to false for better performance when you don't need exact total count
- β Use highlighting to show users WHY a result matched β dramatically improves search UX
- β Prefer fragment highlighter for large text fields, unified highlighter for accuracy
π The 10,000 Hit Limit
By default, from + size cannot exceed 10,000 (index.max_result_window). This exists because deep pagination is expensive β ES must fetch and sort from+size documents on every shard, then merge. For deep pagination, use search_after(stateless cursor) or the Scroll API (stateful, for bulk export).
Interview Questions
Q:What is the difference between query context and filter context?
A: Query context calculates a relevance _score β it answers 'how well does this document match?' and is used for full-text search ranking. Filter context is a binary yes/no check β it answers 'does this document match?' without scoring. Filters are cached as bitsets and are significantly faster. In a bool query, 'must' and 'should' run in query context, while 'filter' and 'must_not' run in filter context. Best practice: use filter context for everything that doesn't need ranking (status, date ranges, categories).
Q:How does BM25 work and why is it better than TF-IDF?
A: BM25 scores documents using three signals: IDF (how rare the term is across all documents), TF with saturation (how often the term appears in this document, with diminishing returns), and field length normalization (shorter fields score higher). The key improvement over TF-IDF is term frequency saturation β in TF-IDF, a term appearing 100 times scores 100x more than appearing once, which rewards keyword stuffing. BM25's saturation curve means after ~5-10 occurrences, additional repetitions barely increase the score. Parameters k1 (saturation speed, default 1.2) and b (length normalization strength, default 0.75) control the behavior.
Q:What is the deep pagination problem and how do you solve it?
A: With from+size pagination, ES must fetch and rank (from+size) documents on EVERY shard, send them to the coordinating node, merge-sort all results, then discard everything before 'from'. For page 1000 with size=20, each shard processes 20,000 documents. With 5 shards, that's 100,000 documents sorted and merged just to return 20. Solutions: (1) search_after β cursor-based pagination using sort values from the last result, always O(size) cost regardless of depth. (2) Scroll API β stateful server-side cursor for bulk data export (not for user-facing search). (3) Point-in-time (PIT) + search_after for consistent pagination across refreshes.
Q:How would you boost recent documents in search results?
A: Use function_score with a gauss (or exp) decay function on the date field. Configure 'origin' as 'now', 'scale' as the time window where decay reaches the decay value (e.g., '30d'), 'offset' for a grace period of full score (e.g., '7d' means docs from the last week get no penalty), and 'decay' as the score multiplier at the scale distance (e.g., 0.5 means 50% score at 30 days). Set boost_mode to 'multiply' so the decay multiplies the BM25 text relevance score. This way, a highly relevant old article can still beat a barely relevant new one β recency augments relevance rather than replacing it.
Q:When would you use match_phrase vs match with operator 'and'?
A: match with operator 'and' requires all terms to be present but in ANY order and ANY position. match_phrase requires terms in the exact order (with optional slop for flexibility). Use match_phrase when word order matters: searching for 'New York' should not match a document about 'York is a new city'. Use match+and when you want all terms present regardless of order: searching 'elasticsearch distributed scaling' should match even if those words are scattered across a paragraph. match_phrase is more expensive because it checks positional data in the inverted index.
Common Mistakes
Using query context for filters (wasting scoring)
Putting status checks, date ranges, and category filters in 'must' instead of 'filter'. Every clause in 'must' calculates a relevance score β expensive computation that adds nothing when you just need a yes/no check. These clauses also can't be cached.
β Move all exact-match, range, and boolean checks to the 'filter' clause. Only use 'must' for full-text queries where relevance ranking matters. This enables bitset caching and skips score computation β often a 2-5x performance improvement.
Deep pagination with from+size
Using from=10000&size=20 for deep pages. ES must fetch and sort 10,020 documents on every shard, merge them on the coordinating node, then discard 10,000. With 5 shards, that's 50,100 documents processed to return 20. Default limit is 10,000 (max_result_window).
β Use search_after with a sort tiebreaker (_id) for user-facing pagination. Each page costs O(size) regardless of depth. For bulk data export, use Point-in-Time (PIT) API with search_after. Never increase max_result_window β it's a guardrail, not a bug.
Not using function_score for business relevance
Relying solely on BM25 text relevance for ranking. A product with 50,000 sales and 4.8 stars ranks below an obscure product because the obscure one has slightly better keyword density. Users see irrelevant results despite good text matching.
β Layer business signals with function_score: field_value_factor for popularity (sales, views), decay functions for recency, script_score for custom quality formulas. Use boost_mode: 'multiply' so business signals augment text relevance rather than replacing it.
Leading wildcard queries on large indices
Using wildcard queries like '*headphones' or regex with leading wildcards. ES must scan every term in the inverted index because it can't use the sorted term dictionary for prefix lookups. On indices with millions of unique terms, this can take seconds or OOM.
β Restructure the data: use n-gram or edge_ngram analyzers for substring matching, reverse token filter for suffix matching, or store the reversed string in a separate field. If wildcards are unavoidable, use them only with a non-wildcard prefix ('head*' is fine, '*phones' is not).
Ignoring the explain API when debugging relevance
Guessing why documents rank in a certain order. Tweaking boosts and function_score parameters blindly without understanding the actual score breakdown. Spending hours on trial-and-error when the answer is one API call away.
β Use the _explain API on specific documents to see the exact score calculation: IDF values, TF saturation, field length norms, boost multipliers. Also use 'explain: true' in search requests to see score breakdowns for all results. Debug with data, not intuition.