Skip to main content

Search Systems

TL;DR

Search engine: Full-text search with ranking, fuzzy matching, faceting. Elasticsearch: Most popular, based on Lucene. Inverted index: Maps words → documents for fast lookup.

Core Concepts

1. Inverted Index

Example documents:

Doc 1: "quick brown fox"
Doc 2: "lazy brown dog"

Inverted index:

Term     | Document IDs
---------|-------------
quick | [1]
brown | [1, 2]
fox | [1]
lazy | [2]
dog | [2]

Query "brown" → Returns [Doc 1, Doc 2] instantly (O(1) lookup)

2. Elasticsearch Architecture

Sharding: Split index across nodes (scale writes)
Replication: Copy shards (high availability)

3. Search Features

FeaturePurpose
Full-text searchMatch words in documents
Fuzzy matchingTypos ("quick" matches "qwick")
BoostingPrioritize title over body
FacetingFilter by category, price range
AutocompleteSuggest as you type
HighlightingShow matching snippets

4. Use Cases

  • E-commerce: Product search (Amazon)
  • Logs: Search application logs (ELK stack)
  • Analytics: Real-time dashboards (Kibana)

Common Interview Questions

Q1: "How does Elasticsearch search so fast?"

Answer:

  • Inverted index: Maps terms to documents (O(1) lookup)
  • Sharding: Parallel search across nodes
  • Caching: Frequent queries cached

Answer:

  • SQL: Simple exact match (LIKE '%term%' is slow)
  • Elasticsearch: Full-text, fuzzy, ranking, facets (purpose-built)
  • Use SQL for: Structured queries, transactions
  • Use Elasticsearch for: Search, text analysis

Quick Reference

Inverted index: Term → Document mapping
Sharding: Horizontal scaling
Use cases: Product search, log analysis


Next: Blob Storage - S3, object storage, file systems.