Search Systems
TL;DR
Search engine: Full-text search with ranking, fuzzy matching, faceting. Elasticsearch: Most popular, based on Lucene. Inverted index: Maps words → documents for fast lookup.
Core Concepts
1. Inverted Index
Example documents:
Doc 1: "quick brown fox"
Doc 2: "lazy brown dog"
Inverted index:
Term | Document IDs
---------|-------------
quick | [1]
brown | [1, 2]
fox | [1]
lazy | [2]
dog | [2]
Query "brown" → Returns [Doc 1, Doc 2] instantly (O(1) lookup)
2. Elasticsearch Architecture
Sharding: Split index across nodes (scale writes)
Replication: Copy shards (high availability)
3. Search Features
| Feature | Purpose |
|---|---|
| Full-text search | Match words in documents |
| Fuzzy matching | Typos ("quick" matches "qwick") |
| Boosting | Prioritize title over body |
| Faceting | Filter by category, price range |
| Autocomplete | Suggest as you type |
| Highlighting | Show matching snippets |
4. Use Cases
- E-commerce: Product search (Amazon)
- Logs: Search application logs (ELK stack)
- Analytics: Real-time dashboards (Kibana)
Common Interview Questions
Q1: "How does Elasticsearch search so fast?"
Answer:
- Inverted index: Maps terms to documents (O(1) lookup)
- Sharding: Parallel search across nodes
- Caching: Frequent queries cached
Q2: "SQL database vs Elasticsearch for search?"
Answer:
- SQL: Simple exact match (
LIKE '%term%'is slow) - Elasticsearch: Full-text, fuzzy, ranking, facets (purpose-built)
- Use SQL for: Structured queries, transactions
- Use Elasticsearch for: Search, text analysis
Quick Reference
Inverted index: Term → Document mapping
Sharding: Horizontal scaling
Use cases: Product search, log analysis
Next: Blob Storage - S3, object storage, file systems.