Skip to main content

Networking Basics

TL;DR (30-second summary)

The internet runs on protocols: TCP (reliable), UDP (fast), HTTP (web), WebSockets (real-time). DNS translates names to IPs. CDN caches content close to users. Load balancers distribute traffic.

Key concept: Network is unreliable and slow - design for it.

Why This Matters

In interviews: You'll design systems that communicate over networks. Understanding protocols helps you choose the right tool and discuss latency/reliability trade-offs.

At work: Every distributed system depends on networking. Poor choices cause cascading failures.

Core Concepts

1. OSI Model (Simplified)

What matters for system design:

  • Layer 4 (Transport): TCP vs UDP
  • Layer 7 (Application): HTTP, WebSockets, gRPC

2. TCP vs UDP

FeatureTCPUDP
ReliabilityGuaranteed delivery, orderedBest effort, no guarantee
ConnectionConnection-oriented (3-way handshake)Connectionless
SpeedSlower (overhead for reliability)Faster (no handshake)
Use CasesHTTP, database connections, file transferVideo streaming, DNS, gaming
Overhead~40 bytes per packet~8 bytes per packet

TCP Three-Way Handshake:

When to use:

  • TCP: When you need reliability (web requests, APIs, databases)
  • UDP: When speed matters more than reliability (live video, VoIP, gaming)
Interview Tip

Say: "We'll use TCP for API calls since we need reliability, but UDP for real-time video streaming where occasional packet loss is acceptable."

3. HTTP/HTTPS

HTTP (HyperText Transfer Protocol): The foundation of the web.

Request structure:

GET /api/users/123 HTTP/1.1
Host: api.example.com
Authorization: Bearer <token>
Content-Type: application/json

Response structure:

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 85

{"id": 123, "name": "John", "email": "john@example.com"}

HTTP Methods:

MethodPurposeIdempotentSafe
GETRetrieve data
POSTCreate resource
PUTUpdate/replace
PATCHPartial update
DELETERemove resource

HTTP Status Codes (memorize these):

  • 2xx Success: 200 OK, 201 Created, 204 No Content
  • 3xx Redirection: 301 Moved Permanently, 302 Found, 304 Not Modified
  • 4xx Client Error: 400 Bad Request, 401 Unauthorized, 404 Not Found, 429 Too Many Requests
  • 5xx Server Error: 500 Internal Server Error, 503 Service Unavailable

HTTP/1.1 vs HTTP/2 vs HTTP/3:

FeatureHTTP/1.1HTTP/2HTTP/3
ProtocolTCPTCPQUIC (UDP)
Multiplexing❌ (1 request/connection)✅ (many streams)
Header Compression✅ (HPACK)✅ (QPACK)
LatencyHigh (head-of-line blocking)BetterBest
AdoptionUniversalCommon (50% of web)Growing

4. HTTPS (HTTP + TLS/SSL)

HTTPS = HTTP + Encryption

Why HTTPS:

  • Encryption: Can't eavesdrop on traffic
  • Authentication: Verify server identity via certificates
  • Integrity: Detect tampering

Cost: ~100ms extra latency for TLS handshake (first request only)

5. WebSockets

Purpose: Full-duplex (two-way) real-time communication.

Use cases:

  • Chat applications (WhatsApp, Slack)
  • Live sports scores
  • Collaborative editing (Google Docs)
  • Real-time dashboards

Alternative: Server-Sent Events (SSE) - one-way (server → client) only, simpler than WebSockets

FeatureWebSocketsSSEPolling
DirectionBi-directionalServer → ClientClient → Server (repeated)
ProtocolCustom (ws://)HTTPHTTP
OverheadLow (persistent)Low (persistent)High (reconnections)
Use CaseChat, gamingLive feeds, notificationsSimple updates

6. DNS (Domain Name System)

Purpose: Translate domain names to IP addresses.

DNS Record Types:

TypePurposeExample
ADomain → IPv4example.com → 93.184.216.34
AAAADomain → IPv6example.com → 2606:2800:220:1...
CNAMEAliaswww.example.com → example.com
MXMail serverexample.com → mail.example.com
TXTText dataSPF, DKIM for email

DNS Caching:

  • Browser cache: ~1 minute
  • OS cache: ~5 minutes
  • ISP cache: ~TTL (Time To Live, often 24 hours)

TTL trade-off:

  • Low TTL (5 min): Fast to update, higher DNS load
  • High TTL (24 hr): Less DNS load, slow to propagate changes
Red Flag

Don't forget DNS when designing! It's a single point of failure. Use:

  • Multiple DNS providers (Route53 + Cloudflare)
  • Low TTL before migrations
  • Health checks to automatically remove failed servers

7. CDN (Content Delivery Network)

Purpose: Cache static content close to users globally.

What CDNs cache:

  • ✅ Static files (images, CSS, JS, videos)
  • ✅ API responses (with cache headers)
  • ❌ Dynamic, user-specific data (without special config)

Benefits:

  1. Reduced latency: Serve from nearest edge server (50ms → 5ms)
  2. Lower origin load: Most requests hit cache (90%+ cache hit rate)
  3. DDoS protection: Absorb malicious traffic at edge

Popular CDNs:

  • Cloudflare (free tier, great DDoS protection)
  • AWS CloudFront (integrates with AWS)
  • Fastly (real-time purging)
  • Akamai (oldest, largest network)

8. Load Balancers

Purpose: Distribute incoming traffic across multiple servers.

Algorithms:

AlgorithmHow It WorksUse Case
Round RobinRotate through servers in orderUniform server capacity
Least ConnectionsSend to server with fewest active connectionsLong-lived connections
WeightedMore traffic to powerful serversHeterogeneous capacity
IP HashSame client → same server (consistent)Session persistence

Health Checks:

  • Load balancer periodically pings servers (e.g., every 10s)
  • If server fails health check (timeout or error), remove from pool
  • When server recovers, add back to pool

Layer 4 vs Layer 7 (covered in detail in Chapter 10):

  • L4 (Transport): Faster, route based on IP/port
  • L7 (Application): Slower, route based on HTTP headers, URL path

Common Interview Questions

Q1: "When would you use UDP instead of TCP?"

Answer:

  • Real-time applications where low latency matters more than reliability
  • Examples: Live video streaming, VoIP (Zoom, Skype), online gaming
  • Reason: Packet loss in video just causes brief glitch; waiting for retransmission would cause stuttering

Q2: "How does HTTPS improve security?"

Answer:

  1. Encryption: Traffic encrypted with TLS, can't be read by intermediaries
  2. Authentication: Certificate proves server identity (prevents man-in-the-middle)
  3. Integrity: Detects if data was modified in transit

Trade-off: Adds ~100ms latency for initial handshake

Q3: "Explain how a CDN works and when to use it."

Answer:

  1. User requests cdn.example.com/image.jpg
  2. DNS routes to nearest CDN edge server
  3. If edge has cached copy → return immediately (cache hit)
  4. If not → edge fetches from origin, caches, and returns (cache miss)
  5. Subsequent requests hit cache

Use when:

  • Serving static content globally
  • High traffic (cost-effective)
  • Need DDoS protection

Q4: "WebSockets vs HTTP polling for real-time updates?"

Answer:

AspectWebSocketsPolling
LatencyInstant (push)Up to poll interval
OverheadLow (persistent connection)High (repeated handshakes)
ComplexityMore (connection management)Less (simple HTTP)
ScalabilityNeed to manage open connectionsStateless, easier to scale

Use WebSockets for: Chat, collaborative editing, gaming
Use Polling for: Less frequent updates, simpler systems

Trade-offs

DecisionOption AOption BTrade-off
ProtocolTCP (reliable)UDP (fast)Reliability vs latency
HTTP VersionHTTP/1.1 (simple)HTTP/2 (multiplexing)Compatibility vs performance
Real-timeWebSockets (push)Polling (pull)Complexity vs simplicity
DNS TTLLow (5 min)High (24 hr)Agility vs DNS load
CDNUse CDN (fast)Direct (simple)Cost vs latency

Real-World Examples

Netflix

  • Protocol: HTTP/2 for API, adaptive bitrate streaming
  • CDN: 90%+ traffic via Open Connect (own CDN)
  • Result: Sub-second startup time globally

WhatsApp

  • Protocol: Custom protocol over TCP (not HTTP)
  • Real-time: Persistent connections (WebSocket-like)
  • Result: Instant message delivery, billions of connections

Cloudflare

  • Service: Global CDN + DDoS protection
  • Scale: 25% of all web traffic passes through Cloudflare
  • Result: Protects millions of sites from attacks

Quick Reference Card

Protocols:

  • TCP: Reliable, ordered, slow(er) - use for APIs, databases
  • UDP: Fast, unreliable - use for streaming, gaming
  • HTTP: Request-response, stateless - use for web APIs
  • WebSockets: Persistent, bi-directional - use for real-time

HTTP Status Codes:

  • 2xx: Success
  • 4xx: Client error (your fault)
  • 5xx: Server error (their fault)

DNS TTL:

  • Low (5 min): Fast updates, high load
  • High (24 hr): Slow updates, low load

CDN Benefits:

  • Lower latency (serve from edge)
  • Reduce origin load (cache hit rate)
  • DDoS protection

Further Reading


Next: Storage Fundamentals - SQL vs NoSQL, ACID vs BASE.