Database Design Patterns: Choosing the Right Data Store

Day 4 of the 30-Day System Design Roadmap

"Just use MongoDB!" a junior developer once told me during a system design review for a banking application. "It's web scale!"

I smiled, knowing we were about to have an important conversation. Three months later, after countless headaches with inconsistent account balances and a near-disaster with lost transactions, that same developer came to my desk and said, "I think I understand why you were hesitant."

Here's the thing: choosing a database isn't about what's trendy or what your favorite tech influencer recommends. It's not about MongoDB vs PostgreSQL vs Redis in some abstract battle. It's about understanding your data's personality, your application's needs, and the trade-offs you're willing to make.

I've spent the last decade designing database architectures for everything from small startups to systems handling billions of transactions. And you know what? The database choice can make or break your application—not because one is inherently "better," but because each one is designed for specific problems.

Today, we're going to explore database design patterns with both theory and practical examples. By the end, you'll know how to choose the right data store for your specific situation.

The Database Selection Dilemma

Let me paint you a picture. You're building a social media platform—the next big thing. You sit down to design your data layer and realize you need to handle:

- User profiles with structured information (email, name, birthdate)

Posts and comments that grow rapidly and have flexible content

Friend connections and follower relationships

User sessions that need lightning-fast access

Analytics data that's write-heavy and time-based

Your first instinct might be to pick one database and force everything into it. I've seen teams do this countless times. They choose PostgreSQL and struggle with graph queries for social connections. Or they choose MongoDB and fight to ensure data consistency for financial transactions.

Here's the reality that took me years to accept: Modern applications don't use just one database. Netflix uses multiple types. So does Amazon. So does pretty much every company operating at scale.

SQL vs NoSQL: Beyond the Hype

The SQL versus NoSQL debate is often framed wrong. People treat it like choosing sides in a sports game. But it's more like choosing between a sedan and a pickup truck—both are vehicles, but they're designed for different jobs.

When SQL Databases Make Perfect Sense

SQL databases—PostgreSQL, MySQL, Oracle—are like that reliable friend who always keeps their promises. They're built around ACID properties and excel when you need absolute certainty about your data.

I once worked on an e-commerce platform processing millions of dollars daily. When a customer places an order, we need to:

1. Check if the product is in stock 2. Reduce the inventory count 3. Create an order record 4. Charge the customer's payment method

Imagine if step 3 succeeded but step 4 failed. You'd have an order in the system but no payment. This is where SQL databases shine—transactions ensure either everything succeeds or nothing does.

Here's how it looks in practice:

sql

-- E-commerce order processing with ACID guarantees
BEGIN TRANSACTION;

-- Step 1: Check and reduce inventory
UPDATE inventory
SET quantity = quantity - 1
WHERE product_id = 'PROD-123'
  AND quantity > 0;

-- If no rows affected, product is out of stock
IF @@ROWCOUNT = 0 THEN
    ROLLBACK;
    RAISE 'Product out of stock';
END IF;

-- Step 2: Create the order
INSERT INTO orders (user_id, product_id, amount, status)
VALUES (42, 'PROD-123', 99.99, 'pending')
RETURNING order_id;

-- Step 3: Process payment
INSERT INTO payments (order_id, amount, status)
VALUES (order_id, 99.99, 'processing');

-- All succeed together, or all fail together
COMMIT;

SQL databases are your go-to when:

- You're dealing with money (banking, payments, e-commerce)

Your data has complex relationships (orders with line items, users with addresses)

Data integrity is non-negotiable

You need complex queries across multiple entities

Your schema is relatively stable

When NoSQL Databases Save the Day

NoSQL databases trade some of SQL's guarantees for massive scalability and flexibility. Let me tell you about a real problem I faced.

I consulted for a social media startup using PostgreSQL for their user feed. Everything worked fine until they hit 100,000 users. Then performance collapsed. The problem? They were doing complex joins across 6 different tables to build each user's feed:

sql

-- This query became a nightmare at scale
SELECT
    p.post_id,
    p.content,
    p.created_at,
    u.user_id,
    u.name,
    u.avatar_url,
    COUNT(DISTINCT l.like_id) as like_count,
    COUNT(DISTINCT c.comment_id) as comment_count,
    EXISTS(
        SELECT 1 FROM likes
        WHERE post_id = p.post_id AND user_id = @current_user_id
    ) as user_has_liked
FROM posts p
INNER JOIN users u ON p.author_id = u.user_id
INNER JOIN followers f ON f.followed_id = p.author_id
LEFT JOIN likes l ON l.post_id = p.post_id
LEFT JOIN comments c ON c.post_id = p.post_id
WHERE f.follower_id = @current_user_id
GROUP BY p.post_id, u.user_id
ORDER BY p.created_at DESC
LIMIT 50;

Each feed request hit 6 tables with multiple joins. At scale, this was a disaster.

We migrated to MongoDB with a denormalized approach:

javascript

// Single document contains everything needed for display
{
  "_id": "feed_user_12345",
  "user_id": 12345,
  "posts": [
    {
      "post_id": "post_789",
      "content": "Just deployed to production on Friday! 🚀",
      "created_at": ISODate("2025-01-15T14:30:00Z"),
      "author": {
        "user_id": "user_456",
        "name": "Jane Doe",
        "avatar_url": "https://cdn.example.com/jane.jpg"
      },
      "stats": {
        "likes": 42,
        "comments": 7
      },
      "user_has_liked": false
    }
    // ... more posts
  ],
  "last_updated": ISODate("2025-01-15T14:35:00Z")
}

// Fetching a feed becomes one simple query
db.feeds.findOne({ user_id: 12345 })

Feed load times dropped from 2 seconds to 50 milliseconds. Yes, we had to update more places when data changed. But the trade-off was worth it for the read performance.

NoSQL databases are your go-to when:

- You need to scale horizontally (add more servers)

Your data model is flexible and evolving

You're dealing with documents or nested data

You have massive write volumes

Geographic distribution is critical

The Honest Truth About Trade-offs

Here's what most tutorials won't tell you: both approaches have painful limitations.

SQL databases struggle with horizontal scaling. When you outgrow a single server, sharding is complex and you lose the ability to join across shards.

NoSQL databases struggle with consistency. When data is spread across servers, you might not see your own write immediately. Transactions across multiple documents are limited or non-existent.

The key is understanding which limitations you can live with.

ACID Properties: The Promise of Reliability

ACID stands for Atomicity, Consistency, Isolation, and Durability. These aren't just buzzwords—they're fundamental guarantees that can save you from catastrophic failures.

Atomicity: All or Nothing

Atomicity means operations either all succeed or all fail. No partial success.

python

# Without atomicity - DISASTER waiting to happen
def transfer_money_bad(from_account, to_account, amount):
    balance = get_balance(from_account)
    if balance >= amount:
        deduct(from_account, amount)  # What if this succeeds...
        add(to_account, amount)       # But this fails? Money disappears!

# With atomicity - SAFE
def transfer_money_good(from_account, to_account, amount):
    with database.transaction():
        balance = get_balance(from_account)
        if balance >= amount:
            deduct(from_account, amount)
            add(to_account, amount)
            # Both succeed or both fail - no middle ground

I learned this the hard way consulting for a startup with a race condition in their wallet system. Money could be deducted but not credited due to network timeouts. They lost $50K before discovering the bug. Atomicity would have prevented this entirely.

Consistency: Rules Are Rules

Consistency means your database enforces rules, even when your application code has bugs.

sql

-- Database-level constraints ensure data integrity
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,  -- No duplicate emails, ever
    age INTEGER CHECK (age >= 18),        -- Must be adult
    balance DECIMAL(10,2) CHECK (balance >= 0),  -- Can't go negative
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(user_id),  -- Must be valid user
    amount DECIMAL(10,2) NOT NULL CHECK (amount > 0)
);

Think about an inventory system. You have a rule: quantity can never go below zero. Without database-level consistency, two customers could buy the last item simultaneously:

python

# Race condition without proper constraints
# Customer A: Check quantity = 1 ✓
# Customer B: Check quantity = 1 ✓
# Customer A: Create order ✓
# Customer B: Create order ✓
# Result: Two orders, one item!

# With proper isolation and constraints
def purchase_item(user_id, product_id):
    with db.transaction(isolation_level='SERIALIZABLE'):
        product = db.execute(
            "SELECT * FROM inventory WHERE product_id = %s FOR UPDATE",
            (product_id,)
        )

        if product.quantity < 1:
            raise OutOfStockError()

        db.execute(
            "UPDATE inventory SET quantity = quantity - 1 "
            "WHERE product_id = %s AND quantity > 0",
            (product_id,)
        )

        db.execute(
            "INSERT INTO orders (user_id, product_id) VALUES (%s, %s)",
            (user_id, product_id)
        )

Isolation: Don't Step on Each Other's Toes

Isolation is about what happens when multiple operations occur simultaneously. Databases offer different isolation levels, and choosing the right one is a balancing act:

Read Committed (default for most databases): You only see data that's been committed. Good for most applications.

Repeatable Read: Same query returns same results throughout a transaction. Great for reports.

Serializable: Complete isolation, as if transactions ran one at a time. Maximum safety, minimum concurrency.

python

# Example: Booking system with proper isolation
from sqlalchemy import create_engine

# Configure isolation level based on your needs
engine = create_engine(
    'postgresql://localhost/bookings',
    isolation_level='REPEATABLE_READ'  # Consistent reads within transaction
)

def book_seat(user_id, seat_id):
    with engine.begin() as conn:
        # Lock the row for update
        seat = conn.execute(
            "SELECT * FROM seats WHERE seat_id = %s FOR UPDATE",
            (seat_id,)
        ).fetchone()

        if not seat.available:
            raise SeatAlreadyBookedError()

        conn.execute(
            "UPDATE seats SET available = false, booked_by = %s "
            "WHERE seat_id = %s",
            (user_id, seat_id)
        )

Durability: Written Means Written

Once a transaction commits, it survives crashes, power failures, everything. This is why database write operations are slower than pure memory operations—durability has a cost, but for critical data, it's worth it.

The CAP Theorem: Choose Your Battles

The CAP theorem states you can only have 2 out of 3 in a distributed system:

- Consistency: All nodes see the same data at the same time

Availability: Every request gets a response (not an error)

Partition Tolerance: System works despite network failures

Why You Can't Have It All

Imagine two database servers in different data centers. The network connection breaks (this happens regularly in real systems). You face a choice:

Option A: Stop accepting writes until connection is restored. This maintains consistency but sacrifices availability.

Option B: Continue accepting writes on both servers. This maintains availability but sacrifices consistency (servers now have different data).

There's no third option. You must choose.

CP Systems: Consistency + Partition Tolerance

CP systems choose consistency over availability. MongoDB with majority write concern, HBase, and Consul are CP systems.

python

# MongoDB configured for strong consistency
from pymongo import MongoClient, WriteConcern

client = MongoClient('mongodb://localhost:27017/')
db = client.banking

# Write succeeds only if majority of replicas confirm
# If replica set can't reach majority, write is rejected (less available)
# But all replicas agree on the data (consistent)
result = db.accounts.insert_one(
    {
        'account_id': 'ACC-123',
        'balance': 1000.00,
        'currency': 'USD'
    },
    write_concern=WriteConcern(w='majority', j=True)
)

When to choose CP: Financial transactions, inventory systems, any scenario where wrong data is worse than slow data.

AP Systems: Availability + Partition Tolerance

AP systems choose availability over consistency. Cassandra, DynamoDB, and Riak are AP systems.

python

# Cassandra configured for high availability
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement, ConsistencyLevel

cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('social_media')

# Write succeeds even if some nodes are down (available)
# But might be temporarily inconsistent across nodes
statement = SimpleStatement(
    "INSERT INTO user_sessions (user_id, session_data) VALUES (%s, %s)",
    consistency_level=ConsistencyLevel.ONE  # Only need one node to acknowledge
)

session.execute(statement, (user_id, session_data))

When to choose AP: Social media feeds, shopping carts, user sessions—scenarios where availability trumps consistency.

Real-World: Using Both

At a previous company, we didn't choose one philosophy. We analyzed each data type:

python

# User accounts: PostgreSQL (CP)
class UserService:
    def update_balance(self, user_id, amount):
        # Strong consistency required for money
        with postgres_db.transaction():
            return postgres_db.execute(
                "UPDATE accounts SET balance = balance + %s "
                "WHERE user_id = %s",
                (amount, user_id)
            )

# User sessions: Redis (AP)
class SessionService:
    def create_session(self, user_id, data):
        # High availability, eventual consistency OK
        return redis_cluster.setex(
            f"session:{user_id}",
            3600,  # 1 hour
            json.dumps(data)
        )

# Shopping cart: Hybrid approach
class CartService:
    def add_item(self, user_id, item):
        # Optimistic update to local cache (AP)
        cart = redis.get(f"cart:{user_id}") or []
        cart.append(item)
        redis.set(f"cart:{user_id}", cart)

        # Async sync to PostgreSQL for durability (CP)
        task_queue.enqueue(sync_cart_to_db, user_id, cart)

Types of Databases: Choosing Your Weapon

Let's explore different database types with practical examples.

Relational Databases: The Structured Powerhouse

Perfect for structured data with complex relationships.

sql

-- E-commerce schema showing the power of relationships
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    stock INTEGER NOT NULL CHECK (stock >= 0)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(user_id),
    total DECIMAL(10,2) NOT NULL,
    status VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

-- Complex queries are natural and efficient
SELECT
    u.email,
    COUNT(DISTINCT o.order_id) as total_orders,
    SUM(o.total) as total_spent,
    AVG(o.total) as avg_order_value
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.created_at >= NOW() - INTERVAL '30 days'
GROUP BY u.email
HAVING COUNT(DISTINCT o.order_id) > 5
ORDER BY total_spent DESC;

When to use: Financial systems, CRM, ERP, any application with complex relationships.

Document Databases: The Flexible Adapter

Perfect for hierarchical data and flexible schemas.

javascript

// Blog post in MongoDB - naturally nested structure
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "title": "Database Design Patterns",
  "slug": "database-design-patterns",
  "author": {
    "user_id": 12345,
    "name": "Hrithik Dhakrey",
    "avatar": "https://cdn.example.com/avatar.jpg"
  },
  "content": "Today we're learning about databases...",
  "tags": ["databases", "system-design", "backend"],
  "published_at": ISODate("2025-01-15T10:00:00Z"),
  "stats": {
    "views": 15420,
    "likes": 342,
    "shares": 67
  },
  "comments": [
    {
      "comment_id": "cmt_001",
      "user": {
        "user_id": 67890,
        "name": "Jane Developer"
      },
      "text": "Great article!",
      "created_at": ISODate("2025-01-15T11:30:00Z")
    }
  ]
}

// One query gets everything - no joins needed
const post = await db.posts.findOne({ slug: "database-design-patterns" });

// Easy to evolve - add new fields without migration
await db.posts.updateOne(
  { _id: post._id },
  {
    $set: {
      video_url: "https://youtube.com/...",
      reading_time: 15
    }
  }
);

When to use: Content management, product catalogs, user profiles with varying fields.

Key-Value Stores: The Speed Demon

Perfect for caching and simple, fast lookups.

python

import redis
import json

redis_client = redis.Redis(host='localhost', port=6379)

# Session management
def create_session(user_id, session_data):
    session_id = generate_session_id()
    redis_client.setex(
        f"session:{session_id}",
        3600,  # 1 hour expiry
        json.dumps(session_data)
    )
    return session_id

# Rate limiting
def check_rate_limit(user_id, max_requests=100, window=3600):
    key = f"rate_limit:{user_id}"
    current = redis_client.incr(key)

    if current == 1:
        redis_client.expire(key, window)

    return current <= max_requests

# Caching expensive queries
def get_user_profile(user_id):
    cache_key = f"user:{user_id}"

    # Try cache first
    cached = redis_client.get(cache_key)
    if cached:
        return json.loads(cached)

    # Cache miss - fetch from database
    user = postgres_db.query("SELECT * FROM users WHERE id = %s", user_id)

    # Cache for 5 minutes
    redis_client.setex(cache_key, 300, json.dumps(user))
    return user

# Leaderboard with sorted sets
def update_leaderboard(user_id, score):
    redis_client.zadd("game_leaderboard", {user_id: score})

def get_top_players(limit=10):
    return redis_client.zrevrange("game_leaderboard", 0, limit-1, withscores=True)

When to use: Caching, sessions, real-time features, rate limiting.

Graph Databases: The Relationship Expert

Perfect when relationships are as important as the data itself.

cypher

-- Social network in Neo4j
-- Create users and relationships
CREATE (alice:User {user_id: 1, name: 'Alice'})
CREATE (bob:User {user_id: 2, name: 'Bob'})
CREATE (carol:User {user_id: 3, name: 'Carol'})
CREATE (dave:User {user_id: 4, name: 'Dave'})

CREATE (alice)-[:FOLLOWS]->(bob)
CREATE (alice)-[:FOLLOWS]->(carol)
CREATE (bob)-[:FOLLOWS]->(carol)
CREATE (bob)-[:FOLLOWS]->(dave)
CREATE (carol)-[:FOLLOWS]->(dave)

-- Find mutual friends (try this efficiently in SQL!)
MATCH (alice:User {name: 'Alice'})-[:FOLLOWS]->(mutual)<-[:FOLLOWS]-(friend)
WHERE alice <> friend
RETURN DISTINCT friend.name

-- Friend recommendations (friends of friends you don't follow)
MATCH (alice:User {name: 'Alice'})-[:FOLLOWS*2..2]->(recommendation)
WHERE NOT (alice)-[:FOLLOWS]->(recommendation)
  AND alice <> recommendation
RETURN recommendation.name, COUNT(*) as connection_strength
ORDER BY connection_strength DESC
LIMIT 5

When to use: Social networks, recommendation engines, fraud detection, knowledge graphs.

Time-Series Databases: The Metrics Collector

Perfect for timestamped data and analytics.

python

from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

client = InfluxDBClient(url="http://localhost:8086", token="token", org="myorg")
write_api = client.write_api(write_options=SYNCHRONOUS)

# Recording application metrics
def track_api_request(endpoint, response_time, status_code):
    point = Point("api_metrics") \
        .tag("endpoint", endpoint) \
        .tag("status_code", str(status_code)) \
        .field("response_time_ms", response_time) \
        .field("value", 1)

    write_api.write(bucket="monitoring", record=point)

# Recording IoT sensor data
def record_sensor_data(sensor_id, temperature, humidity):
    point = Point("sensor_readings") \
        .tag("sensor_id", sensor_id) \
        .tag("location", "warehouse_a") \
        .field("temperature", temperature) \
        .field("humidity", humidity)

    write_api.write(bucket="iot", record=point)

# Querying time-series data
query = '''
from(bucket: "monitoring")
  |> range(start: -24h)
  |> filter(fn: (r) => r["_measurement"] == "api_metrics")
  |> filter(fn: (r) => r["endpoint"] == "/api/orders")
  |> aggregateWindow(every: 1h, fn: mean)
  |> yield(name: "mean_response_time")
'''

result = client.query_api().query(query)

When to use: Application monitoring, IoT data, financial tick data, analytics.

Let's design a real system. Here's how I'd architect a social media platform using multiple databases.

The Architecture Overview

python

"""
Database Strategy for Social Media Platform:

1. PostgreSQL - User accounts and authentication (needs ACID)
2. MongoDB - Posts and comments (flexible schema, high write volume)
3. Neo4j - Social graph (optimized for relationship queries)
4. Redis - Feed cache and sessions (speed critical)
5. InfluxDB - Analytics and metrics (time-series data)
"""

# 1. User Service (PostgreSQL)
class UserService:
    def create_user(self, email, password, name):
        """
        User accounts need ACID guarantees.
        Email must be unique. Password must be hashed.
        All or nothing.
        """
        with postgres_db.transaction():
            user = postgres_db.execute(
                "INSERT INTO users (email, password_hash, created_at) "
                "VALUES (%s, %s, NOW()) RETURNING user_id",
                (email, hash_password(password))
            )

            postgres_db.execute(
                "INSERT INTO user_profiles (user_id, name) VALUES (%s, %s)",
                (user.user_id, name)
            )

            return user.user_id

    def get_user(self, user_id):
        """
        Complex query across related tables - SQL's strength
        """
        return postgres_db.query(
            "SELECT u.*, p.name, p.bio, p.avatar_url, "
            "COUNT(DISTINCT o.order_id) as order_count "
            "FROM users u "
            "JOIN user_profiles p ON u.user_id = p.user_id "
            "LEFT JOIN orders o ON u.user_id = o.user_id "
            "WHERE u.user_id = %s "
            "GROUP BY u.user_id, p.name, p.bio, p.avatar_url",
            (user_id,)
        )

# 2. Post Service (MongoDB)
class PostService:
    def create_post(self, user_id, content, media_urls=None):
        """
        Posts have flexible schema - some have images, some videos, etc.
        MongoDB's document model fits naturally.
        """
        post = {
            'user_id': user_id,
            'content': content,
            'media_urls': media_urls or [],
            'created_at': datetime.now(),
            'stats': {'likes': 0, 'comments': 0, 'shares': 0},
            'comments': []  # Embedded for fast reads
        }

        result = mongo_db.posts.insert_one(post)

        # Async: update follower feeds
        task_queue.enqueue(update_feeds, user_id, str(result.inserted_id))

        return str(result.inserted_id)

    def add_comment(self, post_id, user_id, text):
        """
        Embedded comments - one query to get post with all comments
        """
        mongo_db.posts.update_one(
            {'_id': ObjectId(post_id)},
            {
                '$push': {
                    'comments': {
                        'comment_id': generate_id(),
                        'user_id': user_id,
                        'text': text,
                        'created_at': datetime.now()
                    }
                },
                '$inc': {'stats.comments': 1}
            }
        )

# 3. Social Graph Service (Neo4j)
class SocialGraphService:
    def follow_user(self, follower_id, followee_id):
        """
        Neo4j excels at relationship operations
        """
        query = """
        MATCH (follower:User {user_id: $follower_id})
        MATCH (followee:User {user_id: $followee_id})
        CREATE (follower)-[r:FOLLOWS {created_at: datetime()}]->(followee)
        RETURN r
        """
        neo4j_db.run(query, follower_id=follower_id, followee_id=followee_id)

    def get_friend_suggestions(self, user_id, limit=10):
        """
        Friends of friends - natural in graph DB, nightmare in SQL
        """
        query = """
        MATCH (user:User {user_id: $user_id})-[:FOLLOWS*2..2]->(suggestion)
        WHERE NOT (user)-[:FOLLOWS]->(suggestion)
          AND user <> suggestion
        WITH suggestion, COUNT(*) as strength
        ORDER BY strength DESC
        LIMIT $limit
        RETURN suggestion.user_id as user_id, strength
        """
        return neo4j_db.run(query, user_id=user_id, limit=limit)

# 4. Feed Service (Redis)
class FeedService:
    def get_feed(self, user_id, page=1, per_page=20):
        """
        Feeds must be blazing fast - Redis cache to the rescue
        """
        cache_key = f"feed:{user_id}:page:{page}"

        # Try cache first
        cached = redis_client.get(cache_key)
        if cached:
            return json.loads(cached)

        # Cache miss - build feed from MongoDB
        following = social_graph.get_following(user_id)

        posts = mongo_db.posts.find(
            {'user_id': {'$in': following}}
        ).sort('created_at', -1).skip((page-1)*per_page).limit(per_page)

        feed = list(posts)

        # Cache for 5 minutes
        redis_client.setex(cache_key, 300, json.dumps(feed, default=str))

        return feed

    def invalidate_feed(self, user_id):
        """
        When user posts, invalidate follower caches
        """
        followers = social_graph.get_followers(user_id)
        pipeline = redis_client.pipeline()
        for follower_id in followers:
            pipeline.delete(f"feed:{follower_id}:*")
        pipeline.execute()

# 5. Analytics Service (InfluxDB)
class AnalyticsService:
    def track_post_view(self, post_id, user_id):
        """
        Millions of events per day - time-series DB handles it
        """
        point = Point("post_views") \
            .tag("post_id", str(post_id)) \
            .tag("user_id", str(user_id)) \
            .field("value", 1)

        influx_write_api.write(bucket="analytics", record=point)

    def get_trending_posts(self, hours=1):
        """
        Time-based aggregation - InfluxDB's specialty
        """
        query = f'''
        from(bucket: "analytics")
          |> range(start: -{hours}h)
          |> filter(fn: (r) => r["_measurement"] == "post_views")
          |> group(columns: ["post_id"])
          |> count()
          |> sort(columns: ["_value"], desc: true)
          |> limit(n: 20)
        '''
        return influx_query_api.query(query)

Why This Architecture Works

Each database does what it's optimized for:

- PostgreSQL ensures user account integrity with ACID transactions

MongoDB handles high-volume, flexible post data

Neo4j makes graph queries (friend suggestions) trivial

Redis provides sub-millisecond feed access

InfluxDB efficiently stores and queries millions of metric events

Could we use just PostgreSQL? Technically yes. But we'd struggle with feed performance, graph queries would be complex, and analytics would slow down operational tables.

Making the Right Choice: A Decision Framework

Here's the framework I use:

Step 1: Understand Your Data

- Structured or flexible?

How are entities related?

How often does schema change?

Step 2: Define Consistency Requirements

- Can you tolerate eventual consistency?

Are there regulatory requirements?

Step 3: Consider Scale

- How much data?

Reads vs writes per second?

Need for global distribution?

Step 4: Evaluate Your Team

- What's your team familiar with?

Operational expertise available?

Step 5: Start Simple, Scale Pragmatically

- Don't optimize prematurely

Begin with what you know

Scale when you have real needs

Common Pitfalls

Choosing based on hype: Netflix's needs aren't your startup's needs.

Premature optimization: PostgreSQL can handle millions of users if designed well.

Ignoring operations: A database is only as good as your ability to back it up and recover from failures.

Not planning for growth: Design your application with clear service boundaries so you can migrate later if needed.

Key Takeaways

1. No perfect database exists - every choice involves trade-offs 2. Polyglot persistence is normal - use multiple database types 3. Understand ACID and CAP - these have real implications 4. Start simple, scale pragmatically - don't optimize prematurely 5. Operations matter - consider backups, monitoring, recovery

What's Next?

Tomorrow we're diving into Caching Strategies (Day 5). You'll learn how to serve millions of requests without constantly hitting your database, understand cache invalidation patterns, and design multi-layer caching strategies.

Understanding databases is crucial, but understanding how to avoid hitting them unnecessarily is just as important.

---

Practice Exercise: Design database architecture for a ride-sharing app:

- Driver and rider profiles

Real-time location tracking

Ride matching and pricing

Payment processing

Trip history and ratings

For each data type, choose a database and justify why. What consistency level does it need? What's the read/write pattern? Share your design in the comments!

Database Design Patterns: Choosing the Right Data Store

Database Design Patterns: Choosing the Right Data Store

The Database Selection Dilemma

SQL vs NoSQL: Beyond the Hype

When SQL Databases Make Perfect Sense

When NoSQL Databases Save the Day

The Honest Truth About Trade-offs

ACID Properties: The Promise of Reliability

Atomicity: All or Nothing

Consistency: Rules Are Rules

Isolation: Don't Step on Each Other's Toes

Durability: Written Means Written

The CAP Theorem: Choose Your Battles

Why You Can't Have It All

CP Systems: Consistency + Partition Tolerance

AP Systems: Availability + Partition Tolerance

Real-World: Using Both

Types of Databases: Choosing Your Weapon

Relational Databases: The Structured Powerhouse

Document Databases: The Flexible Adapter

Key-Value Stores: The Speed Demon

Graph Databases: The Relationship Expert

Time-Series Databases: The Metrics Collector

Practical Exercise: Social Media Platform Architecture

The Architecture Overview

Why This Architecture Works

Making the Right Choice: A Decision Framework

Common Pitfalls

Key Takeaways

What's Next?

📑 Table of Contents