How to build scalable web applications, a deep how-to
This is a practical technical guide for building scalable web applications — not marketing fluff. I'll assume you already ship software, want reliability at scale, and care about measurable outcomes (latency, throughput, cost).
I'll cover architecture, core components, tradeoffs, and show real snippets you can copy. Focus on engineering decisions and where teams commonly screw this up. We'll walk through actual production patterns used by companies handling millions of requests daily.
Context & Core Constraints
Goal:
Scalability isn't just about handling traffic — it's about handling *growth gracefully*. Your system should survive spikes, scale back down, and remain cost-efficient. Many systems fail not because of load, but because they scale in unpredictable, fragile ways.
Common constraints you'll face:
- Budget limits — cloud costs matter; every architectural choice has a dollar sign attached. A poorly designed system can cost 10x more at scale.
- Team size and expertise — use patterns your team can actually debug at 3 AM. Complexity is the enemy of reliability.
- Data consistency requirements — understand if you can tolerate eventual consistency or need strong consistency for financial transactions.
- Time-to-market pressure — scaling problems aren't excuses to over-engineer early, but you must build with scaling in mind from day one.
- Technical debt — systems that scale poorly often have fundamental architectural flaws that are expensive to fix later.
The real constraint is observability of your limits. You can't scale what you don't measure. Before optimizing, measure where your latency and throughput start degrading. That's your baseline.
Critical:
// Simple performance tracking
class PerformanceTracker {
constructor() {
this.metrics = {
responseTimes: [],
errorCount: 0,
requestCount: 0
};
}
trackRequest(startTime, success = true) {
const duration = Date.now() - startTime;
this.metrics.responseTimes.push(duration);
this.metrics.requestCount++;
if (!success) this.metrics.errorCount++;
// Calculate p95 every 100 requests
if (this.metrics.requestCount % 100 === 0) {
this.calculatePercentiles();
}
}
calculatePercentiles() {
const sorted = [...this.metrics.responseTimes].sort((a, b) => a - b);
const p95Index = Math.floor(sorted.length * 0.95);
console.log(`P95 latency: ${sorted[p95Index]}ms`);
}
}Architecture Overview (High Level)
A scalable web application is modular and separates responsibilities between layers. It's not just about microservices; it's about isolation of failure domains and clear communication contracts. Think of your architecture as a city — you need good roads (networking), zoning (separation of concerns), and emergency services (monitoring).
Edge / CDN
Handles caching and delivery of static content; acts as your first defense line for DDoS or spikes. CloudFront, Cloudflare, or Fastly.
API Gateway / Load Balancer
Routes requests, applies rate limiting, and performs lightweight authentication. AWS ALB, NGINX, or Kong.
Stateless App Layer
Processes requests; can scale horizontally by adding replicas. Containerized services in ECS, Kubernetes, or EKS.
Stateful Services
Databases, message queues, caches — where durability lives. RDS, Redis, Kafka with proper persistence.
Streaming / Event Backbone
Kafka, Pulsar, or similar for decoupled async workloads and real-time processing.
Observability & Ops
Logs, metrics, tracing — essential for debugging distributed behavior. Prometheus, Grafana, ELK stack.
CI/CD & IaC
Automate deployments, rollback, and reproducibility. Terraform, GitHub Actions, ArgoCD.
The diagram most engineers forget to draw is the one showing who depends on whom. A scalable architecture keeps dependency direction consistent — for instance, API calls flow downward (from gateway → app → data), while async events flow upward (from services → queue → consumers).
Design Principle:
// User service composition
class UserService {
constructor({ db, cache, emailQueue }) {
this.db = db;
this.cache = cache;
this.emailQueue = emailQueue;
}
async createUser(userData) {
// Write to primary database
const user = await this.db.users.create(userData);
// Cache user data
await this.cache.set(`user:${user.id}`, user);
// Queue welcome email (async)
await this.emailQueue.publish('user.created', {
userId: user.id,
email: user.email
});
return user;
}
}1. Make the App Stateless
Statelessness means any app instance can handle any request. No sticky sessions, no in-memory caches that matter, no local file writes. This allows you to spin up or terminate instances freely without breaking user sessions. Think of your application servers as cattle, not pets — they're identical and disposable.
Practical steps:
- Store sessions in Redis or Memcached, or use JWT if you can accept stateless tokens with careful expiration handling.
- Use S3 or object storage for file uploads; never rely on local disk which disappears when containers restart.
- Avoid global mutable state in memory — race conditions scale too, and they're harder to debug across multiple instances.
- Externalize configuration using environment variables or configuration services like etcd or AWS Parameter Store.
- Use distributed locks when you need coordination between instances, but prefer lock-free designs when possible.
// server.js (Express)
const express = require('express');
const session = require('express-session');
const RedisStore = require('connect-redis')(session);
const redis = require('redis');
// Create Redis client with proper connection handling
const client = redis.createClient({
url: process.env.REDIS_URL,
retry_strategy: function(options) {
if (options.error && options.error.code === 'ECONNREFUSED') {
return new Error('The server refused the connection');
}
if (options.total_retry_time > 1000 * 60 * 60) {
return new Error('Retry time exhausted');
}
if (options.attempt > 10) {
return undefined;
}
return Math.min(options.attempt * 100, 3000);
}
});
const app = express();
// Stateless session configuration
app.use(session({
store: new RedisStore({ client }),
secret: process.env.SESSION_SECRET,
resave: false, // Don't resave unchanged sessions
saveUninitialized: false, // Don't save empty sessions
cookie: {
secure: process.env.NODE_ENV === 'production',
httpOnly: true, // Prevent XSS
maxAge: 86400000, // 24 hours
sameSite: 'lax'
},
name: 'sessionId' // Don't use default 'connect.sid'
}));
// Example stateless route
app.get('/api/profile', async (req, res) => {
if (!req.session.userId) {
return res.status(401).json({ error: 'Unauthorized' });
}
try {
// Fetch from DB — not from server memory
const user = await User.findById(req.session.userId);
res.json(user);
} catch (error) {
console.error('Profile fetch error:', error);
res.status(500).json({ error: 'Internal server error' });
}
});
// Health check endpoint for load balancers
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime()
});
});A stateless app scales horizontally without orchestration drama. When you deploy, you can kill any pod and the rest keep working. This also enables blue-green deployments and canary releases with minimal user impact.
Tradeoff:
const jwt = require('jsonwebtoken');
// Generate token
function generateToken(user) {
return jwt.sign(
{
userId: user.id,
role: user.role,
// Include minimal claims needed
},
process.env.JWT_SECRET,
{
expiresIn: '24h',
issuer: 'your-app-name',
subject: user.id.toString()
}
);
}
// Verify token middleware
function authenticateToken(req, res, next) {
const authHeader = req.headers['authorization'];
const token = authHeader && authHeader.split(' ')[1]; // Bearer TOKEN
if (!token) {
return res.status(401).json({ error: 'Access token required' });
}
jwt.verify(token, process.env.JWT_SECRET, (err, user) => {
if (err) {
return res.status(403).json({ error: 'Invalid or expired token' });
}
req.user = user;
next();
});
}2. Scale Horizontally — Process Model & Connection Pooling
Scaling horizontally means adding more instances instead of making one instance bigger. This usually gives better cost control and fault tolerance. But it only works if your app is stateless and your shared dependencies (like the DB) can handle parallelism. Horizontal scaling follows the 'scale out, not up' principle — it's more resilient and cost-effective in cloud environments.
Node Clustering Example
// cluster.js - Utilizing all CPU cores
const cluster = require('cluster');
const os = require('os');
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
// Fork workers for each CPU core
const numCPUs = os.cpus().length;
console.log(`Forking for ${numCPUs} CPUs`);
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`Worker ${worker.process.pid} died. Restarting...`);
cluster.fork(); // Restart the worker
});
// Graceful shutdown
process.on('SIGTERM', () => {
console.log('Master received SIGTERM, shutting down...');
for (const id in cluster.workers) {
cluster.workers[id].kill();
}
process.exit(0);
});
} else {
// Workers share the same port
require('./server');
console.log(`Worker ${process.pid} started`);
}In Kubernetes, this becomes unnecessary — pods are automatically replicated. Your scaling strategy shifts from clustering to autoscaling policies. The key is connection pooling — managing database connections efficiently across multiple instances.
const { Pool } = require('pg');
// Configure connection pool
const pool = new Pool({
max: parseInt(process.env.PG_POOL_MAX) || 20, // Maximum connections
min: parseInt(process.env.PG_POOL_MIN) || 4, // Minimum connections
idleTimeoutMillis: 30000, // Close idle connections after 30s
connectionTimeoutMillis: 2000, // Fail fast if can't connect
maxUses: 7500, // Close connection after 7500 queries
connectionString: process.env.DATABASE_URL,
});
// Graceful shutdown
process.on('SIGINT', async () => {
console.log('Shutting down connection pool...');
await pool.end();
process.exit(0);
});
// Example usage with proper error handling
async function getUserById(userId) {
const client = await pool.connect();
try {
const result = await client.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
return result.rows[0];
} finally {
client.release(); // Always release the client back to pool
}
}
module.exports = {
pool,
getUserById
};Critical Mistake:
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 603. Use Caching Smartly (and Measure Everything)
Caching is the cheapest performance multiplier you have, but also the most dangerous if you don't track hit rates or invalidation. Cache at the edge, then at the app layer, then at the query level. Remember: caching is a trade-off between freshness and performance — know your data's volatility.
- Edge CDN: Cache static assets, or even prerendered pages if content is predictable. Use Cache-Control headers effectively.
- Reverse proxy: Use Nginx or Varnish for response caching via proper Cache-Control headers and cache keys.
- App layer: Redis for hot DB queries, rate limiting, and temporary computations with appropriate TTLs.
- Code-level memoization: Cache heavy function outputs with TTLs, but beware of memory leaks in long-running processes.
- Database query cache: Some databases have built-in query caches, but they're often less effective than application-level caching.
The 80/20 rule applies — caching your slowest 20% of queries usually cuts 80% of latency pain. But measure your cache hit rates! A cache with 40% hit rate might be wasting resources.
Key Insight:
class UserServiceWithWriteThrough {
constructor({ db, cache }) {
this.db = db;
this.cache = cache;
}
async updateUser(userId, updates) {
// Update database first
const updatedUser = await this.db.users.update(updates, {
where: { id: userId },
returning: true
});
// Update cache immediately
await this.cache.set(
`user:${userId}:v3`,
updatedUser,
'EX', 3600 // 1 hour TTL
);
return updatedUser;
}
async getUser(userId) {
// Try cache first
const cached = await this.cache.get(`user:${userId}:v3`);
if (cached) return JSON.parse(cached);
// Fall back to database
const user = await this.db.users.findByPk(userId);
if (user) {
// Populate cache for next time
await this.cache.setex(
`user:${userId}:v3`,
3600,
JSON.stringify(user)
);
}
return user;
}
}4. Decouple with Async / Message Queues
Async architecture lets your app breathe. Instead of blocking users while heavy jobs run, enqueue and process them later. This smooths spikes and improves UX. Message queues act as shock absorbers for your system, allowing components to work at their own pace without blocking each other.
- Send transactional emails asynchronously — no user should wait for email delivery
- Offload image/video processing to specialized workers
- Perform analytics or denormalization in background without affecting response times
- Integrate with 3rd parties without blocking requests — use webhooks or queue-based integration
- Handle batch operations that would timeout if done synchronously
const { Kafka, logLevel } = require('kafkajs');
const kafka = new Kafka({
clientId: 'user-service',
brokers: process.env.KAFKA_BROKERS.split(','),
logLevel: logLevel.ERROR,
retry: {
initialRetryTime: 100,
retries: 8,
maxRetryTime: 30000
}
});
const producer = kafka.producer();
// Connect producer on startup
await producer.connect();
class EventService {
constructor() {
this.producer = producer;
}
async publishUserEvent(eventType, userId, metadata = {}) {
const event = {
type: eventType,
userId,
timestamp: new Date().toISOString(),
service: 'user-service',
version: '1.0',
...metadata
};
try {
await this.producer.send({
topic: 'user-events',
messages: [
{
key: userId.toString(), // Same key ensures ordering for same user
value: JSON.stringify(event),
headers: {
'event-type': eventType,
'version': '1.0'
}
}
]
});
console.log(`Published ${eventType} event for user ${userId}`);
} catch (error) {
console.error('Failed to publish event:', error);
// In production, you might want to store failed events for retry
await this.storeFailedEvent(event, error);
}
}
async storeFailedEvent(event, error) {
// Store in database or dead letter queue for manual processing
console.error('Storing failed event:', event, error);
}
}
// Usage in user registration
app.post('/api/users', async (req, res) => {
try {
const user = await userService.create(req.body);
// Send response immediately
res.status(201).json(user);
// Queue async tasks
await eventService.publishUserEvent('user.registered', user.id, {
email: user.email,
plan: user.plan
});
} catch (error) {
console.error('User creation error:', error);
res.status(500).json({ error: 'Failed to create user' });
}
});Pattern:
const consumer = kafka.consumer({
groupId: 'email-service',
sessionTimeout: 30000,
heartbeatInterval: 3000
});
await consumer.connect();
await consumer.subscribe({ topic: 'user-events', fromBeginning: false });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
try {
const event = JSON.parse(message.value.toString());
// Check if we've already processed this event
const processed = await checkIfProcessed(event.id || message.offset);
if (processed) {
console.log('Skipping already processed event:', event.id);
return;
}
// Process based on event type
switch (event.type) {
case 'user.registered':
await sendWelcomeEmail(event.userId, event.email);
break;
case 'user.upgraded':
await sendUpgradeEmail(event.userId, event.plan);
break;
default:
console.log('Unknown event type:', event.type);
}
// Mark as processed
await markAsProcessed(event.id || message.offset);
} catch (error) {
console.error('Error processing message:', error);
// In production, send to dead letter queue
}
}
});5. Data Modeling: OLTP vs OLAP
Don't mix operational and analytical workloads on the same database. OLTP (transactions) and OLAP (analytics) have opposite access patterns. OLTP needs fast writes and point reads, while OLAP needs complex aggregations over large datasets. Trying to do both on the same system leads to contention and poor performance for both workloads.
- Use normalized SQL schemas for transactional safety and data integrity.
- Use denormalized stores (replicas, materialized views) for reads to avoid complex joins at query time.
- Feed analytics systems from event streams, not live queries, to avoid impacting user-facing operations.
- Consider time-series databases for metrics and monitoring data with high write volumes.
- Use document databases for flexible schemas when you have hierarchical or polymorphic data.
-- Create materialized view for fast reporting
CREATE MATERIALIZED VIEW user_activity_summary AS
SELECT
u.id as user_id,
u.email,
u.created_at,
COUNT(DISTINCT s.id) as session_count,
COUNT(DISTINCT o.id) as order_count,
SUM(o.amount) as total_spent,
MAX(s.created_at) as last_active
FROM users u
LEFT JOIN sessions s ON u.id = s.user_id
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.deleted_at IS NULL
GROUP BY u.id, u.email, u.created_at;
-- Refresh periodically (could be triggered by change data capture)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_activity_summary;
-- Create index for fast queries
CREATE UNIQUE INDEX ON user_activity_summary (user_id);6. Database Scaling: Replication, Sharding, Proxies
Start simple. Use one primary and read replicas. Replicas absorb read-heavy traffic while your write path stays consistent. As you grow, you'll need to consider more advanced strategies like connection pooling, read/write splitting, and eventually sharding.
- Primary handles writes; replicas handle reporting or read-heavy endpoints with careful load balancing.
- Remember replication lag — a user might not see a recent write immediately. Design your UX to handle this gracefully.
- Use connection poolers like PgBouncer for PostgreSQL to handle many concurrent connections efficiently.
- Consider using a database proxy like ProxySQL for intelligent query routing and failover.
- For extreme scale, implement sharding by logical separation (tenants) or key ranges (user_id).
When write volume exceeds a single node, introduce sharding by key (e.g., user_id ranges). Or move specialized workloads (like metrics) to purpose-built stores (Cassandra, ClickHouse, TimescaleDB).
Tip:
const { Pool } = require('pg');
// Primary for writes
const primaryPool = new Pool({
host: process.env.DB_PRIMARY_HOST,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 10
});
// Replica for reads
const replicaPool = new Pool({
host: process.env.DB_REPLICA_HOST,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20 // More connections for reads
});
class DatabaseService {
constructor() {
this.primary = primaryPool;
this.replica = replicaPool;
}
// Use replica for reads
async findUserById(userId) {
const client = await this.replica.connect();
try {
const result = await client.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
return result.rows[0];
} finally {
client.release();
}
}
// Use primary for writes
async updateUser(userId, updates) {
const client = await this.primary.connect();
try {
const result = await client.query(
'UPDATE users SET name = $1, updated_at = NOW() WHERE id = $2 RETURNING *',
[updates.name, userId]
);
return result.rows[0];
} finally {
client.release();
}
}
}7. Search & Analytics — Scaling Data Access Patterns
When your application grows beyond simple database queries, you need specialized solutions for search and analytics. Trying to run complex search queries or analytical workloads on your primary OLTP database will kill performance for both your users and your analytics.
Search Architecture Patterns
- Elasticsearch/OpenSearch — Full-text search with faceting, fuzzy matching, and relevance scoring
- Algolia — Managed search service with excellent UX and performance
- PostgreSQL Full-Text Search — Good enough for basic search needs without additional infrastructure
- ClickHouse — For analytical queries on large datasets with sub-second response times
const { Client } = require('@elastic/elasticsearch');
const esClient = new Client({
node: process.env.ELASTICSEARCH_URL,
auth: {
username: process.env.ELASTICSEARCH_USERNAME,
password: process.env.ELASTICSEARCH_PASSWORD
}
});
class SearchService {
constructor() {
this.client = esClient;
this.indexName = 'products';
}
async indexProduct(product) {
await this.client.index({
index: this.indexName,
id: product.id.toString(),
body: {
id: product.id,
name: product.name,
description: product.description,
category: product.category,
price: product.price,
tags: product.tags,
created_at: product.created_at,
// Search-specific fields
suggest: {
input: [product.name, ...product.tags],
weight: 1
}
}
});
// Refresh to make immediately searchable
await this.client.indices.refresh({ index: this.indexName });
}
async searchProducts(query, filters = {}) {
const mustClauses = [];
// Text search across multiple fields
if (query) {
mustClauses.push({
multi_match: {
query: query,
fields: ['name^3', 'description^2', 'tags'], // Boost certain fields
fuzziness: 'AUTO'
}
});
}
// Add filters
if (filters.category) {
mustClauses.push({
term: { category: filters.category }
});
}
if (filters.minPrice !== undefined) {
mustClauses.push({
range: { price: { gte: filters.minPrice } }
});
}
const searchBody = {
query: {
bool: {
must: mustClauses
}
},
aggs: {
categories: {
terms: { field: 'category.keyword' }
},
price_ranges: {
range: {
field: 'price',
ranges: [
{ to: 50 },
{ from: 50, to: 100 },
{ from: 100, to: 200 },
{ from: 200 }
]
}
}
},
highlight: {
fields: {
name: {},
description: {}
}
},
size: 20,
from: filters.offset || 0
};
const result = await this.client.search({
index: this.indexName,
body: searchBody
});
return {
hits: result.body.hits.hits.map(hit => ({
...hit._source,
highlight: hit.highlight
})),
total: result.body.hits.total.value,
aggregations: result.body.aggregations
};
}
}Pattern:
8. Autoscaling in Production — Beyond Basic Metrics
Autoscaling isn't just about CPU and memory. Effective autoscaling considers application-level metrics, queue depths, and business indicators. The goal is to have enough capacity to handle load while minimizing costs.
Advanced Autoscaling Strategies
- Horizontal Pod Autoscaling (HPA) — Scale based on CPU, memory, or custom metrics
- Vertical Pod Autoscaling (VPA) — Adjust resource requests/limits for pods
- Cluster Autoscaling — Add/remove nodes from your cluster
- Custom Metrics — Scale based on queue depth, request latency, or business metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-processor
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-processor
minReplicas: 2
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: kafka_lag_messages
target:
type: AverageValue
averageValue: "1000"
- type: Object
object:
metric:
name: http_requests_per_second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: main-ingress
target:
type: Value
value: "1000"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 10
periodSeconds: 15
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60const client = require('prom-client');
const express = require('express');
// Create custom metrics
const orderQueueDepth = new client.Gauge({
name: 'order_queue_depth',
help: 'Current number of orders waiting in queue',
labelNames: ['queue_name']
});
const activeUserSessions = new client.Gauge({
name: 'active_user_sessions',
help: 'Number of currently active user sessions'
});
const p95ResponseTime = new client.Gauge({
name: 'http_request_duration_seconds_p95',
help: '95th percentile of HTTP request duration',
labelNames: ['method', 'route', 'status_code']
});
class MetricsCollector {
constructor() {
this.app = express();
this.setupMetricsEndpoint();
}
setupMetricsEndpoint() {
this.app.get('/metrics', async (req, res) => {
try {
// Update custom metrics
await this.updateCustomMetrics();
res.set('Content-Type', client.register.contentType);
res.end(await client.register.metrics());
} catch (error) {
res.status(500).end(error);
}
});
}
async updateCustomMetrics() {
try {
// Update queue depth from Redis
const queueDepth = await redis.llen('orders:processing');
orderQueueDepth.set({ queue_name: 'orders' }, queueDepth);
// Update active sessions
const sessionCount = await redis.scard('active_sessions');
activeUserSessions.set(sessionCount);
// These would be updated from your request metrics
} catch (error) {
console.error('Failed to update metrics:', error);
}
}
start(port = 3001) {
this.app.listen(port, () => {
console.log(`Metrics server running on port ${port}`);
});
}
}
// Usage
const metrics = new MetricsCollector();
metrics.start();Critical:
9. Observability — Don't Fly Blind
The hardest part of scalability isn't adding servers — it's knowing what's breaking when load hits. Observability is your radar. It's not just about monitoring known issues, but about exploring unknown unknowns. Good observability lets you ask arbitrary questions about your system's behavior.
- Collect metrics (latency, throughput, errors, memory, DB connections) with context and dimensions.
- Trace distributed requests (OpenTelemetry or Jaeger) to understand complex call chains.
- Aggregate logs centrally — grep doesn't scale across distributed systems.
- Set up alerting that wakes you up for real problems, not noise.
- Use structured logging with correlation IDs to trace requests across services.
Golden Rule:
const { createLogger, format, transports } = require('winston');
const { v4: uuidv4 } = require('uuid');
// Create logger with structured format
const logger = createLogger({
level: 'info',
format: format.combine(
format.timestamp(),
format.errors({ stack: true }),
format.json()
),
defaultMeta: { service: 'user-api' },
transports: [
new transports.Console(),
new transports.File({ filename: 'error.log', level: 'error' }),
new transports.File({ filename: 'combined.log' })
]
});
// Middleware to add correlation ID to each request
function correlationMiddleware(req, res, next) {
const correlationId = req.headers['x-correlation-id'] || uuidv4();
req.correlationId = correlationId;
res.setHeader('X-Correlation-ID', correlationId);
// Add to logger context
req.logger = logger.child({ correlationId });
next();
}
// Usage in route handlers
app.get('/api/users/:id', correlationMiddleware, async (req, res) => {
const startTime = Date.now();
try {
req.logger.info('Fetching user', { userId: req.params.id });
const user = await userService.findById(req.params.id);
req.logger.info('User fetched successfully', {
userId: req.params.id,
duration: Date.now() - startTime
});
res.json(user);
} catch (error) {
req.logger.error('Failed to fetch user', {
userId: req.params.id,
error: error.message,
stack: error.stack,
duration: Date.now() - startTime
});
res.status(500).json({ error: 'Internal server error' });
}
});10. Reliability Patterns — Building Resilient Systems
Reliability isn't about preventing failures — it's about designing systems that continue working when components fail. Distributed systems fail in complex ways, and your architecture should embrace this reality.
Essential Reliability Patterns
- Circuit Breaker — Prevent cascading failures when dependencies are down
- Retry with Exponential Backoff — Handle transient failures gracefully
- Bulkheads — Isolate failures to specific components
- Timeouts — Never wait indefinitely for responses
- Dead Letter Queues — Handle messages that can't be processed
- Health Checks — Enable load balancers to route traffic away from unhealthy instances
class CircuitBreaker {
constructor(timeout = 10000, failureThreshold = 5, resetTimeout = 60000) {
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.failureCount = 0;
this.successCount = 0;
this.nextAttempt = Date.now();
this.timeout = timeout;
this.failureThreshold = failureThreshold;
this.resetTimeout = resetTimeout;
this.lastFailureTime = null;
}
async call(serviceFunction, ...args) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is OPEN');
}
this.state = 'HALF_OPEN';
}
try {
const promise = serviceFunction(...args);
const timeoutPromise = new Promise((_, reject) => {
setTimeout(() => reject(new Error('Timeout')), this.timeout);
});
const result = await Promise.race([promise, timeoutPromise]);
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.successCount++;
if (this.state === 'HALF_OPEN' && this.successCount >= this.failureThreshold) {
this.state = 'CLOSED';
this.successCount = 0;
}
}
onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.resetTimeout;
}
}
getStatus() {
return {
state: this.state,
failureCount: this.failureCount,
successCount: this.successCount,
nextAttempt: this.nextAttempt,
lastFailureTime: this.lastFailureTime
};
}
}
// Usage with payment service
const paymentCircuitBreaker = new CircuitBreaker(5000, 3, 30000);
async function processPayment(paymentData) {
return await paymentCircuitBreaker.call(
paymentService.process.bind(paymentService),
paymentData
);
}async function retryWithBackoff(operation, maxRetries = 5, baseDelay = 1000) {
let lastError;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await operation();
} catch (error) {
lastError = error;
// Don't retry on certain errors
if (error.isNonRetriable) {
break;
}
if (attempt === maxRetries) {
break;
}
// Exponential backoff with jitter
const delay = baseDelay * Math.pow(2, attempt - 1);
const jitter = delay * 0.1 * Math.random();
const totalDelay = delay + jitter;
console.log(`Attempt ${attempt} failed, retrying in ${Math.round(totalDelay)}ms: ${error.message}`);
await new Promise(resolve => setTimeout(resolve, totalDelay));
}
}
throw lastError;
}
// Usage
async function sendEmailWithRetry(emailData) {
return await retryWithBackoff(
() => emailService.send(emailData),
5, // max retries
1000 // base delay (1 second)
);
}Pattern:
11. Security at Scale — Protecting Distributed Systems
Security becomes more complex as systems scale. Attack surfaces multiply, and traditional perimeter security becomes insufficient. You need defense in depth with security controls at every layer.
Scalable Security Practices
- Zero Trust Architecture — Verify every request, regardless of source
- API Rate Limiting — Prevent abuse and DDoS attacks
- Service-to-Service Authentication — Use mTLS or JWT for internal communication
- Secret Management — Never store secrets in code or configuration files
- Network Policies — Control traffic flow between services
- Regular Security Scanning — Automated vulnerability detection in CI/CD
Security First:
12. CI/CD & Infrastructure as Code — Scaling Development
As your team and application grow, manual deployment processes become bottlenecks and sources of errors. CI/CD and Infrastructure as Code (IaC) enable you to scale your development process while maintaining reliability and velocity.
Modern CI/CD Pipeline Components
- Infrastructure as Code — Terraform, CloudFormation, or Pulumi for reproducible environments
- GitOps — Use Git as the single source of truth for both application and infrastructure
- Progressive Delivery — Canary deployments, feature flags, and blue-green deployments
- Automated Testing — Unit, integration, and end-to-end tests that run on every change
- Security Scanning — SAST, DAST, and dependency vulnerability scanning in pipeline
Best Practice:
Final Lessons & Pitfalls
Remember that scalability is a journey, not a destination. Start with the simplest architecture that meets your current needs, but build it in a way that allows for evolution. Monitor everything, automate relentlessly, and always have a rollback plan.
Need Help Building Scalable Systems?
Our team has deep experience architecting and scaling production systems. Let's discuss your specific challenges and build something that grows with your business.