Linux Performance Monitoring and Optimization: Master System Performance

It's Monday morning, 9 AM. Your e-commerce website is crawling, database queries are timing out, and customer complaints are flooding in. Revenue is dropping by the minute, and everyone's looking at you to fix it. You have tools, you have access, but where do you start? How do you quickly identify whether it's CPU, memory, disk I/O, or network that's causing the bottleneck?

This scenario plays out in IT departments worldwide every day. As someone who has debugged performance issues ranging from overloaded web servers to misconfigured databases, I can tell you that the difference between a good system administrator and a great one isn't knowing every performance tool—it's knowing which tool to use when, and how to interpret what it's telling you.

Today, we'll master the art and science of Linux performance monitoring and optimization. We'll explore the power trio of sar, iostat, and vmstat, along with advanced techniques for identifying and resolving performance bottlenecks before they become business-critical issues.

Understanding Linux Performance: The Four Pillars

The Performance Resource Model

Every performance issue falls into one of four categories:

plaintext

System Performance Resources:

CPU      Memory     Storage     Network
 ↓         ↓          ↓          ↓
Usage    Pressure   I/O Wait   Bandwidth
Queues   Swapping   Latency    Packet Loss
Context  Leaks      IOPS       Congestion

Performance is a chain—the weakest link determines overall system performance.

Performance Methodology: USE Method

Utilization: How busy is the resource? Saturation: Is there more work than the resource can handle? Errors: Are there any error conditions?

This methodology helps systematically analyze each resource type.

The Essential Performance Monitoring Trinity

1. vmstat: Virtual Memory Statistics

vmstat provides a snapshot of system activity including processes, memory, paging, block I/O, and CPU usage.

Basic vmstat Usage

bash

# Display current system status
vmstat

# Monitor every 2 seconds, 10 times
vmstat 2 10

# Display in MB instead of KB
vmstat -S M 2 5

Understanding vmstat Output

plaintext

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 1843200  80532 947012    0    0     5    23   45   78  3  1 96  0  0

Key Metrics Explained:

Processes (procs):

r: Runnable processes (higher than CPU count = CPU pressure)

b: Blocked processes waiting for I/O

Memory:

free: Available memory

buff: Buffers (metadata cache)

cache: Page cache (file data cache)

si/so: Swap in/out (should be 0 for good performance)

I/O:

bi/bo: Blocks in/out per second

High values indicate disk bottleneck

System:

in: Interrupts per second

cs: Context switches per second (high values = CPU thrashing)

CPU:

us: User time

sy: System time

id: Idle time

wa: I/O wait (high = storage bottleneck)

Real-World vmstat Analysis

Scenario 1: Memory Pressure

bash

$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  2  45032   3421  15234  89234  123   89    45    67  892 1543 15  8 65 12  0

Analysis:

High si/so (swap activity) = memory pressure

r=4 on dual-core system = CPU pressure

wa=12% = some I/O waiting

2. iostat: I/O Statistics

iostat monitors disk I/O performance and provides detailed statistics about storage subsystem performance.

Basic iostat Usage

bash

# Display I/O statistics
iostat

# Monitor every 2 seconds
iostat 2

# Extended statistics with more details
iostat -x 1 5

# Monitor specific devices
iostat -x /dev/sda /dev/sdb 2

Understanding iostat Output

bash

$ iostat -x 1 5
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.12     1.85    2.45    8.67    45.23   156.89    36.32     0.08    7.23    5.67    7.89   2.45   2.8

Critical Metrics:

- r/s, w/s: Read/write requests per second

rkB/s, wkB/s: Read/write KB per second

avgqu-sz: Average queue size (high = bottleneck)

await: Average wait time (milliseconds)

svctm: Service time (milliseconds)

%util: Device utilization percentage

Performance Thresholds

bash

# Good Performance Indicators:
# %util < 80%          # Device not saturated
# await < 10ms         # Low latency
# avgqu-sz < 2         # No queuing pressure

# Warning Signs:
# %util > 90%          # Device saturated
# await > 50ms         # High latency
# avgqu-sz > 5         # Queue pressure

3. sar: System Activity Reporter

sar is the most comprehensive performance monitoring tool, capable of collecting and reporting virtually every system metric.

Essential sar Commands

bash

# CPU utilization
sar -u 1 10

# Memory usage
sar -r 1 10

# I/O statistics
sar -b 1 10

# Network statistics
sar -n DEV 1 10

# Load average
sar -q 1 10

# All statistics
sar -A 1 5

Historical Analysis with sar

bash

# View yesterday's data
sar -u -f /var/log/sysstat/saXX

# Specific time range
sar -u -s 09:00:00 -e 17:00:00

# Generate daily report
sar -A > daily-performance-report.txt

Advanced Performance Analysis Techniques

CPU Performance Deep Dive

Identifying CPU Bottlenecks

bash

#!/bin/bash
# cpu-analysis.sh - Comprehensive CPU analysis

echo "=== CPU Analysis Report ==="
echo "Date: $(date)"
echo

# Current CPU usage
echo "1. Current CPU Usage:"
vmstat 1 5 | tail -1 | awk '{print "User: " $13 "%, System: " $14 "%, Idle: " $15 "%, I/O Wait: " $16 "%"}'
echo

# Top CPU consuming processes
echo "2. Top CPU Consumers:"
ps aux --sort=-%cpu | head -10 | awk '{printf "%-15s %-8s %-8s %s\n", $1, $2, $3, $11}'
echo

# Load average analysis
echo "3. Load Average Analysis:"
uptime | awk -F'load average:' '{print $2}' | awk '{
    load1=$1; load5=$2; load15=$3;
    gsub(",", "", load1); gsub(",", "", load5); gsub(",", "", load15);
    print "1min: " load1 ", 5min: " load5 ", 15min: " load15
    
    # Get CPU count
    "nproc" | getline cpus
    
    if(load1 > cpus) print "WARNING: 1-minute load exceeds CPU count"
    if(load5 > cpus * 0.8) print "CAUTION: 5-minute load approaching CPU limit"
}'
echo

# Context switches and interrupts
echo "4. System Activity:"
vmstat 1 3 | tail -1 | awk '{print "Context Switches/sec: " $12 ", Interrupts/sec: " $11}'

# Check for high context switches (>10000/sec is high)
cs_rate=$(vmstat 1 3 | tail -1 | awk '{print $12}')
if [[ $cs_rate -gt 10000 ]]; then
    echo "WARNING: High context switch rate detected"
fi

CPU Optimization Strategies

bash

# 1. Process Priority Management
# Lower priority for background tasks
nice -n 19 backup-script.sh
renice 10 -p $(pgrep backup-process)

# Higher priority for critical services
renice -10 -p $(pgrep mysql)

# 2. CPU Affinity for Performance-Critical Applications
# Bind database to specific CPUs
taskset -cp 0,1 $(pgrep mysql)

# 3. Check CPU Governor Settings
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Set performance mode for high-load scenarios
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Memory Performance Analysis

Memory Monitoring Script

bash

#!/bin/bash
# memory-analysis.sh - Comprehensive memory analysis

echo "=== Memory Analysis Report ==="
echo "Date: $(date)"
echo

# Overall memory usage
echo "1. Memory Overview:"
free -h | awk 'NR==2{printf "Used: %s (%.1f%%), Available: %s\n", $3, $3/$2*100, $7}'
echo

# Memory pressure indicators
echo "2. Memory Pressure Indicators:"
echo "Swap Usage:"
free | awk 'NR==3{if($2>0) printf "Swap: %s/%s (%.1f%%)\n", $3, $2, $3/$2*100; else print "No swap configured"}'

echo "Page Faults:"
vmstat 1 3 | tail -1 | awk '{print "Major faults/sec: " $8 ", Minor faults/sec: " $7}'

# Memory-hungry processes
echo "3. Top Memory Consumers:"
ps aux --sort=-%mem | head -10 | awk '{printf "%-15s %-8s %-8s %s\n", $1, $2, $4, $11}'
echo

# Check for memory leaks
echo "4. Memory Leak Detection:"
echo "Checking for processes with unusual memory growth..."
for pid in $(ps -eo pid --no-headers); do
    if [[ -f "/proc/$pid/status" ]]; then
        vmsize=$(grep VmSize /proc/$pid/status 2>/dev/null | awk '{print $2}')
        vmrss=$(grep VmRSS /proc/$pid/status 2>/dev/null | awk '{print $2}')
        cmd=$(ps -p $pid -o comm= 2>/dev/null)
        
        if [[ $vmsize -gt 1000000 ]] && [[ $vmrss -gt 500000 ]]; then
            echo "Large process: PID $pid ($cmd) - VmSize: ${vmsize}kB, VmRSS: ${vmrss}kB"
        fi
    fi
done

Memory Optimization Techniques

bash

# 1. Tune Virtual Memory Settings
echo "# Memory tuning" >> /etc/sysctl.conf
echo "vm.swappiness=10" >> /etc/sysctl.conf          # Reduce swapping
echo "vm.dirty_ratio=5" >> /etc/sysctl.conf          # Reduce dirty page cache
echo "vm.dirty_background_ratio=2" >> /etc/sysctl.conf

# Apply immediately
sysctl -p

# 2. Configure Huge Pages for databases
# Calculate required huge pages (for 8GB database buffer)
echo 4096 > /proc/sys/vm/nr_hugepages

# 3. Monitor slab cache (kernel memory usage)
cat /proc/slabinfo | head -20

Storage I/O Performance Analysis

Advanced I/O Monitoring

bash

#!/bin/bash
# io-analysis.sh - Comprehensive I/O performance analysis

echo "=== Storage I/O Analysis Report ==="
echo "Date: $(date)"
echo

# Current I/O activity
echo "1. Current I/O Activity:"
iostat -x 1 3 | grep -E "(Device|sd|md|dm-)" | tail -20

echo
echo "2. I/O Performance Summary:"
iostat -x 1 3 | awk '
/^Device:/ { next }
/^$/ { next }
{
    device=$1; util=$NF; await=$(NF-3); avgqu=$(NF-4)
    if(util > 80) printf "WARNING: %s utilization: %.1f%%\n", device, util
    if(await > 20) printf "WARNING: %s average wait: %.1fms\n", device, await
    if(avgqu > 2) printf "WARNING: %s queue size: %.1f\n", device, avgqu
}'

echo
echo "3. Top I/O Processes:"
iotop -a -o -d 1 -n 3 2>/dev/null | grep -v "^$" | head -10

echo
echo "4. Filesystem I/O Statistics:"
for mount in $(mount | grep -E 'ext[234]|xfs|btrfs' | awk '{print $3}'); do
    echo "Mount: $mount"
    if command -v iotop >/dev/null; then
        iotop -a -P -d 1 -n 1 2>/dev/null | grep "$mount" | head -5
    fi
done

echo
echo "5. Disk Space and Inode Usage:"
df -h | grep -v tmpfs
echo
df -i | grep -v tmpfs | awk 'NR>1 && $5+0 > 80 {print "WARNING: " $1 " inode usage: " $5}'

I/O Optimization Strategies

bash

# 1. Optimize mount options for different workloads
# Database server (performance over safety)
/dev/sdb1 /var/lib/mysql ext4 defaults,noatime,data=writeback,barrier=0 0 2

# Log server (frequent writes)
/dev/sdc1 /var/log ext4 defaults,noatime,commit=60 0 2

# Read-heavy server (web content)
/dev/sdd1 /var/www ext4 defaults,noatime,data=ordered 0 2

# 2. Configure I/O scheduler based on storage type
# For SSDs - use noop or deadline
echo noop > /sys/block/sda/queue/scheduler

# For HDDs - use cfq for fairness
echo cfq > /sys/block/sdb/queue/scheduler

# 3. Adjust I/O queue depths
echo 32 > /sys/block/sda/queue/nr_requests

# 4. Enable read-ahead for sequential workloads
blockdev --setra 8192 /dev/sda

Network Performance Monitoring

bash

#!/bin/bash
# network-analysis.sh - Network performance analysis

echo "=== Network Performance Analysis ==="
echo "Date: $(date)"
echo

# Network interface statistics
echo "1. Network Interface Statistics:"
sar -n DEV 1 3 | grep -E "(IFACE|eth|ens|wlan)" | tail -10

echo
echo "2. Network Errors and Drops:"
cat /proc/net/dev | awk '
NR>2 {
    iface=$1; gsub(":", "", iface)
    rx_drops=$5; tx_drops=$13; rx_errors=$4; tx_errors=$12
    if(rx_drops > 0 || tx_drops > 0 || rx_errors > 0 || tx_errors > 0)
        printf "%s: RX drops:%d errors:%d, TX drops:%d errors:%d\n", 
               iface, rx_drops, rx_errors, tx_drops, tx_errors
}'

echo
echo "3. Network Connections:"
ss -tuln | wc -l | awk '{print "Total listening ports: " $1}'
ss -tun | wc -l | awk '{print "Total established connections: " $1}'

echo
echo "4. Top Network Processes:"
if command -v nethogs >/dev/null; then
    timeout 5 nethogs -d 1 2>/dev/null | head -10
else
    echo "Install nethogs for per-process network monitoring"
fi

echo
echo "5. Bandwidth Usage:"
vnstat -l -i eth0 2>/dev/null || echo "Install vnstat for bandwidth monitoring"

Real-World Performance Scenarios

Scenario 1: High-Traffic Web Server Optimization

Problem: E-commerce site experiencing slow response times during peak hours.

Investigation Process:

bash

#!/bin/bash
# web-server-analysis.sh

echo "=== Web Server Performance Analysis ==="

# 1. Check current load
echo "Current system load:"
uptime
echo

# 2. Identify bottlenecks
echo "Resource utilization:"
vmstat 1 5 | tail -1 | awk '{
    printf "CPU: User=%s%% System=%s%% I/O Wait=%s%% Idle=%s%%\n", $13, $14, $16, $15
    if($16 > 20) print "WARNING: High I/O wait detected"
    if($15 < 10) print "WARNING: High CPU utilization"
}'

echo
echo "Memory pressure check:"
free | awk 'NR==2{
    used_pct = $3/$2*100
    printf "Memory used: %.1f%%\n", used_pct
    if(used_pct > 90) print "WARNING: High memory usage"
}'

echo
echo "Apache/Nginx connection analysis:"
if pgrep apache2 >/dev/null; then
    echo "Apache processes: $(pgrep apache2 | wc -l)"
    apache2ctl status 2>/dev/null | grep -E "(requests|workers)"
elif pgrep nginx >/dev/null; then
    echo "Nginx processes: $(pgrep nginx | wc -l)"
    nginx -T 2>/dev/null | grep worker_processes
fi

echo
echo "Database connection check:"
if pgrep mysql >/dev/null; then
    mysql -e "SHOW PROCESSLIST" | wc -l | awk '{print "MySQL connections: " $1}'
    mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries'" 2>/dev/null
fi

# 3. Check disk I/O for logs and database
echo
echo "I/O analysis for critical paths:"
iostat -x 1 3 | grep -A20 "Device:" | tail -20 | while read line; do
    if [[ $line =~ ^[a-z] ]]; then
        util=$(echo $line | awk '{print $NF}')
        device=$(echo $line | awk '{print $1}')
        if (( $(echo "$util > 80" | bc -l) )); then
            echo "WARNING: $device utilization at ${util}%"
        fi
    fi
done

Optimization Solutions:

bash

# 1. Web Server Tuning
# Apache optimization
echo "# Apache performance tuning" >> /etc/apache2/conf.d/performance.conf
cat >> /etc/apache2/conf.d/performance.conf << 'EOF'
# Increase worker limits
ServerLimit 16
MaxRequestWorkers 400
ThreadsPerChild 25

# Enable compression
LoadModule deflate_module modules/mod_deflate.so
SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
EOF

# Nginx optimization
cat > /etc/nginx/conf.d/performance.conf << 'EOF'
# Worker processes optimization
worker_processes auto;
worker_connections 1024;

# Gzip compression
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_types text/plain text/css application/json application/javascript;

# Caching
location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
    expires 1y;
    add_header Cache-Control "public, immutable";
}
EOF

# 2. Database Optimization
cat >> /etc/mysql/conf.d/performance.cnf << 'EOF'
[mysqld]
# InnoDB optimizations
innodb_buffer_pool_size = 2G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2

# Query cache
query_cache_type = 1
query_cache_size = 128M

# Connection limits
max_connections = 200
connect_timeout = 10
EOF

# 3. System-level optimizations
cat >> /etc/sysctl.conf << 'EOF'
# Network optimizations
net.core.netdev_max_backlog = 5000
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.ipv4.tcp_rmem = 4096 65536 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# File system optimizations
fs.file-max = 100000
EOF

sysctl -p

Scenario 2: Database Performance Bottleneck

Problem: MySQL database queries becoming increasingly slow.

Investigation and Optimization:

bash

#!/bin/bash
# database-performance-analysis.sh

echo "=== Database Performance Analysis ==="

# 1. MySQL process analysis
echo "MySQL Process Analysis:"
mysql -e "
SELECT 
    COUNT(*) as total_connections,
    SUM(TIME) as total_time,
    AVG(TIME) as avg_time,
    STATE,
    COMMAND
FROM INFORMATION_SCHEMA.PROCESSLIST 
GROUP BY STATE, COMMAND
ORDER BY total_time DESC;" 2>/dev/null

# 2. Slow query analysis
echo
echo "Slow Query Analysis:"
mysql -e "
SELECT 
    query_time,
    lock_time,
    rows_sent,
    rows_examined,
    sql_text
FROM mysql.slow_log 
ORDER BY query_time DESC 
LIMIT 10;" 2>/dev/null

# 3. InnoDB status
echo
echo "InnoDB Buffer Pool Analysis:"
mysql -e "
SHOW ENGINE INNODB STATUS\G" 2>/dev/null | grep -A10 "BUFFER POOL"

# 4. Table lock analysis
echo
echo "Table Lock Analysis:"
mysql -e "SHOW OPEN TABLES WHERE In_use > 0;" 2>/dev/null

# 5. Index analysis
echo
echo "Missing Index Analysis:"
mysql -e "
SELECT 
    t.TABLE_SCHEMA,
    t.TABLE_NAME,
    t.TABLE_ROWS,
    ROUND(((data_length + index_length) / 1024 / 1024), 2) 'Size_MB'
FROM information_schema.TABLES t
ORDER BY (data_length + index_length) DESC
LIMIT 10;" 2>/dev/null

Scenario 3: Memory Leak Detection

Problem: System gradually consuming more memory over time.

bash

#!/bin/bash
# memory-leak-detection.sh

echo "=== Memory Leak Detection ==="

# Create baseline
mkdir -p /var/log/memory-monitoring
LOGFILE="/var/log/memory-monitoring/memory-$(date +%Y%m%d).log"

# Monitor memory over time
while true; do
    timestamp=$(date)
    
    # System memory
    total_mem=$(free | awk 'NR==2{print $2}')
    used_mem=$(free | awk 'NR==2{print $3}')
    free_mem=$(free | awk 'NR==2{print $4}')
    
    # Top memory consumers
    top_processes=$(ps aux --sort=-%mem | head -10 | awk '{print $2 ":" $4 ":" $11}' | tr '\n' '|')
    
    echo "$timestamp,$total_mem,$used_mem,$free_mem,$top_processes" >> "$LOGFILE"
    
    # Analysis: Check for rapid memory growth
    if [[ -f "$LOGFILE" ]] && [[ $(wc -l < "$LOGFILE") -gt 10 ]]; then
        # Check if memory usage increased by more than 5% in last hour
        current_usage=$((used_mem * 100 / total_mem))
        hour_ago_usage=$(tail -60 "$LOGFILE" | head -1 | cut -d',' -f3)
        hour_ago_total=$(tail -60 "$LOGFILE" | head -1 | cut -d',' -f2)
        hour_ago_pct=$((hour_ago_usage * 100 / hour_ago_total))
        
        if [[ $((current_usage - hour_ago_pct)) -gt 5 ]]; then
            echo "ALERT: Memory usage increased by $((current_usage - hour_ago_pct))% in the last hour"
            echo "Current: ${current_usage}%, Hour ago: ${hour_ago_pct}%"
        fi
    fi
    
    sleep 60  # Monitor every minute
done &

echo "Memory monitoring started. Log file: $LOGFILE"
echo "PID: $!"

Performance Optimization Best Practices

Systematic Performance Tuning Approach

1. Establish Baseline:

bash

#!/bin/bash
# create-performance-baseline.sh

BASELINE_DIR="/var/log/performance-baseline/$(date +%Y%m%d)"
mkdir -p "$BASELINE_DIR"

echo "Creating performance baseline..."

# System information
uname -a > "$BASELINE_DIR/system-info.txt"
cat /proc/cpuinfo > "$BASELINE_DIR/cpu-info.txt"
cat /proc/meminfo > "$BASELINE_DIR/memory-info.txt"
lsblk > "$BASELINE_DIR/storage-info.txt"

# Performance metrics
vmstat 1 60 > "$BASELINE_DIR/vmstat-baseline.txt" &
iostat -x 1 60 > "$BASELINE_DIR/iostat-baseline.txt" &
sar -A 1 60 > "$BASELINE_DIR/sar-baseline.txt" &

echo "Baseline collection started. Will run for 60 seconds."
echo "Files saved to: $BASELINE_DIR"

2. Monitor Key Metrics:

bash

# Key thresholds to monitor
cat > /etc/monitoring/performance-thresholds.conf << 'EOF'
# CPU thresholds
CPU_LOAD_WARN=0.8    # 80% of CPU count
CPU_LOAD_CRIT=1.5    # 150% of CPU count
CPU_IOWAIT_WARN=20   # 20% I/O wait
CPU_IOWAIT_CRIT=40   # 40% I/O wait

# Memory thresholds
MEM_USAGE_WARN=80    # 80% memory usage
MEM_USAGE_CRIT=95    # 95% memory usage
SWAP_USAGE_WARN=10   # Any swap usage
SWAP_USAGE_CRIT=50   # 50% swap usage

# I/O thresholds
IO_UTIL_WARN=80      # 80% disk utilization
IO_UTIL_CRIT=95      # 95% disk utilization
IO_AWAIT_WARN=20     # 20ms average wait
IO_AWAIT_CRIT=100    # 100ms average wait

# Network thresholds
NET_ERRORS_WARN=10   # 10 errors per interval
NET_DROPS_WARN=5     # 5 drops per interval
EOF

3. Automated Performance Alerts:

bash

#!/bin/bash
# performance-monitor.sh - Automated monitoring with alerts

source /etc/monitoring/performance-thresholds.conf

ALERT_EMAIL="admin@company.com"
LOG_FILE="/var/log/performance-alerts.log"

check_cpu() {
    load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
    cpu_count=$(nproc)
    load_ratio=$(echo "$load_avg / $cpu_count" | bc -l)
    
    if (( $(echo "$load_ratio > $CPU_LOAD_CRIT" | bc -l) )); then
        alert "CRITICAL: CPU load $load_avg on $cpu_count cores (ratio: $load_ratio)"
    elif (( $(echo "$load_ratio > $CPU_LOAD_WARN" | bc -l) )); then
        alert "WARNING: CPU load $load_avg on $cpu_count cores (ratio: $load_ratio)"
    fi
}

check_memory() {
    mem_usage=$(free | awk 'NR==2{printf "%.0f", $3/$2*100}')
    
    if [[ $mem_usage -gt $MEM_USAGE_CRIT ]]; then
        alert "CRITICAL: Memory usage at ${mem_usage}%"
    elif [[ $mem_usage -gt $MEM_USAGE_WARN ]]; then
        alert "WARNING: Memory usage at ${mem_usage}%"
    fi
    
    # Check swap
    swap_usage=$(free | awk 'NR==3{if($2>0) printf "%.0f", $3/$2*100; else print "0"}')
    if [[ $swap_usage -gt $SWAP_USAGE_CRIT ]]; then
        alert "CRITICAL: Swap usage at ${swap_usage}%"
    elif [[ $swap_usage -gt $SWAP_USAGE_WARN ]]; then
        alert "WARNING: Swap usage at ${swap_usage}%"
    fi
}

check_io() {
    iostat -x 1 3 | awk '/^[a-z]/ && NF>10 {
        if($NF > '$IO_UTIL_CRIT') system("echo CRITICAL: Device " $1 " utilization at " $NF "%")
        else if($NF > '$IO_UTIL_WARN') system("echo WARNING: Device " $1 " utilization at " $NF "%")
        
        if($(NF-3) > '$IO_AWAIT_CRIT') system("echo CRITICAL: Device " $1 " wait time " $(NF-3) "ms")
        else if($(NF-3) > '$IO_AWAIT_WARN') system("echo WARNING: Device " $1 " wait time " $(NF-3) "ms")
    }'
}

alert() {
    message="$1"
    timestamp=$(date)
    echo "[$timestamp] $message" | tee -a "$LOG_FILE"
    echo "$message" | mail -s "Performance Alert - $(hostname)" "$ALERT_EMAIL"
}

# Run checks
check_cpu
check_memory
check_io

# Schedule with cron every 5 minutes:
# */5 * * * * /usr/local/bin/performance-monitor.sh

Advanced Troubleshooting Techniques

Performance Profiling with perf

bash

# Install perf tools
sudo apt-get install linux-tools-generic  # Ubuntu/Debian
sudo yum install perf                      # CentOS/RHEL

# Profile CPU usage for 30 seconds
sudo perf record -a -g sleep 30
sudo perf report

# Find CPU hotspots in specific process
sudo perf top -p $(pgrep mysql)

# Profile system calls
sudo perf trace -p $(pgrep apache2) sleep 10

Using ftrace for Kernel-Level Analysis

bash

# Enable function tracing
echo function > /sys/kernel/debug/tracing/current_tracer

# Trace specific functions
echo 'sys_write' > /sys/kernel/debug/tracing/set_ftrace_filter

# Start tracing
echo 1 > /sys/kernel/debug/tracing/tracing_on

# View trace
cat /sys/kernel/debug/tracing/trace

# Stop tracing
echo 0 > /sys/kernel/debug/tracing/tracing_on

Performance Optimization Checklist

Daily Monitoring

- [ ] Check system load with uptime

[ ] Monitor disk space with df -h

[ ] Review memory usage with free -h

[ ] Check for errors in /var/log/messages

Weekly Analysis

- [ ] Generate sar reports for trend analysis

[ ] Review slow query logs for databases

[ ] Analyze I/O patterns with iostat

[ ] Check for memory leaks in long-running processes

Monthly Optimization

- [ ] Review and update performance baselines

[ ] Analyze capacity planning requirements

[ ] Update system tuning parameters

[ ] Performance test critical applications

Conclusion: Building High-Performance Systems

We've explored the comprehensive world of Linux performance monitoring and optimization, from fundamental tools like vmstat, iostat, and sar to advanced profiling techniques and real-world troubleshooting scenarios.

Your Performance Mastery Path

Week 1: Foundation

Master vmstat, iostat, and sar basics

Establish performance baselines for your systems

Set up basic monitoring and alerting

Month 1: Advanced Analysis

Implement comprehensive monitoring scripts

Learn to correlate metrics across different subsystems

Practice troubleshooting common performance issues

Month 3: Optimization Expert

Master system tuning parameters

Implement automated performance optimization

Build capacity planning processes

Performance Principles to Remember

🎯 Measure First: Never optimize without baseline measurements 📊 Think Holistically: Performance issues often span multiple subsystems 🔄 Continuous Monitoring: Performance is not a one-time configuration ⚡ Automate Everything: Manual monitoring doesn't scale 🎛️ Tune Incrementally: Make one change at a time and measure impact

Final Thoughts

Performance optimization is both art and science. The tools we've covered give you the scientific measurement capabilities, but the art comes from experience—understanding how different workloads behave, recognizing patterns in metrics, and knowing which optimizations provide the biggest impact.

Remember: A well-tuned system isn't just faster—it's more reliable, costs less to operate, and provides better user experience. Every optimization you make compounds over time, creating systems that scale gracefully and perform consistently under pressure.

Start with measurement, optimize systematically, and never stop learning. Your users (and your 3 AM self) will thank you for building systems that perform beautifully under any load.

---

🚀 Complete Your Linux Journey

This is Part 19 of our comprehensive Linux mastery series - the final piece of your Linux expertise!

Previous: Storage Management: LVM, RAID & Optimization - Master flexible storage systems

🎉 Congratulations! You've Completed the Linux Mastery Series

📚 Your Complete Linux Journey

Beginner Foundation (Parts 1-5):

Part 1: Linux Introduction

Part 2: Terminal Commands

Part 3: File System Structure

Part 4: File Management

Part 5: Permissions & Security

Intermediate Skills (Parts 6-11):

Part 6: Text Processing

Part 7: Package Management

Part 8: User & Group Management

Part 9: Process Management

Part 10: Environment Variables

Part 11: Automation with Cron

Advanced Mastery (Parts 12-19):

Part 12: System Logs Analysis

Part 13: Network Configuration

Part 14: SSH Mastery

Part 15: Service Management

Part 16: Advanced Shell Scripting

Part 17: Firewall Security

Part 18: Storage Management

Part 19: Performance Optimization ← You are here

🎯 What's Next?

You now have comprehensive Linux expertise! Consider specializing in:

DevOps & Automation: Kubernetes, Docker, CI/CD

Security: Penetration testing, hardening, compliance

Cloud: AWS, Azure, GCP administration

Development: System programming, kernel development

---

How has this performance monitoring guide helped you understand system optimization? What performance challenges are you currently facing? Share your experiences—performance expertise grows through shared knowledge and collaborative problem-solving.

Linux Performance Monitoring and Optimization: sar, iostat, vmstat and Beyond

Linux Performance Monitoring and Optimization: Master System Performance

Understanding Linux Performance: The Four Pillars

The Performance Resource Model

Performance Methodology: USE Method

The Essential Performance Monitoring Trinity

1. vmstat: Virtual Memory Statistics

Basic vmstat Usage

Understanding vmstat Output

Real-World vmstat Analysis

2. iostat: I/O Statistics

Basic iostat Usage

Understanding iostat Output

Performance Thresholds

3. sar: System Activity Reporter

Essential sar Commands

Historical Analysis with sar

Advanced Performance Analysis Techniques

CPU Performance Deep Dive

Identifying CPU Bottlenecks

CPU Optimization Strategies

Memory Performance Analysis

Memory Monitoring Script

Memory Optimization Techniques

Storage I/O Performance Analysis

Advanced I/O Monitoring

I/O Optimization Strategies

Network Performance Monitoring

Real-World Performance Scenarios

Scenario 1: High-Traffic Web Server Optimization

Scenario 2: Database Performance Bottleneck

Scenario 3: Memory Leak Detection

Performance Optimization Best Practices

Systematic Performance Tuning Approach

Advanced Troubleshooting Techniques

Performance Profiling with perf

Using ftrace for Kernel-Level Analysis

Performance Optimization Checklist

Daily Monitoring

Weekly Analysis

Monthly Optimization

Conclusion: Building High-Performance Systems

Your Performance Mastery Path

Performance Principles to Remember

Final Thoughts

🚀 Complete Your Linux Journey

🎉 Congratulations! You've Completed the Linux Mastery Series

📚 Your Complete Linux Journey

🎯 What's Next?

📑 Table of Contents