Linux Performance Monitoring and Optimization: sar, iostat, vmstat and Beyond

Linux Performance Monitoring and Optimization: sar, iostat, vmstat and Beyond
min read

Linux Performance Monitoring and Optimization: Master System Performance

It's Monday morning, 9 AM. Your e-commerce website is crawling, database queries are timing out, and customer complaints are flooding in. Revenue is dropping by the minute, and everyone's looking at you to fix it. You have tools, you have access, but where do you start? How do you quickly identify whether it's CPU, memory, disk I/O, or network that's causing the bottleneck?

This scenario plays out in IT departments worldwide every day. As someone who has debugged performance issues ranging from overloaded web servers to misconfigured databases, I can tell you that the difference between a good system administrator and a great one isn't knowing every performance tool—it's knowing which tool to use when, and how to interpret what it's telling you.

Today, we'll master the art and science of Linux performance monitoring and optimization. We'll explore the power trio of sar, iostat, and vmstat, along with advanced techniques for identifying and resolving performance bottlenecks before they become business-critical issues.

Understanding Linux Performance: The Four Pillars

The Performance Resource Model

Every performance issue falls into one of four categories:

plaintext
System Performance Resources:

CPU      Memory     Storage     Network
 ↓         ↓          ↓          ↓
Usage    Pressure   I/O Wait   Bandwidth
Queues   Swapping   Latency    Packet Loss
Context  Leaks      IOPS       Congestion

Performance is a chain—the weakest link determines overall system performance.

Performance Methodology: USE Method

Utilization: How busy is the resource? Saturation: Is there more work than the resource can handle? Errors: Are there any error conditions?

This methodology helps systematically analyze each resource type.

The Essential Performance Monitoring Trinity

1. vmstat: Virtual Memory Statistics

vmstat provides a snapshot of system activity including processes, memory, paging, block I/O, and CPU usage.

Basic vmstat Usage

bash
# Display current system status
vmstat

# Monitor every 2 seconds, 10 times
vmstat 2 10

# Display in MB instead of KB
vmstat -S M 2 5

Understanding vmstat Output

plaintext
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 1843200  80532 947012    0    0     5    23   45   78  3  1 96  0  0

Key Metrics Explained:

Processes (procs):

  • r: Runnable processes (higher than CPU count = CPU pressure)
  • b: Blocked processes waiting for I/O

    Memory:

  • free: Available memory
  • buff: Buffers (metadata cache)
  • cache: Page cache (file data cache)
  • si/so: Swap in/out (should be 0 for good performance)

    I/O:

  • bi/bo: Blocks in/out per second
  • High values indicate disk bottleneck

    System:

  • in: Interrupts per second
  • cs: Context switches per second (high values = CPU thrashing)

    CPU:

  • us: User time
  • sy: System time
  • id: Idle time
  • wa: I/O wait (high = storage bottleneck)

  • Real-World vmstat Analysis

    Scenario 1: Memory Pressure

    bash
    $ vmstat 1 5
    procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     4  2  45032   3421  15234  89234  123   89    45    67  892 1543 15  8 65 12  0

    Analysis:

  • High si/so (swap activity) = memory pressure
  • r=4 on dual-core system = CPU pressure
  • wa=12% = some I/O waiting

  • 2. iostat: I/O Statistics

    iostat monitors disk I/O performance and provides detailed statistics about storage subsystem performance.

    Basic iostat Usage

    bash
    # Display I/O statistics
    iostat
    
    # Monitor every 2 seconds
    iostat 2
    
    # Extended statistics with more details
    iostat -x 1 5
    
    # Monitor specific devices
    iostat -x /dev/sda /dev/sdb 2

    Understanding iostat Output

    bash
    $ iostat -x 1 5
    Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda               0.12     1.85    2.45    8.67    45.23   156.89    36.32     0.08    7.23    5.67    7.89   2.45   2.8

    Critical Metrics:

    - r/s, w/s: Read/write requests per second

  • rkB/s, wkB/s: Read/write KB per second
  • avgqu-sz: Average queue size (high = bottleneck)
  • await: Average wait time (milliseconds)
  • svctm: Service time (milliseconds)
  • %util: Device utilization percentage

  • Performance Thresholds

    bash
    # Good Performance Indicators:
    # %util < 80%          # Device not saturated
    # await < 10ms         # Low latency
    # avgqu-sz < 2         # No queuing pressure
    
    # Warning Signs:
    # %util > 90%          # Device saturated
    # await > 50ms         # High latency
    # avgqu-sz > 5         # Queue pressure

    3. sar: System Activity Reporter

    sar is the most comprehensive performance monitoring tool, capable of collecting and reporting virtually every system metric.

    Essential sar Commands

    bash
    # CPU utilization
    sar -u 1 10
    
    # Memory usage
    sar -r 1 10
    
    # I/O statistics
    sar -b 1 10
    
    # Network statistics
    sar -n DEV 1 10
    
    # Load average
    sar -q 1 10
    
    # All statistics
    sar -A 1 5

    Historical Analysis with sar

    bash
    # View yesterday's data
    sar -u -f /var/log/sysstat/saXX
    
    # Specific time range
    sar -u -s 09:00:00 -e 17:00:00
    
    # Generate daily report
    sar -A > daily-performance-report.txt

    Advanced Performance Analysis Techniques

    CPU Performance Deep Dive

    Identifying CPU Bottlenecks

    bash
    #!/bin/bash
    # cpu-analysis.sh - Comprehensive CPU analysis
    
    echo "=== CPU Analysis Report ==="
    echo "Date: $(date)"
    echo
    
    # Current CPU usage
    echo "1. Current CPU Usage:"
    vmstat 1 5 | tail -1 | awk '{print "User: " $13 "%, System: " $14 "%, Idle: " $15 "%, I/O Wait: " $16 "%"}'
    echo
    
    # Top CPU consuming processes
    echo "2. Top CPU Consumers:"
    ps aux --sort=-%cpu | head -10 | awk '{printf "%-15s %-8s %-8s %s\n", $1, $2, $3, $11}'
    echo
    
    # Load average analysis
    echo "3. Load Average Analysis:"
    uptime | awk -F'load average:' '{print $2}' | awk '{
        load1=$1; load5=$2; load15=$3;
        gsub(",", "", load1); gsub(",", "", load5); gsub(",", "", load15);
        print "1min: " load1 ", 5min: " load5 ", 15min: " load15
        
        # Get CPU count
        "nproc" | getline cpus
        
        if(load1 > cpus) print "WARNING: 1-minute load exceeds CPU count"
        if(load5 > cpus * 0.8) print "CAUTION: 5-minute load approaching CPU limit"
    }'
    echo
    
    # Context switches and interrupts
    echo "4. System Activity:"
    vmstat 1 3 | tail -1 | awk '{print "Context Switches/sec: " $12 ", Interrupts/sec: " $11}'
    
    # Check for high context switches (>10000/sec is high)
    cs_rate=$(vmstat 1 3 | tail -1 | awk '{print $12}')
    if [[ $cs_rate -gt 10000 ]]; then
        echo "WARNING: High context switch rate detected"
    fi

    CPU Optimization Strategies

    bash
    # 1. Process Priority Management
    # Lower priority for background tasks
    nice -n 19 backup-script.sh
    renice 10 -p $(pgrep backup-process)
    
    # Higher priority for critical services
    renice -10 -p $(pgrep mysql)
    
    # 2. CPU Affinity for Performance-Critical Applications
    # Bind database to specific CPUs
    taskset -cp 0,1 $(pgrep mysql)
    
    # 3. Check CPU Governor Settings
    cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
    # Set performance mode for high-load scenarios
    echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

    Memory Performance Analysis

    Memory Monitoring Script

    bash
    #!/bin/bash
    # memory-analysis.sh - Comprehensive memory analysis
    
    echo "=== Memory Analysis Report ==="
    echo "Date: $(date)"
    echo
    
    # Overall memory usage
    echo "1. Memory Overview:"
    free -h | awk 'NR==2{printf "Used: %s (%.1f%%), Available: %s\n", $3, $3/$2*100, $7}'
    echo
    
    # Memory pressure indicators
    echo "2. Memory Pressure Indicators:"
    echo "Swap Usage:"
    free | awk 'NR==3{if($2>0) printf "Swap: %s/%s (%.1f%%)\n", $3, $2, $3/$2*100; else print "No swap configured"}'
    
    echo "Page Faults:"
    vmstat 1 3 | tail -1 | awk '{print "Major faults/sec: " $8 ", Minor faults/sec: " $7}'
    
    # Memory-hungry processes
    echo "3. Top Memory Consumers:"
    ps aux --sort=-%mem | head -10 | awk '{printf "%-15s %-8s %-8s %s\n", $1, $2, $4, $11}'
    echo
    
    # Check for memory leaks
    echo "4. Memory Leak Detection:"
    echo "Checking for processes with unusual memory growth..."
    for pid in $(ps -eo pid --no-headers); do
        if [[ -f "/proc/$pid/status" ]]; then
            vmsize=$(grep VmSize /proc/$pid/status 2>/dev/null | awk '{print $2}')
            vmrss=$(grep VmRSS /proc/$pid/status 2>/dev/null | awk '{print $2}')
            cmd=$(ps -p $pid -o comm= 2>/dev/null)
            
            if [[ $vmsize -gt 1000000 ]] && [[ $vmrss -gt 500000 ]]; then
                echo "Large process: PID $pid ($cmd) - VmSize: ${vmsize}kB, VmRSS: ${vmrss}kB"
            fi
        fi
    done

    Memory Optimization Techniques

    bash
    # 1. Tune Virtual Memory Settings
    echo "# Memory tuning" >> /etc/sysctl.conf
    echo "vm.swappiness=10" >> /etc/sysctl.conf          # Reduce swapping
    echo "vm.dirty_ratio=5" >> /etc/sysctl.conf          # Reduce dirty page cache
    echo "vm.dirty_background_ratio=2" >> /etc/sysctl.conf
    
    # Apply immediately
    sysctl -p
    
    # 2. Configure Huge Pages for databases
    # Calculate required huge pages (for 8GB database buffer)
    echo 4096 > /proc/sys/vm/nr_hugepages
    
    # 3. Monitor slab cache (kernel memory usage)
    cat /proc/slabinfo | head -20

    Storage I/O Performance Analysis

    Advanced I/O Monitoring

    bash
    #!/bin/bash
    # io-analysis.sh - Comprehensive I/O performance analysis
    
    echo "=== Storage I/O Analysis Report ==="
    echo "Date: $(date)"
    echo
    
    # Current I/O activity
    echo "1. Current I/O Activity:"
    iostat -x 1 3 | grep -E "(Device|sd|md|dm-)" | tail -20
    
    echo
    echo "2. I/O Performance Summary:"
    iostat -x 1 3 | awk '
    /^Device:/ { next }
    /^$/ { next }
    {
        device=$1; util=$NF; await=$(NF-3); avgqu=$(NF-4)
        if(util > 80) printf "WARNING: %s utilization: %.1f%%\n", device, util
        if(await > 20) printf "WARNING: %s average wait: %.1fms\n", device, await
        if(avgqu > 2) printf "WARNING: %s queue size: %.1f\n", device, avgqu
    }'
    
    echo
    echo "3. Top I/O Processes:"
    iotop -a -o -d 1 -n 3 2>/dev/null | grep -v "^$" | head -10
    
    echo
    echo "4. Filesystem I/O Statistics:"
    for mount in $(mount | grep -E 'ext[234]|xfs|btrfs' | awk '{print $3}'); do
        echo "Mount: $mount"
        if command -v iotop >/dev/null; then
            iotop -a -P -d 1 -n 1 2>/dev/null | grep "$mount" | head -5
        fi
    done
    
    echo
    echo "5. Disk Space and Inode Usage:"
    df -h | grep -v tmpfs
    echo
    df -i | grep -v tmpfs | awk 'NR>1 && $5+0 > 80 {print "WARNING: " $1 " inode usage: " $5}'

    I/O Optimization Strategies

    bash
    # 1. Optimize mount options for different workloads
    # Database server (performance over safety)
    /dev/sdb1 /var/lib/mysql ext4 defaults,noatime,data=writeback,barrier=0 0 2
    
    # Log server (frequent writes)
    /dev/sdc1 /var/log ext4 defaults,noatime,commit=60 0 2
    
    # Read-heavy server (web content)
    /dev/sdd1 /var/www ext4 defaults,noatime,data=ordered 0 2
    
    # 2. Configure I/O scheduler based on storage type
    # For SSDs - use noop or deadline
    echo noop > /sys/block/sda/queue/scheduler
    
    # For HDDs - use cfq for fairness
    echo cfq > /sys/block/sdb/queue/scheduler
    
    # 3. Adjust I/O queue depths
    echo 32 > /sys/block/sda/queue/nr_requests
    
    # 4. Enable read-ahead for sequential workloads
    blockdev --setra 8192 /dev/sda

    Network Performance Monitoring

    bash
    #!/bin/bash
    # network-analysis.sh - Network performance analysis
    
    echo "=== Network Performance Analysis ==="
    echo "Date: $(date)"
    echo
    
    # Network interface statistics
    echo "1. Network Interface Statistics:"
    sar -n DEV 1 3 | grep -E "(IFACE|eth|ens|wlan)" | tail -10
    
    echo
    echo "2. Network Errors and Drops:"
    cat /proc/net/dev | awk '
    NR>2 {
        iface=$1; gsub(":", "", iface)
        rx_drops=$5; tx_drops=$13; rx_errors=$4; tx_errors=$12
        if(rx_drops > 0 || tx_drops > 0 || rx_errors > 0 || tx_errors > 0)
            printf "%s: RX drops:%d errors:%d, TX drops:%d errors:%d\n", 
                   iface, rx_drops, rx_errors, tx_drops, tx_errors
    }'
    
    echo
    echo "3. Network Connections:"
    ss -tuln | wc -l | awk '{print "Total listening ports: " $1}'
    ss -tun | wc -l | awk '{print "Total established connections: " $1}'
    
    echo
    echo "4. Top Network Processes:"
    if command -v nethogs >/dev/null; then
        timeout 5 nethogs -d 1 2>/dev/null | head -10
    else
        echo "Install nethogs for per-process network monitoring"
    fi
    
    echo
    echo "5. Bandwidth Usage:"
    vnstat -l -i eth0 2>/dev/null || echo "Install vnstat for bandwidth monitoring"

    Real-World Performance Scenarios

    Scenario 1: High-Traffic Web Server Optimization

    Problem: E-commerce site experiencing slow response times during peak hours.

    Investigation Process:

    bash
    #!/bin/bash
    # web-server-analysis.sh
    
    echo "=== Web Server Performance Analysis ==="
    
    # 1. Check current load
    echo "Current system load:"
    uptime
    echo
    
    # 2. Identify bottlenecks
    echo "Resource utilization:"
    vmstat 1 5 | tail -1 | awk '{
        printf "CPU: User=%s%% System=%s%% I/O Wait=%s%% Idle=%s%%\n", $13, $14, $16, $15
        if($16 > 20) print "WARNING: High I/O wait detected"
        if($15 < 10) print "WARNING: High CPU utilization"
    }'
    
    echo
    echo "Memory pressure check:"
    free | awk 'NR==2{
        used_pct = $3/$2*100
        printf "Memory used: %.1f%%\n", used_pct
        if(used_pct > 90) print "WARNING: High memory usage"
    }'
    
    echo
    echo "Apache/Nginx connection analysis:"
    if pgrep apache2 >/dev/null; then
        echo "Apache processes: $(pgrep apache2 | wc -l)"
        apache2ctl status 2>/dev/null | grep -E "(requests|workers)"
    elif pgrep nginx >/dev/null; then
        echo "Nginx processes: $(pgrep nginx | wc -l)"
        nginx -T 2>/dev/null | grep worker_processes
    fi
    
    echo
    echo "Database connection check:"
    if pgrep mysql >/dev/null; then
        mysql -e "SHOW PROCESSLIST" | wc -l | awk '{print "MySQL connections: " $1}'
        mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries'" 2>/dev/null
    fi
    
    # 3. Check disk I/O for logs and database
    echo
    echo "I/O analysis for critical paths:"
    iostat -x 1 3 | grep -A20 "Device:" | tail -20 | while read line; do
        if [[ $line =~ ^[a-z] ]]; then
            util=$(echo $line | awk '{print $NF}')
            device=$(echo $line | awk '{print $1}')
            if (( $(echo "$util > 80" | bc -l) )); then
                echo "WARNING: $device utilization at ${util}%"
            fi
        fi
    done

    Optimization Solutions:

    bash
    # 1. Web Server Tuning
    # Apache optimization
    echo "# Apache performance tuning" >> /etc/apache2/conf.d/performance.conf
    cat >> /etc/apache2/conf.d/performance.conf << 'EOF'
    # Increase worker limits
    ServerLimit 16
    MaxRequestWorkers 400
    ThreadsPerChild 25
    
    # Enable compression
    LoadModule deflate_module modules/mod_deflate.so
    SetOutputFilter DEFLATE
    SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
    SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
    EOF
    
    # Nginx optimization
    cat > /etc/nginx/conf.d/performance.conf << 'EOF'
    # Worker processes optimization
    worker_processes auto;
    worker_connections 1024;
    
    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_min_length 1024;
    gzip_types text/plain text/css application/json application/javascript;
    
    # Caching
    location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
        expires 1y;
        add_header Cache-Control "public, immutable";
    }
    EOF
    
    # 2. Database Optimization
    cat >> /etc/mysql/conf.d/performance.cnf << 'EOF'
    [mysqld]
    # InnoDB optimizations
    innodb_buffer_pool_size = 2G
    innodb_log_file_size = 256M
    innodb_flush_log_at_trx_commit = 2
    
    # Query cache
    query_cache_type = 1
    query_cache_size = 128M
    
    # Connection limits
    max_connections = 200
    connect_timeout = 10
    EOF
    
    # 3. System-level optimizations
    cat >> /etc/sysctl.conf << 'EOF'
    # Network optimizations
    net.core.netdev_max_backlog = 5000
    net.core.rmem_default = 262144
    net.core.wmem_default = 262144
    net.ipv4.tcp_rmem = 4096 65536 16777216
    net.ipv4.tcp_wmem = 4096 65536 16777216
    
    # File system optimizations
    fs.file-max = 100000
    EOF
    
    sysctl -p

    Scenario 2: Database Performance Bottleneck

    Problem: MySQL database queries becoming increasingly slow.

    Investigation and Optimization:

    bash
    #!/bin/bash
    # database-performance-analysis.sh
    
    echo "=== Database Performance Analysis ==="
    
    # 1. MySQL process analysis
    echo "MySQL Process Analysis:"
    mysql -e "
    SELECT 
        COUNT(*) as total_connections,
        SUM(TIME) as total_time,
        AVG(TIME) as avg_time,
        STATE,
        COMMAND
    FROM INFORMATION_SCHEMA.PROCESSLIST 
    GROUP BY STATE, COMMAND
    ORDER BY total_time DESC;" 2>/dev/null
    
    # 2. Slow query analysis
    echo
    echo "Slow Query Analysis:"
    mysql -e "
    SELECT 
        query_time,
        lock_time,
        rows_sent,
        rows_examined,
        sql_text
    FROM mysql.slow_log 
    ORDER BY query_time DESC 
    LIMIT 10;" 2>/dev/null
    
    # 3. InnoDB status
    echo
    echo "InnoDB Buffer Pool Analysis:"
    mysql -e "
    SHOW ENGINE INNODB STATUS\G" 2>/dev/null | grep -A10 "BUFFER POOL"
    
    # 4. Table lock analysis
    echo
    echo "Table Lock Analysis:"
    mysql -e "SHOW OPEN TABLES WHERE In_use > 0;" 2>/dev/null
    
    # 5. Index analysis
    echo
    echo "Missing Index Analysis:"
    mysql -e "
    SELECT 
        t.TABLE_SCHEMA,
        t.TABLE_NAME,
        t.TABLE_ROWS,
        ROUND(((data_length + index_length) / 1024 / 1024), 2) 'Size_MB'
    FROM information_schema.TABLES t
    ORDER BY (data_length + index_length) DESC
    LIMIT 10;" 2>/dev/null

    Scenario 3: Memory Leak Detection

    Problem: System gradually consuming more memory over time.

    bash
    #!/bin/bash
    # memory-leak-detection.sh
    
    echo "=== Memory Leak Detection ==="
    
    # Create baseline
    mkdir -p /var/log/memory-monitoring
    LOGFILE="/var/log/memory-monitoring/memory-$(date +%Y%m%d).log"
    
    # Monitor memory over time
    while true; do
        timestamp=$(date)
        
        # System memory
        total_mem=$(free | awk 'NR==2{print $2}')
        used_mem=$(free | awk 'NR==2{print $3}')
        free_mem=$(free | awk 'NR==2{print $4}')
        
        # Top memory consumers
        top_processes=$(ps aux --sort=-%mem | head -10 | awk '{print $2 ":" $4 ":" $11}' | tr '\n' '|')
        
        echo "$timestamp,$total_mem,$used_mem,$free_mem,$top_processes" >> "$LOGFILE"
        
        # Analysis: Check for rapid memory growth
        if [[ -f "$LOGFILE" ]] && [[ $(wc -l < "$LOGFILE") -gt 10 ]]; then
            # Check if memory usage increased by more than 5% in last hour
            current_usage=$((used_mem * 100 / total_mem))
            hour_ago_usage=$(tail -60 "$LOGFILE" | head -1 | cut -d',' -f3)
            hour_ago_total=$(tail -60 "$LOGFILE" | head -1 | cut -d',' -f2)
            hour_ago_pct=$((hour_ago_usage * 100 / hour_ago_total))
            
            if [[ $((current_usage - hour_ago_pct)) -gt 5 ]]; then
                echo "ALERT: Memory usage increased by $((current_usage - hour_ago_pct))% in the last hour"
                echo "Current: ${current_usage}%, Hour ago: ${hour_ago_pct}%"
            fi
        fi
        
        sleep 60  # Monitor every minute
    done &
    
    echo "Memory monitoring started. Log file: $LOGFILE"
    echo "PID: $!"

    Performance Optimization Best Practices

    Systematic Performance Tuning Approach

    1. Establish Baseline:

    bash
    #!/bin/bash
    # create-performance-baseline.sh
    
    BASELINE_DIR="/var/log/performance-baseline/$(date +%Y%m%d)"
    mkdir -p "$BASELINE_DIR"
    
    echo "Creating performance baseline..."
    
    # System information
    uname -a > "$BASELINE_DIR/system-info.txt"
    cat /proc/cpuinfo > "$BASELINE_DIR/cpu-info.txt"
    cat /proc/meminfo > "$BASELINE_DIR/memory-info.txt"
    lsblk > "$BASELINE_DIR/storage-info.txt"
    
    # Performance metrics
    vmstat 1 60 > "$BASELINE_DIR/vmstat-baseline.txt" &
    iostat -x 1 60 > "$BASELINE_DIR/iostat-baseline.txt" &
    sar -A 1 60 > "$BASELINE_DIR/sar-baseline.txt" &
    
    echo "Baseline collection started. Will run for 60 seconds."
    echo "Files saved to: $BASELINE_DIR"

    2. Monitor Key Metrics:

    bash
    # Key thresholds to monitor
    cat > /etc/monitoring/performance-thresholds.conf << 'EOF'
    # CPU thresholds
    CPU_LOAD_WARN=0.8    # 80% of CPU count
    CPU_LOAD_CRIT=1.5    # 150% of CPU count
    CPU_IOWAIT_WARN=20   # 20% I/O wait
    CPU_IOWAIT_CRIT=40   # 40% I/O wait
    
    # Memory thresholds
    MEM_USAGE_WARN=80    # 80% memory usage
    MEM_USAGE_CRIT=95    # 95% memory usage
    SWAP_USAGE_WARN=10   # Any swap usage
    SWAP_USAGE_CRIT=50   # 50% swap usage
    
    # I/O thresholds
    IO_UTIL_WARN=80      # 80% disk utilization
    IO_UTIL_CRIT=95      # 95% disk utilization
    IO_AWAIT_WARN=20     # 20ms average wait
    IO_AWAIT_CRIT=100    # 100ms average wait
    
    # Network thresholds
    NET_ERRORS_WARN=10   # 10 errors per interval
    NET_DROPS_WARN=5     # 5 drops per interval
    EOF

    3. Automated Performance Alerts:

    bash
    #!/bin/bash
    # performance-monitor.sh - Automated monitoring with alerts
    
    source /etc/monitoring/performance-thresholds.conf
    
    ALERT_EMAIL="admin@company.com"
    LOG_FILE="/var/log/performance-alerts.log"
    
    check_cpu() {
        load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
        cpu_count=$(nproc)
        load_ratio=$(echo "$load_avg / $cpu_count" | bc -l)
        
        if (( $(echo "$load_ratio > $CPU_LOAD_CRIT" | bc -l) )); then
            alert "CRITICAL: CPU load $load_avg on $cpu_count cores (ratio: $load_ratio)"
        elif (( $(echo "$load_ratio > $CPU_LOAD_WARN" | bc -l) )); then
            alert "WARNING: CPU load $load_avg on $cpu_count cores (ratio: $load_ratio)"
        fi
    }
    
    check_memory() {
        mem_usage=$(free | awk 'NR==2{printf "%.0f", $3/$2*100}')
        
        if [[ $mem_usage -gt $MEM_USAGE_CRIT ]]; then
            alert "CRITICAL: Memory usage at ${mem_usage}%"
        elif [[ $mem_usage -gt $MEM_USAGE_WARN ]]; then
            alert "WARNING: Memory usage at ${mem_usage}%"
        fi
        
        # Check swap
        swap_usage=$(free | awk 'NR==3{if($2>0) printf "%.0f", $3/$2*100; else print "0"}')
        if [[ $swap_usage -gt $SWAP_USAGE_CRIT ]]; then
            alert "CRITICAL: Swap usage at ${swap_usage}%"
        elif [[ $swap_usage -gt $SWAP_USAGE_WARN ]]; then
            alert "WARNING: Swap usage at ${swap_usage}%"
        fi
    }
    
    check_io() {
        iostat -x 1 3 | awk '/^[a-z]/ && NF>10 {
            if($NF > '$IO_UTIL_CRIT') system("echo CRITICAL: Device " $1 " utilization at " $NF "%")
            else if($NF > '$IO_UTIL_WARN') system("echo WARNING: Device " $1 " utilization at " $NF "%")
            
            if($(NF-3) > '$IO_AWAIT_CRIT') system("echo CRITICAL: Device " $1 " wait time " $(NF-3) "ms")
            else if($(NF-3) > '$IO_AWAIT_WARN') system("echo WARNING: Device " $1 " wait time " $(NF-3) "ms")
        }'
    }
    
    alert() {
        message="$1"
        timestamp=$(date)
        echo "[$timestamp] $message" | tee -a "$LOG_FILE"
        echo "$message" | mail -s "Performance Alert - $(hostname)" "$ALERT_EMAIL"
    }
    
    # Run checks
    check_cpu
    check_memory
    check_io
    
    # Schedule with cron every 5 minutes:
    # */5 * * * * /usr/local/bin/performance-monitor.sh

    Advanced Troubleshooting Techniques

    Performance Profiling with perf

    bash
    # Install perf tools
    sudo apt-get install linux-tools-generic  # Ubuntu/Debian
    sudo yum install perf                      # CentOS/RHEL
    
    # Profile CPU usage for 30 seconds
    sudo perf record -a -g sleep 30
    sudo perf report
    
    # Find CPU hotspots in specific process
    sudo perf top -p $(pgrep mysql)
    
    # Profile system calls
    sudo perf trace -p $(pgrep apache2) sleep 10

    Using ftrace for Kernel-Level Analysis

    bash
    # Enable function tracing
    echo function > /sys/kernel/debug/tracing/current_tracer
    
    # Trace specific functions
    echo 'sys_write' > /sys/kernel/debug/tracing/set_ftrace_filter
    
    # Start tracing
    echo 1 > /sys/kernel/debug/tracing/tracing_on
    
    # View trace
    cat /sys/kernel/debug/tracing/trace
    
    # Stop tracing
    echo 0 > /sys/kernel/debug/tracing/tracing_on

    Performance Optimization Checklist

    Daily Monitoring

    - [ ] Check system load with uptime

  • [ ] Monitor disk space with df -h
  • [ ] Review memory usage with free -h
  • [ ] Check for errors in /var/log/messages

  • Weekly Analysis

    - [ ] Generate sar reports for trend analysis

  • [ ] Review slow query logs for databases
  • [ ] Analyze I/O patterns with iostat
  • [ ] Check for memory leaks in long-running processes

  • Monthly Optimization

    - [ ] Review and update performance baselines

  • [ ] Analyze capacity planning requirements
  • [ ] Update system tuning parameters
  • [ ] Performance test critical applications

  • Conclusion: Building High-Performance Systems

    We've explored the comprehensive world of Linux performance monitoring and optimization, from fundamental tools like vmstat, iostat, and sar to advanced profiling techniques and real-world troubleshooting scenarios.

    Your Performance Mastery Path

    Week 1: Foundation

  • Master vmstat, iostat, and sar basics
  • Establish performance baselines for your systems
  • Set up basic monitoring and alerting

    Month 1: Advanced Analysis

  • Implement comprehensive monitoring scripts
  • Learn to correlate metrics across different subsystems
  • Practice troubleshooting common performance issues

    Month 3: Optimization Expert

  • Master system tuning parameters
  • Implement automated performance optimization
  • Build capacity planning processes

  • Performance Principles to Remember

    🎯 Measure First: Never optimize without baseline measurements 📊 Think Holistically: Performance issues often span multiple subsystems 🔄 Continuous Monitoring: Performance is not a one-time configuration ⚡ Automate Everything: Manual monitoring doesn't scale 🎛️ Tune Incrementally: Make one change at a time and measure impact

    Final Thoughts

    Performance optimization is both art and science. The tools we've covered give you the scientific measurement capabilities, but the art comes from experience—understanding how different workloads behave, recognizing patterns in metrics, and knowing which optimizations provide the biggest impact.

    Remember: A well-tuned system isn't just faster—it's more reliable, costs less to operate, and provides better user experience. Every optimization you make compounds over time, creating systems that scale gracefully and perform consistently under pressure.

    Start with measurement, optimize systematically, and never stop learning. Your users (and your 3 AM self) will thank you for building systems that perform beautifully under any load.

    ---

    🚀 Complete Your Linux Journey

    This is Part 19 of our comprehensive Linux mastery series - the final piece of your Linux expertise!

    Previous: Storage Management: LVM, RAID & Optimization - Master flexible storage systems

    🎉 Congratulations! You've Completed the Linux Mastery Series

    📚 Your Complete Linux Journey

    Beginner Foundation (Parts 1-5):

  • Part 1: Linux Introduction
  • Part 2: Terminal Commands
  • Part 3: File System Structure
  • Part 4: File Management
  • Part 5: Permissions & Security

    Intermediate Skills (Parts 6-11):

  • Part 6: Text Processing
  • Part 7: Package Management
  • Part 8: User & Group Management
  • Part 9: Process Management
  • Part 10: Environment Variables
  • Part 11: Automation with Cron

    Advanced Mastery (Parts 12-19):

  • Part 12: System Logs Analysis
  • Part 13: Network Configuration
  • Part 14: SSH Mastery
  • Part 15: Service Management
  • Part 16: Advanced Shell Scripting
  • Part 17: Firewall Security
  • Part 18: Storage Management
  • Part 19: Performance OptimizationYou are here

  • 🎯 What's Next?

    You now have comprehensive Linux expertise! Consider specializing in:

  • DevOps & Automation: Kubernetes, Docker, CI/CD
  • Security: Penetration testing, hardening, compliance
  • Cloud: AWS, Azure, GCP administration
  • Development: System programming, kernel development

    ---

    How has this performance monitoring guide helped you understand system optimization? What performance challenges are you currently facing? Share your experiences—performance expertise grows through shared knowledge and collaborative problem-solving.

  • Made With Love on