It's Monday morning, 9 AM. Your e-commerce website is crawling, database queries are timing out, and customer complaints are flooding in. Revenue is dropping by the minute, and everyone's looking at you to fix it. You have tools, you have access, but where do you start? How do you quickly identify whether it's CPU, memory, disk I/O, or network that's causing the bottleneck?
This scenario plays out in IT departments worldwide every day. As someone who has debugged performance issues ranging from overloaded web servers to misconfigured databases, I can tell you that the difference between a good system administrator and a great one isn't knowing every performance tool—it's knowing which tool to use when, and how to interpret what it's telling you.
Today, we'll master the art and science of Linux performance monitoring and optimization. We'll explore the power trio of sar, iostat, and vmstat, along with advanced techniques for identifying and resolving performance bottlenecks before they become business-critical issues.
Every performance issue falls into one of four categories:
System Performance Resources:
CPU Memory Storage Network
↓ ↓ ↓ ↓
Usage Pressure I/O Wait Bandwidth
Queues Swapping Latency Packet Loss
Context Leaks IOPS Congestion
Performance is a chain—the weakest link determines overall system performance.
Utilization: How busy is the resource? Saturation: Is there more work than the resource can handle? Errors: Are there any error conditions?
This methodology helps systematically analyze each resource type.
vmstat provides a snapshot of system activity including processes, memory, paging, block I/O, and CPU usage.
# Display current system status
vmstat
# Monitor every 2 seconds, 10 times
vmstat 2 10
# Display in MB instead of KB
vmstat -S M 2 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 1843200 80532 947012 0 0 5 23 45 78 3 1 96 0 0
Key Metrics Explained:
Processes (procs):
r
: Runnable processes (higher than CPU count = CPU pressure)b
: Blocked processes waiting for I/OMemory:
free
: Available memorybuff
: Buffers (metadata cache)cache
: Page cache (file data cache)si/so
: Swap in/out (should be 0 for good performance)I/O:
bi/bo
: Blocks in/out per secondSystem:
in
: Interrupts per secondcs
: Context switches per second (high values = CPU thrashing)CPU:
us
: User timesy
: System time id
: Idle timewa
: I/O wait (high = storage bottleneck)Scenario 1: Memory Pressure
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 2 45032 3421 15234 89234 123 89 45 67 892 1543 15 8 65 12 0
Analysis:
si/so
(swap activity) = memory pressurer=4
on dual-core system = CPU pressurewa=12%
= some I/O waitingiostat monitors disk I/O performance and provides detailed statistics about storage subsystem performance.
# Display I/O statistics
iostat
# Monitor every 2 seconds
iostat 2
# Extended statistics with more details
iostat -x 1 5
# Monitor specific devices
iostat -x /dev/sda /dev/sdb 2
$ iostat -x 1 5
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.12 1.85 2.45 8.67 45.23 156.89 36.32 0.08 7.23 5.67 7.89 2.45 2.8
Critical Metrics:
- r/s, w/s
: Read/write requests per second
rkB/s, wkB/s
: Read/write KB per secondavgqu-sz
: Average queue size (high = bottleneck)await
: Average wait time (milliseconds)svctm
: Service time (milliseconds)%util
: Device utilization percentage# Good Performance Indicators:
# %util < 80% # Device not saturated
# await < 10ms # Low latency
# avgqu-sz < 2 # No queuing pressure
# Warning Signs:
# %util > 90% # Device saturated
# await > 50ms # High latency
# avgqu-sz > 5 # Queue pressure
sar is the most comprehensive performance monitoring tool, capable of collecting and reporting virtually every system metric.
# CPU utilization
sar -u 1 10
# Memory usage
sar -r 1 10
# I/O statistics
sar -b 1 10
# Network statistics
sar -n DEV 1 10
# Load average
sar -q 1 10
# All statistics
sar -A 1 5
# View yesterday's data
sar -u -f /var/log/sysstat/saXX
# Specific time range
sar -u -s 09:00:00 -e 17:00:00
# Generate daily report
sar -A > daily-performance-report.txt
#!/bin/bash
# cpu-analysis.sh - Comprehensive CPU analysis
echo "=== CPU Analysis Report ==="
echo "Date: $(date)"
echo
# Current CPU usage
echo "1. Current CPU Usage:"
vmstat 1 5 | tail -1 | awk '{print "User: " $13 "%, System: " $14 "%, Idle: " $15 "%, I/O Wait: " $16 "%"}'
echo
# Top CPU consuming processes
echo "2. Top CPU Consumers:"
ps aux --sort=-%cpu | head -10 | awk '{printf "%-15s %-8s %-8s %s\n", $1, $2, $3, $11}'
echo
# Load average analysis
echo "3. Load Average Analysis:"
uptime | awk -F'load average:' '{print $2}' | awk '{
load1=$1; load5=$2; load15=$3;
gsub(",", "", load1); gsub(",", "", load5); gsub(",", "", load15);
print "1min: " load1 ", 5min: " load5 ", 15min: " load15
# Get CPU count
"nproc" | getline cpus
if(load1 > cpus) print "WARNING: 1-minute load exceeds CPU count"
if(load5 > cpus * 0.8) print "CAUTION: 5-minute load approaching CPU limit"
}'
echo
# Context switches and interrupts
echo "4. System Activity:"
vmstat 1 3 | tail -1 | awk '{print "Context Switches/sec: " $12 ", Interrupts/sec: " $11}'
# Check for high context switches (>10000/sec is high)
cs_rate=$(vmstat 1 3 | tail -1 | awk '{print $12}')
if [[ $cs_rate -gt 10000 ]]; then
echo "WARNING: High context switch rate detected"
fi
# 1. Process Priority Management
# Lower priority for background tasks
nice -n 19 backup-script.sh
renice 10 -p $(pgrep backup-process)
# Higher priority for critical services
renice -10 -p $(pgrep mysql)
# 2. CPU Affinity for Performance-Critical Applications
# Bind database to specific CPUs
taskset -cp 0,1 $(pgrep mysql)
# 3. Check CPU Governor Settings
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Set performance mode for high-load scenarios
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
#!/bin/bash
# memory-analysis.sh - Comprehensive memory analysis
echo "=== Memory Analysis Report ==="
echo "Date: $(date)"
echo
# Overall memory usage
echo "1. Memory Overview:"
free -h | awk 'NR==2{printf "Used: %s (%.1f%%), Available: %s\n", $3, $3/$2*100, $7}'
echo
# Memory pressure indicators
echo "2. Memory Pressure Indicators:"
echo "Swap Usage:"
free | awk 'NR==3{if($2>0) printf "Swap: %s/%s (%.1f%%)\n", $3, $2, $3/$2*100; else print "No swap configured"}'
echo "Page Faults:"
vmstat 1 3 | tail -1 | awk '{print "Major faults/sec: " $8 ", Minor faults/sec: " $7}'
# Memory-hungry processes
echo "3. Top Memory Consumers:"
ps aux --sort=-%mem | head -10 | awk '{printf "%-15s %-8s %-8s %s\n", $1, $2, $4, $11}'
echo
# Check for memory leaks
echo "4. Memory Leak Detection:"
echo "Checking for processes with unusual memory growth..."
for pid in $(ps -eo pid --no-headers); do
if [[ -f "/proc/$pid/status" ]]; then
vmsize=$(grep VmSize /proc/$pid/status 2>/dev/null | awk '{print $2}')
vmrss=$(grep VmRSS /proc/$pid/status 2>/dev/null | awk '{print $2}')
cmd=$(ps -p $pid -o comm= 2>/dev/null)
if [[ $vmsize -gt 1000000 ]] && [[ $vmrss -gt 500000 ]]; then
echo "Large process: PID $pid ($cmd) - VmSize: ${vmsize}kB, VmRSS: ${vmrss}kB"
fi
fi
done
# 1. Tune Virtual Memory Settings
echo "# Memory tuning" >> /etc/sysctl.conf
echo "vm.swappiness=10" >> /etc/sysctl.conf # Reduce swapping
echo "vm.dirty_ratio=5" >> /etc/sysctl.conf # Reduce dirty page cache
echo "vm.dirty_background_ratio=2" >> /etc/sysctl.conf
# Apply immediately
sysctl -p
# 2. Configure Huge Pages for databases
# Calculate required huge pages (for 8GB database buffer)
echo 4096 > /proc/sys/vm/nr_hugepages
# 3. Monitor slab cache (kernel memory usage)
cat /proc/slabinfo | head -20
#!/bin/bash
# io-analysis.sh - Comprehensive I/O performance analysis
echo "=== Storage I/O Analysis Report ==="
echo "Date: $(date)"
echo
# Current I/O activity
echo "1. Current I/O Activity:"
iostat -x 1 3 | grep -E "(Device|sd|md|dm-)" | tail -20
echo
echo "2. I/O Performance Summary:"
iostat -x 1 3 | awk '
/^Device:/ { next }
/^$/ { next }
{
device=$1; util=$NF; await=$(NF-3); avgqu=$(NF-4)
if(util > 80) printf "WARNING: %s utilization: %.1f%%\n", device, util
if(await > 20) printf "WARNING: %s average wait: %.1fms\n", device, await
if(avgqu > 2) printf "WARNING: %s queue size: %.1f\n", device, avgqu
}'
echo
echo "3. Top I/O Processes:"
iotop -a -o -d 1 -n 3 2>/dev/null | grep -v "^$" | head -10
echo
echo "4. Filesystem I/O Statistics:"
for mount in $(mount | grep -E 'ext[234]|xfs|btrfs' | awk '{print $3}'); do
echo "Mount: $mount"
if command -v iotop >/dev/null; then
iotop -a -P -d 1 -n 1 2>/dev/null | grep "$mount" | head -5
fi
done
echo
echo "5. Disk Space and Inode Usage:"
df -h | grep -v tmpfs
echo
df -i | grep -v tmpfs | awk 'NR>1 && $5+0 > 80 {print "WARNING: " $1 " inode usage: " $5}'
# 1. Optimize mount options for different workloads
# Database server (performance over safety)
/dev/sdb1 /var/lib/mysql ext4 defaults,noatime,data=writeback,barrier=0 0 2
# Log server (frequent writes)
/dev/sdc1 /var/log ext4 defaults,noatime,commit=60 0 2
# Read-heavy server (web content)
/dev/sdd1 /var/www ext4 defaults,noatime,data=ordered 0 2
# 2. Configure I/O scheduler based on storage type
# For SSDs - use noop or deadline
echo noop > /sys/block/sda/queue/scheduler
# For HDDs - use cfq for fairness
echo cfq > /sys/block/sdb/queue/scheduler
# 3. Adjust I/O queue depths
echo 32 > /sys/block/sda/queue/nr_requests
# 4. Enable read-ahead for sequential workloads
blockdev --setra 8192 /dev/sda
#!/bin/bash
# network-analysis.sh - Network performance analysis
echo "=== Network Performance Analysis ==="
echo "Date: $(date)"
echo
# Network interface statistics
echo "1. Network Interface Statistics:"
sar -n DEV 1 3 | grep -E "(IFACE|eth|ens|wlan)" | tail -10
echo
echo "2. Network Errors and Drops:"
cat /proc/net/dev | awk '
NR>2 {
iface=$1; gsub(":", "", iface)
rx_drops=$5; tx_drops=$13; rx_errors=$4; tx_errors=$12
if(rx_drops > 0 || tx_drops > 0 || rx_errors > 0 || tx_errors > 0)
printf "%s: RX drops:%d errors:%d, TX drops:%d errors:%d\n",
iface, rx_drops, rx_errors, tx_drops, tx_errors
}'
echo
echo "3. Network Connections:"
ss -tuln | wc -l | awk '{print "Total listening ports: " $1}'
ss -tun | wc -l | awk '{print "Total established connections: " $1}'
echo
echo "4. Top Network Processes:"
if command -v nethogs >/dev/null; then
timeout 5 nethogs -d 1 2>/dev/null | head -10
else
echo "Install nethogs for per-process network monitoring"
fi
echo
echo "5. Bandwidth Usage:"
vnstat -l -i eth0 2>/dev/null || echo "Install vnstat for bandwidth monitoring"
Problem: E-commerce site experiencing slow response times during peak hours.
Investigation Process:
#!/bin/bash
# web-server-analysis.sh
echo "=== Web Server Performance Analysis ==="
# 1. Check current load
echo "Current system load:"
uptime
echo
# 2. Identify bottlenecks
echo "Resource utilization:"
vmstat 1 5 | tail -1 | awk '{
printf "CPU: User=%s%% System=%s%% I/O Wait=%s%% Idle=%s%%\n", $13, $14, $16, $15
if($16 > 20) print "WARNING: High I/O wait detected"
if($15 < 10) print "WARNING: High CPU utilization"
}'
echo
echo "Memory pressure check:"
free | awk 'NR==2{
used_pct = $3/$2*100
printf "Memory used: %.1f%%\n", used_pct
if(used_pct > 90) print "WARNING: High memory usage"
}'
echo
echo "Apache/Nginx connection analysis:"
if pgrep apache2 >/dev/null; then
echo "Apache processes: $(pgrep apache2 | wc -l)"
apache2ctl status 2>/dev/null | grep -E "(requests|workers)"
elif pgrep nginx >/dev/null; then
echo "Nginx processes: $(pgrep nginx | wc -l)"
nginx -T 2>/dev/null | grep worker_processes
fi
echo
echo "Database connection check:"
if pgrep mysql >/dev/null; then
mysql -e "SHOW PROCESSLIST" | wc -l | awk '{print "MySQL connections: " $1}'
mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries'" 2>/dev/null
fi
# 3. Check disk I/O for logs and database
echo
echo "I/O analysis for critical paths:"
iostat -x 1 3 | grep -A20 "Device:" | tail -20 | while read line; do
if [[ $line =~ ^[a-z] ]]; then
util=$(echo $line | awk '{print $NF}')
device=$(echo $line | awk '{print $1}')
if (( $(echo "$util > 80" | bc -l) )); then
echo "WARNING: $device utilization at ${util}%"
fi
fi
done
Optimization Solutions:
# 1. Web Server Tuning
# Apache optimization
echo "# Apache performance tuning" >> /etc/apache2/conf.d/performance.conf
cat >> /etc/apache2/conf.d/performance.conf << 'EOF'
# Increase worker limits
ServerLimit 16
MaxRequestWorkers 400
ThreadsPerChild 25
# Enable compression
LoadModule deflate_module modules/mod_deflate.so
SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
EOF
# Nginx optimization
cat > /etc/nginx/conf.d/performance.conf << 'EOF'
# Worker processes optimization
worker_processes auto;
worker_connections 1024;
# Gzip compression
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_types text/plain text/css application/json application/javascript;
# Caching
location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
expires 1y;
add_header Cache-Control "public, immutable";
}
EOF
# 2. Database Optimization
cat >> /etc/mysql/conf.d/performance.cnf << 'EOF'
[mysqld]
# InnoDB optimizations
innodb_buffer_pool_size = 2G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
# Query cache
query_cache_type = 1
query_cache_size = 128M
# Connection limits
max_connections = 200
connect_timeout = 10
EOF
# 3. System-level optimizations
cat >> /etc/sysctl.conf << 'EOF'
# Network optimizations
net.core.netdev_max_backlog = 5000
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.ipv4.tcp_rmem = 4096 65536 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# File system optimizations
fs.file-max = 100000
EOF
sysctl -p
Problem: MySQL database queries becoming increasingly slow.
Investigation and Optimization:
#!/bin/bash
# database-performance-analysis.sh
echo "=== Database Performance Analysis ==="
# 1. MySQL process analysis
echo "MySQL Process Analysis:"
mysql -e "
SELECT
COUNT(*) as total_connections,
SUM(TIME) as total_time,
AVG(TIME) as avg_time,
STATE,
COMMAND
FROM INFORMATION_SCHEMA.PROCESSLIST
GROUP BY STATE, COMMAND
ORDER BY total_time DESC;" 2>/dev/null
# 2. Slow query analysis
echo
echo "Slow Query Analysis:"
mysql -e "
SELECT
query_time,
lock_time,
rows_sent,
rows_examined,
sql_text
FROM mysql.slow_log
ORDER BY query_time DESC
LIMIT 10;" 2>/dev/null
# 3. InnoDB status
echo
echo "InnoDB Buffer Pool Analysis:"
mysql -e "
SHOW ENGINE INNODB STATUS\G" 2>/dev/null | grep -A10 "BUFFER POOL"
# 4. Table lock analysis
echo
echo "Table Lock Analysis:"
mysql -e "SHOW OPEN TABLES WHERE In_use > 0;" 2>/dev/null
# 5. Index analysis
echo
echo "Missing Index Analysis:"
mysql -e "
SELECT
t.TABLE_SCHEMA,
t.TABLE_NAME,
t.TABLE_ROWS,
ROUND(((data_length + index_length) / 1024 / 1024), 2) 'Size_MB'
FROM information_schema.TABLES t
ORDER BY (data_length + index_length) DESC
LIMIT 10;" 2>/dev/null
Problem: System gradually consuming more memory over time.
#!/bin/bash
# memory-leak-detection.sh
echo "=== Memory Leak Detection ==="
# Create baseline
mkdir -p /var/log/memory-monitoring
LOGFILE="/var/log/memory-monitoring/memory-$(date +%Y%m%d).log"
# Monitor memory over time
while true; do
timestamp=$(date)
# System memory
total_mem=$(free | awk 'NR==2{print $2}')
used_mem=$(free | awk 'NR==2{print $3}')
free_mem=$(free | awk 'NR==2{print $4}')
# Top memory consumers
top_processes=$(ps aux --sort=-%mem | head -10 | awk '{print $2 ":" $4 ":" $11}' | tr '\n' '|')
echo "$timestamp,$total_mem,$used_mem,$free_mem,$top_processes" >> "$LOGFILE"
# Analysis: Check for rapid memory growth
if [[ -f "$LOGFILE" ]] && [[ $(wc -l < "$LOGFILE") -gt 10 ]]; then
# Check if memory usage increased by more than 5% in last hour
current_usage=$((used_mem * 100 / total_mem))
hour_ago_usage=$(tail -60 "$LOGFILE" | head -1 | cut -d',' -f3)
hour_ago_total=$(tail -60 "$LOGFILE" | head -1 | cut -d',' -f2)
hour_ago_pct=$((hour_ago_usage * 100 / hour_ago_total))
if [[ $((current_usage - hour_ago_pct)) -gt 5 ]]; then
echo "ALERT: Memory usage increased by $((current_usage - hour_ago_pct))% in the last hour"
echo "Current: ${current_usage}%, Hour ago: ${hour_ago_pct}%"
fi
fi
sleep 60 # Monitor every minute
done &
echo "Memory monitoring started. Log file: $LOGFILE"
echo "PID: $!"
1. Establish Baseline:
#!/bin/bash
# create-performance-baseline.sh
BASELINE_DIR="/var/log/performance-baseline/$(date +%Y%m%d)"
mkdir -p "$BASELINE_DIR"
echo "Creating performance baseline..."
# System information
uname -a > "$BASELINE_DIR/system-info.txt"
cat /proc/cpuinfo > "$BASELINE_DIR/cpu-info.txt"
cat /proc/meminfo > "$BASELINE_DIR/memory-info.txt"
lsblk > "$BASELINE_DIR/storage-info.txt"
# Performance metrics
vmstat 1 60 > "$BASELINE_DIR/vmstat-baseline.txt" &
iostat -x 1 60 > "$BASELINE_DIR/iostat-baseline.txt" &
sar -A 1 60 > "$BASELINE_DIR/sar-baseline.txt" &
echo "Baseline collection started. Will run for 60 seconds."
echo "Files saved to: $BASELINE_DIR"
2. Monitor Key Metrics:
# Key thresholds to monitor
cat > /etc/monitoring/performance-thresholds.conf << 'EOF'
# CPU thresholds
CPU_LOAD_WARN=0.8 # 80% of CPU count
CPU_LOAD_CRIT=1.5 # 150% of CPU count
CPU_IOWAIT_WARN=20 # 20% I/O wait
CPU_IOWAIT_CRIT=40 # 40% I/O wait
# Memory thresholds
MEM_USAGE_WARN=80 # 80% memory usage
MEM_USAGE_CRIT=95 # 95% memory usage
SWAP_USAGE_WARN=10 # Any swap usage
SWAP_USAGE_CRIT=50 # 50% swap usage
# I/O thresholds
IO_UTIL_WARN=80 # 80% disk utilization
IO_UTIL_CRIT=95 # 95% disk utilization
IO_AWAIT_WARN=20 # 20ms average wait
IO_AWAIT_CRIT=100 # 100ms average wait
# Network thresholds
NET_ERRORS_WARN=10 # 10 errors per interval
NET_DROPS_WARN=5 # 5 drops per interval
EOF
3. Automated Performance Alerts:
#!/bin/bash
# performance-monitor.sh - Automated monitoring with alerts
source /etc/monitoring/performance-thresholds.conf
ALERT_EMAIL="admin@company.com"
LOG_FILE="/var/log/performance-alerts.log"
check_cpu() {
load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
cpu_count=$(nproc)
load_ratio=$(echo "$load_avg / $cpu_count" | bc -l)
if (( $(echo "$load_ratio > $CPU_LOAD_CRIT" | bc -l) )); then
alert "CRITICAL: CPU load $load_avg on $cpu_count cores (ratio: $load_ratio)"
elif (( $(echo "$load_ratio > $CPU_LOAD_WARN" | bc -l) )); then
alert "WARNING: CPU load $load_avg on $cpu_count cores (ratio: $load_ratio)"
fi
}
check_memory() {
mem_usage=$(free | awk 'NR==2{printf "%.0f", $3/$2*100}')
if [[ $mem_usage -gt $MEM_USAGE_CRIT ]]; then
alert "CRITICAL: Memory usage at ${mem_usage}%"
elif [[ $mem_usage -gt $MEM_USAGE_WARN ]]; then
alert "WARNING: Memory usage at ${mem_usage}%"
fi
# Check swap
swap_usage=$(free | awk 'NR==3{if($2>0) printf "%.0f", $3/$2*100; else print "0"}')
if [[ $swap_usage -gt $SWAP_USAGE_CRIT ]]; then
alert "CRITICAL: Swap usage at ${swap_usage}%"
elif [[ $swap_usage -gt $SWAP_USAGE_WARN ]]; then
alert "WARNING: Swap usage at ${swap_usage}%"
fi
}
check_io() {
iostat -x 1 3 | awk '/^[a-z]/ && NF>10 {
if($NF > '$IO_UTIL_CRIT') system("echo CRITICAL: Device " $1 " utilization at " $NF "%")
else if($NF > '$IO_UTIL_WARN') system("echo WARNING: Device " $1 " utilization at " $NF "%")
if($(NF-3) > '$IO_AWAIT_CRIT') system("echo CRITICAL: Device " $1 " wait time " $(NF-3) "ms")
else if($(NF-3) > '$IO_AWAIT_WARN') system("echo WARNING: Device " $1 " wait time " $(NF-3) "ms")
}'
}
alert() {
message="$1"
timestamp=$(date)
echo "[$timestamp] $message" | tee -a "$LOG_FILE"
echo "$message" | mail -s "Performance Alert - $(hostname)" "$ALERT_EMAIL"
}
# Run checks
check_cpu
check_memory
check_io
# Schedule with cron every 5 minutes:
# */5 * * * * /usr/local/bin/performance-monitor.sh
# Install perf tools
sudo apt-get install linux-tools-generic # Ubuntu/Debian
sudo yum install perf # CentOS/RHEL
# Profile CPU usage for 30 seconds
sudo perf record -a -g sleep 30
sudo perf report
# Find CPU hotspots in specific process
sudo perf top -p $(pgrep mysql)
# Profile system calls
sudo perf trace -p $(pgrep apache2) sleep 10
# Enable function tracing
echo function > /sys/kernel/debug/tracing/current_tracer
# Trace specific functions
echo 'sys_write' > /sys/kernel/debug/tracing/set_ftrace_filter
# Start tracing
echo 1 > /sys/kernel/debug/tracing/tracing_on
# View trace
cat /sys/kernel/debug/tracing/trace
# Stop tracing
echo 0 > /sys/kernel/debug/tracing/tracing_on
- [ ] Check system load with uptime
df -h
free -h
/var/log/messages
- [ ] Generate sar reports for trend analysis
- [ ] Review and update performance baselines
We've explored the comprehensive world of Linux performance monitoring and optimization, from fundamental tools like vmstat, iostat, and sar to advanced profiling techniques and real-world troubleshooting scenarios.
Week 1: Foundation
Month 1: Advanced Analysis
Month 3: Optimization Expert
🎯 Measure First: Never optimize without baseline measurements 📊 Think Holistically: Performance issues often span multiple subsystems 🔄 Continuous Monitoring: Performance is not a one-time configuration ⚡ Automate Everything: Manual monitoring doesn't scale 🎛️ Tune Incrementally: Make one change at a time and measure impact
Performance optimization is both art and science. The tools we've covered give you the scientific measurement capabilities, but the art comes from experience—understanding how different workloads behave, recognizing patterns in metrics, and knowing which optimizations provide the biggest impact.
Remember: A well-tuned system isn't just faster—it's more reliable, costs less to operate, and provides better user experience. Every optimization you make compounds over time, creating systems that scale gracefully and perform consistently under pressure.
Start with measurement, optimize systematically, and never stop learning. Your users (and your 3 AM self) will thank you for building systems that perform beautifully under any load.
---
This is Part 19 of our comprehensive Linux mastery series - the final piece of your Linux expertise!
Previous: Storage Management: LVM, RAID & Optimization - Master flexible storage systems
Beginner Foundation (Parts 1-5):
Intermediate Skills (Parts 6-11):
Advanced Mastery (Parts 12-19):
You now have comprehensive Linux expertise! Consider specializing in:
---
How has this performance monitoring guide helped you understand system optimization? What performance challenges are you currently facing? Share your experiences—performance expertise grows through shared knowledge and collaborative problem-solving.