Text Processing Mastery: grep, sed, awk, and Advanced Command Line Tools

Text processing is one of Linux's greatest strengths. Whether you're analyzing log files, processing CSV data, or manipulating configuration files, mastering these tools will make you incredibly productive at the command line.

The Power Trio: grep, sed, and awk

These three tools form the foundation of text processing in Linux:

grep: Search and filter text patterns

sed: Stream editor for filtering and transforming text

awk: Programming language for text processing and data extraction

Mastering grep - The Pattern Hunter

Basic grep Usage

bash

# Basic pattern search
grep "pattern" file.txt
grep "error" /var/log/syslog

# Case-insensitive search
grep -i "ERROR" logfile.txt

# Show line numbers
grep -n "function" script.py

# Count matching lines
grep -c "warning" logfile.txt

# Show only filenames with matches
grep -l "TODO" *.py

# Show filenames without matches
grep -L "completed" *.txt

Advanced grep Options

bash

# Recursive search in directories
grep -r "config" /etc/
grep -r --include="*.conf" "database" /etc/

# Context lines (before and after)
grep -A 3 "error" logfile.txt        # 3 lines after
grep -B 2 "error" logfile.txt        # 2 lines before
grep -C 5 "error" logfile.txt        # 5 lines before and after

# Multiple patterns
grep -E "error|warning|critical" logfile.txt
grep -e "pattern1" -e "pattern2" file.txt

# Invert match (lines NOT containing pattern)
grep -v "debug" logfile.txt

# Whole word matching
grep -w "user" file.txt              # Won't match "username"

# Fixed string search (no regex)
grep -F "literal.string" file.txt

Regular Expressions with grep

bash

# Beginning and end of line
grep "^start" file.txt               # Lines starting with "start"
grep "end$" file.txt                 # Lines ending with "end"
grep "^$" file.txt                   # Empty lines

# Character classes
grep "[0-9]" file.txt                # Lines containing digits
grep "[A-Z]" file.txt                # Lines containing uppercase
grep "[aeiou]" file.txt              # Lines containing vowels

# Quantifiers
grep "colou\?r" file.txt             # "color" or "colour"
grep "go\+d" file.txt                # "god", "good", "goood", etc.
grep "ba.*" file.txt                 # "ba" followed by anything

# Word boundaries
grep "\buser\b" file.txt             # Whole word "user"
grep "\<user\>" file.txt             # Alternative syntax

Practical grep Examples

bash

# Find all Python functions
grep -n "^def " *.py

# Find IP addresses in log files
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log

# Find email addresses
grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# Find lines with specific length
grep '^.\{80\}' file.txt             # Lines exactly 80 characters
grep '^.\{80,\}' file.txt            # Lines 80+ characters

# Extract error codes
grep -oE 'HTTP [0-9]{3}' access.log

# Find commented lines (various languages)
grep '^[[:space:]]*#' script.sh     # Shell comments
grep '^[[:space:]]*//' code.js       # JavaScript comments
grep '^[[:space:]]*<!--' page.html   # HTML comments

sed - The Stream Editor

Basic sed Operations

bash

# Substitute (replace) text
sed 's/old/new/' file.txt            # Replace first occurrence per line
sed 's/old/new/g' file.txt           # Replace all occurrences (global)
sed 's/old/new/2' file.txt           # Replace second occurrence per line

# Case-insensitive substitution
sed 's/old/new/gi' file.txt

# In-place editing (modify original file)
sed -i 's/old/new/g' file.txt
sed -i.bak 's/old/new/g' file.txt    # Create backup with .bak extension

Line-Based Operations

bash

# Delete lines
sed '3d' file.txt                    # Delete line 3
sed '2,5d' file.txt                  # Delete lines 2-5
sed '/pattern/d' file.txt            # Delete lines containing pattern
sed '/^$/d' file.txt                 # Delete empty lines

# Print specific lines
sed -n '10p' file.txt                # Print only line 10
sed -n '1,5p' file.txt               # Print lines 1-5
sed -n '/pattern/p' file.txt         # Print lines containing pattern

# Add lines
sed '3a\New line after line 3' file.txt
sed '3i\New line before line 3' file.txt
sed '$a\New line at end' file.txt

Advanced sed Techniques

bash

# Multiple operations
sed -e 's/old1/new1/g' -e 's/old2/new2/g' file.txt
sed 's/old1/new1/g; s/old2/new2/g' file.txt

# Using different delimiters
sed 's|/path/old|/path/new|g' file.txt   # Useful for paths
sed 's#old#new#g' file.txt               # Alternative delimiter

# Line ranges
sed '1,10s/old/new/g' file.txt       # Replace only in lines 1-10
sed '/start/,/end/s/old/new/g' file.txt  # Replace between patterns

# Backreferences
sed 's/\(word\)/[\1]/g' file.txt     # Wrap "word" in brackets
sed 's/\([0-9]*\)-\([0-9]*\)/\2-\1/g' file.txt  # Swap numbers around dash

# Address patterns
sed '/pattern/s/old/new/g' file.txt  # Replace only in lines with pattern
sed '/pattern/!s/old/new/g' file.txt # Replace only in lines WITHOUT pattern

Practical sed Examples

bash

# Configuration file editing
sed -i 's/^#port 22/port 22/' /etc/ssh/sshd_config
sed -i 's/DEBUG/INFO/g' app.conf

# CSV processing
sed 's/,/\t/g' data.csv              # Convert CSV to TSV
sed '1d' data.csv                    # Remove header row

# Code formatting
sed 's/[[:space:]]*$//' file.txt     # Remove trailing whitespace
sed '/^[[:space:]]*$/d' file.txt     # Remove blank lines

# Log processing
sed -n '/ERROR/,/^$/p' logfile.txt   # Extract error blocks
sed 's/.*\[\([^]]*\)\].*/\1/' access.log  # Extract timestamps

# HTML processing
sed 's/<[^>]*>//g' page.html         # Remove HTML tags
sed 's/&lt;/</g; s/&gt;/>/g; s/&amp;/\&/g' file.html  # Decode HTML entities

awk - The Text Processing Language

Basic awk Concepts

awk processes text line by line, splitting each line into fields (columns) by default using whitespace as the delimiter.

bash

# Basic syntax
awk 'pattern { action }' file.txt

# Print specific fields (columns)
awk '{print $1}' file.txt            # First field
awk '{print $1, $3}' file.txt        # First and third fields
awk '{print $NF}' file.txt           # Last field
awk '{print $(NF-1)}' file.txt       # Second-to-last field

# Print entire line
awk '{print}' file.txt               # Same as cat
awk '{print $0}' file.txt            # Explicit way

Field Separation and Variables

bash

# Custom field separator
awk -F: '{print $1}' /etc/passwd     # Use colon as separator
awk -F',' '{print $2}' data.csv      # Use comma for CSV
awk -F'\t' '{print $1}' data.tsv     # Use tab for TSV

# Built-in variables
awk '{print NR, NF, $0}' file.txt    # Line number, field count, full line
awk '{print FILENAME, FNR}' *.txt    # Filename and line number
awk 'END {print NR}' file.txt        # Total number of lines

Pattern Matching in awk

bash

# Pattern matching
awk '/pattern/ {print}' file.txt     # Lines containing pattern
awk '/^start/ {print}' file.txt      # Lines starting with "start"
awk '$1 == "error" {print}' file.txt # First field equals "error"
awk '$3 > 100 {print}' file.txt      # Third field greater than 100

# Ranges
awk '/start/,/end/ {print}' file.txt # Lines between patterns
awk 'NR==5,NR==10 {print}' file.txt # Lines 5 through 10

# Conditional expressions
awk 'length($0) > 80 {print}' file.txt       # Lines longer than 80 chars
awk 'NF > 5 {print}' file.txt                # Lines with more than 5 fields
awk '$2 ~ /pattern/ {print}' file.txt        # Second field matches pattern
awk '$1 !~ /pattern/ {print}' file.txt       # First field doesn't match

awk Programming Constructs

bash

# Variables and calculations
awk '{sum += $3} END {print sum}' file.txt          # Sum third column
awk '{count++} END {print count}' file.txt          # Count lines
awk '{sum += $1; count++} END {print sum/count}' file.txt  # Average

# Conditionals
awk '{if ($3 > 100) print "High:", $0}' file.txt
awk '{if ($1 == "error") errors++; else others++} END {print errors, others}' file.txt

# Loops
awk '{for(i=1; i<=NF; i++) print i, $i}' file.txt  # Print field numbers and values

# Arrays
awk '{count[$1]++} END {for (word in count) print word, count[word]}' file.txt

Practical awk Examples

bash

# Log analysis
awk '/ERROR/ {errors++} /WARNING/ {warnings++} END {print "Errors:", errors, "Warnings:", warnings}' logfile.txt

# CSV processing
awk -F',' '{print $2, $1}' data.csv                # Swap first two columns
awk -F',' 'NR>1 {sum+=$3; count++} END {print "Average:", sum/count}' data.csv

# System monitoring
ps aux | awk '{sum += $3} END {print "Total CPU:", sum "%"}'
df -h | awk '$5 > 80 {print $6, $5}'               # Show filesystems >80% full

# Text formatting
awk '{printf "%-20s %s\n", $1, $2}' file.txt      # Format columns
awk '{gsub(/old/, "new"); print}' file.txt         # Global substitution

# Data extraction
awk '/start_marker/,/end_marker/ {if (!/start_marker/ && !/end_marker/) print}' file.txt

# Multiple files processing
awk 'FNR==1{print "Processing", FILENAME} {print NR, $0}' *.txt

Additional Text Processing Tools

sort - Ordering Text

bash

# Basic sorting
sort file.txt                        # Alphabetical sort
sort -n numbers.txt                  # Numerical sort
sort -r file.txt                     # Reverse sort
sort -u file.txt                     # Unique sort (remove duplicates)

# Field-based sorting
sort -k2 file.txt                    # Sort by second field
sort -k2,2 file.txt                  # Sort by second field only
sort -k2n file.txt                   # Numerical sort by second field
sort -t: -k3n /etc/passwd            # Sort passwd by UID (third field)

# Advanced sorting
sort -k2,2n -k1,1 file.txt          # Sort by second field numerically, then first alphabetically
sort -c file.txt                     # Check if file is sorted
sort -m sorted1.txt sorted2.txt      # Merge sorted files

uniq - Finding Unique Lines

bash

# Remove duplicates (requires sorted input)
sort file.txt | uniq

# Count occurrences
sort file.txt | uniq -c

# Show only duplicates
sort file.txt | uniq -d

# Show only unique lines (appear once)
sort file.txt | uniq -u

# Case-insensitive uniqueness
sort file.txt | uniq -i

# Check specific fields
sort file.txt | uniq -f 1            # Skip first field when comparing

cut - Extracting Columns

bash

# Extract by character position
cut -c1-10 file.txt                  # Characters 1-10
cut -c5- file.txt                    # Characters 5 to end
cut -c-20 file.txt                   # Characters 1-20

# Extract by field
cut -d: -f1 /etc/passwd              # First field, colon delimiter
cut -d, -f2,4 data.csv               # Second and fourth fields
cut -d' ' -f3- file.txt              # Third field to end

# Multiple delimiters (using tr first)
tr '[:space:]' ',' < file.txt | cut -d, -f2

tr - Character Translation

bash

# Character replacement
tr 'a-z' 'A-Z' < file.txt            # Convert to uppercase
tr 'A-Z' 'a-z' < file.txt            # Convert to lowercase
tr ' ' '_' < file.txt                 # Replace spaces with underscores

# Character deletion
tr -d '0-9' < file.txt               # Remove all digits
tr -d '\n' < file.txt                # Remove newlines (join lines)
tr -d '[:punct:]' < file.txt         # Remove punctuation

# Character squeezing
tr -s ' ' < file.txt                 # Squeeze multiple spaces to one
tr -s '\n' < file.txt                # Remove blank lines

# Complement (everything except)
tr -cd '0-9\n' < file.txt            # Keep only digits and newlines

wc - Counting

bash

# Count lines, words, characters
wc file.txt                          # All counts
wc -l file.txt                       # Lines only
wc -w file.txt                       # Words only
wc -c file.txt                       # Characters only
wc -m file.txt                       # Multibyte characters

# Multiple files
wc *.txt                             # Count for each file plus total
find . -name "*.py" | xargs wc -l    # Total lines in all Python files

Advanced Text Processing Combinations

Pipelines and Command Chaining

bash

# Log analysis pipeline
cat access.log | grep "404" | cut -d' ' -f1 | sort | uniq -c | sort -rn | head -10

# Word frequency analysis
cat book.txt | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | tr ' ' '\n' | sort | uniq -c | sort -rn | head -20

# CSV data processing
cat sales.csv | awk -F',' 'NR>1 {sum+=$3} END {print "Total sales:", sum}' | tee results.txt

# Configuration file processing
grep -v '^#' config.txt | grep -v '^$' | sed 's/[[:space:]]*#.*//' | awk -F'=' '{print $1}' | sort

Complex Data Extraction

bash

# Extract email domains
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt | sed 's/.*@//' | sort | uniq -c

# Process JSON with text tools (basic)
cat data.json | grep '"name"' | sed 's/.*"name"[[:space:]]*:[[:space:]]*"//' | sed 's/".*//' | sort

# Extract URLs from HTML
grep -oE 'href="[^"]*"' page.html | sed 's/href="//' | sed 's/"//' | sort | uniq

# Process Apache logs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20  # Top IPs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20  # Top pages
awk '$9 >= 400 {print $9, $7}' access.log | sort | uniq -c         # Error pages

Practical Exercises

Exercise 1: Log Analysis

bash

# Create sample log file
cat > sample.log << EOF
2025-07-19 10:00:01 INFO User login successful: user1
2025-07-19 10:00:15 ERROR Database connection failed
2025-07-19 10:00:30 INFO User login successful: user2
2025-07-19 10:01:45 WARNING High memory usage detected
2025-07-19 10:02:00 ERROR Authentication failed for user1
2025-07-19 10:03:15 INFO User logout: user1
EOF

# Practice tasks
grep "ERROR" sample.log                                    # Find all errors
grep -c "INFO\|ERROR\|WARNING" sample.log                 # Count log levels
awk '{print $3}' sample.log | sort | uniq -c             # Count by log level
sed 's/.*: //' sample.log | grep "user" | sort | uniq    # Extract unique users

Exercise 2: CSV Processing

bash

# Create sample CSV
cat > employees.csv << EOF
Name,Department,Salary,Years
John Doe,Engineering,75000,5
Jane Smith,Marketing,65000,3
Bob Johnson,Engineering,80000,7
Alice Brown,Sales,60000,2
Charlie Wilson,Engineering,90000,10
EOF

# Practice tasks
awk -F',' 'NR>1 {sum+=$3; count++} END {print "Average salary:", sum/count}' employees.csv
awk -F',' '$2=="Engineering" {print $1, $3}' employees.csv
awk -F',' 'NR>1 {dept[$2]+=$3; count[$2]++} END {for(d in dept) print d, dept[d]/count[d]}' employees.csv

Exercise 3: Text Cleanup

bash

# Create messy text file
cat > messy.txt << EOF
  This   has   multiple    spaces  
	And some tabs mixed in
UPPERCASE and lowercase
Special characters: !@#$%^&*()
Numbers123and456letters
EOF

# Clean up tasks
sed 's/[[:space:]]\+/ /g' messy.txt                      # Fix spacing
tr '[:upper:]' '[:lower:]' < messy.txt                   # Normalize case
tr -d '[:punct:]' < messy.txt                            # Remove punctuation
tr -cd '[:alnum:][:space:]' < messy.txt                  # Keep only letters, numbers, spaces

Performance Tips

Efficient Text Processing

1. Use appropriate tools: grep for searching, sed for simple substitutions, awk for field processing 2. Minimize pipeline stages: Combine operations when possible 3. Use specific options: grep -F for literal strings, sort -n for numbers 4. Process large files efficiently: Use head or tail to sample first

bash

# Efficient: single awk command
awk '/pattern/ {gsub(/old/, "new"); print}' largefile.txt

# Less efficient: multiple commands
grep "pattern" largefile.txt | sed 's/old/new/g'

Memory Considerations

bash

# For very large files, process in chunks
split -l 1000000 hugefile.txt chunk_
for chunk in chunk_*; do
    process_chunk "$chunk"
done

# Use streaming when possible
tail -f logfile.txt | grep "ERROR" | awk '{print $1, $2}'

Quick Reference

grep Essentials

bash

grep "pattern" file           # Basic search
grep -i "pattern" file        # Case insensitive
grep -r "pattern" dir/        # Recursive
grep -v "pattern" file        # Invert match
grep -E "pat1|pat2" file      # Extended regex
grep -A3 -B3 "pattern" file   # Context lines

sed Essentials

bash

sed 's/old/new/g' file        # Global replace
sed -i 's/old/new/g' file     # In-place edit
sed '5d' file                 # Delete line 5
sed -n '1,10p' file           # Print lines 1-10
sed '/pattern/d' file         # Delete matching lines

awk Essentials

bash

awk '{print $1}' file         # First column
awk -F: '{print $1}' file     # Custom delimiter
awk '/pattern/ {print}' file  # Pattern matching
awk '{sum+=$1} END {print sum}' file  # Sum column
awk 'NF>5 {print}' file       # Lines with >5 fields

Key Takeaways

- Master grep for pattern searching and filtering

Use sed for stream editing and simple text transformations

Leverage awk for complex field-based processing and calculations

Combine tools with pipes for powerful text processing workflows

Choose the right tool for the job to maximize efficiency

Practice with real data to build proficiency

Regular expressions greatly expand your text processing capabilities

These text processing skills form the foundation of data manipulation and analysis in Linux. With practice, you'll find yourself solving complex text processing tasks with elegant one-liners that would require complex programs in other environments.

---

🚀 Continue Your Linux Journey

This is Part 6 of our comprehensive Linux mastery series.

Previous: Linux Permissions & Security - Master file permissions and ownership

Next: Package Management - Learn to install, update, and manage software packages

Beginner Foundation:

Part 5: Permissions & Security

Part 6: Text Processing ← You are here

Intermediate Skills:

Part 7: Package Management

Part 8: User & Group Management

Part 9: Process Management

Part 10: Environment Variables

Part 11: Automation with Cron

Ready to Manage Software? Continue with package management to install and maintain applications!

---

Coming next: Understanding package management - learn how to install, update, and manage software packages across different Linux distributions.

Text Processing Mastery: grep, sed, awk, and Advanced Command Line Tools

Text Processing Mastery: grep, sed, awk, and Advanced Command Line Tools

The Power Trio: grep, sed, and awk

Mastering grep - The Pattern Hunter

Basic grep Usage

Advanced grep Options

Regular Expressions with grep

Practical grep Examples

sed - The Stream Editor

Basic sed Operations

Line-Based Operations

Advanced sed Techniques

Practical sed Examples

awk - The Text Processing Language

Basic awk Concepts

Field Separation and Variables

Pattern Matching in awk

awk Programming Constructs

Practical awk Examples

Additional Text Processing Tools

sort - Ordering Text

uniq - Finding Unique Lines

cut - Extracting Columns

tr - Character Translation

wc - Counting

Advanced Text Processing Combinations

Pipelines and Command Chaining

Complex Data Extraction

Practical Exercises

Exercise 1: Log Analysis

Exercise 2: CSV Processing

Exercise 3: Text Cleanup

Performance Tips

Efficient Text Processing

Memory Considerations

Quick Reference

grep Essentials

sed Essentials

awk Essentials

Key Takeaways

🚀 Continue Your Linux Journey

📚 Complete Linux Series Navigation

📑 Table of Contents