Mastering Search and Replace in Files with Sed

As a Linux power user, efficiently manipulating text files is an essential skill. Fortunately, with theStream Editor (sed), Linux provides a lightweight yet incredibly versatile tool for performing find-and-replace operations on text files. In this comprehensive guide, you‘ll learn sed basics and advance to complex text transformations across multiple files.

Sed Basics: Syntax and Simple Replacements

The sed command has been part of Unix and Linux for decades and remains a staple for shell scripting and basic system administration tasks. At its core, sed allows you to find and replace text in files or input streams using regular expressions.

Here is the basic syntax for using sed for search and replace:

sed ‘s/search_regex/replace_string/‘ input_file

Let‘s break this command template down:

  • sed invokes the stream editor command
  • s stands for "substitution". It tells sed we want to replace text
  • search_regex is the regular expression pattern to search for
  • replace_string is the text that will replace matches
  • input_file is the file sed will process

For example, to replace all instances of "Linux" with "GNU/Linux" in text.txt:

sed ‘s/Linux/GNU\/Linux/‘ text.txt

This performs a basic search and replace. However, it only prints the changes to standard output rather than updating text.txt. To make substitutions permanent, use the -i flag:

sed -i ‘s/Linux/GNU\/Linux/g‘ text.txt

Now text.txt will contain the updated text after running the command.

Regex Flavors: Basic vs Extended

By default, sed expects basic regular expressions (BRE) in the search parameter. But sed can also work with extended regular expressions (ERE) for more complex matching by using the -E or -r options:

sed -r ‘s/([0-9]{2})\/([0-9]{2})\/([0-9]{4})/\3-\1-\2/‘ dates.txt

This example demonstrates using capture groups and backreferences to reformat dates in an file.

Understanding the differences between BRE, ERE, and using tools like regex101.com to build/test patterns is crucial for unlocking the full potential of sed.

Special Characters Must Be Escaped

Since forward slashes delimit the search and replace strings, any forward slashes inside the parameters need escaping with backslashes:

sed ‘s/https:\/\/\(.*\)\//http:\/\/\1\//‘ urls.txt

Other metacharacters like . * + ? [ ] ^ $ ( ) { } | \ also have special meaning and need escaping. Always check your regular expressions to avoid unexpected matches!

Replace All Instances or Just Some

By default, sed only replaces the first match on each line. To substitue every match, use the g flag:

# Replace all IP addresses 
sed ‘s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/xxx.xxx.xxx.xxx/‘ access.log

You can also specify which match to replace by appending a number:

# Only replace second phone number
sed ‘s/[0-9]\{3\}-[0-9]\{3\}-[0-9]\{4\}/REDACTED/2‘ contacts.csv 

Make Changes Permanent with -i

As mentioned earlier, the -i option writes changes to the actual files instead of standard output:

sed -i ‘s/foo/bar/‘ *.txt

-i is essentially shorthand for redirecting output to an interim file, then overwriting the original:

sed ‘s/foo/bar/‘ file.txt > tmp.txt && mv tmp.txt file.txt

So by default -i creates a backup file. Disable this with -i ‘‘ if disk space is limited.

Find and Replace Across Multiple Files

Rather than running sed separately on individual files, you can process an entire directory recursively:

sed -i ‘s/Windows/Linux/‘ ./docs/*.txt  

Or find all text files and stream them into sed‘s standard input:

find . -name ‘*.txt‘ -print0 | sed -z ‘s/findme/replace/‘

This handles files with spaces/special characters.

Limit Changes to Certain Line Numbers

Address ranges let you restrict replacements to matching lines:

# Lines 1-5 only  
sed ‘1,5 s/foo/bar/‘ file.txt   

# Line 10 to last line
sed ‘10,$ s/\bLinux\b/GNU\/Linux/‘ file.txt

Ranges can be standalone lines or prepended to existing substitutions:

sed ‘3 s/.*/This is line 3/
     s/Linux/GNU\/Linux/‘ file.txt  

Conditional Logic and Branching

In addition to simple substitutions, sed also supports conditionals, branching, variables and more:

# Set delimiter  
sed ‘s#//##‘

# Store match in hold buffer
sed -n ‘/match/{h;b};${g;p}‘  

# Operate on hold buffer
sed ‘{H;g;s#\n#\\n#g;p}‘

This allows sed to perform logic like filters, sums, concatenations etc. Complex scripts can leverage these features.

Optimizing Long Pipelines With Sed

When processing large files or streams, optimizing long pipelines avoids bottlenecks.

# BAD - each process waits on previous one
cat file | grep | sed | sort -u > output  

# GOOD - stream directly between tools 
grep file | sed | sort -u > output

Grouping sed commands, using temporary files, subprocessing sorts, greps etc significantly boosts throughput.

Recursive Find and Replace

Need to substitute text across an entire directory?

# Recursive find/replace under /path
find /path -type f -print0 | sed -z ‘s/find/replace/‘ | xargs -0 sed -i ‘s/find/replace/‘

The first sed previews changes, the second substitutes those findings.

Sed Editing Scripts

Rather than one-liners, you can create reusable sed scripts:

# script.sed
# Set delimiter
s#//##   

# Global sub
s/match/replace/g

w output.txt
q

Then run sed -f script.sed input.txt on any file.

Distributed / Cluster Sed

On Hadoop, Spark and other distributed platforms, sed runs on worker nodes:

# Broadcast script 
yarn sed -f script.sed   

# Apply to node local data
sed ‘s/find/replace/‘ file  

Makes scaling text processing to big data volumes easier.

Integrating Sed Into Programs

Many programming languages integrate sed functionality:

Python

import subprocess

with open(‘file.txt‘) as f:
   out = subprocess.check_output([‘sed‘, ‘s/f/r/‘, f.name]) 

Node.js

const { exec } = require(‘child_process‘);

exec(‘sed -i ‘s/f/r/g‘ file.txt‘, (error, stdout, stderr) => {

  // Check sed output

});

So sed can be leveraged from within application code.

Comparison to Alternative Tools

Performance Benchmarks

Sed has some advantages over Perl, Python and Awk for stream processing:

Sed Performance Comparison

Image source: GeekFlare

As the chart shows, sed has extremely high throughput on stdin pipes.

Feature Comparison

Tool Search/Replace RegEx Line Addressing Performance Programming
sed Yes BRE/ERE Yes Fast Minimal
awk Yes ERE No Moderate Full language
perl Yes PCRE No Slow Full language
python Yes Python RE No Slowest Full language

Use Case Comparison

Task Best Suited Tool
Simple substitutions / filters sed
CSV / Table Processing awk, python, perl
HTML / XML Parsing perl, python
Machine Learning Text Analysis python

As shown sed has strengths and weaknesses compared to other POSIX tools.

Advanced Sed Tutorials

Now that you have a solid base in sed, where can you learn more advanced techniques?

These tutorials contain additional sed commands, practical examples, and how to integrate sed into shell scripts.

Be sure to also check the manual – man sed.

Business Value and Use Cases

Why should IT departments invest time into leveraging sed – especially with modern alternatives available?

  • Reduced Licensing Costs – No need to purchase expensive commercial software for text processing needs – sed is free as part of Linux.
  • Staff Productivity – Easy to learn basic sed, increasing output of engineers, researchers, analysts when cleaninig and wrangling text data.
  • Big Data Enabler – Sed scales effectively across thousands of servers and petabyte scale unstructured data.
  • Cloud Migration Accelerator – Find and replace tasks facilitate moving legacy apps to cloud platforms faster via search/replace on config files.
  • Regulatory Compliance – Anonymizing log files, reports and transcripts to meet GDPR, HIPAA and other data privacy mandates can leverage sed.

Organizations from startups to the Fortune 500 rely on sed daily as part of essential business operations.

Frequently Asked Questions About sed

Here are answers to some common questions about using the sed editing tool on Linux:

Q: Can sed edit files in-place like perl?

Yes, use the -i option to make substitutions permanent in the actual file instead of just printing to standard output.

Q: How do I replace strings but only on certain lines?

Address ranges allow matching lines numbers – for instance sed ‘20,30 s/find/replace/‘ file.txt

Q: What if my search string contains forward slash characters?

Escape them properly, such as: sed ‘s/https:\/\//http:\/\//‘

Q: Should I use single or double quotes around the sed script?

Double quotes allow variable substitution, while single quotes treat the script literally. Single quotes avoid extra escaping.

Q: Can sed split strings into multiple variables like awk?

No, sed focuses specifically on finding and replacing text patterns. For more advanced text parsing features, use awk or Perl.

Q: What is the difference between the g and p flags?

g means global – i.e. replace all instances on each line, not just the first match. p prints altered lines so you can see substitutions.

Conclusion

Sed‘s simplicity – itaggregates just 30+ commands – hides its power for performing fast, in-place find-and-replace operations on text files. It may lack conveniences of full scripting langauges, but sed remains deeply embedded into the fabric of Unix-style utilities.

Hopefully this overview and reference helps both new and experienced Linux users take advantage of sed for common text transformation tasks. The examples and best practices provide a solid foundation for leveraging sed in bash scripts and sysadmin workflows.

I encourage you to check the sed documentation for even more commands, and examples of chaining them together for advanced text parsing needs on Linux or Unix systems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *