Mastering Search and Replace in Files with Sed
As a Linux power user, efficiently manipulating text files is an essential skill. Fortunately, with theStream Editor (sed), Linux provides a lightweight yet incredibly versatile tool for performing find-and-replace operations on text files. In this comprehensive guide, you‘ll learn sed basics and advance to complex text transformations across multiple files.
Sed Basics: Syntax and Simple Replacements
The sed command has been part of Unix and Linux for decades and remains a staple for shell scripting and basic system administration tasks. At its core, sed allows you to find and replace text in files or input streams using regular expressions.
Here is the basic syntax for using sed for search and replace:
sed ‘s/search_regex/replace_string/‘ input_file
Let‘s break this command template down:
sed
invokes the stream editor commands
stands for "substitution". It tells sed we want to replace textsearch_regex
is the regular expression pattern to search forreplace_string
is the text that will replace matchesinput_file
is the file sed will process
For example, to replace all instances of "Linux" with "GNU/Linux" in text.txt
:
sed ‘s/Linux/GNU\/Linux/‘ text.txt
This performs a basic search and replace. However, it only prints the changes to standard output rather than updating text.txt
. To make substitutions permanent, use the -i
flag:
sed -i ‘s/Linux/GNU\/Linux/g‘ text.txt
Now text.txt
will contain the updated text after running the command.
Regex Flavors: Basic vs Extended
By default, sed expects basic regular expressions (BRE) in the search parameter. But sed can also work with extended regular expressions (ERE) for more complex matching by using the -E
or -r
options:
sed -r ‘s/([0-9]{2})\/([0-9]{2})\/([0-9]{4})/\3-\1-\2/‘ dates.txt
This example demonstrates using capture groups and backreferences to reformat dates in an file.
Understanding the differences between BRE, ERE, and using tools like regex101.com to build/test patterns is crucial for unlocking the full potential of sed.
Special Characters Must Be Escaped
Since forward slashes delimit the search and replace strings, any forward slashes inside the parameters need escaping with backslashes:
sed ‘s/https:\/\/\(.*\)\//http:\/\/\1\//‘ urls.txt
Other metacharacters like . * + ? [ ] ^ $ ( ) { } | \
also have special meaning and need escaping. Always check your regular expressions to avoid unexpected matches!
Replace All Instances or Just Some
By default, sed only replaces the first match on each line. To substitue every match, use the g
flag:
# Replace all IP addresses
sed ‘s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/xxx.xxx.xxx.xxx/‘ access.log
You can also specify which match to replace by appending a number:
# Only replace second phone number
sed ‘s/[0-9]\{3\}-[0-9]\{3\}-[0-9]\{4\}/REDACTED/2‘ contacts.csv
Make Changes Permanent with -i
As mentioned earlier, the -i
option writes changes to the actual files instead of standard output:
sed -i ‘s/foo/bar/‘ *.txt
-i is essentially shorthand for redirecting output to an interim file, then overwriting the original:
sed ‘s/foo/bar/‘ file.txt > tmp.txt && mv tmp.txt file.txt
So by default -i
creates a backup file. Disable this with -i ‘‘
if disk space is limited.
Find and Replace Across Multiple Files
Rather than running sed separately on individual files, you can process an entire directory recursively:
sed -i ‘s/Windows/Linux/‘ ./docs/*.txt
Or find all text files and stream them into sed‘s standard input:
find . -name ‘*.txt‘ -print0 | sed -z ‘s/findme/replace/‘
This handles files with spaces/special characters.
Limit Changes to Certain Line Numbers
Address ranges let you restrict replacements to matching lines:
# Lines 1-5 only
sed ‘1,5 s/foo/bar/‘ file.txt
# Line 10 to last line
sed ‘10,$ s/\bLinux\b/GNU\/Linux/‘ file.txt
Ranges can be standalone lines or prepended to existing substitutions:
sed ‘3 s/.*/This is line 3/
s/Linux/GNU\/Linux/‘ file.txt
Conditional Logic and Branching
In addition to simple substitutions, sed
also supports conditionals, branching, variables and more:
# Set delimiter
sed ‘s#//##‘
# Store match in hold buffer
sed -n ‘/match/{h;b};${g;p}‘
# Operate on hold buffer
sed ‘{H;g;s#\n#\\n#g;p}‘
This allows sed
to perform logic like filters, sums, concatenations etc. Complex scripts can leverage these features.
Optimizing Long Pipelines With Sed
When processing large files or streams, optimizing long pipelines avoids bottlenecks.
# BAD - each process waits on previous one
cat file | grep | sed | sort -u > output
# GOOD - stream directly between tools
grep file | sed | sort -u > output
Grouping sed commands, using temporary files, subprocessing sorts, greps etc significantly boosts throughput.
Recursive Find and Replace
Need to substitute text across an entire directory?
# Recursive find/replace under /path
find /path -type f -print0 | sed -z ‘s/find/replace/‘ | xargs -0 sed -i ‘s/find/replace/‘
The first sed previews changes, the second substitutes those findings.
Sed Editing Scripts
Rather than one-liners, you can create reusable sed scripts:
# script.sed
# Set delimiter
s#//##
# Global sub
s/match/replace/g
w output.txt
q
Then run sed -f script.sed input.txt
on any file.
Distributed / Cluster Sed
On Hadoop, Spark and other distributed platforms, sed
runs on worker nodes:
# Broadcast script
yarn sed -f script.sed
# Apply to node local data
sed ‘s/find/replace/‘ file
Makes scaling text processing to big data volumes easier.
Integrating Sed Into Programs
Many programming languages integrate sed functionality:
Python
import subprocess
with open(‘file.txt‘) as f:
out = subprocess.check_output([‘sed‘, ‘s/f/r/‘, f.name])
Node.js
const { exec } = require(‘child_process‘);
exec(‘sed -i ‘s/f/r/g‘ file.txt‘, (error, stdout, stderr) => {
// Check sed output
});
So sed can be leveraged from within application code.
Comparison to Alternative Tools
Performance Benchmarks
Sed has some advantages over Perl, Python and Awk for stream processing:
Image source: GeekFlare
As the chart shows, sed has extremely high throughput on stdin pipes.
Feature Comparison
Tool | Search/Replace | RegEx | Line Addressing | Performance | Programming |
---|---|---|---|---|---|
sed | Yes | BRE/ERE | Yes | Fast | Minimal |
awk | Yes | ERE | No | Moderate | Full language |
perl | Yes | PCRE | No | Slow | Full language |
python | Yes | Python RE | No | Slowest | Full language |
Use Case Comparison
Task | Best Suited Tool |
---|---|
Simple substitutions / filters | sed |
CSV / Table Processing | awk, python, perl |
HTML / XML Parsing | perl, python |
Machine Learning Text Analysis | python |
As shown sed has strengths and weaknesses compared to other POSIX tools.
Advanced Sed Tutorials
Now that you have a solid base in sed, where can you learn more advanced techniques?
- The Sed Tutorial – By Bruce Barnett
- IBM DeveloperWorks Sed One-Liners – Aggregates useful scripts
- Sed – An Introduction and Tutorial – By Bruce Barnett
These tutorials contain additional sed commands, practical examples, and how to integrate sed into shell scripts.
Be sure to also check the manual – man sed
.
Business Value and Use Cases
Why should IT departments invest time into leveraging sed – especially with modern alternatives available?
- Reduced Licensing Costs – No need to purchase expensive commercial software for text processing needs – sed is free as part of Linux.
- Staff Productivity – Easy to learn basic sed, increasing output of engineers, researchers, analysts when cleaninig and wrangling text data.
- Big Data Enabler – Sed scales effectively across thousands of servers and petabyte scale unstructured data.
- Cloud Migration Accelerator – Find and replace tasks facilitate moving legacy apps to cloud platforms faster via search/replace on config files.
- Regulatory Compliance – Anonymizing log files, reports and transcripts to meet GDPR, HIPAA and other data privacy mandates can leverage sed.
Organizations from startups to the Fortune 500 rely on sed daily as part of essential business operations.
Frequently Asked Questions About sed
Here are answers to some common questions about using the sed editing tool on Linux:
Q: Can sed edit files in-place like perl?
Yes, use the -i
option to make substitutions permanent in the actual file instead of just printing to standard output.
Q: How do I replace strings but only on certain lines?
Address ranges allow matching lines numbers – for instance sed ‘20,30 s/find/replace/‘ file.txt
Q: What if my search string contains forward slash characters?
Escape them properly, such as: sed ‘s/https:\/\//http:\/\//‘
Q: Should I use single or double quotes around the sed script?
Double quotes allow variable substitution, while single quotes treat the script literally. Single quotes avoid extra escaping.
Q: Can sed split strings into multiple variables like awk?
No, sed focuses specifically on finding and replacing text patterns. For more advanced text parsing features, use awk or Perl.
Q: What is the difference between the g and p flags?
g
means global – i.e. replace all instances on each line, not just the first match. p
prints altered lines so you can see substitutions.
Conclusion
Sed‘s simplicity – itaggregates just 30+ commands – hides its power for performing fast, in-place find-and-replace operations on text files. It may lack conveniences of full scripting langauges, but sed remains deeply embedded into the fabric of Unix-style utilities.
Hopefully this overview and reference helps both new and experienced Linux users take advantage of sed for common text transformation tasks. The examples and best practices provide a solid foundation for leveraging sed in bash scripts and sysadmin workflows.
I encourage you to check the sed documentation for even more commands, and examples of chaining them together for advanced text parsing needs on Linux or Unix systems.