The sed command is a powerful text processing tool in Linux that allows you to edit files directly on the command line. One of sed‘s most useful features is its ability to use regular expressions (regex) to match complex patterns in text. By combining sed with regex, you can search, find, delete, replace and transform text in extremely flexible ways.
In this comprehensive guide, you‘ll learn:
- What is sed and regex
- Sed regex basics
- Matching text patterns
- Managing whitespace
- Transforming text case
- Multi-line processing
- Using sed in scripts
By the end, you‘ll have a deep understanding of how to unleash the full power of sed with regular expressions.
What is Sed and Regex?
Sed stands for "stream editor". It accepts standard input, modifies it according to an editing script, and prints it to standard output. The editing commands allow you to search, find, delete, insert, replace and transform text quickly.
Regular expressions (regex) are patterns used to match character combinations in text strings. Regex provides extremely flexible pattern matching notation to search text.
When sed is combined with a regex, you get a powerful duo for advanced text processing and editing.
Sed Regex Basics
Here are some regex basics to get started with using sed:
- “.“` – Matches any single character
- *““** – Matches 0 or more occurrences of the previous character/class
+
– Matches 1 or more occurrences of the previous character/class[]
– Matches any character within the brackets^
– Matches the start of the line$
– Matches the end of the line\s
– Matches whitespace\S
– Matches non-whitespace
To print matched lines with sed, use the p
command. For example:
sed -n ‘/regex/p‘ file.txt
The -n
flag tells sed not to print every line by default. Then the p
command prints any lines matching the regex.
Matching Text Patterns
One of sed‘s most common uses is searching text to find matching patterns.
Exact Matches
To match an exact word, use the regex with no special characters:
sed -n ‘/orange/p‘ fruits.txt
This will print lines in fruits.txt that contain the exact word "orange".
Wildcard Matches
To perform wildcard matches, use the dot .
character to match any character.
For example, to match words starting with "app":
sed -n ‘/app.*/p‘ file.txt
The .*
matches any 0 or more characters after "app".
Repeated Matches
To match repeated instances of a character, use the star *
quantifier after the character:
sed -n ‘/app.*e*/p‘ apps.txt
This will match "apple", "applee", "applessse" etc.
You can also match 1 or more repeats with the plus +
sign instead.
Character Classes
Square brackets []
allow you to match any character within the class.
For example, to match fruit words starting with a, b or c:
sed -n ‘/[abc]ruit/p‘ fruits.txt
Start and End Anchors
The caret ^
matches the start of the line, while $
matches the end of the line.
For example, to print lines ending in a question mark:
sed -n ‘/\?$/p‘ file.txt
You can combine these into powerful regex patterns to precisely target text.
Managing Whitespace
Sed provides handy shortcuts to match and manipulate whitespace in text.
Matching Whitespace
The shortcut \s
matches a whitespace character, while \S
matches non-whitespace.
For example, to print lines starting with whitespace:
sed -n ‘/^\s/p‘ file.txt
And to print lines without initial whitespace:
sed -n ‘/^\S/p‘ file.txt
Deleting Whitespace
To delete leading whitespace, use the following substitution command:
sed ‘s/^\s*//‘ file.txt
The regex ^\s*
matches whitespace at the start of the line. Substituting this with nothing deletes the whitespace.
Trailing whitespace can be deleted similarly with:
sed ‘s/\s*$//‘ file.txt
Inserting Whitespace
To insert a tab at the start of lines, use:
sed ‘s/^/‘\‘‘\t‘‘/` file.txt
The ‘\t‘
inserts a tab character. This is a useful way to indent code.
Transforming Text Case
Sed allows you to easily convert text between upper case and lower case.
To Lower Case
To convert to lower case letters, use:
sed ‘s/\(.*\)/\L\1/‘ file.txt
The \L
escape sequence converts the matched text to lower case.
To Upper Case
Likewise, to convert to upper case:
sed ‘s/\(.*\)/\U\1/‘ file.txt
The \U
escape sequence converts text to upper case.
You can also use these to capitalize sentences by lower casing first, then upper casing just the first letter.
Multi-Line Processing
By default sed operates line by line. To match patterns over multiple lines, use the N
and D
commands.
For example, to match a regex over two lines:
sed ‘N; s/regex/replace/‘ file.txt
The N
command appends the next line to the pattern space. After doing the substitution, the D
command deletes the new line character to join them back together.
By repeating these commands, you can process 3 or more lines too.
This makes sed incredibly powerful for find and replacing across multiple lines.
Using Sed in Scripts
In addition to interactive use on the command line, sed editing commands can be saved in a script file for reuse.
For example, save the following into a file script.sed
:
# Script to format text
# Convert to lower case
s/\(.*\)/\L\1/
# Capitalize first letter of sentences
s/\<./\u&/g
# Remove double spaces
s/ / /g
The script contains common text formatting tasks.
To run the script on a file:
sed -f script.sed file.txt
This allows you to create reusable sed scripts to automate editing tasks.
Conclusion
Sed is an extremely powerful text processing tool on Linux. Combining it with regular expression makes it unmatched for manipulating text from the command line.
With the basics covered here, you should now be able to use sed and regex to find, edit, delete, replace and transform text files with ease.
Sed scripts allow you to save common editing tasks for reuse. Over time you can build up a suite of scripts to automate text processing operations.
I hope you found this guide useful! Let me know if you have any sed & regex tips in the comments.