The sed command is a powerful text processing tool in Linux that allows you to edit files directly on the command line. One of sed‘s most useful features is its ability to use regular expressions (regex) to match complex patterns in text. By combining sed with regex, you can search, find, delete, replace and transform text in extremely flexible ways.

In this comprehensive guide, you‘ll learn:

  • What is sed and regex
  • Sed regex basics
  • Matching text patterns
  • Managing whitespace
  • Transforming text case
  • Multi-line processing
  • Using sed in scripts

By the end, you‘ll have a deep understanding of how to unleash the full power of sed with regular expressions.

What is Sed and Regex?

Sed stands for "stream editor". It accepts standard input, modifies it according to an editing script, and prints it to standard output. The editing commands allow you to search, find, delete, insert, replace and transform text quickly.

Regular expressions (regex) are patterns used to match character combinations in text strings. Regex provides extremely flexible pattern matching notation to search text.

When sed is combined with a regex, you get a powerful duo for advanced text processing and editing.

Sed Regex Basics

Here are some regex basics to get started with using sed:

  • “.“` – Matches any single character
  • *“** – Matches 0 or more occurrences of the previous character/class
  • + – Matches 1 or more occurrences of the previous character/class
  • [] – Matches any character within the brackets
  • ^ – Matches the start of the line
  • $ – Matches the end of the line
  • \s – Matches whitespace
  • \S – Matches non-whitespace

To print matched lines with sed, use the p command. For example:

sed -n ‘/regex/p‘ file.txt

The -n flag tells sed not to print every line by default. Then the p command prints any lines matching the regex.

Matching Text Patterns

One of sed‘s most common uses is searching text to find matching patterns.

Exact Matches

To match an exact word, use the regex with no special characters:

sed -n ‘/orange/p‘ fruits.txt

This will print lines in fruits.txt that contain the exact word "orange".

Wildcard Matches

To perform wildcard matches, use the dot . character to match any character.

For example, to match words starting with "app":

sed -n ‘/app.*/p‘ file.txt

The .* matches any 0 or more characters after "app".

Repeated Matches

To match repeated instances of a character, use the star * quantifier after the character:

sed -n ‘/app.*e*/p‘ apps.txt

This will match "apple", "applee", "applessse" etc.

You can also match 1 or more repeats with the plus + sign instead.

Character Classes

Square brackets [] allow you to match any character within the class.

For example, to match fruit words starting with a, b or c:

sed -n ‘/[abc]ruit/p‘ fruits.txt

Start and End Anchors

The caret ^ matches the start of the line, while $ matches the end of the line.

For example, to print lines ending in a question mark:

sed -n ‘/\?$/p‘ file.txt

You can combine these into powerful regex patterns to precisely target text.

Managing Whitespace

Sed provides handy shortcuts to match and manipulate whitespace in text.

Matching Whitespace

The shortcut \s matches a whitespace character, while \S matches non-whitespace.

For example, to print lines starting with whitespace:

sed -n ‘/^\s/p‘ file.txt

And to print lines without initial whitespace:

sed -n ‘/^\S/p‘ file.txt

Deleting Whitespace

To delete leading whitespace, use the following substitution command:

sed ‘s/^\s*//‘ file.txt

The regex ^\s* matches whitespace at the start of the line. Substituting this with nothing deletes the whitespace.

Trailing whitespace can be deleted similarly with:

sed ‘s/\s*$//‘ file.txt 

Inserting Whitespace

To insert a tab at the start of lines, use:

sed ‘s/^/‘\‘‘\t‘‘/` file.txt

The ‘\t‘ inserts a tab character. This is a useful way to indent code.

Transforming Text Case

Sed allows you to easily convert text between upper case and lower case.

To Lower Case

To convert to lower case letters, use:

sed ‘s/\(.*\)/\L\1/‘ file.txt

The \L escape sequence converts the matched text to lower case.

To Upper Case

Likewise, to convert to upper case:

sed ‘s/\(.*\)/\U\1/‘ file.txt 

The \U escape sequence converts text to upper case.

You can also use these to capitalize sentences by lower casing first, then upper casing just the first letter.

Multi-Line Processing

By default sed operates line by line. To match patterns over multiple lines, use the N and D commands.

For example, to match a regex over two lines:

sed ‘N; s/regex/replace/‘ file.txt

The N command appends the next line to the pattern space. After doing the substitution, the D command deletes the new line character to join them back together.

By repeating these commands, you can process 3 or more lines too.

This makes sed incredibly powerful for find and replacing across multiple lines.

Using Sed in Scripts

In addition to interactive use on the command line, sed editing commands can be saved in a script file for reuse.

For example, save the following into a file script.sed:

# Script to format text

# Convert to lower case
s/\(.*\)/\L\1/

# Capitalize first letter of sentences  
s/\<./\u&/g

# Remove double spaces
s/  / /g

The script contains common text formatting tasks.

To run the script on a file:

sed -f script.sed file.txt

This allows you to create reusable sed scripts to automate editing tasks.

Conclusion

Sed is an extremely powerful text processing tool on Linux. Combining it with regular expression makes it unmatched for manipulating text from the command line.

With the basics covered here, you should now be able to use sed and regex to find, edit, delete, replace and transform text files with ease.

Sed scripts allow you to save common editing tasks for reuse. Over time you can build up a suite of scripts to automate text processing operations.

I hope you found this guide useful! Let me know if you have any sed & regex tips in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *