grep explained

The word grep is sometimes used as a verb to mean, find or search for a string. Often this implies searching for a fixed string. But grep (Global Regular Expression Print) as the command acronym indicates, searches for a regular expression. It’s a highly versatile *nix command to match strings. See Computerphile: Where GREP Came From for a brief history.

Terminology

  • Pattern – is a sequence of characters, where each character may be a literal or have a special interpretation.
  • Fixed string – is a search pattern where none of the characters have any special interpretation.
  • Regular Expression (or regex) – is a search pattern where each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. There are two variations on the syntax of the specified pattern1. The difference is in the behavior of a few special characters, ?, +, (), {}, and |.
    • BRE (Basic Regular Expression) – these characters do not have special meaning unless prefixed with a backslash \.
    • ERE (Extended Regular Expression) – these characters are special unless they are prefixed with backslash \.
  • Glob – is a simpler search pattern used for matching filenames. The special characters *, ? and [] are referred to as wildcards. The confusing part is when the syntax of globbing and regex overlap, because these wildcard characters are the same as regex meta characters but with a different interpretation 2.
GLOB is for files. GREP is for strings.
ERE is the de facto variation of regex you teach, learn and use.

Usage

Specify target files to search.

$ grep [OPTIONS] PATTERN [FILE1 FILE2 FILE3 ...]

Specify all files in target directories recursively to search.

$ grep -r [OPTIONS] PATTERN [DIRECTORY1 DIRECTORY2 DIRECTORY3 ...]

The PATTERN is interpreted depending on the matcher selection option.
-E, –extended-regexp
-F, –fixed-strings
-G, –basic-regexp (default)

It’s this option that can make usage of grep frustrating if you are not aware of it.
* The PATTERN by default is interpreted as a regex and not a fixed string.
* Though regex typically implies ERE, the default is BRE.

In addition, there are variant programs egrep and fgrep that are the same as grep -E and grep -F, respectively. These variants are deprecated, but are provided for backward compatibility.

Using the default or -G is very confusing. Always use -E ( egrep ) or -F ( fgrep ).

Options

Here are some useful options, especially when searching through source code.

Matching control
-r, --recursive Read all files under each directory, recursively.
-i, --ignore-case Ignore case distinctions in both the PATTERN and the input files.
-w Match whole word.
-v, --invert-match Invert the sense of matching, select non matching lines.
Output control
-c, --count Print count of matching lines for each input file.
-n, --line-number Prefix each line of output with the 1-based line number within its input file.
-l, --files-with-matches Print only the name of each input file where pattern is matched.
Helpful when there are many matches in each file making the output very long.
-color, --color Color the output.
Useful in distinguishing the matched strings, matching lines, file names, line numbers.
File and directory control
-I Ignore binary files.
Useful in reducing output verbosity like Binary file ./.git/index matches.
--exclude=GLOB Use GLOB wildcard matching to skip files when searching.
--include=GLOB Use wildcard matching to search only files that match GLOB.
--exclude-dir=GLOB Use GLOB wildcard matching to skip directories when searching recursively.

find | grep

Though grep provides file and directory control (--include, -exclude, --exclude-dir) to filter input files, it’s a bit clunky and can be confusing and frustrating. So typically, the more powerful find file(s) command is used and piped to the grep command.

In its simplest form.

$ find path/to/start/dir [FIND-OPTIONS] FILE-PATTERN | xargs grep [GREP-OPTIONS] STRING-PATTERN

Or if the filenames have space(s).

$ find path/to/start/dir [FIND-OPTIONS] FILE-PATTERN -print0 | xargs -0 grep [GREP-OPTIONS] STRING-PATTERN
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s