The word grep is sometimes used as a verb to mean, find or search for a string. Often this implies searching for a fixed string. But grep
(Global Regular Expression Print) as the command acronym indicates, searches for a regular expression. It’s a highly versatile *nix command to match strings. See Computerphile: Where GREP Came From for a brief history.
Terminology
- Pattern – is a sequence of characters, where each character may be a literal or have a special interpretation.
- Fixed string – is a search pattern where none of the characters have any special interpretation.
- Regular Expression (or regex) – is a search pattern where each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. There are two variations on the syntax of the specified pattern1. The difference is in the behavior of a few special characters,
?
,+
,()
,{}
, and|
.- BRE (Basic Regular Expression) – these characters do not have special meaning unless prefixed with a backslash
\
. - ERE (Extended Regular Expression) – these characters are special unless they are prefixed with backslash
\
.
- BRE (Basic Regular Expression) – these characters do not have special meaning unless prefixed with a backslash
- Glob – is a simpler search pattern used for matching filenames. The special characters
*
,?
and[]
are referred to as wildcards. The confusing part is when the syntax of globbing and regex overlap, because these wildcard characters are the same as regex meta characters but with a different interpretation 2.
ERE is the de facto variation of regex you teach, learn and use.
Usage
Specify target files to search.
$ grep [OPTIONS] PATTERN [FILE1 FILE2 FILE3 ...]
Specify all files in target directories recursively to search.
$ grep -r [OPTIONS] PATTERN [DIRECTORY1 DIRECTORY2 DIRECTORY3 ...]
The PATTERN
is interpreted depending on the matcher selection option.
-E, –extended-regexp
-F, –fixed-strings
-G, –basic-regexp (default)
It’s this option that can make usage of grep frustrating if you are not aware of it.
* The PATTERN
by default is interpreted as a regex and not a fixed string.
* Though regex typically implies ERE, the default is BRE.
In addition, there are variant programs egrep
and fgrep
that are the same as grep -E
and grep -F
, respectively. These variants are deprecated, but are provided for backward compatibility.
-G
is very confusing. Always use -E
( egrep
) or -F
( fgrep
).Options
Here are some useful options, especially when searching through source code.
Matching control | |
-r, --recursive |
Read all files under each directory, recursively. |
-i, --ignore-case |
Ignore case distinctions in both the PATTERN and the input files. |
-w |
Match whole word. |
-v, --invert-match |
Invert the sense of matching, select non matching lines. |
Output control | |
-c, --count |
Print count of matching lines for each input file. |
-n, --line-number |
Prefix each line of output with the 1-based line number within its input file. |
-l, --files-with-matches |
Print only the name of each input file where pattern is matched. Helpful when there are many matches in each file making the output very long. |
-color, --color |
Color the output. Useful in distinguishing the matched strings, matching lines, file names, line numbers. |
File and directory control |
|
-I |
Ignore binary files. Useful in reducing output verbosity like Binary file ./.git/index matches . |
--exclude=GLOB |
Use GLOB wildcard matching to skip files when searching. |
--include=GLOB |
Use wildcard matching to search only files that match GLOB . |
--exclude-dir=GLOB |
Use GLOB wildcard matching to skip directories when searching recursively. |
find | grep
Though grep
provides file and directory control (--include
, -exclude
, --exclude-dir
) to filter input files, it’s a bit clunky and can be confusing and frustrating. So typically, the more powerful find
file(s) command is used and piped to the grep command.
In its simplest form.
$ find path/to/start/dir [FIND-OPTIONS] FILE-PATTERN | xargs grep [GREP-OPTIONS] STRING-PATTERN
Or if the filenames have space(s).
$ find path/to/start/dir [FIND-OPTIONS] FILE-PATTERN -print0 | xargs -0 grep [GREP-OPTIONS] STRING-PATTERN