Uniq

From LQWiki
Jump to navigation Jump to search

Uniq is a command for removing successive duplicate lines from files.

Usage: uniq [options] [input_file] [output_file]

Strangely, uniq takes input and output arguments without requiring redirection. For instance,

 uniq - foo

will read stdin and store output in file "foo" while

 uniq foo -

will read from file "foo" and display on stdout.

It is usual to pipe data from sort to uniq, since uniq only checks sequential lines.

foo
foo
baz
bar
baz
Baz

will be processed as

foo
baz
bar
baz
Baz

with an unsorted and case sensitive input.

Among the many options (see the man pages for full details) -c is useful for counting how many duplicated of each line there are. -i renders the matching case-insensitive or ignores case.

~
1052>> sort -f foobarbaz | uniq -ci | sort -r                                  
      3 Baz
      2 foo
      1 bar

(Note that to get case-insensitive and sorted output, sort must be rendered insensitive as well, or else tr or sed could just avoid the issue by transforming the lines, because sort would otherwise alphabetize 'Baz' before 'bar'. Indeed, if counting is not an issue and other options are not wanted, sort -fu foobarbaz would eliminate the need for piping to uniq at all.)

The -s, -f and -w options give more flexibility to operating on columnar lines or ignoring beginnings or endings of lines.

The -d, -D, and -u options control output.

Related Commands

These relate to managing individual lines or records.

  • cat -- concatenate files and print on standard output
  • shuf -- generate random permutations
  • sort -- sort lines of text files
  • tac -- concatenate and print files in reverse order