sort ( 1 ) USER COMMANDSsort ( 1 )


NAME

sort - sort and/or merge files

SYNOPSIS

sort [ options ] [ file ... ]

DESCRIPTION

sort sorts lines of all the files together and writes the result on the standard output. The file name - means the standard input. If no files are named, the standard input is sorted.

The default sort key is an entire line. Default ordering is lexicographic by bytes in machine collating sequence. The ordering is affected globally by the following options, one or more of which may appear. See recsort(3) for details.

For backwards compatibility the -o option is allowed in any file operand position when neither the -c nor the -- options are specified.

OPTIONS

-k, --key=pos1[,pos2]|.reclen|.position.length]]
Restrict the sort key to a string beginning at pos1 and ending at pos2. pos1 and pos2 each have the form m.n, counting from 1, optionally followed by one or more of the flags CMbdfginprZ; m counts fields from the beginning of the line and n counts characters from the beginning of the field. If any flags are present they override all the global ordering options for this key. If .n is missing from pos1, it is taken to be 1; if missing from pos2, it is taken to be the end of the field. If pos2 is missing, it is taken to be end of line. The second form specifies a fixed record length reclen, and the last form specifies a fixed field at byte position position (counting from 1) of length bytes. The obsolescent reclen:fieldlen:offset (byte offset from 0) is also accepted.
-K, --oldkey=pos
Specified in pairs: -K pos1 -K pos2, where positions count from 0.
-R, --record|recfmt=format
Sets the record format to format; newlines will be treated as normal characters. The formats are:
d[terminator]
Variable length with record terminator character, \n by default.
[f]reclen
Fixed record length reclen.
v[op...]
Variable length. h4o0z2bi (4 byte IBM V format descriptor) if op are omitted. op may be a combination of:
hn
Header size is n bytes (default 4).
on
Size offset in header is n bytes (default 0).
zn
Size length is n bytes (default min(h-o,2)).
b
Size is big-endian (default).
l
Size is little-endian (default b).
i
Record length includes header (default).
n
Record length does not include header (default i).
%
If the record format is not otherwise specified, and the any input file name, from left to right, ends with %format or %format.* then the record format is set to format. In addition, the -o path, if specified and if it does not contain % and if it names a regular file, is renamed to contain the input %format.
-
The first block of the first input file is sampled to check for v variable length and f fixed length format records. Not all formats are detected. sort exits with an error diagnostic if the record format cannot be determined from the sample.
-b, --ignorespace
Ignore leading white space (spaces and tabs) in field comparisons.
-d, --dictionary
`Phone directory' order: only letters, digits and white space are significant in string comparisons.
-C, --codeset|convert=codeset|from:to
The field data codeset is codeset or the field data must be converted from the from codeset to the to codeset. The codesets are:
ascii
8 bit ascii
ebcdic
X/Open ebcdic
o|ebcdic-o
mvs OpenEdition ebcdic
h|ebcdic-h
ibm OS/400 AS/400 ebcdic
s|ebcdic-s
siemens posix-bc ebcdic
i|ebcdic-i
X/Open ibm ebcdic (not idempotent)
m|ebcdic-m
mvs ebcdic
u|ebcdic-u
microfocus cobol ebcdic
native
native code set
-f, --fold|ignorecase
Fold lower case letters onto upper case.
-i, --ignorecontrol
Ignore characters outside the ASCII range 040-0176 in string comparisons.
-J, --shuffle|jumble=seed
Do a random shuffle of the sort keys. seed specifies a pseudo random number generator seed. A seed of 0 generates a seed based on time and pid.
-n, --numeric
An initial numeric string, consisting of optional white space, optional sign, and a nonempty string of digits with optional decimal point, is sorted by value.
-g, --floating
Numeric, like -n, with e-style exponents allowed.
-p, --bcd|packed-decimal
Compare packed decimal (bcd) numbers with trailing sign.
-M, --months
Compare as month names. The first three characters after optional white space are folded to lower case and compared. Invalid fields compare low to jan.
-r, --reverse|invert
Reverse the sense of comparisons.
-t, --tabs=tab-char
`Tab character' separating fields is char.
-c, --check
Check that the single input file is sorted according to the ordering rules; give no output unless the file is out of sort.
-j, --processes|nproc|jobs=processes
Use up to jobs separate processes to sort the input. The current implementation still uses one process for the final merge phase; improvements are planned.
-m, --merge
Merge; the input files are already sorted.
-u, --unique
Unique. Keep only the first of two lines that compare equal on all keys. Implies -s.
-s, --stable
Stable sort. When all keys compare equal, preserve input order.
-S, --unstable
Unstable sort. When all keys compare equal, break the tie by using the entire record, ignoring all but the -r option. This is the default.
-o, --output=output
Place output in the designated file instead of on the standard output. This file may be the same as one of the inputs. The file - names the standard output. The option may appear among the file arguments, except after --.
-l, --library=library[,name=value...]
Load the external sort discipline library with optional comma separated name=value arguments. Libraries are loaded, in left to right order, after the sort method has been initialized.
-T, --tempdir=tempdir
Put temporary files in tempdir. The default value is /usr/tmp.
-L, --list
List the available sort methods. See the -x option.
-x, --method=method
Specify the sort method to apply:
rasp
Initial radix split into a forest of splay trees.
radix
Radix sort.
splay
Splay tree sort.
verify
Verify that the input is sorted.
copy
Copy (no sort).
The default value is rasp.
-v, --verbose
Trace the sort progress on the standard error.
-Z, --zd|zoned-decimal
Compare zoned decimal (ZD) numbers with embedded trailing sign.
-z, --size|zip=type[size]
Suggest using the specified number of bytes of internal store to tune performance. Type is a single character and may be one of:
a
Buffer alignment.
b
Input reserve buffer size.
c
Input chunk size; sort chunks of this size and disable merge.
i
Input buffer size.
m
Maximum number of intermediate merge files.
p
Input sort size; sort chunks of this size before merge.
o
Output buffer size.
r
Maximum record size.
I
Decompress the input if it is compressed.
O
gzip(1) compress the output.
-y, --size=size
Equivalent to -zisize.
-X, --test=test
Enables implementation defined test code. Some or all of these may be disabled.
dump
List detailed information on the option settings.
io
List io file paths.
keys
List the canonical key for each record.
read
Force input file read by disabling memory mapping.
show
Show setup information and exit before sorting.
test
Immediatly exit with status 0; used to verify this implementation
-D, --debug=level
Sets the debug trace level. Higher levels produce more output.

+pos1 -pos2 is the classical alternative to -k, with counting from 0 instead of 1, and pos2 designating next-after-last instead of last character of the key. A missing character count in pos2 means 0, which in turn excludes any -t tab character from the end of the key. Thus +1 -1.3 is the same as -k 2,2.3 and +1r -3 is the same as -k 2r,3.

Under option -tx fields are strings separated by x; otherwise fields are non-empty strings separated by white space. White space before a field is part of the field, except under option -b. A b flag may be attached independently to pos1 and pos2.

When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Except under option -s, lines with all keys equal are ordered with all bytes significant. -S turns off -s, the last occurrence, left-to-right, takes affect.

Sorting is done by a method determined by the -x option. -L lists the available methods. rasp (radix+splay-tree) is the default and current all-around best.

Single-letter options may be combined into a single string, such as -cnrt:. The option combination -di and the combination of -n with any of -diM are improper. Posix argument conventions are supported.

Options -b, -c, -d, -f, -i, -k, -m, -n, -o, -r, -t, and -u are in the Posix and/or X/Open standards.

DIAGNOSTICS

sort comments and exits with non-zero status for various trouble conditions and for disorder discovered under option -c.

SEE ALSO

comm(1), join(1), uniq(1), recsort(3)

CAVEATS

The never-documented default pos1=0 for cases such as sort -1 has been abolished. An input file overwritten by -o is not replaced until the entire output file is generated in the same directory as the input, at which point the input is renamed.

IMPLEMENTATION

version
sort (AT&T Research) 2007-09-05
author
Glenn Fowler <gsf@research.att.com>
author
Phong Vo <kpv@research.att.com>
author
Doug McIlroy <doug@research.bell-labs.com>
copyright
Copyright © 1996-2008 AT&T Intellectual Property
license
http://www.opensource.org/licenses/cpl1.0.txt