[<--] [Cover] [Table of Contents] [Concept Index] [Program Index] [-->] |
Methods and tools for changing the arrangement or presentation of text are often useful for preparing text for printing. This chapter discusses ways of changing the spacing of text and setting up pages, of underlining and sorting and reversing text, and of numbering lines of text.
These recipes are for changing the spacing of text -- the whitespace that exists between words, lines, and paragraphs.
The filters described in this section send output to standard output by default; to save their output to a file, use shell redirection (see Redirecting Output to a File).
To eliminate extra whitespaces within
lines of text, use the
fmt
filter; to eliminate extra whitespace between lines of
text, use cat
.
Use fmt
with the `-u' option to output text with "uniform
spacing," where the space between words is reduced to one space
character and the space between sentences is reduced to two space
characters.
$ fmt -u term-paper [RET]
Use cat
with the `-s' option to "squeeze" multiple
adjacent blank lines into one.
$ cat -s term-paper [RET]
You can combine both of these commands to output text with multiple
adjacent lines removed and give it a unified spacing between
words. The following example shows how the output of the combined
commands is sent to less
so that it can be perused on the screen.
$ cat -s term-paper | fmt -u | less [RET]
Notice that in this example, both fmt
and less
worked on
their standard input instead of on a file -- the standard output of
cat
(the contents of `term-paper' with extra blank lines
squeezed out) was passed to the standard input of fmt
, and its
standard output (the space-squeezed `term-paper', now with uniform
spacing) was sent to the standard input of less
, which displayed
it on the screen.
There are many methods for single-spacing text. To remove all empty
lines from text output, use grep
with the regular expression
`.', which matches any character, and therefore matches any line
that isn't empty (see Regular Expressions -- Matching Text Patterns). You can then redirect this output to a file, or pipe it to
other commands; the original file is not altered.
$ grep . term-paper [RET]
This command outputs all lines that are not empty -- so lines containing only non-printing characters, such as spaces and tabs, will still be output.
To remove from the output all empty lines, and all lines that consist of only space characters, use `[^ ].' as the regexp to search for. But this regexp will still output lines that contain only tab characters; to remove from the output all empty lines and lines that contain only a combination of tab or space characters, use `[^[:space:]].' as the regexp to search for. It uses the special predefined `[:space:]' regexp class, which matches any kind of space character at all, including tabs.
$ grep '[^ ].' term-paper [RET]To output only the lines from the file `term-paper' that contain more than just space or tab characters, type:
$ grep '[^[:space:]].' term-paper [RET]
If a file is already double-spaced, where all even lines are blank, you
can remove those lines from the output by using sed
with the
`n;d' expression.
$ sed 'n;d' term-paper [RET]
To double-space text, where one blank line is inserted between each line
in the original text, use the pr
tool with the `-d'
option. By default, pr
paginates text and puts a header at the
top of each page with the current date, time, and page number; give the
`-t' option to omit this header.
$ pr -d -t term-paper > term-paper.print [RET]
To send the output directly to the printer for printing, you would pipe
the output to lpr
:
$ pr -d -t term-paper | lpr [RET]
NOTE: The pr
("print") tool is a text pre-formatter,
often used to paginate and otherwise prepare text files for printing;
there is more discussion on the use of this tool in Paginating Text.
To triple-space text, where two blank lines are inserted between each
line of the original text, use sed
with the `'G;G''
expression.
$ sed 'G;G' term-paper > term-paper.print [RET]
The `G' expression appends one blank line to each line of
sed
's output; using `;' you can specify more than one blank
line to append (but you must quote this command, because the semicolon
(`;') has meaning to the shell -- see Passing Special Characters to Commands). You can use multiple `G'
characters to output text with more than double or triple spaces.
$ sed 'G;G;G' term-paper > term-paper.print [RET]
The usage of sed
is described in Editing Streams of Text.
Sometimes a file will not have line breaks at the end of each line (this
commonly happens during file conversions between operating systems). To
add line breaks to a file that does not have them, use the text
formatter fmt
. It outputs text with lines arranged up to a
specified width; if no length is specified, it formats text up to a
width of 75 characters per line.
$ fmt term-paper [RET]
Use the `-w' option to specify the maximum line width.
$ fmt -w 80 term-paper [RET]
Giving text an extra left margin is especially good when you want to print a copy and punch holes in it for use with a three-ring binder.
To output a text file with a larger left margin, use pr
with the
file name as an argument; give the `-t' option (to disable headers
and footers), and, as an argument to the `-o' option, give the
number of spaces to offset the text. Add the number of spaces to the
page width (whose default is 72) and specify this new width as an
argument to the `-w' option.
$ pr -t -o 5 -w 77 owners-manual > owners-manual.pr [RET]
This command is almost always used for printing, so the output is
usually just piped to lpr
instead of saved to a file. Many text
documents have a width of 80 and not 72 columns; if you are printing
such a document and need to keep the 80 columns across the page, specify
a new width of 85. If your printer can only print 80 columns of text,
specify a width of 80; the text will be reformatted to 75 columns after
the 5-column margin.
$ pr -t -o 5 -w 85 owners-manual | lpr [RET]
$ pr -t -o 5 -w 80 owners-manual | lpr [RET]
Use the expand
and unexpand
tools to swap tab characters
for space characters, and to swap space characters with tabs,
respectively.
Both tools take a file name as an argument and write changes to the standard output; if no files are specified, they work on the standard input.
To convert tab characters to spaces, use expand
. To convert only
the initial or leading tabs on each line, give the `-i'
option; the default action is to convert all tabs.
$ expand list > list2 [RET]
$ expand -i list [RET]
To convert multiple space characters to tabs, use unexpand
. By
default, it only converts leading spaces into tabs, counting eight space
characters for each tab. Use the `-a' option to specify that
all instances of eight space characters be converted to tabs.
$ unexpand list2 > list [RET]
$ unexpand -a list2 [RET]
To specify the number of spaces to convert to a tab, give that number as an argument to the `-t' option.
$ unexpand -t 1 list2 [RET]
The formfeed character, ASCII C-l or octal code 014, is the delimiter used to paginate text. When you send text with a formfeed character to the printer, the current page being printed is ejected and a new page begins -- thus, you can paginate a text file by inserting formfeed characters at a place where you want a page break to occur.
To insert formfeed characters in a text file, use the pr
filter.
Give the `-f' option to omit the footer and separate pages of output with the formfeed character, and use `-h ""' to output a blank header (otherwise, the current date and time, file name, and current page number are output at the top of each page).
$ pr -f -h "" listings > listings.page [RET]
By default, pr
outputs pages of 66 lines each. You can specify
the page length as an argument to the `-l' option.
$ pr -f -h "" -l 43 listings > listings.page [RET]
NOTE: If a page has more lines than a printer can fit on a physical sheet of paper, it will automatically break the text at that line as well as at the places in the text where there are formfeed characters.
You can paginate text in Emacs by manually inserting formfeed characters where you want them -- see Inserting Special Characters in Emacs.
The pr
tool is a general-purpose page formatter and
print-preparation utility. By default, pr
outputs text in pages
of 66 lines each, with headers at the top of each page containing the
date and time, file name, and page number, and footers containing five
blank lines.
pr
preparation,
type:
$ pr duchess | lpr [RET]
You can also use pr
to put text in columns -- give the number of
columns to output as an argument. Use the `-t' option to omit the
printing of the default headers and footers.
$ pr -4 -t news.update | lpr [RET]
The following table describes some of pr
's options; see the
pr
info
for a complete description of its capabilities
(see Using the GNU Info System).
OPTION | DESCRIPTION |
+first:last |
Specify the first and last page to process; the last page can be omitted, so +7 begins processing with the seventh page and continues until the end of the file is reached. |
-column |
Specify the number of columns to output text in, making all columns fit the page width. |
-a |
Print columns across instead of down. |
-c |
Output control characters in hat notation and print all other unprintable characters in "octal backslash" notation. |
-d |
Specify double-spaced output. |
-f |
Separate pages of output with a formfeed character instead of a footer of blank lines (63 lines of text per 66-line page instead of 53). |
-h header |
Specify the header to use instead of the default; specify -h "" for a blank header. |
-l length |
Specify the page length to be length lines (default 66). If page length is less than 11, headers and footers are omitted and existing form feeds are ignored. |
-m |
Use when specifying multiple files; this option merges and outputs them in parallel, one per column. |
-o spaces |
Set the number of spaces to use in the left margin (default 0). |
-t |
Omit the header and footer on each page, but retain existing formfeeds. |
-T |
Omit the header and footer on each page, as well as existing formfeeds. |
-v |
Output non-printing characters in "octal backslash" notation. |
-w width |
Specify the page width to use, in characters (default 72). |
pr
to change the spacing
of text (see Spacing Text).
In the days of typewriters, text that was meant to be set in an italicized font was denoted by underlining the text with underscore characters; now, it's common practice to denote an italicized word in plain text by typing an underscore character, `_', just before and after a word in a text file, like `_this_'.
Some text markup languages use different methods for denoting italics; for example, in TeX or LaTeX files, italicized text is often denoted with brackets and the `\it' command, like `{\it this}'. (LaTeX files use the same format, but `\emph' is often used in place of `\it'.)
You can convert one form to the other by using the Emacs
replace-regular-expression
function and specifying the text to be
replaced as a regexp (see Regular Expressions -- Matching Text Patterns).
M-x replace-regular-expression [RET] _\([^_]+\)_ [RET] \{\\it \1} [RET]
M-x replace-regular-expression [RET] \{\\it \{\([^\}]+\)\} [RET] _\1_ [RET]
Both examples above used the special regexp symbol `\1', which matches the same text matched by the first `\( ... \)' construct in the previous regexp. See Info file `emacs-e20.info', node `Regexps' for more information on regexp syntax in Emacs.
To put a literal underline under text, you need to use a text editor to
insert a C-h
character followed by an underscore (`_')
immediately after each character you want to underline; you can insert
the C-h
in Emacs with the C-q function (see Inserting Special Characters in Emacs).
When a text file contains these literal underlines, use the ul
tool to output the file so that it is viewable by the terminal you are
using; this is also useful for printing (pipe the output of ul
to
lpr
).
$ ul term-paper [RET]
To output such text without the backspace character, C-h, in the
output, use col
with the `-u' option.
$ col -u term-paper [RET]
You can sort a list in a text file with sort
. By default, it
outputs text in ascending alphabetical order; use the `-r' option
to reverse the sort and output text in descending alphabetical order.
For example, suppose a file `provinces' contains the following:
Shantung Honan Szechwan Hunan Kiangsu Kwangtung Fukien
$ sort provinces [RET] Fukien Honan Hunan Kiangsu Kwangtung Shantung Szechwan $
$ sort -r provinces [RET] Szechwan Shantung Kwangtung Kiangsu Hunan Honan Fukien $
The following table describes some of sort
's options.
OPTION | DESCRIPTION |
-b |
Ignore leading blanks on each line when sorting. |
-d |
Sort in "phone directory" order, with only letters, digits, and blanks being sorted. |
-f |
When sorting, fold lowercase letters into their uppercase equivalent, so that differences in case are ignored. |
-i |
Ignore all spaces and all non-typewriter characters when sorting. |
-n |
Sort numerically instead of by character value. |
-o file |
Write output to file instead of standard output. |
There are several ways to number lines of text.
One way to do it is to use the nl
("number lines") tool. Its
default action is to write its input (either the file names given as an
argument, or the standard input) to the standard output, with an
indentation and all non-empty lines preceded with line numbers.
$ nl report | less [RET]
You can set the numbering style with the `-b' option followed by an argument. The following table lists the possible arguments and describes the numbering style they select.
ARGUMENT | NUMBERING STYLE |
a |
Number all lines. |
t |
Number only non-blank lines. This is the default. |
n |
Do not number lines. |
pregexp |
Only number lines that contain the regular expression regexp (see Regular Expressions -- Matching Text Patterns). |
$ nl -v 2 -i 4 report [RET]
$ nl -i 5 -v 0 -b p'^\.' cantos > cantos.numbered [RET]
cat
with one of the
following two options: the `-n' option numbers each line of its
input text, while the `-b' option only numbers non-blank lines.
$ cat -n report | less [RET]
$ cat -b report | less [RET]
cat
is piped to
less
for perusal; the original file is not altered.
To take an input file, number its lines, and then write the
line-numbered version to a new file, send the standard output of the
cat
command to the new file to write.
$ cat -n report > report.lines [RET]
The tac
command is similar to cat
, but it outputs text in
reverse order. There is another difference---tac
works on
records, sections of text with separator strings, instead of lines
of text. Its default separator string is the linebreak character, so by
default tac
outputs files in line-for-line reverse order.
$ tac prizes [RET]
Specify a different separator with the `-s' option. This is often useful when specifying non-printing characters such as formfeeds. To specify such a character, use the ANSI-C method of quoting (see Passing Special Characters to Commands).
$ tac -s $'\f' prizes [RET]
The preceding example uses the formfeed, or page break, character as the delimiter, and so it outputs the file `prizes' in page-for-page reverse order, with the last page output first.
Use the `-r' option to use a regular expression for the separator string (see Regular Expressions -- Matching Text Patterns). You can build regular expressions to output text in word-for-word and character-for-character reverse order:
$ tac -r -s '[^a-zA-z0-9\-]' prizes [RET]
$ tac -r -s '.\| [RET] ' prizes [RET]
To reverse the characters on each line, use rev
.
$ rev prizes [RET]
[<--] [Cover] [Table of Contents] [Concept Index] [Program Index] [-->]