[<--] [Cover] [Table of Contents] [Concept Index] [Program Index] [-->] |
File management tools include those for splitting, comparing, and compressing files, making backup archives, and tracking file revisions. Other management tools exist for determining the contents of a file, and for changing its timestamp.
When we speak of a file's type, we are referring to the kind of data it contains, which may include text, executable commands, or some other data; this data is organized in a particular way in the file, and this organization is called its format. For example, an image file might contain data in the JPEG image format, or a text file might contain unformatted text in the English language or text formatted in the TeX markup language.
The file
tool analyzes files and indicates their type and -- if
known -- the format of the data they contain. Supply the name of a file
as an argument to file
and it outputs the name of the file,
followed by a description of its format and type.
$ file /usr/doc/HOWTO/README.gz [RET] /usr/doc/HOWTO/README.gz: gzip compressed data, deflated, original filename, last modified: Sun Apr 26 02:51:48 1998, os: Unix $
This command reports that the file `/usr/doc/HOWTO/README.gz'
contains data that has been compressed with the gzip
tool.
To determine the original format of the data in a compressed file, use the `-z' option.
$ file -z /usr/doc/HOWTO/README.gz [RET] /usr/doc/HOWTO/README.gz: English text (gzip compressed data, deflated, original filename, last modified: Sun Apr 26 02:51:48 1998, os: Unix) $
This command reports that the data in `/usr/doc/HOWTO/README.gz', a compressed file, is English text.
NOTE: Currently, file
differentiates among more than 100
different data formats, including several human languages, many sound
and graphics formats, and executable files for many different operating
systems.
Use touch
to change a file's timestamp without modifying its
contents. Give the name of the file to be changed as an argument. The
default action is to change the timestamp to the current time.
$ touch pizzicato [RET]
To specify a timestamp other than the current system time, use the `-d' option, followed by the date and time that should be used enclosed in quote characters. You can specify just the date, just the time, or both.
$ touch -d '17 May 1999 14:16' pizzicato [RET]
$ touch -d '14 May' pizzicato [RET]
$ touch -d '14:16' pizzicato [RET]
NOTE: When only the date is given, the time is set to `0:00'; when no year is given, the current year is used.
See Info file `fileutils.info', node `Date input formats', for more information on date input formats.
It's sometimes necessary to split one file into a number of smaller
ones. For example, suppose you have a very large sound file in the
near-CD-quality MPEG2, level 3 ("MP3") format. Your file,
`large.mp3', is 4,394,422 bytes in size, and you want to transfer
it from your desktop to your laptop, but your laptop and desktop are not
connected on a network -- the only way to transfer files between them is
by floppy disk. Because this file is much too large to fit on one
floppy, you use split
.
The split
tool copies a file, chopping up the copy into separate
files of a specified size. It takes as optional arguments the name of
the input file (using standard input if none is given) and the file name
prefix to use when writing the output files (using `x' if none is
given). The output files' names will consist of the file prefix followed
by a group of letters: `aa', `ab', `ac', and so on -- the
default output file names would be `xaa', `xab', and so on.
Specify the number of lines to put in each output file with the
`-l' option, or use the `-b' option to specify the number of
bytes to put in each output file. To specify the output files'
sizes in kilobytes or megabytes, use the `-b' option and append
`k' or `m', respectively, to the value you supply. If neither
`-l' nor `-b' is used, split
defaults to using 1,000
lines per output file.
$ split -b1m large.mp3 large.mp3. [RET]
This command creates five new files whose names begin with `large.mp3.'. The first four files are one megabyte in size, while the last file is 200,118 bytes -- the remaining portion of the original file. No alteration is made to `large.mp3'.
You could then copy these five files onto four floppies (the last file
fits on a floppy with one of the larger files), copy them all to your
laptop, and then reconstruct the original file with cat
(see Concatenating Text).
$ cat large.mp3.* > large.mp3 [RET] $ rm large.mp3.* [RET]
In this example, the rm
tool is used to delete all of the split
files after the original file has been reconstructed.
There are a number of tools for comparing the contents of files in different ways; these recipes show how to use some of them. These tools are especially useful for comparing passages of text in files, but that's not the only way you can use them.
Use cmp
to determine whether or not two text files differ. It
takes the names of two files as arguments, and if the files contain the
same data, cmp
outputs nothing. If, however, the files differ,
cmp
outputs the byte position and line number in the files where
the first difference occurs.
$ cmp master backup [RET]
Use diff
to compare two files and output a difference
report (sometimes called a "diff") containing the text that
differs between two files. The difference report is formatted so
that other tools (namely, patch
---see Patching a File with a Difference Report) can use it to make a file identical to the
one it was compared with.
To compare two files and output a difference report, give their names as
arguments to diff
.
$ diff manuscript.old manuscript.new [RET]
The difference report is output to standard output; to save it to a file, redirect the output to the file to save to:
$ diff manuscript.old manuscript.new > manuscript.diff [RET]
In the preceding example, the difference report is saved to a file called `manuscript.diff'.
The difference report is meant to be used with commands such as
patch
, in order to apply the differences to a file. See Info file `diff.info', node `Top', for more information on diff
and the format of
its output.
To better see the difference between two files, use sdiff
instead
of diff
; instead of giving a difference report, it outputs the
files in two columns, side by side, separated by spaces. Lines that
differ in the files are separated by `|'; lines that appear only in
the first file end with a `<', and lines that appear only in the
second file are preceded with a `>'.
$ sdiff laurel hardy | less [RET]
To output the difference between three separate files, use
diff3
.
$ diff3 larry curly moe > stooges [RET]
To apply the differences in a difference report to the original file
compared in the report, use patch
. It takes as arguments the
name of the file to be patched and the name of the difference report
file (or "patchfile"). It then applies the changes specified in the
patchfile to the original file. This is especially useful for
distributing different versions of a file -- small patchfiles may be sent
across networks easier than large source files.
$ patch manuscript.new manuscript.diff [RET]
File compression is useful for storing or transferring large files. When you compress a file, you shrink it and save disk space. File compression uses an algorithm to change the data in the file; to use the data in a compressed file, you must first uncompress it to restore the original data (and original file size).
The following recipes explain how to compress and uncompress files.
Use the gzip
("GNU zip") tool to compress files. It takes as an
argument the name of the file or files to be compressed; it writes a
compressed version of the specified files, appends a `.gz'
extension to their file names, and then deletes the original files.
$ gzip war-and-peace [RET]
This command compresses the file `war-and-peace', putting it in a
new file named `war-and-peace.gz'; gzip
then deletes the
original file, `war-and-peace'.
To access the contents of a compressed file, use gunzip
to
decompress (or "uncompress") it.
Like gzip
, gunzip
takes as an argument the name of the
file or files to work on. It expands the specified files, writing the
output to new files without the `.gz' extensions, and then deletes
the compressed files.
$ gunzip war-and-peace.gz [RET]
This command expands the file `war-and-peace.gz' and puts it in a
new file called `war-and-peace'; gunzip
then deletes the
compressed file, `war-and-peace.gz'.
NOTE: You can view a compressed text file without uncompressing
it by using zless
. This is useful when you want to view a
compressed file but do not want to write changes to it. (For more
information about zless
, see Perusing Text).
An archive is a single file that contains a collection of other files, and often directories. Archives are usually used to transfer or make a backup copy of a collection of files and directories -- this way, you can work with only one file instead of many. This single file can be easily compressed as explained in the previous section, and the files in the archive retain the structure and permissions of the original files.
Use the tar
tool to create, list, and extract files from
archives. Archives made with tar
are sometimes called "tar
files," "tar archives," or -- because all the archived files are
rolled into one---"tarballs."
The following recipes show how to use tar
to create an archive,
list the contents of an archive, and extract the files from an
archive. Two common options used with all three of these operations are
`-f' and `-v': to specify the name of the archive file, use
`-f' followed by the file name; use the `-v' ("verbose")
option to have tar
output the names of files as they are
processed. While the `-v' option is not necessary, it lets you
observe the progress of your tar
operation.
NOTE: The name of this tool comes from "tape archive," because it was originally made to write the archives directly to a magnetic tape device. It is still used for this purpose, but today, archives are almost always saved to a file on disk.
See Info file `tar.info', node `Top', for more information about
managing archives with tar
.
To create an archive with tar
, use the `-c' ("create")
option, and specify the name of the archive file to create with the
`-f' option. It's common practice to use a name with a `.tar'
extension, such as `my-backup.tar'.
Give as arguments the names of the files to be archived; to create an archive of a directory and all of the files and subdirectories it contains, give the directory's name as an argument.
$ tar -cvf project.tar project [RET]
This command creates an archive file called `project.tar' containing the `project' directory and all of its contents. The original `project' directory remains unchanged.
Use the `-z' option to compress the archive as it is being written.
This yields the same output as creating an uncompressed archive and then
using gzip
to compress it, but it eliminates the extra step.
$ tar -zcvf project.tar.gz project [RET]
This command creates a compressed archive file, `project.tar.gz', containing the `project' directory and all of its contents. The original `project' directory remains unchanged.
NOTE: When you use the `-z' option, you should specify the archive name with a `.tar.gz' extension and not a `.tar' extension, so the file name shows that the archive is compressed. This is not a requirement, but it serves as a reminder and is the standard practice.
To list the contents of a tar
archive without extracting them,
use tar
with the `-t' option.
$ tar -tvf project.tar [RET]
This command lists the contents of the `project.tar' archive. Using
the `-v' option along with the `-t' option causes tar
to output the permissions and modification time of each file, along with
its file name -- the same format used by the ls
command with the
`-l' option (see Listing File Attributes).
Include the `-z' option to list the contents of a compressed archive.
$ tar -ztvf project.tar [RET]
To extract (or unpack) the contents of a tar
archive, use
tar
with the `-x' ("extract") option.
$ tar -xvf project.tar [RET]
This command extracts the contents of the `project.tar' archive into the current directory.
If an archive is compressed, which usually means it will have a `.tar.gz' or `.tgz' extension, include the `-z' option.
$ tar -zxvf project.tar.gz [RET]
NOTE: If there are files or subdirectories in the current directory with the same name as any of those in the archive, those files will be overwritten when the archive is extracted. If you don't know what files are included in an archive, consider listing the contents of the archive first (see Listing the Contents of an Archive).
Another reason to list the contents of an archive before extracting them is to determine whether the files in the archive are contained in a directory. If not, and the current directory contains many unrelated files, you might confuse them with the files extracted from the archive.
To extract the files into a directory of their own, make a new directory, move the archive to that directory, and change to that directory, where you can then extract the files from the archive.
The Revision Control System (RCS) is a set of tools for managing multiple revisions of a single file.
To store a revision of a file so that RCS can keep track of it, you check in the file with RCS. This deposits the revision of the file in an RCS repository---a file that RCS uses to store all changes to that file. RCS makes a repository file with the same file name as the file you are checking in, but with a `,v' extension appended to the name. For example, checking in the file `foo.text' with RCS creates a repository file called `foo.text,v'.
Each time you want RCS to remember a revision of a file, you check in the file, and RCS writes to that file's RCS repository the differences between the file and the last revision on record in the repository.
To access a revision of a file, you check out the revision from RCS. The revision is obtained from the file's repository and is written to the current directory.
Although RCS is most often used with text files, you can also use it to keep track of revisions made to other kinds of files, such as image files and sound files.
Another revision control system, Concurrent Versions System (CVS), is used for tracking collections of multiple files whose revisions are made concurrently by multiple authors. While much less simple than RCS, it is very popular for managing free software projects on the Internet. See Info file `cvs.info', node `Top', for information on using CVS.
When you have a version of a file that you want to keep track of, use
ci
to check in that file with RCS.
Type ci followed by the name of a file to deposit that file into
the RCS repository. If the file has never before been checked in,
ci
prompts for a description to use for that file; each
subsequent time the file is checked in, ci
prompts for text to
include in the file's revision log (see Viewing a File's Revision Log). Log messages may contain more than one line of
text; type a period (`.') on a line by itself to end the entry.
For example, suppose the file `novel' contains this text:
This is a tale about many things, including a long voyage across America.
$ ci novel [RET] novel,v <-- novel enter description, terminated with single '.' or end of file: NOTE: This is NOT the log message! >> The Great American Novel. [RET] >> . [RET] $
This command deposits the file in an RCS repository file called `novel,v', and the original file, `novel', is removed. To edit or access the file again, you must check out a revision of the file from RCS with which to work (see Checking Out a File Revision).
Whenever you have a new revision that you want to save, use ci
as
before to check in the file. This begins the process all over again.
For example, suppose you have checked out the first revision of `novel' and changed the file so that it now looks like this:
This is a very long tale about a great many things, including my long voyage across America, and back home again.
$ ci novel [RET] novel,v <-- novel new revision: 1.2; previous revision: 1.1 enter log message, terminated with single '.' or end of file: >> Second draft. [RET] >> . [RET] $
If you create a subdirectory called `RCS' (in all uppercase letters) in the current directory, RCS recognizes this specially named directory instead of the current directory as the place to store the `,v' revision files. This helps reduce clutter in the directory you are working in.
If the file you are depositing is a text file, you can have RCS insert a
line of text, every time the file is checked out, containing the name of
the file, the revision number, the date and time in the UTC (Coordinated
Universal Time) time zone, and the user ID of the author. To do this,
put the text `$'
Id$ at a place in the file where you want
this text to be written. You only need to do this once; each time you
check the file out, RCS replaces this string in the file with the header
text.
For example, this chapter was written to a file,
`managing-files.texinfo', whose revisions were tracked with RCS;
the `$'
Id$ string in this file currently reads:
$Id: managing-files.texinfo,v 1.32 2001/05/16 16:57:58 m Exp m $
Use co
to check out a revision of a file from an RCS repository.
To check out the latest revision of a file that you intend to edit (and
to check in later as a new revision), use the -l
(for "lock")
option. Locking a revision in this fashion prevents overlapping changes
being made to the file should another revision be accidentally checked
out before this revision is checked in.
$ co -l novel [RET]
This command checks out the latest revision of file `novel' from
the `novel,v' repository, writing it to a file called `novel'
in the current directory. (If a file with that name already exists in
the current directory, co
asks whether or not to overwrite the
file.) You can make changes to this file and then check it in as a new
revision (see Checking In a File Revision).
You can also check out a version of a file as read only, where changes cannot be written to it. Do this to check out a version to view only and not to edit.
To check out the current version of a file for examination, type co followed by the name of the file.
$ co novel [RET]
This command checks out the latest revision of the file `novel' from the RCS repository `novel,v' (either from the current directory or in a subdirectory named `RCS').
To check out a version other than the most recent version, specify the version number to check out with the `-r' option. Again, use the `-l' option to allow the revision to be edited.
$ co -l -r1.14 novel [RET]
NOTE: Before checking out an old revision of a file, remember to check in the latest changes first, or they may be lost.
Use rlog
to view the RCS revision log for a file -- type
rlog followed by the name of a file to list all of the revisions
of that file.
$ rlog novel [RET] RCS file: novel,v Working file: novel head: 1.2 branch: locks: strict access list: symbolic names: keyword substitution: kv total revisions: 2; selected revisions: 2 description: The Great American Novel. ---------------------------- revision 1.2 date: 1991/06/20 15:31:44; author: leo; state: Exp; lines: +2 -2 Second draft. ---------------------------- revision 1.1 date: 1991/06/21 19:03:58; author: leo; state: Exp; Initial revision ==================================================================== $
This command outputs the revision log for the file `novel'; it lists information about the RCS repository, including its name (`novel,v') and the name of the actual file (`novel'). It also shows that there are two revisions -- the first, which was checked in to RCS on 20 June 1991, and the second, which was checked in to RCS the next day, on 21 June 1991.
[<--] [Cover] [Table of Contents] [Concept Index] [Program Index] [-->]