Directory Operations

Moving Around the Directory Tree

Your program runs with a “working directory,” which is the starting point for relative pathnames. That is, if you refer to the file fred, that means fred in the current working directory.

The chdir operator changes the working directory. It’s just like the Unix shell’s cd command:

chdir "/etc" or die "cannot chdir to /etc: $!";

Because this is a system request, the value of $! will be set if an error occurs. You should normally check $! when a false value is returned from chdir, since that indicates that something has not gone as requested.

The working directory is inherited by all processes that Perl starts. However, the change in working directory cannot affect the process that invoked Perl, such as the shell. So you can’t make a Perl program to replace your shell’s cd command.

If you omit the parameter, Perl determines your home directory as best as it can and attempts to set the working directory to your home directory, similar to using the cd command at the shell without a parameter. This is one of the few places where omitting the parameter doesn’t use $_.

Globbing

Normally, the shell expands any filename patterns on each command line into the matching filenames. This is called globbing. For example, if you give a filename pattern of *.pm to the echo command, the shell expands this list to a list of names that match:

$ echo *.pm
barney.pm dino.pm fred.pm wilma.pm
$

The echo command doesn’t have to know anything about expanding *.pm because the shell has already expanded it. But sometimes we end up with a pattern like *.pm inside our Perl program. Can we expand this pattern into the matching filenames without working very hard? Sure — just use the glob operator:

my @all_files = glob "*"; 
my @pm_files = glob "*.pm";

Here, @all_files gets all the files in the current directory, alphabetically sorted, and not including the files beginning with a period, just like the shell. And @pm_files gets the same list that we got before by using *.pm on the command line.
In fact, anything you can say on the command line, you can also put as the (single) argument to glob, including multiple patterns separated by spaces:

my @all_files_including_dot = glob ".* *";

Here, we’ve included an additional “dot star” parameter to get the filenames that begin with a dot as well as the ones that don’t. Please note that the space between these two items inside the quoted string is significant, as it separates two different items you want to glob. The reason this works exactly as the shell works is that prior to Perl version 5.6, the glob operator simply called /bin/csh behind the scenes to perform the expansion. Because of this, globs were time-consuming and could break in large directories, or in some other cases. However, if you’re using a modern version of Perl, you should no longer be concerned about such things.

An Alternate Syntax for Globbing

Although we use the term globbing freely, and we talk about the glob operator, you might not see the word glob in very many of the programs that use globbing. Why not? Well, most legacy code was written before the glob operator was given a name. Instead, it was called up by the angle-bracket syntax, similar to reading from a file handle:

my @all_files = <*>; ## exactly the same as my @all_files = glob "*";

The value between the angle brackets is interpolated similarly to a double-quoted string, which means that Perl variables are expanded to their current Perl values before being globbed:

my $dir = "/etc";
my @dir_files = <$dir/* $dir/.*>;

Here, we’ve fetched all the non-dot and dot files from the designated directory because $dir has been expanded to its current value.

So, if using angle brackets means both file handle reading and globbing, how does Perl decide which of the two operators to use? Well, a file handle has to be a Perl identifier. So, if the item between the angle brackets is strictly a Perl identifier, it’s a file handle read; otherwise, it’s a globbing operation. For example:

my @files = <FRED/*>; ## a glob
my @lines = <FRED>; ## a file handle read 
my $name = "FRED";
my @files = <$name/*>; ## a glob

The one exception is if the contents are a simple scalar variable (not an element of a hash or array); then it’s an indirect file handle read, where the variable contents give the name of the file handle you want to read:

my $name = "FRED";
my @lines = <$name>; ## an indirect file handle read of FRED handle

If you want, you can get the operation of an indirect file handle read using the read line operator, which also makes it clearer:

my $name = "FRED";
my @lines = readline FRED; ## read from FRED 
my @lines = readline $name; ## read from FRED

But the readline operator is rarely used, as indirect file handle reads are uncommon and are generally performed against a simple scalar variable anyway.

Directory Handles

Another way to get a list of names from a given directory is with a directory handle. A directory handle looks and acts like a file handle. You open it (with opendir instead of open), you read from it (with readdir instead of readline), and you close it (with closedir instead of close). But instead of reading the contents of a file, you’re reading the names of files (and other things) in a directory. For example:

my $dir_to_process = "/etc";
opendir DH, $dir_to_process or die "Cannot open $dir_to_process: $!"; 
foreach $file (readdir DH) {
    print "one file in $dir_to_process is $file\n"; 
}
closedir DH;

Like file handles, directory handles are automatically closed at the end of the program or if the directory handle is reopened onto another directory.

Unlike globbing, which in older versions of Perl fired off a separate process, a directory handle never fires off another process. So it makes them more efficient for applications that demand every ounce of power from the machine. However, it’s also a lower-level operation, meaning that we have to do more of the work ourselves.

For example, the names are returned in no particular order. And the list includes all files, not just those matching a particular pattern (like *.pm from our globbing examples). It is also includes the dot files, and particularly the dot and dot-dot entries. So, if we wanted only the pm-ending files, we could use a skip-over function inside the loop:

while ($name = readdir DIR) { 
    next unless $name =~ /\.pm$/;
    # more processing
}

Note here that the syntax is that of a regular expression, not a glob. And if we wanted all the nondot files, we could say that:

next if $name =~ /^\./;

Or if we wanted everything but the common dot (current directory) and dot-dot (parent directory) entries, we could explicitly say that:

next if $name eq "." or $name eq "..";

Now we’ll look at the part that gets most people mixed up, so pay close attention. The filenames returned by the readdir operator have no pathname component. It’s just the name within the directory. So, we’re not looking at /etc/passwd, we’re just looking at passwd. (And because this is another difference from the globbing operation, it’s easy to see how people get confused.)

So you’ll need to patch up the name to get the full name:

opendir SOMEDIR, $dirname or die "Cannot open $dirname: $!"; 
while (my $name = readdir SOMEDIR) {
    next if $name =~ /^\./; # skip over dot files
    $name = "$dirname/$name"; # patch up the path
    next unless -f $name and -r $name; # only readable files 
    #...
}

Without the patch, the file tests would have been checking files in the current directory, rather than in the directory named in $dirname. This is the single most common mistake when using directory handles.

Recursive Directory Listing

Perl comes with a nice library called File::Find, which you can use for nifty recursive directory processing.

Removing Files

At the Unix shell level, we’d type an rm command to remove a file or files:

$ rm slate bedrock lava

In Perl, we use the unlink operator:

unlink "slate", "bedrock", "lava";

This sends the three named files away to bit heaven, never to be seen again.

Now, since unlink takes a list, and the glob function returns a list, we can combine the two to delete many files at once:

unlink glob "*.o";

This is similar to rm *.o at the shell, except that we didn’t have to fire off a separate
rm process. So we can make those important files go away that much faster!

The return value from unlink tells us how many files have been successfully deleted.
So, going back to the first example, we can check its success:

my $successful = unlink "slate", "bedrock", "lava"; 
print "I deleted $successful file(s) just now\n";

Sure, if this number is 3, we know it removed all of the files, and if it’s 0, then we removed none of them. But what if it’s 1 or 2? Well, there’s no clue as to which ones were removed. If you need to know, do them one at a time in a loop:

foreach my $file (qw(slate bedrock lava)) { 
    unlink $file or warn "failed on $file: $!\n";
}

Here, each file being deleted one at a time means the return value will be 0 (failed) or 1 (succeeded), which happens to look like a nice Boolean value, controlling the execution of warn. Using or warn is similar to or die, except that it’s not fatal, of course (as we said back in Chapter 5). In this case, we put the newline on the end of the message to warn because it’s not a bug in our program that causes the message.

Now, here’s a little-known Unix fact. It turns out that you can have a file that you can’t read, you can’t write, you can’t execute, maybe you don’t even own the file—that is, it’s somebody else’s file altogether—but you can still delete it. That’s because the permission to unlink a file doesn’t depend upon the permission bits on the file itself; it’s the permission bits on the directory that contains the file that matters.

Renaming Files

Giving an existing file a new name is simple with the rename function:

rename "old", "new";

This is similar to the Unix mv command, taking a file named old and giving it the name new in the same directory. You can even move things around:

rename "over_there/some/place/some_file", "some_file";

This moves a file called some_file from another directory into the current directory, provided the user running the program has the appropriate permissions. Like most functions that request something of the operating system, rename returns false if it fails, and sets $! with the operating system error, so you can (and often should) use or die (or or warn) to report this to the user.

below is an example showing how to rename everything that ends with .old to the same name with .new:

foreach my $file (glob "*.old") { 
    my $newfile = $file;
    $newfile =~ s/\.old$/.new/;
    if (-e $newfile) {
        warn "can't rename $file to $newfile: $newfile exists\n";
    } elsif (rename $file, $newfile) {
        ## success, do nothing
    } else {
        warn "rename $file to $newfile failed: $!\n";
    }
} 

Those first two lines inside the loop can be combined (and often are) to simply:

(my $newfile = $file) =~ s/\.old$/.new/;

Also, some programmers seeing this substitution for the first time wonder why the backslash is needed on the left, but not on the right. The two sides aren’t symmetrical: the left part of a substitution is a regular expression, and the right part is a double-quoted string.

Links and Files

To create a (hard) link to a file named "chicken", just use the link function:

link "chicken", "egg"
    or warn "Can't link chicken to egg: $!";

This is similar to typing ln chicken egg at the Unix shell prompt. If link succeeds, it returns true. If it fails, it returns false and sets $!.

There’s a rule about the links in directory listings: the inode numbers in a given directory listing all refer to inodes on that same mounted volume. This rule ensures that if the physical medium (the diskette, perhaps) is moved to another machine, all of the directories stick together with their files. That’s why you can use rename to move a file from one directory to another, but only if both directories are on the same filesystem (mounted volume). If they were on different disks, the system would have to relocate the inode’s data, which is too complex an operation for a simple system call.

And yet another restriction on links is that they can’t make new names for directories. That’s because the directories are arranged in a hierarchy. If you were able to change that, utility programs like find and pwd could easily become lost trying to find their way around the filesystem.

So, links can’t be added to directories, and they can’t cross from one mounted volume to another. Fortunately, there’s a way to get around these restrictions on links, by using a new and different kind of link: a symbolic link. A symbolic link (also called a soft link to distinguish it from the true or hard links that we’ve been talking about up to now) is a special entry in a directory that tells the system to look elsewhere. Let’s say that Barney creates a symbolic link with Perl’s symlink function, like this:

symlink "dodgson", "carroll"
    or warn "can't symlink dodgson to carroll: $!";

This is similar to what would happen if Barney used the command ln -s dodgson carroll from the shell.

A symbolic link can freely cross mounted filesystems or provide a new name for a directory, unlike a hard link. In fact, a symbolic link could point to any filename, one in this directory or in another one—or even to a file that doesn’t exist! But that also means that a soft link can’t keep data from being lost as a hard link can, since the symlink doesn’t contribute to the link count. If Barney were to delete dodgson, the system would no longer be able to follow the soft link. Even though there would still be an entry called carroll, trying to read from it would give an error like file not found. The file test -l 'carroll' would report true, but -e 'carroll' would be false: it’s a symlink, but it doesn’t exist.

Since a soft link could point to a file that doesn’t yet exist, it could be used when creating a file as well. Barney has most of his files in his home directory, /home/barney, but he also needs frequent access to a directory with a long name that is difficult to type: /usr/ local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin. So he sets up a symlink named /home/barney/my_stuff, which points to that long name, and now it’s easy for him to get to it. If he creates a file (from his home directory) called my_stuff/bowling, that file’s real name is /usr/local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin/bowling.

It’s normal for either /usr/bin/perl or /usr/local/bin/perl (or both) to be symbolic links to the true Perl binary on your system. This makes it easy to switch to a new version of Perl. Say you’re the system administrator, and you’ve built the new Perl. Of course, your older version is still running, and you don’t want to disrupt anything. When you’re ready for the switch, you simply move a symlink or two, and now every program that begins with #!/usr/bin/perl will automatically use the new version. In the unlikely case that there’s some problem, it’s a simple thing to replace the old symlinks and have the older Perl running the show again.

To find out where a symbolic link is pointing, use the readlink function. This will tell you where the symlink leads, or it will return undef if its argument wasn’t a symlink:

my $where = readlink "carroll"; # Gives "dodgson"
my $perl = readlink "/usr/local/bin/perl"; # maybe tells where perl is

You can remove either kind of link with unlink—and now you see where that operation gets its name. unlink simply removes the directory entry associated with the given filename, decrementing the link count and thus possibly freeing the inode.

Making and Removing Directories

Making a directory inside an existing directory is easy. Just invoke the mkdir function:

mkdir "fred", 0755  or warn "Cannot make fred directory: $!";

Again, true means success, and $! is set on failure.

As you saw earlier (in Chapter 2), a string value being used as a number is never interpreted as octal, even if it starts with a leading 0. So this doesn’t work:

my $name = "fred";
my $permissions = "0755"; # danger... this isn't working 
mkdir $name, $permissions;

Oops, we just created a directory with that bizarre 01363 permissions because 0755 was treated as decimal. To fix that, use the oct function, which forces octal interpretation of a string whether or not there’s a leading zero:

mkdir $name, oct($permissions);

Of course, if you are specifying the permission value directly within the program, just use a number instead of a string. The need for the extra oct function shows up most often when the value comes from user input. For example, suppose we take the arguments from the command line:

my ($name, $perm) = @ARGV; # first two args are name, permissions 
mkdir $name, oct($perm) or die "cannot create $name: $!";

To remove empty directories, use the rmdir function in a manner similar to the unlink function, although it can only remove on directory per call:

foreach my $dir (qw( fred barney betty)) {
    rmdir $dir or warn "cannot rmdir $dir: $!\n";
}

The rmdir operator fails for nonempty directories. As a first pass, you can attempt to delete the contents of the directory with unlink, then try to remove what should now be an empty directory. For example, suppose we need a place to write many temporary files during the execution of a program:

my $temp_dir = "/tmp/scratch_

The initial temporary directory name includes the current process ID, which is unique for every running process and is accessed with the " /> variable (similar to the shell). At the end of the program, that last unlink should remove all the files in this temporary directory, and then the rmdir function can delete the then-empty directory. However, if we’ve created subdirectories under that directory, the unlink operator fails on those, and the rmdir also fails. For a more robust solution, check out the rmtree function provided by the File::Path module of the standard distribution.

Modifying Permissions

The Unix chmod command changes the permissions on a file or directory. Similarly, Perl has the chmod function to perform this task:

chmod 0755, "fred", "barney";

As with many of the operating system interface functions, chmod returns the number of items successfully altered, and when used with a single argument, sets $! in a sensible way for error messages when it fails.

Symbolic permissions (like +x or go=u-w) accepted by the Unix chmod command are not valid for the chmod function.

Changing Ownership

If the operating system permits it, you may change the ownership and group member- ship of a list of files (or file handles) with the chown function. The user and group are both changed at once, and both have to be the numeric user-ID and group-ID values. For example:

my $user = 1004;
my $group = 100;
chown $user, $group, glob "*.o";

What if you have a username like merlyn instead of the number? Simple. Just call the getpwnam function to translate the name into a number, and the corresponding getgrnam to translate the group name into its number:

defined(my $user = getpwnam "merlyn") or die "bad user";
defined(my $group = getgrnam "users") or die "bad group";
chown $user, $group, glob "/home/merlyn/*";

The defined function verifies that the return value is not undef, which will be returned if the requested user or group is not valid.

The chown function returns the number of files affected, and it sets $! on error.

Changing Timestamps

In those rare cases when you want to lie to other programs about when a file was most recently modified or accessed, you can use the utime function to fudge the books a bit. The first two arguments give the new access time and modification time, while the remaining arguments are the list of filenames to alter to those timestamps. The times are specified in internal timestamp format.

One convenient value to use for the timestamps is “right now,” returned in the proper format by the time function. So to update all the files in the current directory to look like they were modified a day ago, but accessed just now, we could simply do this:

my $now = time;
my $ago = $now − 24 * 60 * 60; # seconds per day
utime $now, $ago, glob "*"; # set access to now, mod to a day ago

The third timestamp (the ctime value) is always set to “now” whenever anything alters a file, so there’s no way to set it (it would have to be reset to “now” after you set it) with the utime function.

发表评论

邮箱地址不会被公开。 必填项已用*标注