Process Management

The system Function

The simplest way to launch a child process in Perl to run a program is the system function. For example, to invoke the Unix date command from within Perl, it looks like:

system "date";

The child process runs the date command, which inherits Perl’s standard input, standard output, and standard error. This mean that the normal short date-and-time string generated by date ends up wherever Perl’s STDOUT was already going.

The parameter to the system function is generally whatever you’d normally type at the shell. So, if it were a more complicated command, like ls -l $HOME, we’d just have to put all that into the parameter:

system 'ls -l $HOME';

Note that we had to switch here from double quotes to single quotes, since $HOME is the shell’s variable. Otherwise, the shell would never have seen the dollar sign, since that’s also an indicator for Perl to interpolate. Alternatively, we could write:

system "ls -l \$HOME";

But that can quickly become unwieldy.

Now, the date command is output-only, but let’s say it had been a chatty command, asking first “for which time zone do you want the time?” That’ll end up on standard output, and then the program will listen on standard input (inherited from Perl’s STDIN) for the response. You’ll see the question, and type in the answer (like “Zimbabwe time”), and then date will finish its duty.

While the child process is running, Perl is patiently waiting for it to finish. So, if the date command took 37 seconds, then Perl is paused for those 37 seconds. You can use the shell’s facility to launch a background process, however:

system "long_running_command with parameters &";

Here, the shell launches, then notices the ampersand at the end of the command line, which causes it to put long_running_command into the background. And then the shell exits rather quickly, which Perl notices and moves on. In this case, the long_running_command is really a grandchild of the Perl process, to which Perl really has no direct access or knowledge.

When the command is “simple enough,” no shell gets involved, so for the date and ls commands earlier, the requested command is launched directly by Perl, which searches the inherited PATH to find the command, if necessary. But if there’s anything weird in the string (such as shell metacharacters like the dollar sign, semicolon, or vertical bar), then the standard Bourne Shell (/bin/sh‖) gets invoked to work through the complicated stuff. In that case, the shell is the child process, and the requested commands are grandchildren (or further offspring). For example, you can write an entire little shell script in the argument:

system 'for i in *; do echo == $i ==; cat $i; done';
Avoiding the Shell

The system operator may also be invoked with more than one argument, in which case, a shell doesn’t get involved, no matter how complicated the text:

my $tarfile = "something*wicked.tar";
my @dirs = qw(fred|flintstone <barney&rubble> betty ); 
system "tar", "cvf", $tarfile, @dirs;

In this case, the first parameter ("tar" here) gives the name of a command found in the normal PATH-searching way, while the remaining arguments are passed, one-by-one, directly to that command. Even if the arguments have shell-significant characters, such as the name in $tarfile or the directory names in @dirs, the shell never gets a chance to mangle the string. So that tar command will get precisely five parameters. Compare this with:

system "tar cvf $tarfile @dirs"; # Oops!

Here, we’ve now piped a bunch of stuff into a flintstone command and put it into the background, and opened betty for output.

And that’s a bit scary, especially if those variables are from user input—such as from a web form or something. So, if you can arrange things so that you can use the multiple- argument version of system, you probably should use that way to launch your subprocess.

Note that redundantly, a single argument invocation of system is nearly equivalent to the proper multiple-argument version of system:

system $command_line;
system "/bin/sh", "-c", $command_line;

The return value of the system operator is based upon the exit status of the child command. In Unix, an exit value of 0 means that everything is okay, and a nonzero exit value usually indicates that something went wrong:

unless (system "date") {
    # Return was zero - meaning success 
    print "We gave you a date, OK!\n";
}

Note that this is backward from the normal “true is good—false is bad” strategy for most of the operators, so to write a typical “do this or die” style, we’ll need to flip false and true. The easiest way is simply to prefix the system operator with a bang (the logical-not operator):

!system "rm -rf files_to_delete" or die "something went wrong";

In this case, including $! in the error message would not be appropriate because the failure is most likely somewhere within the experience of the rm command, and it’s not a system call–related error within Perl that $! can reveal.

The exec Function

Everything we’ve just said about system syntax and semantics is also true about the exec function, except for one (very important) thing. The system function creates a child process, which then scurries off to perform the requested action while Perl naps. The exec function causes the Perl process itself to perform the requested action. Think of it as more like a “goto” than a subroutine call.

Why is exec useful? Well, if the purpose of this Perl program were to set up a particular environment to run another program, the purpose is fulfilled as soon as the other program has started. If we’d used system instead of exec, we’d have a Perl program just standing around tapping its toes waiting for the other program to complete, just so Perl could finally immediately exit as well, and that’s a wasted resource.

Because Perl is no longer in control once the requested command has started, it doesn’t make any sense to have any Perl code following the exec, except for handling the error when the requested command cannot be started:

exec "date";
die "date couldn't run: $!";

In fact, if you have warnings turned on, and if you have any code after the exec other than a die or exit, you’ll get notified.

The Environment Variables

When you’re starting another process (with any of the methods discussed here), you may need to set up its environment in one way or another. As we mentioned earlier, you could start the process with a certain working directory, which it inherits from your process. Another common configuration detail is the environment variables.

In Perl, the environment variables are available via the special %ENV hash; each key in this hash represents one environment variable. At the start of your program’s execution, %ENV holds values it has inherited from its parent process (generally the shell). Modifying this hash changes the environment variables, which will then be inherited by new processes and possibly used by Perl as well. For example, suppose you wished to run the system’s make utility (which typically runs other programs), and you want to use a private directory as the first place to look for commands (including make itself). And let’s say that you don’t want the IFS environment variable to be set when you run the command because that might cause make or some subcommand do the wrong thing. Here we go:

$ENV{'PATH'} = "/home/rootbeer/bin:$ENV{'PATH'}"; 
delete $ENV{'IFS'};
my $make_result = system "make";

Newly created processes will generally inherit from their parent the environment variables; the current working directory; the standard input, output, and error streams; and a few more esoteric items.

Using Backquotes to Capture Output

With both system and exec, the output of the launched command ends up wherever Perl’s standard output is going. Sometimes it’s interesting to capture that output as a string value to perform further processing. And that’s done simply by creating a string using backquotes instead of single or double quotes:

my $now = `date`; # grab the output of date 
print "The time is now $now"; # newline already present

Normally, this date command spits out a string approximately 30 characters long to its standard output, giving the current date and time followed by a newline. When we’ve placed date between backquotes, Perl executes the date command, arranging to capture its standard output as a string value and, in this case, assign it to the $now variable.

This is very similar to the Unix shell’s meaning for backquotes. However, the shell also performs the additional job of ripping off the final end-of-line to make it easier to use the value as part of other things. Perl is honest; it gives the real output. To get the same result in Perl, we can simply add an additional chomp operation on the result:

chomp(my $no_newline_now = `date`);
print "A moment ago, it was $no_newline_now, I think.\n";

The value between backquotes is just like the single-argument form of system and is interpreted as a double-quoted string, meaning that backslash-escapes and variables are expanded appropriately. So, if you want to pass a real backslash to the shell, you’ll need to use two. If you need to pass two (which happens frequently on Windows systems), you’ll need to use four. For example, to fetch the Perl documentation on a list of Perl functions, we might invoke the perldoc command repeatedly, each time with a different argument:

my @functions = qw{ int rand sleep length hex eof not exit sqrt umask }; 
my %about;
foreach (@functions) {
    $about{$_} = `perldoc -t -f $_`; 
}

Instead of the backquotes, you can also use the generalized quoting operator, qx() that does the same thing:

foreach (@functions) {
    $about{$_} = qx(perldoc -t -f $_);
}

As with the other generalized quotes, you mainly use this when the stuff inside the quotes is also the default delimiter. If you wanted to have a literal backquote in your command, you can use the qx() mechanism to avoid the hassle of escaping the offending character. There’s another benefit to the generalized quoting, too. If you use the single quote as the delimiter, the quoting does not interpolate anything. If you want to use the shell’s process ID variable </code> instead of Perl?s, you use <code>qx''</code> to avoid the interpolation:</p>
<pre class=my $output = qx'echo " />';

Using Backquotes in a List Context

The scalar context use of backquotes returns the captured as a single long string, even if it looks to you like there are multiple “lines” because it has newlines. However, using the same backquoted string in a list context yields a list containing one line of output per element.

For example, the Unix who command normally spits out a line of text for each current login on the system as follows:

merlyn tty/42 Dec 7 19:41 
rootbeer console Dec 2 14:15 
rootbeer tty/12 Dec 6 23:00

The left column is the username, the middle column is the TTY name (that is, the name of the user’s connection to the machine), and the rest of the line is the date and time of login (and possibly remote login information, but not in this example). In a scalar context, you get all that at once, which you would then need to split up on your own:

my $who_text = `who`;
my @who_lines = split /\n/, $who_text;

But in a list context, we automatically get the data broken up by lines:

my @who_lines = `who`;

You’ll have a number of separate elements in @who_lines, each one terminated by a newline. Of course, adding a chomp around the outside of that will rip off those newlines, but you can go a different direction. If you put that as part of the value for a foreach, you’ll iterate over the lines automatically, placing each one in $_:

foreach (`who`) {
    my($user, $tty, $date) = /(\S+)\s+(\S+)\s+(.*)/; 
    $ttys{$user} .= "$tty at $date\n";
}

External Process with IPC::System::Simple

Running or capturing output from external commands is tricky business, especially since Perl aims to work on so many diverse platforms, each with their own way of doing things. Paul Fenwick’s IPC::System::Simple module fixes that by providing a simpler interface that hides the complexity of the operating system-specific stuff. It doesn’t come with Perl (yet), so you have to get it from CPAN.

There’s really not that much to say about this module because it is truly simple. You can use it to replace the built-in system with its own more robust version:

use IPC::System::Simple qw(system);
my $tarfile = 'something*wicked.tar';
my @dirs = qw(fred|flintstone <barney&rubble> betty ); 
system 'tar', 'cvf', $tarfile, @dirs;

It also provides a systemx that never uses the shell, so you should never have the problem of unintended shell actions:

systemx 'tar', 'cvf', $tarfile, @dirs;

If you want to capture the output, you change the system or systemx to capture or capturex, both of which work like backquotes (but better):

my @output = capturex 'tar', 'cvf', $tarfile, @dirs;

Processes as File handles

So far, you’ve seen ways to deal with synchronous processes, where Perl stays in charge, launches a command, (usually) waits for it to finish, then possibly grabs its output. But Perl can also launch a child process that stays alive, communicating to Perl on an ongoing basis until the task is complete.

The syntax for launching a concurrent (parallel) child process is to put the command as the “filename” for an open call, and either precede or follow the command with a vertical bar, which is the “pipe” character. For that reason, this is often called a piped open. In the two-argument form, the pipe goes before or after the command that you want to run:

open DATE, 'date|' or die "cannot pipe from date: $!";
open MAIL, '|mail merlyn' or die "cannot pipe to mail: $!";

In the first example, with the vertical bar on the right, Perl launches the command with its standard output connected to the DATE file handle opened for reading, similar to the way that the command date | your_program would work from the shell. In the second example, with the vertical bar on the left, Perl connects the command’s standard input to the MAIL file handle opened for writing, similar to what happens with the command your_program | mail merlyn. In either case, the command continues independently of the Perl process. (If the Perl process exits before the command is complete, a command that’s been reading will see end-of-file, while a command that’s been writing will get a “broken pipe” error signal on the next write, by default.) The open fails if Perl can’t start the child process. If the command itself does not exist or exits erroneously, Perl will not see this as an error when opening the file handle, but as an error when closing it. We’ll get to that in a moment.

The three-argument form is a bit tricky because for the read file handle, the pipe character comes after the command. There are special modes for that though. For the file handle mode, if you want a read file handle, you use -|, and if you want a write file handle, you use |- to show which side of the pipe you want to place the command:

open my $date_fh, '-|', 'date' or die "cannot pipe from date: $!"; 
open my $mail_fh, '|-', 'mail merlyn'
    or die "cannot pipe to mail: $!";

The pipe opens can also take more than three commands. The fourth and subsequent arguments become the arguments to the command, so you can break up that command string to separate the command name from its arguments:

open my $mail_fh, '|-', 'mail', 'merlyn' 
    or die "cannot pipe to mail: $!";

Either way, for all intents and purposes, the rest of the program doesn’t know, doesn’t care, and would have to work pretty hard to figure out that this is a file handle opened on a process rather than on a file. So, to get data from a file handle opened for reading, you read the file handle normally:

$now = <$date_fh>;

And to send data to the mail process (waiting for the body of a message to deliver to merlyn on standard input), a simple print-with-a-file handle will do:

print $mail_fh "The time is now $now"; # presume $now ends in newline

If a process is connected to a file handle that is open for reading, and then exits, the file handle returns end-of-file, just like reading up to the end of a normal file. When you close a file handle open for writing to a process, the process will see end-of-file. So, to finish sending the email, close the handle:

close $mail_fh;
die "mail: non-zero exit of $?" if $?;

Closing a file handle attached to a process waits for the process to complete so that Perl can get the process’s exit status. The exit status is then available in the $? variable (reminiscent of the same variable in the Bourne Shell) and is the same kind of number as the value returned by the system function: zero for success, nonzero for failure. Each new exited process overwrites the previous value though, so save it quickly if you want it.

The processes are synchronized just like a pipelined command. If you try to read and no data is available, the process is suspended (without consuming additional CPU time) until the sending program has started speaking again. Similarly, if a writing process gets ahead of the reading process, the writing process is slowed down until the reader starts to catch up. There’s a buffer (usually 8 KB or so) in between, so they don’t have to stay precisely in lockstep.

Why use processes as file handles? Well, it’s the only easy way to write to a process based on the results of a computation. But if you’re just reading, backquotes are often much easier to manage, unless you want to have the results as they come in.

For example, the Unix find command locates files based on their attributes, and it can take quite a while if used on a fairly large number of files (such as starting from the root directory). You can put a find command inside backquotes, but it’s often nicer to see the results as they are found:

open my $find_fh, '-|',
    'find', qw( / -atime +90 -size +1000 -print )
        or die "fork: $!";
        
while (<$find_fh>) { 
    chomp;
    printf "%s size %dK last accessed %.2f days ago\n", $_, (1023 + -s $_)/1024, -A $_;
}

That find command looks for all the files that have not been accessed within the past 90 days and that are larger than 1,000 blocks (these are good candidates to move to longer-term storage). While find is searching and searching, Perl can wait. As it finds each file, Perl responds to the incoming name and displays some information about that file for further research. Had this been written with backquotes, you would not see any output until the find command had finished, and it’s comforting to see that it’s actually doing the job even before it’s done.

Getting Down and Dirty with Fork

In addition to the high-level interfaces already described, Perl provides nearly direct access to the low-level process management system calls of Unix and some other systems. Let's take a look at a quick reimplementation of this:

system 'date';

You can do that using the low-level system calls:

defined(my $pid = fork) or die "Cannot fork: $!"; 
unless ($pid) {
    # Child process is here exec 'date';
    die "cannot exec date: $!";
}
# Parent process is here 
waitpid($pid, 0);

Here, you check the return value from fork, which is undef if it failed. Usually it succeeds, causing two separate processes to continue to the next line, but only the parent process has a nonzero value in $pid, so only the child process executes the exec function. The parent process skips over that and executes the waitpid function, waiting for that particular child to finish.

Sending and Receiving Signals

A Unix signal is a tiny message sent to a process. Different signals are identified by a name (such as SIGINT, meaning “interrupt signal”) and a corresponding small integer (in the range from 1 to 16, 1 to 32, or 1 to 63, depending on your Unix flavor). Programs or the operating system typically send signals to another program when a significant event happens, such as pressing the interrupt character (typically Control-C) on the terminal, which sends a SIGINT to all the processes attached to that terminal. Some signals are sent automatically by the system, but they can also come from another process.

You can send signals from your Perl process to another process, but you have to know the target’s process ID number. How you figure that out is a bit complicated, but let’s say you know that you want to send a SIGINT to process 4201. That’s easy enough if you know that SIGINT corresponds to the number 2(On a Unix system, you can get a list by running kill -l on the command line):

kill 2, 4201 or die "Cannot signal 4201 with SIGINT: $!";

It’s named “kill” because one of the primary purposes of signals is to stop a process that’s gone on long enough. You can also use the string 'INT' in place of the 2, so you don’t have to know the number:

kill 'INT', 4201 or die "Cannot signal 4201 with SIGINT: $!";

You can even use the => operator to automatically quote the signal name:

kill INT => 4201 or die "Cannot signal 4201 with SIGINT: $!";

If the process no longer exists, you’ll get a false return value, so you can also use this technique to see whether a process is still alive. A special signal number of 0 says “just check to see whether I could send a signal if I wanted to, but I don’t want to, so don’t actually send anything.” So a process probe might look like:

unless (kill 0, $pid) {
    warn "$pid has gone away!"; 
}

Perhaps a little more interesting than sending signals is catching signals. Why might you want to do this? Well, suppose you have a program that creates files in /tmp, and you normally delete those files at the end of the program. If someone presses Control-C during the execution, that leaves trash in /tmp, a very impolite thing to do. To fix this, you can create a signal handler that takes care of the cleanup:

my $temp_directory = "/tmp/myprog.$$"; # create files below here
mkdir $temp_directory, 0700 or die "Cannot create $temp_directory: $!";

sub clean_up {
    unlink glob "$temp_directory/*";
    rmdir $temp_directory; 
}

sub my_int_handler { 
    &clean_up();
    die "interrupted, exiting...\n"; 
}

$SIG{'INT'} = 'my_int_handler';

# Time passes, the program runs, creates some temporary
# files in the temp directory, maybe someone presses Control-C .
# Now it's the end of normal execution
&clean_up();

If the subroutine returns rather than exiting, execution resumes right where the signal interrupted it. This can be useful if the signal needs to actually interrupt something rather than causing it to stop. For example, suppose processing each line of a file takes a few seconds, which is pretty slow, and you want to abort the overall processing when an interrupt is processed—but not in the middle of processing a line. Just set a flag in the signal procedure and check it at the end of each line’s processing:

my $int_count = 0;
sub my_int_handler { 
    $int_count++ 
} 
$SIG{'INT'} = 'my_int_handler'; 
#...;
while (<SOMEFILE>) {
    #...; # some processing that takes a few seconds ... 
    if ($int_count) {
        # interrupt was seen!
        print "[processing interrupted...]\n"; 
        last;
    } 
}

So, you can either set a flag or break out of the program, and that covers most of what you’ll need from catching signals. For the most part, Perl will only handle a signal once it reaches a safe point to do so. For instance, Perl will not deliver most signals in the middle of allocating memory or rearranging its internal data structures. Perl delivers some signals, such as SIGILL, SIGBUS, and SIGSEGV, right away, so those are still unsafe.

发表评论

邮箱地址不会被公开。 必填项已用*标注