USP Print File Management


A bit of History

Most every USP application creates a print file containing job information ( dataset info, program parameterization, errors, etc. ) that is by default stored in the directory from which the application was run. The file name is of the form PRGM.NNNNN.machine, where PRGM represents some form of the program name, NNNNN a unique machine identifier (the process id), and machine the actual hostname of the machine on which the application has run. The hostname wasn't always used as part of the name. That came about when our old HP clusters had name collisions because all of the machines got booted at the same time and the process ids actually coincided quite often on different machines due to that fact. Since our only parallel processing at that time was via IKP, we added the hostname only on jobs run under that system. However, with the grid engine processing, this again is a problem, even under scripted flows, so the hostname is appended now for all print files.

Since the simpler days when this mechanism was first developed, the world has progressed to the point where managing these printout files has become a bit cumbersome due to the proliferation of large parallel processing flows and automated scripts.

The first attempts to enable the user some power over the management of these print files came through the development of a script called rmprint, whose duty it was to try and safely remove all print files in a directory while leaving data and script files untouched. Users being what they are, however, misuse of this utility by some people has been known to cause problems. In a classic example, the script is run while jobs are still writing to those print files. By definition, that shouldn't be a problem, but in the real world of shared file systems, it can be, and abnormal processing termination can occur.

A more drastic method of preventing "print file buildup" has been to make use of an environment variable USP_PRINT_BYPASS which, when set, redirects the USP application print file to /dev/null. Obviously, this is not an optimum solution, as those print files may contain crucial information in the event of an abnormal program termination.


A more Elegant Solution


A more robust method of managing printout files under USP has been introduced in the 1st quarter of 2008. Several new environmental variables have been created to modify the naming and location of the print files produced. The environment variables are:

These can be set in the user's environment, either via a startup script such as .userrc or .cshrc, or explicitly through a command. For example the csh command setenv USP_PRINT_DIRECTORY ${HOME}/printouts. A more useful usage would be on a per-job or project basis. For example a simple shell script shows how this would work -

Now to run the script and examine the printout directories -

This simple demo shows the file names resulting from use of the prefix/suffix features. Usage of any combination of the three new environment variables is allowed. If none are specified, the traditional naming is used.

Because the handling under rmprint gets trickier as the names get more complicated, the prefix and suffix specifications have the restriction that the only non-alphanumeric characters allowed are underscore ( '_' ) or dash ( '-' ). Any character found other those allowed will be silently changed to an underscore.