What's up with VOS structured files and POSIX programs?

If you have ever tried to use POSIX-based programs to process VOS structured files, you may have encountered some restrictions or seen some behavior that you didn’t understand. In this post I will try to explain what’s going on.

VOS supports 4 types of files: sequential, relative, fixed, and stream. The first 3 formats are called “structured” files because the file system keeps track of the record boundaries. The available I/O operations read and write entire records. The last format is called “unstructured” because the record boundaries are implicit; the newline character delimits records. The available I/O operations read and write sequences of bytes.

A POSIX-compliant environment, such as found on a Unix® or Linux© system, only has one native file type, which is calls a “regular file”. The VOS stream file organization is equivalent to a POSIX regular file.

The VOS POSIX runtime environment classifies all 4 types of files as POSIX regular files. So at least in theory, any POSIX program can read or write any of the 4 file types. However, it is not that simple.

Because the POSIX API defines all I/O operations in terms of sequences of bytes, whereas VOS defines I/O operations (on structured files) in terms of sequences of records, the VOS POSIX runtime mediates the difference by buffering up the current record in user space, performing the POSIX I/O operations on the buffer, and then writing the buffered record back out at appropriate points.

If the POSIX program knows the size of each record (say, for a fixed file), and if it reads and writes the exact number of bytes, then the presence of the buffer does not affect the operation, and the mapping of the two types of I/O operations is easy to understand and efficient.

But if the POSIX program is just reading or writing a stream of bytes without regard to the underlying record size, then the mapping of the POSIX semantics to the VOS semantics, while well-defined, is not generally very useful, and is often quite inefficient. Certain operations, such as seeking to a byte position, or rewriting a sequence of bytes that extends across a record boundary, are inefficient at best and impossible at worst.

Then there is the matter of the handling of newline characters.

Both a POSIX regular file and a VOS stream file use the newline character to distinguish a record boundary. However, VOS records in structured files typically do not contain any newline characters. For example, when a VOS editor (e.g., edit, emacs, or line_edit) creates a new sequential file, each record contains one line of text. But the record does not end in a newline character. By convention, the programs all assume that a sequential file that contains text has an implicit newline at the end of each record. The same holds true for relative and fixed files. But here is an important point: no attribute of any file (VOS, Unix, structured, or unstructured) records whether the file contains text or binary data. The distinction is left up to the programs that access the file.

This situation creates something of a quandry for the VOS POSIX runtime. It needs to know whether a VOS structured file contains text or data in order to know whether to append a newline to each record or not. If the file contains text, it should append a newline. If it contains data, it should not append a newline.

The answer is that the VOS POSIX runtime depends on the caller to provide this information. By default, the POSIX runtime treats structured files as if they contain text, and appends a newline; a caller can explicitly request this behavior by providing the O_TEXT opening mode. A caller that wants to treat a VOS structured file as containing data must supply the O_BINARY opening mode. These two modes are mutually exclusive; if you specify one of them, you must not specify the other one. These modes are not needed for stream files, and so are ignored in this case.

For the language lawyers who are reading this post, let me quickly note that both O_TEXT and O_BINARY are extensions to the POSIX standard; they are not defined by POSIX itself. They are typically only present on operating systems that distinguish between text and binary files (as VOS does), or have a special end-of-line convention (as Windows does; using CR-LF).

By policy, when Stratus ports POSIX-based software to OpenVOS, we modify it to exclude the use of FIXED and RELATIVE files, and we restrict SEQUENTIAL files to read-only access. We also assume that all sequential files contain text. In our experience, these rules make it practical to use either STREAM or SEQUENTIAL files as input to POSIX programs, while avoiding the inherent inefficiencies of trying to perform byte-oriented operatings on SEQUENTIAL output files.

In summary, the best approach is to stick to using only STREAM files (for input or output) and SEQUENTIAL files (for input of text only) with POSIX-based programs. If you have binary data in structured file, write a small program to copy it into a stream file, and then use the stream version of the file with POSIX programs. If you have text data in a FIXED or RELATIVE file that you wish to process with POSIX programs, copy the file into a STREAM file first.

What’s up with VOS structured files and POSIX programs?

PARTNERS

TOPICS

QUICK LINKS