OpenVOS Blog

Showing archives for category Uncategorized

Using a File-based Sharable Address Space

3.11.2015UncategorizedBy:  

VM regions are often used as sharable address spaces in VOS applications. This is a very efficient way of allowing multiple processes to access the same set of addresses. The main disadvantage is that the space is restricted to well under 2 GB due to VOS’s virtual memory limits, and this usage cuts into VM available for other purposes.

Address space can be also shared via the file system by mapping addresses onto a binary file and accessing those addresses via File I/O operations. This eliminates VM region size restrictions and frees up otherwise dedicated VM. It also allows for coordination via region locking, but is far more expensive than direct access of shared VM, particularly if actual disk I/O is involved.

Posix applications sometimes use stream files in binary mode for this purpose, establishing the desired size of the address space via ftruncate (used for its ability to extend the location of EOF) and then position to areas within the file from which data is read or written. In this way, the file serves as backing store for the address space and processes share the space using file oriented interfaces such as fseek/fread/fwrite.

Prior to 64-bit stream files (introduced in Release 17.2), this could be very expensive on VOS because ordinary stream files cannot be sparse, and extending EOF involves explicitly allocating and writing blocks of binary zeros. For example, if the desired address space was say 2 GB, then VOS would require 524,288 blocks of disk storage, even though only a small amount of that storage might ever contain values other than binary zeros. With 64-bit stream files, these type of applications can now run efficiently on VOS, requiring only as much disk space as is actually needed. Posix applications automatically get the benefit of 64-bit stream files when built for large file awareness; you should do this to get these performance benefits, even if files are not expected to grow to more than 2 GB. (See OpenVOS POSIX.1 Reference manual, R502, “Porting Existing Applications to the 64-bit Stream File Environment” for more information).

Similar use of file-backed shared address space is possible in native VOS applications, i.e., those using s$ interfaces. VOS provides a number of features which can greatly reduce disk I/O when using this technique, essentially making CPU usage the primary cost. The introduction of sparse 64-bit stream files in Release 17.2 makes this approach to shared address space even more attractive.

Memory Resident and RAM files

A memory resident file is identified using the set_open_options command. A settable portion of the disk cache is reserved for memory resident files. Depending on physical memory available and other uses of cache, this can be up to 9-10 GB. Blocks of memory resident files once in cache will not incur subsequent disk reads as long as the total number doesn’t exceed that portion of the cache. If it does, then the most recently referenced blocks retain this advantage.

RAM files are files containing non-persistent data and are useful if the file contains data which does not need to be committed to disk when the application is done using it. You can use the set_ram_file command, but any file for which s$delete_file_on_close is called is automatically treated as a RAM file from that point on.
While memory resident files do not incur subsequent disk reads, blocks are still written at regular intervals, and the number of modified blocks allowed in cache is limited just as with any other file. Modified block limits prevent the situation where millions of blocks may need to be written when the file is closed or flushed – a 4 GB file occupies a million blocks. This limitation can slow down an application which modifies data faster than it can be written. RAM files avoid this type of throttling since their data never needs to be written to disk, even when the file is deactivated.

Using memory resident RAM files provides for an address space in cache memory avoiding most disk I/O, an address space which is limited only by cache size which in turn is based on available physical memory, not virtual memory (the cache manager shares VM addresses to access physical memory). Note: a single file-based address space can grow up to 512 GB, but when larger than the memory resident portion of cache, it loses I/O advantages, at least for those blocks which have not been recently referenced.

Example

The contents of a stream file can be accessed using s$seq_position with byte oriented opcodes and then examined or modified using s$read_raw/s$write_raw. 64-bit stream files can be up to 512 GB without requiring any significant disk space except for regions in the file which are used, i.e., set to be non-zero.

For example,

create_file scratch -organization stream64 -extent_size 256

This creates a DAE-256 file called “scratch”

set_ram_file scratch

This allows this file to have unlimited access to cache avoiding any throttling related to the number of modified blocks, and to avoid disk writes altogether except in the background or when cache is exhausted and needed for other purposes. This can be done programmatically as well via s$set_ram_file. When the last opener closes the file, the data in cache is discarded and never entails disk writes. The file must be empty when this command is used.

set_open_options scratch -cache_mode memory_resident

This causes as many as possible of this file’s blocks to be retained in cache indefinitely. The actual number is a factor of the cache size and memory residence percentage, as set in the set_tuning_parameters command.

The following sequence shows an example of programmatic usage (illustrated using test_system_calls):

tsc: s$attach_port p scratch
tsc: s$open p -io_type update

Now, provide an address space of around 512 GB:

tsc: extend_stream_file p 549235720192 (s$control EXTEND_STREAM_FILE)

and use it to store data:

tsc: s$seq_position_x p bwd_by_bytes 3 (current position after extend is EOF)
tsc: s$write_raw p END
tsc: s$seq_position_x p bof
tsc: s$write_raw p START
tsc: s$seq_position_x p fwd_by_bytes 2000
tsc: s$write_raw p 2000

Note: s$seq_position supports opcodes to position to absolute byte offsets as well.

The result looks like this, with the file occupying just two data blocks on disk:

..dump_file scratch -brief

%swsle#Raid4>otto>d-3>new>scratch 15-02-25 16:04:17 est

Block number 1

000 53544152 54000000 00000000 00000000 |START………..|
010 00000000 00000000 00000000 00000000 |…………….|
=
7D0 00000000 00323030 30000000 00000000 |…..2000…….|
7E0 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

Block number 131870736

000 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00454E44 |………….END|

Note: blocks 1 and 131870736 are in cache and typically will not have been written to disk, although disk space is reserved for them, should they need to be if cache resources are strained. The dump_file command above is seeing the cached blocks, not reading disk.

When a RAM file is deactivated (is no longer open in any process), VOS truncates the file (avoiding writing modified blocks to disk) and releases all disk space. Under typical circumstances, the above sequence would result in no disk I/O at all except writes of the two file map blocks, which are eventually discarded.

tsc: s$close p

Now the file is empty and occupies no disk space:

tsc: ..dump_file scratch

%swsle#Raid4>otto>d-3>new>scratch 15-02-25 16:07:12 est

Using 64-bit Stream Files

3.4.2015UncategorizedBy:  

VOS stream files are limited in growth to about 2 GB. This is because the definition of interfaces such as s$seq_position, s$get_port_status, s$lock_region, etc. use signed 32-bit variables to indicate position in the file and 2**31 is the largest possible value which can be represented. In stream files, position is byte-offset rather than record number as in other VOS file organizations.

64-bit stream files are available as of Release 17.2 and allow for growth limited only by the VOS file system (the maximum size of any file is 512 GB). In addition to expanding growth potential, these files offer other advantages, particularly when accessing binary data. Unlike with normal stream files, blocks containing all binary zeros occupy no space on disk, and reading such data requires no disk access.

Posix compliant applications can be easily built for large file awareness without any source changes: new files created on disks running on 17.2+ modules will be 64-bit (often referred to as stream64) files and on pre-17.2 modules will be ordinary stream files. Applications which use VOS s$ interface or VOS Language I/O are called native applications and those using stream files will be able to access stream64 files as long as either they do not use byte positioning operations (positioning to BOF or EOF is not considered byte positioning) OR if the file is However, they will need to be modified in order to create stream64 files.

Here is some information which may be useful if/when you plan to make use of 64-bit stream files. It is organized as follows:

Existing Applications and Compatibility
– Existing Posix Applications
– Existing Native Applications
– Compatibility
– Open Source Products
Sparse Allocation
File Conversion
Physical Characteristics
– Extents
– How Extents Affect Sparse Allocation
– Flexible Extents
Tools
– Locating 64-bit Stream Files on a Module
– Block Compares
– Comparing Sparse Files

Existing Applications and Compatibility
– Existing Posix Applications

As of Release 17.2, Posix applications can be built for large file awareness allowing then to access 64-bit files and create them if the target is on a disk running on 17.2 or beyond. If the application is Posix-compliant (for example, is coded to use types like off_t instead of int where required), then no source changes are needed; simply build it as described in the OpenVOS POSIX.1 Reference manual, R502, “Porting Existing Applications to the 64-bit Stream File Environment”. If a file being created is on a disk on a module running pre-17.2, then a normal stream file is created, and all will be fine with failure occurring only if an attempt is made to grow the file > 2 GB. So enabling your Posix application to use large files costs nothing in terms of interoperability. Most VOS supported Open source products will produce and deal properly with 64-bit stream files as of Release 17.2.

Posix compliant applications can be easily built for large file awareness without any source changes: new files created on disks running on 17.2+ modules will be 64-bit stream (often referred to as stream64) files and on pre-17.2 modules will be ordinary stream files. Applications which use VOS s$ interface or VOS Language I/O are called native applications and those which use stream files will be able to access stream64 files as long as either they do not use byte positioning operations (positioning to BOF or EOF is not considered byte positioning) OR if the file is smaller than 2 GB.

– Existing Native Applications

Many native applications simply need to change the use of the create_file command (or s$create_file) to indicate organization is STREAM64 rather than STREAM. If the application uses positioning operations and the file grows to > 2 GB, then s$seq_position will produce an error: to support larger files, s$seq_position calls needs to be changed to s$seq_position_x. Again, using _x interfaces does not interfere with the application’s ability to reference normal stream files existing on modules which may not support the _x interfaces, as long as positional arguments are in the supported range. So making the change allows access to any type of stream file and costs nothing in interoperability.

Existing applications which use only interfaces which do not communicate byte position: s$seq_read, s$seq_write, s$seq_position BOF/EOF will be able to deal with stream64 files without modification. Many applications are written to deal with ASCII files whether they are sequential or stream and thus need no change to deal with 64-bit stream files, regardless of size.

64-bit stream files cannot have indexes, and thus an indexed stream file cannot be converted to stream64. Indexed stream files cannot grow to > 2 GB due to restrictions in the existing index implementation.

– Compatibility

Because 64-bit stream files offer a number of advantages, they should be adopted when possible, if an index is not required.

As long as the file does not grow > 2 GB, it can be copied or moved to modules running older versions of VOS, and will appear there as a normal stream file. You can even move a disk containing stream64 files to an older VOS, providing the files are not “restricted”. A file is restricted when it grows > 2 GB or is sparsely allocated; such files cannot be opened if the disk which contains them is moved to a pre-17.2 VOS (tools are available to identify restricted files if you are planning on such a move). However, applications running older versions of VOS can open, read and write any kind of stream64 files across the network; any failure which occurs is the same as if the application were running on 17.2+ (for example, use of s$seq_position for byte positioning when the file exceeds 2 GB).

– Open Source Products

Most Open Software supported on VOS has been converted to be large file aware, for example ftp, sftp, samba, etc. That has certain implications. As of release 17.2, you can now ftp a stream file to VOS and not be concerned about the file being too big since the ftp daemon will create a stream64 file. ftp writes all bytes and thus the resulting file will not be sparse (see next section), even if consisting mostly of binary zeros. Simply copying the transferred file may greatly reduce the disk space it occupies. If you plan on adding an index to the ftp’ed file, you must first convert it to an ordinary stream file (and of course it must not be exceed 2 GB in length).

Sparse Allocation

Blocks which contain all binary zeros are not always allocated for stream64 files and thus require no disk space. The gnu “truncate” command (in >system>gnu_library>bin) can be used to both truncate and extend the size of a stream file. In 18.0, VOS provides a similar command called reset_eof. Data past EOF is undefined, but when EOF is extended, the file contents from the point of the current EOF is always set to binary zeros.

Consider an ordinary stream file and a stream64 file:

create_file stm -organization stream
create_file s64 -organization stream64

and assume each has contents ‘abc’. We wish to extend the files so that each occupies 2 billion bytes:

bash
bash-4.2$ truncate -s 2000000000 s64
bash-4.2$ truncate -s 2000000000 stm

The first request finishes immediately while the 2nd takes several minutes. The files show logical equivalency:

bash-4.2$ ls -l
total 244382
-rwxrwxrwx 1 nobody nobody 2000000000 Feb 25 14:08 s64
-rwxrwxrwx 1 nobody nobody 2000000000 Feb 25 14:11 stm

However, the number of disk blocks each file occupies is quite different, as shown by the VOS list command:

list

Files: 2, Blocks: 488764

w 4 s64
w 488760 stm

In Release 18.0, the equivalent operations can using the VOS reset_eof command:

reset_eof s64 2000000000

The VOS command will warn if you ask to extend an ordinary stream file by more than a few thousand blocks, since this is such a costly operation

reset_eof stm 2000000000
Extending eof to 2000000000 will add 488281 blocks to the file. Do you want to extend stm by 1999999996 bytes? (yes, no)

Here is the 64-bit stream file after being extended. It consists of just two data blocks:

dump_file s64
Block number 1

000 6162630A 00000000 00000000 00000000 |abc………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

Block number 488282

000 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

Dumping the ordinary stream file stm will show all 488282 blocks which have been allocated with most blocks containing binary zeros.

dump_file stm
Block number 1
000 6162630A 00000000 00000000 00000000 |abc………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

Block number 2
000 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

Block number 3

Block number 488282
000 00000000 00000000 00000000 00000000 |…………….|
=
400 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |…………….|
=
FF0 FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |…………….|

Nonetheless, compare_files and diff see the files as equivalent, but need to read through all blocks in the ordinary stream file which in this case can take a few minutes:

ready;compare_files s64 stm
ready 14:29:47
ready 14:31:24 45.877 46

File Conversion

The convert_stream_file command is available to convert between any of the sequential and stream file types (stream, 64-bit stream, sequential, and extended sequential) and to change extents. Typically conversion involves copying the contents of the file and in this way is similar to creating an empty target file and using copy_file with the -truncate option. convert_stream_file can be used to change the file in place and has advantages of making sure the existing contents can be represented in the requested new format before attempting the conversion.

There are cases where a stream file can be reset to stream64 and vice versa without conversion, i.e., without copying contents. This is done by resetting the directory entry and is possible only if the file is not sparse, is less that 2 GB and extent size is not changed. This can be useful when migrating a disk containing stream64 files to a pre-17.2 module and again if the disk is moved back. The set_stream_files command is available for this purpose. It is a privileged command intended for use by a System Administrator: it takes a directory as input and resets files in that directory, and optionally in all sub-directories. It affects only those stream64 files which can be reset without conversion (not sparse and otto>stm
file organization: stream file
last used at: 15-02-25 14:34:45 est

extent size: 1
next byte: 2000000000
blocks used: 488760 <<<<<

Then:
convert_stream_file stm -to stream64 -extent_size 8

display_file_status stm
file organization: stream file (64-bit/rstr)
last used at: 15-02-25 14:39:16 est

extent size: 8
next byte: 2000000000
blocks used: 18 <<<<
sparse: yes

The resulting file is shown as restricted (64-bit/rstr) due to it now being sparse. The copy_file command will also produce a sparse file. Converting a stream64 file to an ordinary stream file will result in all blocks being instantiated and so the resulting file can be considerably larger then the original.

In a stream64 file, there is no guarantee that a block which happen to contain binary zeros will be allocated on disk or not. For example, if you write some zeros into a block which then ends up being all zeros, the block remains allocated – or if a program explicitly writes zero data filling up blocks. Only blocks which are never written remain unallocated; copy_file and convert_stream_file never write blocks containing all binary zeros.

Physical Characteristics
– Extents

The actual growth limit of any VOS file is determined by its extent size, so it is important to understand what an extent is. The VOS file map contains disk addresses for each 4k block in a file; it is limited to 523792 entries and thus any file without extents is limited to 2145452032 (523792 * 4096) bytes. This is slightly under 2**31 and thus is the actual limit on the growth of an ordinary stream file.

An extent is a group of N contiguous blocks on disk, where N is the extent size. The file map for a file allocated via extents contains the address of the first block in the extent and thus allows the file map to represent a significantly larger file. Dynamically allocated extents allow extent size up to 256 allowing for a file containing 512 GB (Note: statically allocated extents which allow extent size greater than 256 are deprecated except for use with paging files; these have many restrictions and performance issues which make then unacceptable for general use).

In order to grow past 2 GB, a stream64 file must have extents, the value of which will determine its actual growth limit. For example,

create_file stm64 -organization steam64 -extent_size 32

will create a file which can grow to 64 GB.

In release 17.2, the default extent size for files created by Posix Runtime is 8, allowing for growth up to 16 GB. Such files grow 8 blocks at a time, and thus the minimum size of any file is 8 blocks. This default can be changed by a System Administrator on a per-module basis, but the larger the growth potential, the larger the minimum size of a file. Larger extents can be a problem for applications which produce many very small files.

– How Extents Affect Sparse Allocation

Normally blocks which are never written are not allocated for stream64 files; however if one block in an extent is written, then all blocks in that extent are allocated, even if some contain all binary zeros.

Suppose a non-extent file containing the string abc (stm1) is converted to a file with extent size 8 (stm8):

dump_file stm1
Block number 1
000 6162630A 00000000 00000000 00000000 |abc………….|
010 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

convert_stream_file stm1 stm8 -extent_size 8

dump_file stm8
Block number 1
000 6162630A 00000000 00000000 00000000 |abc………….|
010 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

Block number 2
000 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

Block number 3

Block number 8
000 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

The -brief option was added to dump_file in release 18.0 and can be useful when you don’t want to see blocks which are physically present, but represent “empty” (potentially non-existent) blocks. Using that option with the DAE-8 file is shown below:

dump_file stm8 -brief
Block number 1
000 6162630A 00000000 00000000 00000000 |abc………….|
010 00000000 00000000 00000000 00000000 |…………….|
=
FF0 00000000 00000000 00000000 00000000 |…………….|

This is effective also in hiding “empty” blocks containing all -1′s for relative of fixed files.

– Flexible Extents

Flexible extents were introduced in Release 18.0 allowing a small file to grow one block at a time; as the file grows larger, the extent value changes. This is the default for all files created by Posix applications in Release 18.0 and is what you’ll get with the create_file command if you don’t give an specific extent size:

create_file flex -organization stream64 -extent_size

Files with flexible extents are referred to as flex files; only stream64 files can have flexible extents. The display_file_status command shows the extent size of a flex file as “flex”.

display_file_status flex
name: %swsle#Raid4>flex
file organization: stream file (64-bit)
last modified at: 15-02-25 12:06:25 est

dynamic extents: yes
extent size: flex

If the display_file_status command is run on a pre-18.0 module, it shows:
extent size: -1

If a flex file is copied to a disk on a 17.2 module, it has extents which are the default for that module. If copied to 17.1 or before, the result is an ordinary stream files.

In release 17.2+, compare_files does not see extent size differences between stream files, and files appear identical regardless of extents. For example, suppose %swsle#m109_mas is a on a module running 17.2:

copy_file flex %swsle#m109_mas>Stratus
compare_files flex %swsle#m109_mas>Stratus>flex
ready 14:42:38 0.001 6

Running compare_files on a pre-17.2 module will show this as a difference:

compare_files %swsle#Raid>flex %swsle#m109_mas>Stratus>flex
A (%swsle#Raid4>flex does not match B (%swsle#m109_mas>Stratus>flex).
– Some attributes of the two files do not match:
extent size -1 8

Flex files are able to hold somewhat less than 512 GB, specifically 540,142,534,656 bytes vs 549,235,720,192 with DAE-256, but have the advantage that small files grow one block at a time, then 8, then 32, then 256 as opposed to always 256 blocks at a time.

You should not mount a disk containing flex files on a pre-18.0 module; there are tools which allow you to easily examine a disk for the presence of flex files. If you accidentally do, the files will not be visible and can’t be accessed. Such files or any containing directory cannot be deleted and so will remain intact if the disk is moved back to a module running 18.0+. However if you salvage on a pre-18.0 modules, all flex files will be eliminated and the space recovered.

Tools
– Locating 64-bit Stream Files on a Module

When you need to move a disk to an older version of VOS, you need to locate files which are incompatible with the older release. A command is available to help with this:

locate_stream_files -form

—————————– locate_stream_files ————————-
directory_names:
-depth:
-type: 64-bit
-brief: no
-long: no

-type can also be flex, sparse, or large (which will locate 64-bit stream files with that specific characteristic), or all (which will locate all stream files).

Depending on -depth, subordinate directories are searched and if the file has any of the attributes (not specified by type), information regarding that and extent size is shown as well. If no directory name is given, then all disks on the module are searched from the root. For example:

locate_stream_files -type 64-bit

Checking directories on disk %swsle#raid0-1…
%swsle#raid0-1:
smb_test.stm64 (FLEX/large)
smb_test.stmb
Total of 2 64-bit stream files.

Checking directories on disk %swsle#raid0-2…
%swsle#raid0-2:
big_stream (FLEX)
smb_test.stm (FLEX)
smb_test.stm64a (FLEX/large)
Total of 3 64-bit stream files.

Checking directories on disk %swsle#Raid4…
%swsle#Raid4>otto:
b3 (DAE-8)
big (DAE-256/large/sparse)

– Block Compares

In release 17.2 and 18.0, new options are available for the compare_files command which are useful for stream files. Block by block comparison of VOS structured file types is typically not useful because two blocks with different values often will represent the same logical records. For example, unused data in a relative file’s records past their current length is unpredictable. In addition, sequential files contain undefined filler bytes. The block contents of extended sequential files are completely different if the record size is different, although representing the same logical records.

However, block compares can sometimes be useful for fixed and stream files. The problem is that compare_files is record oriented and designed to show differences in terms of record numbers (or line numbers) which can be located using an editor. This has presented problems for binary stream files which are not organized in terms of records. Since record size for any file is limited to 32767 bytes, using s$seq_read in a stream file when a sequence of more than 32k bytes occurs without an intervening NL record delimiter will deliver the next 32767 characters, and not distinguish whether the next byte was a NL or not. This makes the result of record-oriented comparison ambiguous for stream files which hold binary values rather than a sequence of characters separated by the NL character.

This problem has been dealt with in release 17.2; compare_files will detect cases where false success may have occurred previously – specifically when 32767 characters followed by a NL match 32767 characters not followed by a NL. If this occurs, an error will be reported indicating that the stream file contains records > 32767. If this never occurs, then the record oriented results will be valid and can be trusted. But in any event, when all you want to do is assure files are identical, comparing the blocks of an unstructured file is much faster than the record-oriented method. The problem with block compares is that with no records to serve as guideposts, comparison after the first discrepancy is often meaningless.

In release 17.2, the -compare_blocks option was added to the compare_files command for this purpose. This will quickly detect whether or not unstructured files are block for block identical and identify the block at which they differ. For the rare case, where the difference is caused by an overlay (certain bytes being modified as opposed to inserted or deleted) and the rest of the files are identical, the -continue option is also available. This will show you the range of blocks in the remainder of the file which differ. If a byte was inserted for example (as opposed to overlaid), then all remaining blocks will differ; that is, resync’ing as is done with record compares is not possible with block compares. But in the case alluded to, this additional information could ascertain that the only difference was some overlaid bytes in a few blocks, perhaps representing something like a timestamp.

As an example of -compare_blocks with -continue:

compare_files flex dae256 -compare_blocks -continue
A (%swsle#m111_mas>Stratus>Newman>flex) does not match B (%swsle#m111_mas>Stratus>Newman>dae256).
– Data blocks from 2 to 24 differ.
– Data blocks from 33 to 488 differ.
– 272 additional blocks in File B.

– Comparing Sparse Files

The compare_files command looks at all physically present blocks in a file. This can be a problem in sparse files, because a block of binary zeros may appear as an allocated block in one file, yet not be present in another. The files may be logically identical, but not block for block. In fact, using copy_file will eliminate blocks of binary zeros in the target, and if any exist, this will guarantee the source and target will not be block for block identical.

This is also a problem in extent based stream files because all blocks in an extent are always instantiated even if they contain all binary zeros. That means that the blocks in an sparse extent file may look different that the blocks in an identical non-extent file, even though they represent the exact same data. Running copy_file cannot eliminate zero blocks in an extent.

The example showing use of -continue above actually involves two identical files, except that one has DAE-256 extents and the other has flexible extents.

So a DAE-8 file with contents “abc” will contain one block with “abc” followed by 7 others containing binary zeros. A non-extent file (or flex file) will just have one block with “abc”, and a DAE-16 file will have an “abc” block followed by 15 blocks of binary zeros.

Using the compare_files command with -compare_blocks will show all these files as being different, even though they all represent the same file data. In Release 18.0, the -compare_all_blocks option is available for this situation. This causes copy_file to logically instantiate missing blocks in files being compared, resulting in a true block by block compare with differences caused by sparseness eliminated. Missing blocks in stream64 files are considered all binary zeros while in fixed (or all other file types) are considered binary -1′s (all FFFF).

In a very sparse file, this can make the comparison much slower than using the -compare_blocks options, but still significantly faster than using the default record-oriented comparison. It is typically only needed when comparing blocks of files having different extents, although there are other cases as mentioned where sparseness can differ. This is useful also for fixed files having different extents.

Share