In Windows XP for NTFS there is a command that sets the file lengths. It is called the FSUTIL command.
Ex: FSUTIL FILE setvaliddata
Set the valid data length for a file on an NTFS volume.
In NTFS, there are two important concepts of file length:
- End of File (EOF) marker
- Valid Data Length (VDL)
The EOF indicates the actual length of the file.
The VDL identifies the length of valid data on disk.
Any reads between VDL and EOF automatically return 0 in order to preserve the C2 object reuse requirement.
|~||~||Standard Attribute Header|
|0x00||8||File reference to the parent directory.|
|0x08||8||C Time – File Creation|
|0x10||8||A Time – File Altered|
|0x18||8||M Time – MFT Changed|
|0x20||8||R Time – File Read|
|0x28||8||Allocated size of the file|
|0x30||8||Real size of the file|
|0x38||4||Flags, e.g. Directory, compressed, hidden|
|0x3c||4||Used by EAs and Reparse|
|0x40||1||Filename length in characters (L)|
|0x42||2L||File name in Unicode (not null terminated)|
In NTFS, the “End of File” can be referred to as the “allocated size of the file” while “Valid Data Length” would be the “real size of the file”.
There is a way to set these values via a program:
SetFileValidData function to set the “real size of the file”
SetEndOfFile function to set the “allocated size of the file” (Physical size)
Use of this function creates a concept of “file pre-allocation” where you can define the size of the file up front and in advance before or while writing in the file.
But wait …. You say – this is a blog on the exFAT file system, why am I talking about NTFS? This is not a NTFS Blog.
That is true, and you’re correct. BUT…
If we were to examine a legacy FAT directory entry, the end of the directory record is a 32 bit length of the file. A single length, because effectively Valid Data Length = Size of File.
Now, let’s look at exFAT:
From the Specification, we get:
|Stream Extension Directory Entry|
We see from the definition of the Stream Extension Directory Entry two, not one: lengths. Just like NTFS, exFAT has two separate data length, each 8 bytes long which provide a maximum theoretical file length of 16 EiB. If we look at the definitions of each:
The ValidDataLength field describes how far into the data stream user data has been written. Implementations shall update this field as they write data further out into the data stream. On the storage media, the data between the valid data length and the data length of the data stream is undefined. Implementations shall return zeroes for read operations beyond the valid data length.
If the corresponding File directory entry describes a directory, then the only valid value for this field is equal to the value of the DataLength field. Otherwise, the range of valid values for this field is:
- At least 0, which means no user data has been written out to the data stream
- At most DataLength, which means user data has been written out to the entire length of the data stream
If the corresponding File directory entry describes a directory, then the valid value for this field is the entire size of the associated allocation, in bytes, which may be 0. Further, for directories, the maximum value for this field is 256 MB.
So, what does this mean?
First, exFAT can support pre-allocation of a file in advance before writing into it.
Second, a file can be pre-allocated in a very large size to attempt to get many contiguous clusters in a single allocation so that it can use that bit to say the file is contiguous and therefore not have to update the FAT chain. Imagine this: A camera is recording a long HD movie, it gets 3 hours in the movie, then trips on an allocated cluster from another file, in order to continue recording, the camera would need to go back and build the FAT chain, in other words the camera went 3 hours without touching the FAT, it would have to go back and fill in every FAT for every cluster since the beginning of the file, and depending on cluster size – that could be a lot of fixing up to do.
So, this is a feature of the exFAT file system, taken from NTFS and not currently in legacy FAT. Although legacy FAT supports the VDL (Set Valid Data Length) feature, legacy FAT only has one length field in the directory record so VDL is also the physical EOF.