Virtual file system

Virtual File System

Introduction

The virtual file system resides on an existing file system. Applications would use the vfs if they require certain files to be “inaccessible” to the user, but accessible to the application. It does this by storing all files within a single data file, and managing the files with a single index file. Multiple index/data pairs can be referenced by the same application, allowing it to store different files in different vfs's. This also allows files to be compressed and/or encrypted in the vfs, even if the underlying file system does not support these features. In addition, applications can define and query arbitrary application-specific attributes of any file in the vfs.

VFS Structure

As mentioned, each vfs consists of a data file and an index file. The index file represents a file allocation table which manages the list of files and directories in the vfs, and their locations within the data file. The data file contains the files in the vfs. A cluster size is specified at vfs creation, and the data file is a multiple of this cluster size. Files in the vfs occupy one or more clusters in the data file; the locations of these clusters is recorded in the index file. Thus, the structure of the data file is extremely simple; all the work goes into defining the index file.

The index file is also split into “clusters”, but the cluster size in the index file is relatively small – if file attributes do not fit in a single index cluster, the data spills over into another index cluster. Each index cluster ends with a pointer to the next entry for the file the index cluster represents. If there are no more attributes, the pointer is set to 0. This works, because the vfs will only ever need to follow the chain of index clusters to retrive all attributes of a file in the vfs once it knows exactly which file is being queried. The root directory is simply an index cluster containing a list of files and pointers to the index clusters for those files. The root directory is identified as '/' and subdirectories are separated by '/'s (therefore, '/' is not a valid filename in the vfs). Once the index cluster for the root directory becomes full, the next available index cluster is used, and the root directory cluster points to the new index cluster as the next in its chain. Similarly, sub-directories can implemented by allocating an index cluster to the subdirectory, and adding an entry for the sub-directory in the index cluster of the parent directory.

Cluster id
attributes
...
attributes
Pointer to next index cluster

Attributes include: file name, file size, flags indicating if the file is compressed or encrypted, application-defined attributes, and the list of data clusters in the data file that makes up the file. If the file is large, it will contain many data-cluster pointers in its attribute list, and this will spill over into another index cluster. Another interesting feature is that since the file contents can be considered as simply another attribute of the file, the vfs can store the actual file contents in the index cluster if the file data is small enough. If the file becomes too large to store in the index cluster, or if more attributes are added to the file, the vfs can then move the data for the file into the data file.
In addition to file information, the index file stores another important piece of information: a bit-field representing each cluster in the data file, and whether it is currently used or not. This provides two benefits. Firstly, it provides a snapshot of the data file usage, so there is no need to examine any other structures to locate an available data cluster. Also, by knowing beforehand where the free data clusters are located, it is possible to store files optimally to reduce fragmentation – if the vfs is requested to create a file containing say 17 clusters worth of data, it can assign 17 consecutive clusters to that file, even if these are not the first available clusters. Thus, as long as the underlying file system does not fragment the data file much, fragmentation within the vfs can be kept to a minimum.

Typically, to improve performance, instead of resizing the data (and index) file(s) as new files are added to the vfs (or as existing files grow), applications will specify an initial size for the data file. This can then be fixed, or, the vfs can grow if needed (this can either be explicitly requested, or the vfs can anticipate growth when it notices that the data file has reached a certain percentage of its capacity, and grow by a certain factor as needed). Again, if the size of the data file grows such that the bit-field representing data cluster usage can no longer fit in a single index cluster, the bit-field can spill over into another available index cluster as needed. This allows the index file to be fairly robust and dynamic, and without too many restrictions imposed in terms of following a rigid structure – the only concrete structure that needs to be guaranteed is that the very first index cluster contains the first free-space bit-field, and the very next index cluster contains the beginning of the root directory. From here, the vfs can follow the pointers in the root-directory index cluster to build the contents of the root directory, and use these entries to locate files and sub-directories. The vfs can cache certain directory lists in memory instead of reading the contents of the index file repeatedly, although this is probably best left to the caching features of the underlying file system.

Index clusters

Index clusters simply contain a list of attributes for a file, or, in the case of a directory, the list of files in that directory. Thus, apart from a strict beginning and ending, the contents of the index cluster is very dynamic. Index-cluster size is not fixed by the vfs, and a non-default size can be specified by the application upon vfs creation. The first 32 bits contains a unique index-id, and the last 32 bits contains the index-id of the next cluster in the chain for the current file. If there are no more clusters, the last 32 bits are all 0. The remainder of the index-cluster structure is a data blob containing attributes for the file/directory. The attribute definitions do follow a set structure, but because each attribute can be an abitrary size, and different files can have arbitrary attributes, and because an index cluster could be spill-over from a previous index cluster, no assumptions can be made regarding any attribute that must be present in the structure, or its size. Attributes have a name and a value. To maximise efficiency, attribute names should be as short as possible. Applications cannot set vfs-defined attributes as they set application-defined attributes (for example, an application cannot set the filesize attribute, even though the filesize is simply another attribute of a file) – the vfs will return an error if this is attempted, to modify vfs-defined attributes, API functions are provided.

The attribute entries in the data blob have the following structure:

8 bits for length of attribute name
Attribute name
16 bits for length of attribute data
Attribute data

There is no need for NULL-termination, as the length of each field is explicitly defined already. This has the added benefit of being able to allocate buffers that are the exact size before-hand, instead of performing an extra operation to work out the length of each field copy the information from a generic large buffer into an appropriately sized buffer. This does create a problem when an attribute's name or data overflows from one cluster to another. There are two options here. The vfs could simply continue reading the remaining info from the next cluster, as it knows not only the address of the next cluster, but also the number of bytes in the complete field, and the number of bytes left to read. Alternatively, the vfs could just shift attributes that overflow to the beginning of the next cluster, and attempt to optimise cluster usage by moving smaller attributes from the next cluster to the first one, to attempt to use up the remaining space. The first option appears to be the simplest and most efficient, as it eliminates not only wastage of cluster space, but the need to perform frequent “attribute defragmentation”.

Using variable length fields for attributes becomes a problem as soon as an attribute's value is changed such that the length of the attribute changes. If the length decreases, there is now stale, useless data from the end of the attribute until the 8 bits that define the length of the next attribute. The vfs expects the data blob to contain only the length of the attribute name, the attribute name, the length of the attribute data, and the attribute data. Thus, if the length of the attribute data is , it would expect to find the next blob bytes after the beginning of the attribute data. If the length of the attribute data is decreased, this assumption becomes invalid, as the new length is , but the next blob still begins bytes after the beginning of the attribute data. To solve this, the data blob is redifined as follows:

32 bits for length of data blob
8 bits for length of attribute name
Attribute name
16 bits for length of attribute data
Attribute data

By specifying the length of the data blob at the beginning of the blob, the point where the next blob begins is known, regardless of the lengths of the attribute name and data fields. However, if an attribute's length increases, the problem is now that the data could overflow into the next blob, corrupting it. In this case, the solution is either to move the next blob out, or to move the current attribute blob elsewhere, to a location that has enough space to hold the new data. If the new data would overwrite only the next blob, then the next blob is moved out, either into slack space of another blob whose data length had been decreased, or to the end of the index cluster chain. If the new data would overwrite more than one blob, then the blob that is being modified is the one to be moved out. The only exception to this is if the blob being modified is the first blob in the first index cluster of the file, in which case as many blobs as necessary will be moved to make space for the new attribute data.

VFS API (subject to change)

All operations on virtual file systems occur through a custom API. The VFS API exposes ways to create, open, close, and access virtual file systems. The vfs functions return VFSRESULTs, HVFSs or HVFILEs, depending on the operation being performed. VFSRESULTs are return codes defined by the vfs api, which allows for more detailed results to be returned by operations, instead of simply 'true' or 'false' indicating a successful operation or a failure. In this way, if an operation fails, the application requesting the operation will have a more detailed reason as to why it failed. An HVFS is a handle to a virtual file system. HVFSs are returned by the OpenVirtualFileSystem call, and are used to identify the vfs being accessed. In this way, an application may access multiple virtual file systems simultaneously. An HVFILE is a handle to a file in the vfs. For the functions that do not return a VFSRESULT, the GetLastVFSError function will return the last error result that caused a function to fail.

VFSRESULT CreateVirtualFileSystem(char* szPhysicalPath,
char* szVFSName,
DWORD dwDataSize, bool bAutoGrow,
WORD dwClusterSize = 4096, DWORD dwAverageFileSizeKB = 0, DWORD dwAverageNumberOfFiles = 0);

CreateVirtualFileSystem creates a new virtual file system at the specified physical path, with the specified vfs name. dwDataSize specifies a size, in bytes, for the data file (i.e. the size of the vfs). If this file size is not a multiple of the dwClusterSize parameter, it will be rounded up to the next multiple of dwClusterSize. If bAutoGrow is true, the vfs will grow as the vfs nears capacity. The dwAverageFileSizeKB and dwAverageNumberOfFiles parameters are optional, and simply provide hints to the vfs as to how big the index file should be.

HVFS OpenVirtualFileSystem(char* szPhysicalPath,
char* szVFSName,
DWORD dwFlags);

OpenVirtualFileSystem opens the specified vfs and returns a handle to the vfs that an application can use to access it. If the vfs could not be opened, the function will return NULL. Once the vfs has been opened, applications can use the vfs handle that OpenVirtualFileSystem returns, to access the files within the vfs. Once the application is done with the vfs, it must close the vfs using the appropriate function. Applications can specify various options in the dwFlags parameter, such as how locked files should be handled, etc.

VFSRESULT CloseVirtualFileSystem(HVFS hvfs);

CloseVirtualFileSystem closes all the resources associated with an open virtual file system. Note however, that since an application can call OpenVirtualFileSystem multiple times, and that the vfs is only actually opened once, the application must call CloseVirtualFileSystem for each time that it opened it. CloseVirtualFileSystem decreases a reference count that is incremented by OpenVirtualFileSystem – when this count reaches zero, the file system is closed and the resources are released.

HVFILE VFSOpenFile(HVFS hvfs,
char* szPathToFile,
char* szFileName,
DWORD dwFlags);

VFSOpenFile opens a file in the specified virtual file system and returns a handle to the opened file that the application can use to perform operations on the file. Flags defining the access level required can be passed in, for example, to prevent the file from being opened a second time, or to prevent the file from being deleted while open. If the path is NULL, the current directory is used.

VFSRESULT VFSCloseFile(HVFILE hvfile);

VFSCloseFile closes a file in a virtual file system, and releases any resources held by the file. Note that an application can open the same file multiple times. Each open call must be matched by a close call, as the vfs will only release resources once the last open has been closed.

HVFILE VFSCreateFile(HVFS hvfs,
char* szPathToFile,
char* szFileName,
DWORD dwAttributes);

VFSCreateFile creates a file in the specified vfs, in the specified path, with the specified file name. Certain commom attributes can also be provided. VFSCreateFile also opens the file by default, and returns a handle to the opened file. If the application decides to merely create an empty file, it must call VFSCloseFile as well. If the path is NULL, the current directory is used.

VFSRESULT VFSDeleteFile(HVFILE hvfile);

VFSDeleteFile will delete an open file from a vfs. Since this version of VFSDeleteFile operates on an open file handle, the application may have opened the file multiple times, and may be using the same handle elsewhere. Any attempts to use the handle of a deleted file will return an error. Note that VFSDeleteFile will fail if the file was opened with a flag preventing it from being deleted while open.

VFSRESULT VFSDeleteFile(HVFS hvfs,
char* szPathToFile,
char* szFileName);

This version of VFSDeleteFile deletes the specified file from the specified vfs. If the file is open already, and does not have any restrictions on being deleted while open, the file is deleted, and the existing handles to it become invalid. Otherwise, the function fails. If the path is NULL, the current directory is used.

VFSRESULT VFSSetAttribute(HVFILE hvfile,
char* szAttributeName,
DWORD dwAttributeSize,
void* pvAttributeData);
VFSRESULT VFSSetAttribute(char* szPathToFile,
char* szFileName,
char* szAttributeName,
DWORD dwAttributeSize,
void* pvAttributeData);

VFSSetAttribute creates or changes an arbitrary attribute of a file in a vfs. Note that some vfs-defined attributes (e.g. Filesize) cannot be changed. Note that specifying 0 for dwAttributeSize and NULL for pvAttributeData will remove the attribute.

VFSRESULT VFSGetAttribute(HVFILE hvfile,
char* szAttributeName,
DWORD dwBufferSize,
void* pvAttributeDataBuffer);

VFSRESULT VFSGetAttribute(char* szPathToFile,
char* szFileName,
char* szAttributeName,
DWORD dwBufferSize,
void* pvAttributeDataBuffer);

VFSGetAttribute retrieves the specified attribute for the specified file, and copies it into the specified buffer. If the buffer is too small, it is filled to capacity, but an error code is returned.

VFSRESULT VFSRemoveAttribute(HVFILE hvfile,
char* szAttributeName);

VFSRESULT VFSRemoveAttribute(char* szPathToFile,
char* szFileName,
char* szAttributeName);

VFSRemoveAttribute removes the specified attribute. This is equivalent to calling VFSSetAttribute with 0 for dwAttributeSize and NULL for pvAttributeData.

DWORD VFSGetFileSize(HVFILE hvfile);
DWORD VFSGetFileSize(char* szPathToFile,
char* szFileName);

VFSGetFileSize returns the file size in bytes of the specified file. The file can be specified either by a physical name, or a file handle. If the file is currently open, this function returns the latest filesize available.

DWORD VFSGetFilePos(HVFILE hvfile);

VFSGetFilePos returns the current position of the specified file from the beginning of the file. This is the position from which data will be read by VFSReadFile, and written by VFSWriteFile.

VFSRESULT VFSSetFilePos(HVFILE hvfile,
long dwOffset,
int dwOrigin);

VFSSetFilePos will move the current position in the specified file to the specified offset from the specified origin. The origin can be the beginning of the file, the current position, or the end of the file. Negative offsets are allowed, but the function will fail if a negative offset is passed with the beginning of the file specified as the origin. VFSSetFilePos will allow an application to set the file position past the current end-of-file. In this case, the contents of the file from the existing end-of-file to the new file position remain unchanged, and are unpredictable.

VFSRESULT VFSReadFile(HVFILE hvfile,
void* pvBuffer,
DWORD dwBufferSize,
DWORD dwBytesToRead,
DWORD* pdwBytesRead,
TRANSFORMPROC pfnTransform = NULL);

VFSReadFile reads a specified number of bytes from the specified file into the specified buffer. The buffer size must at least as large as the number of bytes to read. If the buffer is too small, data will be read in up to the size of the buffer. The function returns the number of bytes actually read via the pdwBytesRead parameter. Depending on how the vfs was opened, if the file specified to be read is locked at the time of the read request, the function will either return immediately with an error, or it will block until the file is unlocked. The TransformProc parameter specifies an optional function to apply transforms to data once it is read, e.g. Custom decryption / decompression algorithms. If this parameter is present, the data is read from the vfs, and passed through the transform function, before having the output from the transform function copied into the buffer that is returned to the application.

VFSRESULT VFSWriteFile(HVFILE hvfile,
void* pvBuffer,
DWORD dwBufferSize,
DWORD dwBytesToWrite,
DWORD* pdwBytesWritten,
TRANSFORMPROC pfnTransform = NULL);

VFSWriteFile writes a given number of bytes from a buffer to the specified file. The number of bytes actually written is returned via the pdwBytesWritten parameter. If the buffer is smaller than the number of bytes requested to write, the function will copy up to the buffers size into the file. Depending on how the vfs was opened, if the file specified to be written is locked at the time of the write request, the function will either return immediately with an error, or it will block until the file is unlocked. The TransformProc parameter specifies an optional function to apply transforms to data before it is written, e.g. Custom encryption / compression algorithms. If this parameter is present, the data is passed through the transform function first, and the output from the transform function is written to the vfs.

bool VFSIsFileLocked(HVFILE hvfile);

VFSIsFileLocked returns whether a given file is locked for writing or not.

VFSRESULT VFSCreateDirectory(HVFS hvfs,
char* szParentDirectoryName,
char* szNewDirectoryName);

VFSCreateDirectory creates a new directory under the specified parent directory.

VFSRESULT VFSSetCurrentDirectory(HVFS hvfs,
char* szPathName);

VFSSetCurrentDirectory sets the current directory for the specified vfs.

VFSRESULT VFSGetCurrentDirectory(HVFS hvfs,
char* szBuffer,
DWORD dwBufferSize);

VFSGetCurrentDirectory retrieves the current directory for the specified vfs.

VFSRESULT VFSCopyFileFromVfsToVfs(HVFILE hvfile,
HVFS hDestVFS,
char* szDestPath,
char* szDestName,
DWORD dwFlags);

VFSRESULT VFSCopyFileFromVfsToVfs(HVFS hSourceVFS,
char* szSourcePath,
char* szSourceName,
HVFS hDestVFS,
char* szDestPath,
char* szDestName,
DWORD dwFlags);

VFSCopyFileFromVfsToVfs copies an existing file in a vfs to another location in possibly another vfs. If the destination VFS handle is null, the file is copied to the specified location in the current vfs. If the destination directory does not exist, the function fails. If the source or destination directories are null, the current directory is substituted.

VFSRESULT VFSCopyFileFromVfsToUfs(HVFILE hvfile,
char* szDestUFSPath,
char* szDestUFSName,
DWORD dwFlags);
VFSRESULT VFSCopyFileFromVfsToUfs(char* szSourcePath,
char* szSourceName,
char* szDestUFSPath,
char* szDestUFSName,
DWORD dwFlags);

VFSCopyFileFromVfsToUfs copies a file from the vfs to a native underlying file system.

VFSRESULT VFSCopyFileFromUfsToVfs(char* szSourceUFSPath,
char* szSourceUFSName,
HVFS hvfs,
char* szDestVFSPath,
char* szDestVFSName,
DWORD dwFlags);

VFSCopyFileFromUfsToVfs copies a file from a native underlying file system into the specified vfs.

Concurrent Operations

Since an application can open the same vfs multiple times, it is quite possible that different threads of the application may try to access the vfs simultaneously. Because the vfs is a component of the application, and not of the operating system, an application does not know if another application has opened a vfs, and, more importantly, applications cannot synchronise operations to prevent two applications from modifying the same file simultaneously. Because of this, when an application opens a vfs, it will open it exclusively at the OS file system level, preventing another application from opening the vfs. This leaves the only concurrent operations needed to synchronise being those requested by the same application. Note that Read and Write operations only return once complete. While it is possible for writes to cache data in memory and actually write the data to disk later, the file system hosting the vfs would be able to optimise this much better than the vfs would, so it would be redundant to cache writes in the vfs.

Obviously, there's no harm in multiple concurrent reads, the problem arises when one thread is writing to a file, and another thread attempts to access the same file, either reading or writing. When the application makes a write request, the vfs checks that the file isn't locked already, and if not, it locks the file to reads and writes until the write completes. This leaves two options for read/write requests that come through while a file is locked for writing: either fail the operation with an appropriate error message, or block the request until the file is unlocked, at which time the request is processed and the read/write operation succeeds. Because different applications may have different requirements, both options will be supported by the vfs, and applications can choose which method they require when they open the vfs. Applications can test if a file is locked for writing by calling the VFSIsFileLocked function. Note however, that in the time between VFSIsFileLocked returning and any subsequent action, another thread may make a write request which locks the file. Since multiple threads could request operations on the same file, the vfs will queue requests in the order they are issued, if the application requires operations to block until they are able to complete.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options