Novell Home

Programming to the Linux Filesystem

From Developer Community

This article provides information on how to program to the Linux filesystem in C++.


   This article is a stub. If you have experience with or knowledge about this topic you can help by adding to it.

Note: This article is related to the Fissle project.

Contents

Linux Filesystem Basics

When we talk about the Linux filesystem, there are actually two different things we could be talking about:

  1. We could be talking about the code within the operating system that reads and writes file information to and from the disk and maintains the organization of the data. There are several choices for this type of filesystem available for Linux; see the Linux Filesystem Overview for more information.
  2. We could be talking about the general organization of data on Linux from a user's perspective as compared to other operating systems, like Microsoft Windows.

In the case of this article, we are primarily talking about the second item. Regardless of what software is used to actually manage the file information, all Linux filesystems look pretty much the same to the end user.

Essentially, the filesystem is made up of files. This seems obvious, but the point is that on Linux, all things in the file system are technically some type of file, including directories. The organization of the filesystem begins with "/", the root directory. Everything else in the filesystem is eventually contained within the root directory in a tree-like organization. Besides regular files (i.e. text files, executables, scripts, etc.) and directories, there are other types of files like domain sockets, devices, named pipes, symbolic links, and others.

Linux filesystems support multiple physical and logical volumes, partitions, network mounted volumes, etc. - pretty much everything you would expect from an enterprise-quality filesystem. Volumes are mounted at directory locations in the filesystem. For example, /share may be a mount point for a network-mounted shared drive; /home may be the mount point for a different physical drive than the rest of the file system. This is different than Microsoft Windows, which would designate a different drive letter (i.e. F:) for each mounted volume. This makes it easy for one to search the entire space of files available on the system via recursive search beginning at the root directory ("/"). Not all the files may be local, though, so be judicious!

Refer to the Linux Filesystem Overview for detailed information.

Programming to the Linux Filesystem in C++

The C++ language offers basic language support for common filesystem actions like opening, closing, reading, and writing files. The language implementation takes care of platform differences for you in these areas.

However, there are still many things that are not natively supported by C++ that you may want to do in an application. This section covers some of those topics.

File Attribute Tests

Getting File Status Information

There are many pieces of information about a file you may wish to access within an application, such as:

  • File Type - directory, regular file, socket, etc.
  • IDs of user and group that own the file
  • File size
  • Time of last access or modification

In C, this is done via the stat() function. This function accesses the filesystem information for a file and returns a structure containing the information. See the stat man page] for detailed information.

Here's a simple example of a C function that gets the file mode.

    mode_t getMode(const char* strPath)
    {
        struct stat _stat;
        if ( 0 == lstat(strPath, &_stat) )
        {
            return _stat.st_mode;
        }
        return 0;
    }

This function simply uses lstat() to get the status information on a file path that is passed in. It returns 0 if it fails and returns the mode otherwise.

This isn't exactly what we want in C++, however. It's nice to have an object that takes care of some of this for us. Here's another example, this time in C++.

    mode_t getMode(const string& strPath) throw (invalid_argument)
    {
        struct stat _stat;
        if ( 0 > lstat(strPath.c_str(), &_stat) )
        {
            throw invalid_argument("Unable to stat file");
        }
        return _stat.st_mode;
    }

In this example, we call lstat() as before but using a C++ string object instead of a char*. We also throw an exception - an invalid_argument exception, one of the standard C++ exceptions - if we can't stat the file instead of returning 0. Throwing an exception is much less error-prone and more informative. In addition, the way we wrote this function includes the exception it might throw in the function signature. If you call this function and don't handle the exception in your code, your compiler should warn you about it.

In C++, I prefer functions contained within objects rather than free-floating. In this case, I want to create a more general-purpose object that I'm calling the LinuxFileSystem object. It will include this code, plus some higher-level code that is more useful to the consumer of the object.

Here's a version of the LinuxFileInfo object.

class LinuxFileInfo
{
    public:
    static bool isDirectory(const string& strPath) throw (invalid_argument)
    {
        return S_ISDIR(getMode(strPath));
    }
    static bool isRegularFile(const string& strPath) throw (invalid_argument)
    {
        return S_ISREG(getMode(strPath));
    }
protected:
    static mode_t getMode(const string& strPath) throw (invalid_argument)
    {
        struct stat _stat;
        if ( 0 > lstat(strPath.c_str(), &_stat) )
        {
            throw invalid_argument("Unable to stat file");
        }
        return _stat.st_mode;
    }
};

Note that the code that actually gets the file mode is now in the protected interface. Instead, I created two new functions, isDirectory() and isRegularFile(). Each of these functions invokes getMode() within the context of a Linux macro that tests the mode to see if it is a file of a certain type, for example, S_ISREG() to see if the mode represents a regular file. In addition, both of these functions are static, so I don't need an instance of LinuxFileInfo in order to get the information I'm looking for.

Of course, not all file types are represented here, but it would be pretty easy to add them.

This makes it pretty easy for me to check to see what type of file I have. I just call the corresponding function and it returns true or false, throwing an exception if something unexpected happens, like this:

        try {
            if ( LinuxFileInfo::isRegularFile(candidate) )
            {
                // do something
            }
            else if ( LinuxFileInfo::isDirectory(candidate) )
            {
                // do something different
            }
        }
        catch ( const invalid_argument& ia )
        {
            cerr << "Caught exception on path " << candidate << ":  " << ia.what() << endl;
        }

Given a string object candidate that contains a file path, I can go into this block of code and determine whether that path points to a regular file or a directory.

Getting File Access Information

Another thing you may want to do with a file is to determine whether a user has a specific right to access a file in a certain way, for example, to write a file or execute it. In C this is done with the access() function, which checks a supplied pathname against a supplied access mask to see whether that access is allowed. See the man page for more information.

Here's an example of a function wrapper for access() in C++.

    bool canAccess(const string& strPath, int mode) throw ( invalid_argument, length_error )
    {
        assert(mode == F_OK || mode == R_OK || mode == W_OK || mode == X_OK);
        
        if (mode != F_OK && mode != R_OK && mode != W_OK && mode != X_OK)
        {
            // Should never get here due to the assert above.
            throw invalid_argument("Unknown file mode");
        }
        if (strPath.length()<1)
        {
            throw length_error("File path string too short");
        }
        return 0 == access(strPath.c_str(), mode);
    }

This function returns true if the user has the rights specified by mode to the path supplied; false otherwise. Note that again we indicate the exceptions we will throw if there is a problem. Between the exceptions and the assert() at the top of the function, we feel pretty confident that we will only invoke access() with useful data.

We can add this function to LinuxFileInfo just like the other one. Here's a new version of LinuxFileInfo with the access functions added:

class LinuxFileInfo
{
    public:
    static bool canRead(const string& strPath) throw (length_error)
    {
        bool rv = false;
        try
        {
            rv = canAccess(strPath, R_OK);
        }
        catch ( const invalid_argument& e )
        {
            cerr << e.what() << endl;
        }
        return rv;
    }
    static bool canExecute(const string& strPath) throw (length_error)
    {
        bool rv = false;
        try
        {
            rv = canAccess(strPath, X_OK);
        }
        catch ( const invalid_argument& e )
        {
            cerr << e.what() << endl;
        }
        return rv;
    }
    static bool canWrite(const string& strPath) throw (length_error)
    {
        bool rv = false;
        try
        {
            rv = canAccess(strPath, W_OK);
        }
        catch ( const invalid_argument& e )
        {
            cerr << e.what() << endl;
        }
        return rv;
    }
    static bool exists(const string& strPath) throw (length_error)
    {
        bool rv = false;
        try
        {
            rv = canAccess(strPath, F_OK);
        }
        catch ( const invalid_argument& e )
        {
            cerr << e.what() << endl;
        }
        return rv;
    }
    static bool isDirectory(const string& strPath) throw (invalid_argument)
    {
        return S_ISDIR(getMode(strPath));
    }
    static bool isRegularFile(const string& strPath) throw (invalid_argument)
    {
        return S_ISREG(getMode(strPath));
    }
protected:
    static bool canAccess(const string& strPath, int mode) throw ( invalid_argument, length_error )
    {
        assert(mode == F_OK || mode == R_OK || mode == W_OK || mode == X_OK);
        
        if (mode != F_OK && mode != R_OK && mode != W_OK && mode != X_OK)
        {
            // Should never get here due to the assert above.
            throw invalid_argument("Unknown file mode");
        }
        if (strPath.length()<1)
        {
            throw length_error("File path string too short");
        }
        return 0 == access(strPath.c_str(), mode);
    }
    static mode_t getMode(const string& strPath) throw (invalid_argument)
    {
        struct stat _stat;
        if ( 0 > lstat(strPath.c_str(), &_stat) )
        {
            throw invalid_argument("Unable to stat file");
        }
        return _stat.st_mode;
    }
};

Like in the example before, we moved canAccess() into the protected interface and we call into it with more meaningful functions, like canWrite() or exists(). Not only do these functions simplify the process of finding out what we want to know, they also make the invoking code more legible. For example:

        if ( LinuxFileInfo::canRead(candidate) )
        {
            // do something
        }

In this example, we do something if the path represented in the C++ string object "candidate" is readable. The code reads like this: "If can read candidate then do something." It is almost a real sentence and is easy to read.

Reading Directory Contents

If your application needs to examine the entries in a directory, you need to use the opendir(), readdir(), and closedir() functions (see the opendir, readdir, and closedir man pages).

Here's an example of this in C:

    char* path = "/home";
    DIR* d = opendir( path );
    struct dirent* dirp;
    if (d)
    {
        while ( (dirp = readdir(d)) != NULL )
        {
            // do something
        }
    }
    closedir(d);

This code simply tries to open a directory, and if successful, reads all the entries in the directory in a loop and allows you to do something with each entry. You might use this to look for a file of a given name, open all the files in a directory to look for contents in the files, change the permissions of all the entries, or recursively descend into subdirectories, for example.

Here's an example of a function in C++ that you can use to print any file matching a supplied pattern in a directory and all subdirectories.

typedef bool         (*file_match_function)    ( const string&, const string& );
void findMatch( const string& pattern, const string& path, file_match_function match )
{
    DIR* d = opendir( path.c_str() );
    static struct dirent* dirp;
    vector<string> dirs;
    if (! d)
    {
        cerr << "Unable to open directory " << path << endl;
        cerr << "Error is " << strerror(errno) << "(" << errno << ")" << endl;
        return;
    }
    while ( (dirp = readdir(d)) != NULL )
    {
        if ( 0 == strcmp( ".", dirp->d_name ) ||
             0 == strcmp( "..",dirp->d_name ) ) continue; // skip . and ..
        string candidate( path + "/" + dirp->d_name );
        try {
            if ( LinuxFileInfo::isRegularFile(candidate) )
            {
                if ( LinuxFileInfo::canRead(candidate) && match(pattern, candidate) )
                {
                    cout << candidate << endl;
                }
            }
            else if ( LinuxFileInfo::isDirectory(candidate) )
            {
                dirs.push_back( candidate );
            }
        }
        catch ( const invalid_argument& ia )
        {
            cerr << "Caught exception on path " << candidate << ":  " << ia.what() << endl;
        }
    }
    closedir(d);
    for ( vector<string>::const_iterator ci=dirs.begin(); ci!=dirs.end(); ++ci )
    {
        findMatch( pattern, *ci, match );
    }
}

In this example, the findMatch function takes three arguments. The first is a C++ string that represents the thing we are trying to match. The second is the directory where we are going to look. The third is a function pointer to a function used to test whether a file matches the pattern. This allows us to match in different ways. You'll also notice this uses the LinuxFileInfo class we described above.

We basically open the directory and examine each directory entry. If the entry is a regular file, is readable, and matches our pattern according to the function pointed to by match, we print the file.

Otherwise, if the entry is a directory, we push it into a C++ vector container. Note that we don't recurse here. We finish examining the entries and then iterate through the vector to recurse. This gives us an opportunity to finish with each directory and close it before moving on. Otherwise, you might run out of file handles and your code would fail.

Reading and Writing Files

Creating, Renaming, and Deleting Files

Links and Symbolic Links

Novell® Making IT Work As One

© 2008 Novell, Inc. All Rights Reserved.