Introduction
FLAIM is a FLexible Adaptable Information Management database engine for traditional as well as volatile and complex information. Even though FLAIM provides many traditional database features (e.g., transactions, recovery, reliability, scalability), it was conceived with a broader view toward the greater flexibility and adaptability that is offered by an XML data model. FLAIM is not new; various products have used FLAIM for over 15 years. For instance, Novell’s scalable, reliable directory and collaboration products, eDirectory and GroupWise, both use FLAIM as the data store, with user licenses totaling well into the hundreds of millions.
XFLAIM 5 is the next generation of the FLAIM technology and builds on the proven FLAIM 4 engine used by Novell's eDirectory products. Since FLAIM 4 was conceived with a view toward flexibility and adaptability, it was a logical step to move to a full XML-based engine. Most of the features and concepts that existed in FLAIM 4 also exist in XFLAIM 5, with variations to support the XML/DOM model.
FLAIM has been ported to a wide variety of 32 bit and 64 bit platforms, including SUSE Linux Enterprise Server 9, OpenSUSE 10.0, NetWare 5.1, NetWare 6.x, Windows NT4, Windows 2000, Windows XP, Fedora Core 4, Ubuntu, Solaris 10, AIX 5L, Mac OS X, and HP/UX 11. There is 64 bit support for all of the Linux, Unix, and Unix-like (Mac OS X) platforms, as well as Windows versions that support 64 bit application development.
FLAIM and XFLAIM are embeddable technologies, similar to other embeddable database engines such as SQLite and Sleepycat/Oracle's Berkeley DB. To access the functionality offered by FLAIM or XFLAIM, an application merely needs to link against the library (either statically or dynamically). This is illustrated below:
FLAIM Features
Transactions
- Transaction begin, commit, abort. Use of rollback log for transaction abort and for recovery after a crash.
- Transaction types:
- Update. Update, read, and query operations allowed.
- Read. Only read and query operations allowed. Read transactions provide a read consistent snapshot of the database as of the point in time the transaction is started.
- Automatic. Single update operations may be told to automatically begin and end (commit or abort) a transaction if no transaction has been explicitly started.
- Automatic rollback of failed transactions (due to application failures or CPU failures).
- Periodic checkpoints to minimize recovery time after a system crash.
- No limit on size of update transactions.
- ACID principles supported: Atomicity, Consistency, Isolation, Durability.
- Group Commit allows multiple update transactions committed to disk at once to enhance update performance.
Roll-forward Logging
- Use of roll-forward log to minimize data that has to be written to commit a transaction.
- Roll-forward log is used in automatic recovery after a crash. Transactions that were committed since the last checkpoint will be redone.
- Multiple roll-forward log files may be used to support continuous backup feature. Files are numbered sequentially and are also identified with serial numbers to guarantee proper sequencing - no spoofing. Up to 4 billion log files supported - capacity is practically unlimited.
- Option to use only a single roll-forward log file - for applications that do not care about continuous backup.
- Roll-forward log files may be stored on a separate disk from rest of database.
- Minimal transaction logging. Only deltas logged for record modifies. Only DRNs logged for record deletes.
- Aborted transactions can be logged for debug purposes, but default is to not log them.
- Support for logging of application data.
Database Reliability and Recovery
- Automatic database recovery after a system crash. Rollback log is used to roll database back to last consistent checkpointed state. Then roll-forward log is used to redo transactions that were committed after the last checkpoint.
- Recovery is idempotent. That is, if we crash during recovery, it will be resumed when the database is subsequently opened.
- Reliability has been tested using an automated pull-the-plug test, which randomly cycles the power on the server during high volume updates to test database recovery. Thousands of pull-the-plug iterations have been performed.
- Handling of disk-full conditions and other disk errors. Database attempts to stall new update transactions until disk-full condition is resolved - without requiring a shut down.
- Protection against media failure. Customers can take hot backups and put roll-forward logs on a different volume than the database. If they do these things, two simultaneous disk failures would be required to lose any data.
Checksumming
- Block checksum set on all blocks in the database when writing to disk.
- Block checksum verified when reading blocks from disk.
- Checksum used to automatically detect corruption.
Concurrency
- One writer, multiple readers.
- Readers don't block writers (they NEVER lock items in the database).
- Writers don't block readers.
- Read consistency for readers (readers get a stable consistent snapshot of the database). Rollback log is used to provide block multi-versioning.
- Uncommitted data is not visible to other transactions.
Fields and Records
- Variable length fields. Text and binary fields up to 4GB per field.
- All fields are tagged - record is self-describing - no schema for record structure - structure is embedded in each record - XML-like.
- Nested sub-records, N-levels deep.
- Repeating fields and repeating sub-records.
- No storage used for omitted fields or to pad text fields to a fixed length.
- Unregistered fields (can store fields that are not defined in the dictionary).
- Data types: text, numeric, binary, context, blob.
- Text types: UNICODE.
Containers
- Allow application to partition data records physically and/or logically.
- Multiple containers per database.
- Multiple record types per container.
Indexing
- Compound indexes, component fields may be any FLAIM data type except BLOB.
- Optional and/or required fields in compound indexes (key not generated if required field missing).
- Existence indexes (indexes the presence of a field versus the field’s content).
- Case insensitive and case sensitive collation.
- Case insensitive collation with case preserved (post indexes).
- White space compression, other special indexing rules.
- Cross-record type indexes.
- Counter indexes.
- Sub-string indexing.
- Each-word indexing.
- Unique indexes.
- Support for many international languages and collating sequences, including Arabic, Hebrew, Asian (Japanese, Korean, Chinese), etc.
- Each index in a database can have its own international language.
- Fast updating of large reference sets.
- Keys up to 640 bytes long, key truncation supported.
- Multiple indexes per container and/or per record type.
- Left-end compression of index keys.
- Compression of index reference sets.
- APIs for reading of indexes directly (keys and references).
- Dynamically updated when records are added, modified, or deleted.
- Background indexing threads.
- Suspend, resume indexing. Can take indexes “offline.”
Dynamic Dictionary
- Add, modify, drop indexes, containers, field definitions.
- Comment fields allowed in ALL dictionary records.
Query Capabilities
- Rich set of query expression operators:
- Comparison operators (equal, not equal, less than, less than or equal, greater than, greater than or equal, match, match begin, contains, match end). Text comparison operators include wild card matching.
- Arithmetic operators (unary minus, multiply, divide, mod, plus, minus).
- Logical operators (not, and, or).
- Parentheses (used to alter normal operator precedence).
- Simple, powerful mechanism for building up query expression programmatically:
- Expression does not have to be passed in as a string.
- Allows program to add operators, operands, and parentheses to the expression in infix order.
- Allows program to easily use program variables which contain comparison values or field names.
- Allows use of values that are not easily formatted into a string (such as binary).
- Advanced query optimization (FLAIM will automatically select an indexes, etc. based on cost estimation).
- Index specification - application may specify an index instead of letting FLAIM choose one.
- Embedded Application-defined pPredicate callbacks.
- Powerful navigational calls for retrieving and browsing through query results (retrieve first, last, next, previous, and current records). Only records which satisfy query expression are retrieved.
Read and Update Operations
- Reading data records directly from containers (including dictionary container).
- Reading of indexes directly (keys and references).
- Advanced querying capabilities.
- Navigating forward and backward through containers and indexes.
- Update operations are: add, modify, and delete (including dictionary records).
Caching
- Block cache, shared by all threads in a process - up to 4 GB on 32 bit machines, much more on 64 bit machines.
- Record cache.
- Cache poisoning prevention
- Cache statistics available - hits, faults, hit looks, fault looks.
- Memory fragmentation prevention. Background thread is continually moving cached items to eliminate fragmentation.
Optimized Disk Reading/Writing
- Direct IO - bypass file system cache.
- Asynchronous writes.
- Sorting of blocks to optimize disk head movements. Also attempt to coalesce adjacent dirty blocks into larger write buffers for improved performance. Will fill write buffer with non-dirty blocks that are already in cache if it results in a more optimal write.
Database Validation and Repair
- Routine for checking physical structure of database. Links between Blocks verified, B-Tree structure verified, block checksums verified, field and record structures verified, index keys and reference sets verified, data in fields verified.
- Routine for checking indexes. Ensures that all keys that ought to be in an index are, in fact, in the indexes, and that no extra keys are in the indexes. In-line repair of index problems is allowed during index checking. Extra keys will be automatically deleted. Missing keys will be added.
- Routine for repairing database. Can rebuild from a totally trashed file - or even a zero length file!
- Callback facility in all functions to report progress. Allows application to display progress and cancel out if desired. Corruptions are also reported via the callback so that an application can create a detailed log of corruptions found if desired.
Backup/Restore
- Hot backup. Backups can be performed without taking the database offline and without stopping updates.
- Continuous backup. Roll-forward logs can be managed in a way that allows them to serve as a “continuous” backup of the database. No committed transaction will be lost.
- Incremental backups. This minimizes what must be backed up - only blocks changed since last backup.
- Capture of output during backup using callbacks. This allows an application to capture backup output and stream it directly to tape or other backup medium without having to stage it to an intermediate disk file first. An application could even choose to send backup data across a network connection to be stored on a remote device. FLAIM uses double-buffering so that an output device can be kept busy while FLAIM is fetching the next set of blocks to backup. This would help prevent a streaming tape device from stalling, resulting in dramatically improved backup throughput.
- All blocks in backup include a checksum to ensure that data is reliable when restored.
- Simple block compression used to minimize size of backup.
- Use of serial numbers in roll-forward log files and backups to ensure identifiability when restoring. Database also has a serial number.
- Restore from full backup, multiple incremental backups, and/or roll-forward logs - all in one call.
- Streaming input during restore using callbacks. Allows an application to restore backed up data directly from tape or other backup medium without having to stage backed up data to an intermediate file first. An application could also use this to restore directly from a remote location by bringing the data over a network connection. FLAIM uses double-buffering so that an input device can be kept busy while FLAIM is writing out blocks from a backup to the database. This would help prevent a streaming tape device from stalling, resulting in dramatically improved restore throughput.
- Status callbacks during backup/restore so that application can report progress and/or abort the backup or restore operation.
- Partial restore supported. An application has the option of stopping a restore operation after either: 1) a full backup or incremental has been restored, or 2) after any transaction in the roll-forward log has been redone.
Database Monitoring, Statistics Collection
- APIs to collect detailed statistics on disk I/O activity and transaction activity.
- APIs to monitor cache utilization, including bytes used, number of blocks and records cached, cache hits, faults, etc.
- APIs to collection detailed information about queries - to see what indexes were used, how many keys were fetched, how many records were fetched, how many failed the criteria, etc. This allows analyzing of query efficiency and troubleshooting of query performance problems.
Database Size
- Database may grow up to 8 terabytes or 4 terabytes (depends on platform). Up to 4096 files may be created. Each file is limited to either ~2GB or ~4GB, depending on operating system limitations.
- Number of records up to 4 billion per container.
- Database grows as-needed. No need to preallocate disk space. However, when extending files, it is more optimal to extend by a large amount than a small amount, so we typically extend a file by 8 MB at a time.
- Routine for re-claiming unused database blocks and log areas and returning to OS. Space may be reclaimed without taking database off-line.
- Benchmarks and comparisons show FLAIM database size to be smaller than other databases (25-40%).
- Database block size can be set on database creation to 4K or 8K.
- Sophisticated block splitting and block combining to maximize block utilization.
- Roughly 70% utilization in index blocks.
- Roughly 80-90% utilization in data blocks.
- Left end compression of index keys.
- Compression of index reference sets.
Cross Platform
- Database file is binary portable to ALL supported platforms, no need for conversions when moving database file from platform to platform. Little endian format used for most internal integer values.
- Platforms: Netware, Windows (NT, 2000, XP-64 bit), Unix (Solaris, AIX, HP/UX), Linux, MAC OSX (both PowerPC and Intel). 64 bit supported for Windows, Linux, and Unix platforms where it is available.
- Source code is developed in C++ programming language (one source for all platforms), allowing FLAIM to easily build libraries for other platforms – a new platform is generally an hour or two of work.
- Operating System services are abstracted into common interfaces or C++ classes for upper layers of code so they don’t have to worry about operating system differences. Code is maintained in a handful of files. Abstractions exist for disk I/O, memory management, semaphores and mutexes, and so forth.
Utilities
- Database checking utility (checkdb).
- Database rebuild utility (rebuild).
- Database browser/editor utility (dbshell). Can retrieve, add, modify, and delete records, perform transactions, perform queries, etc.
- Low-level viewers: Physical structure viewer/editor (view) and roll-forward log viewer/searcher (rflview).
- Text interface (TUI) for all platforms - supports colors, rudimentary windowing, keyboard access, and multiple screens. Have a common cross-platform abstraction for these services to hide platform specific details.
- All utilities build and work on all platforms and have the same look and feel.
XFLAIM Features
Transactions
- Transaction begin, commit, abort. Use of rollback log for transaction abort and for recovery after a crash.
- Transaction types:
- Update. Update, read, and query operations allowed.
- Read. Only read and query operations allowed. Read transactions provide a read consistent snapshot of the database as of the point in time the transaction is started.
- Automatic. Single update operations may be told to automatically begin and end (commit or abort) a transaction if no transaction has been explicitly started.
- Automatic rollback of failed transactions (due to application failures or CPU failures).
- Periodic checkpoints to minimize recovery time after a system crash.
- No limit on size of update transactions.
- ACID principles supported: Atomicity, Consistency, Isolation, Durability.
- Group Commit allows multiple update transactions committed to disk at once to enhance update performance.
Roll-forward Logging
- Use of roll-forward log to minimize data that has to be written to commit a transaction.
- Roll-forward log is used in automatic recovery after a crash. Transactions that were committed since the last checkpoint will be redone.
- Multiple roll-forward log files may be used to support continuous backup feature. Files are numbered sequentially and are also identified with serial numbers to guarantee proper sequencing - no spoofing. Up to 4 billion log files supported - capacity is practically unlimited.
- Option to use only a single roll-forward log file - for applications that do not care about continuous backup.
- Roll-forward log files may be stored on a separate disk from rest of database.
- Minimal transaction logging. Only deltas logged for record modifies. Only DRNs logged for record deletes.
- Aborted transactions can be logged for debug purposes, but default is to not log them.
- Support for logging of application data.
Database Reliability and Recovery
- Automatic database recovery after a system crash. Rollback log is used to roll database back to last consistent checkpointed state. Then roll-forward log is used to redo transactions that were committed after the last checkpoint.
- Recovery is idempotent. That is, if we crash during recovery, it will be resumed when the database is subsequently opened.
- Reliability has been tested using an automated pull-the-plug test, which randomly cycles the power on the server during high volume updates to test database recovery. Thousands of pull-the-plug iterations have been performed.
- Handling of disk-full conditions and other disk errors. Database attempts to stall new update transactions until disk-full condition is resolved - without requiring a shut down.
- Protection against media failure. Customers can take hot backups and put roll-forward logs on a different volume than the database. If they do these things, two simultaneous disk failures would be required to lose any data.
Checksums
- Block checksums are set on all blocks in the database when writing to disk and are verified whenever blocks are read from disk.
- The checksums are used to automatically detect database inconsistencies.
Concurrency
- One writer, multiple readers.
- Readers don't block writers (they NEVER lock items in the database).
- Writers don't block readers.
- Read consistency for readers (readers get a stable consistent snapshot of the database). Rollback log is used to provide block multi-versioning.
- Uncommitted data is not visible to other transactions.
DOM Nodes and Documents
- Documents are stored as DOM nodes.
- All element, attribute, and data nodes have a name id tag.
- Each DOM node can contain up to 4 gigabytes of data.
- Data types include text (Unicode and UTF-8), numeric, and binary.
Collections
- Documents are stored in collections
- There may be multiple collections per database.
- Collections allow data to be logically partitioned.
Indexing
- Compound indexes, key component may be any XFLAIM data type. Contextual relationships between nodes in a document may be specified (sibling, child, parent, etc.) for each component.
- Optional and/or required nodes in compound indexes (key not generated if required nodes are missing)
- Presence indexes (indexes the existence of a node rather than its content).
- Case insensitive and case sensitive collation.
- White space compression and other special key-generation rules.
- Ascending/Descending sort order. Ascending or descending may be specified separately for each key component in a compound index.
- Cross-document type indexes.
- Substring indexing.
- Each-word indexing.
- Approximate indexing (Metaphone).
- Support for many international languages and collating sequences, including Arabic, Hebrew, and Asian (Japanese, Korean, Chinese).
- Each index in a database can have it's own international language.
- Keys up to 1024 bytes long, key truncation supported.
- Multiple indexes per collection.
- APIs for reading indexes directly.
- Indexes are dynamically updated when nodes are added, modified, or deleted.
- Indexes can be built in the background.
- Indexes can be taken off-line (suspend) and later resumed.
Dynamic Dictionary
- Add, modify, and drop index, collection, element, attribute, prefix, and encryption definitions.
Query Capabilities
- XPATH is used as the query language.
- Rich set of query expression operators:
- Comparison operators (equal, not equal, less than, less than or equal, greater than, greater than or equal). Text comparison operators include wild card matching, allowing for match begin, match end, and substring (contains) searching.
- Arithmetic operators (unary minus, multiply, divide, mod, plus, minus).
- Logical operators (not, and, or).
- Parentheses (used to alter normal operator precedence).
- Advanced query optimization (XFLAIM will automatically select indexes, etc. based on least cost estimation).
- Index specification. The application may explicitly specify an index to use.
- Powerful navigational calls for retrieving and browsing through query results (first, last, next, previous, and current node/document).
Read and Update Operations
- Ability to retrieve nodes directly from collections by 64 bit node id. APIs for navigation within a document (next/prev sibling, first/last child, parent, etc.)
- Index keys can be read directly.
- Advanced querying capabilities are supported via XPATH.
- Add, modify, and delete operations are supported.
Caching
- A block cache is shared by all threads in a process. XFLAIM supports up to 4 GB of cache on 32 bit platforms and much more on 64 bit platforms.
- Document node cache.
- Cache poisoning prevention.
- Memory fragmentation prevention via smart management of cache and node allocations.
- Cache statistics can be queried, and include hits, faults, hit looks, and fault looks.
Optimized Disk Reading / Writing
- Direct I/O allows file system cache to be bypassed.
- Asynchronous writes.
- Cache blocks are written in ascending order to optimize disk head movements. Adjacent blocks are coalesced into larger write buffers for improved performance.
Database Validation and Repair
- Routines for checking the physical and logical structure of database are provided. Links between blocks, the B-Tree structure, block checksums, node/document structure, index keys/reference sets and data in nodes are verified. Damaged indexes can be fixed on-line if problems are encountered during the check.
- Routines for repairing a database allow data recovery from severely damaged databases.
- Progress and status callbacks are possible with all check and repair routines. This allows the application to display progress and cancel the operation if desired. Corruptions are also reported via the callbacks so that an application can create a detailed log of corruptions found if desired.
Backup/Restore
- Hot backup. Backups can be performed without taking the database offline and without stopping updates.
- Continuous backup. Roll-forward logs can be managed in a way that allows them to serve as a continuous backup of the database. No committed transaction will be lost.
- Incremental backups. This minimizes what must be backed up - only blocks changed since last backup.
- Backup and restore use flexible streaming interfaces to allow the application to efficiently select and manage the backup media. For example, an application could even choose to send backup data across a network to be stored on a remote device. XFLAIM uses double buffering so that an output device can be kept busy while XFLAIM is fetching the next set of blocks to backup. This helps prevent streaming devices (such as tape drives) from stalling.
- All blocks in backup include a checksum to ensure that data is reliable when restored.
- Simple block compression used to minimize size of backup.
- Use of serial numbers in roll-forward log files and backups to ensure “identifiability” when restoring. Database also has a serial number.
- Restore from full backup, multiple incremental backups, and/or roll-forward logs - all in one call.
- Status callbacks are supported during backup and restore operations, allowing the application to report progress and/or abort the backup or restore operation.
- Partial restore of a database is supported. An application has the option of stopping a restore operation after either: 1) a full backup or incremental has been restored, or 2) after a particular transaction in the roll-forward log has been re-played.
Database Monitoring / Statistics Collection
- APIs to collect detailed statistics on disk I/O activity and transaction activity.
- APIs to monitor cache utilization, including bytes used, number of blocks and nodes cached, cache hits, faults, etc.
- APIs to collect detailed information about queries. This includes the ability to see which indexes are used, how many keys are fetched, how many nodes are fetched, how many nodes failed the criteria, etc. This allows analyzing of query efficiency and troubleshooting of query performance problems.
Database Size
- Up to 8 terabytes of data per database.
- Up to 2^64 - 1 (64 bits) of document IDs per collection.
- Database grows as needed. There is no need to pre-allocate disk space.
- Support is provided for re-claiming unused database blocks and log areas and returning to them to the host file system. *Space may be reclaimed without taking database off-line.
- The database block size can be set on database creation to 4, 8, 16 or 32 KB.
- Sophisticated block splitting and block combining to maximize block utilization.
- Roughly 80% utilization in index blocks.
- Roughly 80-95% utilization in data blocks.
Cross Platform
- Databases files are binary portable across ALL supported platforms. There is no need for explicit conversions when moving a database from one platform to another. The platform where the database is created determines whether a little-endian or big-endian storage format will be used for database metadata. If a database is moved to a platform with a different endian format, conversions happen automatically as needed. Thus, it is possible for a database that was originally created on a little-endian platform and subsequently moved to a big-endian platform to gradually migrate to over time.
- Platforms: Netware, Windows (NT, 2000, XP-64 bit), Unix (Solaris, AIX, HP/UX), Linux, MAC OSX (both PowerPC and Intel). 64 bit supported for Windows, Linux, and Unix platforms where it is available.
- Source code is developed in C++ programming language (one source for all platforms), allowing XFLAIM to easily build libraries for other platforms – a new platform is generally an hour or two of work.
- JAVA APIs are also available for JAVA developers. JNI is used to interface to the C++ methods.
- Operating System services are abstracted into common interfaces or C++ classes for upper layers of code so they don’t have to worry about operating system differences. Code is maintained in a handful of files. Abstractions exist for disk I/O, memory management, semaphores and mutexes, and so forth.
Utilities
- Database checking utility (checkdb).
- Database rebuild utility (rebuild).
- Database browser and editor utility (xshell, DOMEdit). Provides support for retrieving, adding, modifying, and deleting documents and individual nodes.
- Low-level physical structure viewer/editor (view).
- All utilities build and work on all platforms and have the same look and feel.
External Links
FLAIM on Wikipedia
XFLAIM on Wikipedia
Project Members
Activity
Total Project Commits: 926
Total File Downloads: 1,916
File Download Stats
| 2009-01 | 2009-05 | 2009-10 | | Total |
|---|
| 5 | 4 | 8 | | 17 |
Source Commit Stats
| User | 2009-03 | 2009-04 | 2009-05 | 2009-06 | 2009-07 | 2009-08 | 2009-09 | Total |
|---|
| Jcalcote | 2 | 17 | 2 | 1 | 9 | 2 | 1 | 34 |
| | 2 | 17 | 2 | 1 | 9 | 2 | 1 | 34 |