[an error occurred while processing this directive]
> developer > web app development
Content Management: A Perspective
by Dmitry Goldenberg, Sr. Software Engineer, Novell
Date Created: 2002-04-29 13:28:00.000
  Introduction
  So, what is this beast?
  The Past
  The Present
  The Future
  Conclusion
  References
introduction
We have been working hard to provide a comprehensive, scalable, easy-to-use content management system (CMS) in its Director (formerly known as ePortal) product offering. With Director now being an integral and quintessential part of the exteNd product family, CMS is becoming more important. Using feedback from the field, we has evolved our CMS into a highly robust system that often drives the company's commercial success. That said, perhaps it's time to take a step back and answer some questions before we move on to the next stage of the game. For example:
  • What is content management (CM)? How does we define it, and what does our CMS provide?
  • Why is CM important? Why is it important?
  • What can we do to ensure its success in providing a valuable CM product?


Taking a perspective look at our CM solution - from its inception as a database schema sketched on a whiteboard to its current state as a fully operational machine - might provide a few answers to these questions. And that's just what the following summary attempts to do: outline our take on the past, present, and future of CM.
so, what is this beast?
There are as many opinions about what CM is as there are people talking about it. Most people, however, agree that CM involves the lifecycle management of Web site information. Fundamentally, CM appears to rest on the framework of traditional document management with its ability to create, store, retrieve, update, delete, version, secure, and search documents. On top of that, CM delivers a stack of additional benefits, including: categorization and classification of content; XML integration; content layout and presentation management; personalization; publishing and expiration; scheduling; content usage analysis; and reporting. There are three essential types of CM solutions distinguished in the marketplace:

A. Database Systems
Database systems tend to work well when the data is more or less homogeneous and can be categorized into specific document types and the like, i.e., when content is standardized. Database systems offer traditional powers such as efficient storage, retrieval, and search capabilities.

B. File-Based Systems
File-based systems tend to work better for varied content, and when data is shared by multiple applications and systems that don't necessarily integrate well with databases. File-based systems sometimes provide easier access to data for both technical and non-technical contributors.

C. Hybrid File/Database Systems
Hybrid file/database systems may use a file system to store content, but will use a database to store metadata and control the workflow. When evaluating various CM systems (and there are myriad out there), analysts usually recommend that IT specialists focus on the following technical aspects:
  1. System type (see above) according to business need
  2. Functionality richness (both back- and front-end)
  3. Security
  4. System scalability and speed of performance
  5. Application programmer interfaces
  6. Platform (in)dependence
  7. Extensibility
  8. Customizability
  9. Relationship to third party tools
  10. Ability to deal with legacy data
  11. Multilingual and internationalization capabilities
  12. Data conversion and mobility
  13. Event model and scheduling
  14. Reporting
  15. Workflow management
The following review takes a closer look at exteNd's CMS, outlining how we addressed the above criteria and detailing where we could or should take its CMS in the future.
the past
In the glorious days of ePortal 1.0, we started out to build a CM solution with a somewhat simple, albeit rich and to the point, scope. The focus was on providing basic lifecycle CM capabilities to enable application developers to create, manage, and secure documents in a database-driven environment. Technologically, SilverStream's CMS was based on the SilverStream Application Server and its data binding capabilities such as AgDatas and Data Source Objects (DSOs). SilverStream provided a robust API and a Portal Management Console (PMC), which we (SilverStream developers) affectionately referred to as "the revolutionary PMC."

So as it turns out, SilverStream gave the world a baby CMS that grew and matured rapidly. It became the underlying foundation of our e-business solutions framework and a source that hydrated and logically brought together other important product capabilities such as the portal, rules, and workflow. With its CM capabilities, ePortal 1.0 delivered rich, out-of-the-box functionality with a robust, well-documented API, a front end, integrated security, and a portal that emphasized the usage of components that can pump data from inside the CMS.

Yet even as we were still coding, we sensed the urgency to include other system capabilities that could turn SilverStream's CMS into a serious weapon rather than the prototypical toy gun. To be a really useable framework, the product needed speed, extensibility, and customizability.

Moving on from what we call the "Guinness" release (ePortal 1.0) to the "el cheapo vodka" release (ePortal 2.0, a.k.a."Smirnoff"), we addressed the speed issue, especially in the areas of searching, acquiring lists of CM items, and looking up directory entries by utilizing smarter SQL and caching.

So, what else did we do? We wrote a lot of Java, and drank a lot of it too! We tackled the notion of content personalization by adding the content query action, which allowed developers to create rules that operate on content. We figured out a way of utilizing rules that makes the security filtering of content less weighty, faster, and more dynamic. We even enhanced our whole product's object management and architecture via factories, so now, objects could be dynamically bound to specific implementations at runtime, which no longer had to be canned but could be custom-built.

The ePortal 2.0 model rested on the heavy usage of interfaces, which proved to be absolutely the right approach. In fact, with it we had taken a giant leap forward in providing the ability to customize diverse, complex applications. Yet even still, for SilverStream to gain new market segments and not remain tied to its own application server, we as developers had to immediately face our next target - platform independence. For SilverStream's, platform independence meant moving away from Fulcrum's full-text searching technology, which was tightly wired into our application server and stood as an obstacle in our "fight" for independence. Additionally, we needed to better enable applications to search both the document metadata and the content (the structured and unstructured data). And, of course, new requirements continued to filter in from real-life usage in the field.
present
Server Independence
As ePortal 1.0 has evolved into Director 3.0 (our "cognac" release), the product has matured into a robust, J2EE-compliant application framework that has bolstered SilverStream's strategic eXtend initiative. The CMS and other subsystems in Director 3.0 provide EJB access (the delegate design pattern for abstracting the user out of the details of local vs. remote references), and the CMS has become part of the overall J2EE-based architecture. In fact, SilverStream's CMS is now packaged and deployed as an enterprise archive within a variety of application servers.

You might say we've cut the umbilical cord that once tied the CMS to SilverStream's application server, removing any vestiges that might render our work as proprietary. We no longer speak AgDatas, as our product's data access model has been overhauled and reworked into our own lightweight layer, appropriately nicknamed as the DAMN layer (Data Access Model Normalization). And with the CMS being the primary user of this remedy, we've brought a sense of normalcy to the insane world of database access.

WebDAV
The implementation of WebDAV support (see http://www.webdav.org) has created numerous integration possibilities for SilverStream. This is especially true since the middleware of SilverStream's WebDAV support hides the complexity of the product's APIs and allows users to pick and choose their front end applications, be they Microsoft Office, Exchange 2000, Adobe Acrobat 5, or Dreamweaver. This gives our CMS interoperability and extensibility, and it also validates our original vision and design because our WebDAV implementation fits neatly into our CMS object and functionality models. In fact, adding a document to the CMS is now as easy as dragging and dropping a file on your desktop.

Autonomy
With Director 3.0, we revamped the product's overall searching capabilities, and cleaned up and enhanced the SQL-based document metadata searching. Our original full-text searching module became obsolete with the introduction of server independence, leading us to seek a more cutting edge technology to support this area. We went with Autonomy, which gives us several key advantages:

A. Conceptual searching is a cornerstone of Autonomy's technology, whereby documents are indexed and then searched by meaning, using probabilistic algorithms, rather than by keywords and frequency of appearance, which is the traditional approach. This alleviates the infuriating problem users often encounter when searching - too many hits, most of which turn out to be useless noise.
B. Application developers can still enable keyword searching.
C. The metadata and the content can be searched at the same time since both are stored in Autonomy's Dynamic Reasoning Engine (DRE).
D. Batching of results is supported, and users can easily paginate through them.
E. The maximum number of results returned is supported, which was not the case in SilverStream's previous searching implementations.
F. "Suggest similar documents" type of query is supported, which helps users find what they're really looking for and obtain related information.
G. Autonomy offers many other important capabilities, which are discussed in the "The Future" section of this epic document.

Search Service was introduced with Director 3.0. It encapsulates the underlying indexing and searching mechanism, making it pluggable if necessary. Search Service also wrappers the Autonomy APIs that are used for interacting with the DRE and serves as an intermediary between the CMS and the searching engine. And even though the formerly introduced full-text search interfaces were deprecated with 3.0, it is not a complex undertaking to code a query in CM that goes into the Autonomy DRE and returns results. The CMS imports its own documents into the DRE and indexes them, either immediately when they are added or modified, or in a batch fashion via a scheduled task (with deletions taken care of by the CMS, as well).

CMS 3.0 also gives more power to element objects, such as document and folder, as opposed to always doing everything through the manager object. Thus, we're maintaining our object-oriented cleanliness (which is next to godliness), and providing a plethora of new convenience methods on the elements. Well, this has been an exciting and interesting journey. And what's even more exciting is looking into...
future
Without promising any future functionality, it actually feels great to sit back and irresponsibly speculate on the future of CM. Where are we going? What awaits us in the future, which (with markets changing faster than we can all digest Mylanta) always seems to need to be now, if not yesterday, if not the day before? Well... Analysts tend to agree on several key aspects of CM that may drive the success of CM providers in the near future. These requirements are dictated by the exponential speed in which content is growing in company repositories. They are also driven by a growing urgency for quick and efficient ways to make content available and useable by various people in fast-paced, dynamic environments. So, what areas do the analysts predict will be key in the near future?

Multiple Content Repositories
Content is varied and vast, and only growing more so. Multiple repositories are needed to store the growing amount of existing documents, and also for content replication, backup, and recovery, as well as to store content separated by type, original source, or intended unsafe. Multiple repositories are also needed to enable the staging of Web content from development and quality assurance to production. As our customers begin to use the system more and more, SilverStream will need to address these issues.

Reporting
The more data involved, moved, and changed, the more critical monitoring and reporting become. Version control and logging may no longer be sufficient, and that is where, according to the analysts, reporting and analysis of content usage will come into play in a big way.

Conflict Resolution
In the early days of CM, we (SilverStream developers) made the design decision to make our versioning pessimistic, or exclusive, with only one person being able to hold a lock on a document. The wider the audience CM gets, the more urgent it may become optimistic locking, conflict resolution, and merging to better support medium- to large-sized content contributor teams.

Tools
Tools, tools, tools! We may hear it soon, and in fact, we already do hear it when it comes to the importing and exporting of CM data. Again, to quote the analysts, users want more direct control over their data. With our product maturing, we may have to move quickly from making a lot of things possible to making things nice and easy - having good tools is a major step in that direction.

Event Model and Scheduling
With Director 3.0, we introduced timer-based support for scheduled tasks. We may need to add a more fundamental set of capabilities for scheduling, perhaps based on a third-party solution. Generating and handling notifications of important CM events such as "document added" or "document checked in" is another important area that needs to be evaluated.

So, what about Autonomy?

Dynamic Categorization
Is this yet another buzzword? What does it mean? With dynamic categorization there is no need to manually assign (tag) documents into categories, which translates into a big time savings. And because categories are defined as queries it supports fast-paced, changing environments as well as rapid content deployment. Additionally, category definitions are stored as documents in CM and can be secured, versioned, and published, providing great flexibility in content categorization. Both static and dynamic categorization may prove to be the best way for us to address customer requirements in this area. Not only would we use Autonomy as a search engine, but also as a powerful content classification engine, which is where its true value resides.

Dynamic Personalization
Monitoring and storing user queries will "teach" systems about user interests, instead of systems having to bombard users with bulky, unmanageable questionnaires that often prove inefficient, if not useless. The more a user uses the system, the better the system will maintain the customer relationship, making the user want to return time and time again.

Communities
Similar queries, similar interests, similar people - all grouped and interacting as a community. This, in basic terms as well as in Autonomy's view, is what Web communities truly are. We've already received urgent requests to include a community management feature, and Autonomy may turn out to be an important building block in this area for future development.

XML Management
Autonomy has capabilities for automatically marking data with XML tags, helping to automatically categorize content and deliver it to the right points in the data processing cycle. Additionally, according to an article on Autonomy's web site: "A more subtle application of Autonomy and XML combined lies in areas such as supply chain management where, building on the strengths of XML to accurately record precise product codes or catalog numbers, additional unstructured information may be required to relay qualitative or supplementary detail."

Voice Suite
Again taken from the Autonomy Web site, this is in reference to Autonomy's voice and speech recognition interface technology: "With the acquisition of SoftSound, a renowned speech recognition company, Autonomy adds to its infrastructure offering the ability to handle multimedia content, such as videos, broadcasts, audio archives, news feed streams, etc."
conclusion
exteNd has come a long way with its CMS. As developers, it's exciting to think about how much we've done and how much exteNd users hopefully appreciate all that Director 3.0 has to offer. And now, with nothing but the possibility of greater things to come, we can only say: You ain't seen nothing yet!
references
www.autonomy.com
"Dispelling Popular Myths of Web Content Management," by Kathleen Means and James Graham, SunServer "A Framework to Manage Pervasive Content at the Edge of the Network," Internet REPORT, Vol. 4, No. 15, December 1999
The Forrester Report, "Commerce Software Takes Off," by Eric Schmitt with Harley Manning, Yolanda Paul and Sadaf Roshan
The Forrester Report, "Smart Personalization," by Paul R. Hagen with Harley Manning and Randy Souza
www.gigaweb.com, "Content Management in E-Commerce: An Emerging, Converging Market," by Tony White and Kathleen Hall
www.intranetjournal.com, "Effective Web Content Management: Empowering the Business User While IT Maintains Control," Winett Associates