|
This fall, SilverStream announced an OEM agreement to embed a core piece
of Autonomy's Dynamic Reasoning Engine (DRE) within SilverStream eXtend
Director 3.0. Now, eXtend Director users can benefit from the flexibility
and power of Autonomy's infrastructure software to identify and leverage
relevant information within the enterprise. Let's explore the meaning of
this integration, how it works, and how the resulting
capabilities can be used to support your applications.
With version 3.0, the search feature in SilverStream eXtend Director has evolved.
Previously, ePortal 2.x leveraged SilverStream's Application Server integration
with Fulcrum technology to provide limited search capabilities.
Through SilverStream Application Server integration, the Fulcrum Search Server
indexed only specific columns of selected database tables. For example, with
'CMDOCCONTENTS', Fulcrum indexed the CONTENTDATA columns that contained
the content for documents stored in the content management repository. The content
management FullTextQuery objects used AgData calls to retrieve a list of documents
that matched a set of full-text search criteria.
The Autonomy integration is more open and versatile and provides the
following capabilities:
-
Fetching - This involves importing and indexing. The Autonomy DRE
uses a 'fetcher' to connect to a specific repository and import documents
before indexing them. SilverStream eXtend Director includes a customized
fetcher that allows the Autonomy DRE to request documents from the content
management system instead of through direct database access, which was
the case with Fulcrum.
- Synchronization - The content management system within SilverStream
eXtend Director warns the DRE when documents have changed as a result of selected
events such as add, remove, update, unlock, rollback, publish, unpublish,
checkout, and checkin. Synchronization can be specified to occur in real time
or as a batched process.
-
Querying - The SilverStream eXtend Director search API provides
classes and methods for performing a variety of queries.
SilverStream eXtend Director 3.0 includes an OEM version of Autonomy's server/application
builder, which includes the DRE + C API + Java wrappers (JNI). This package constitutes
the core technology used by Autonomy's publicly available products. SilverStream's
search service API wraps around Autonomy's own Java API. This integration supports
the following query types:
-
Conceptual search
-
Traditional search (a.k.a. full-text search)
- Keyword search (syntax : 'word1:+word2')
-
Boolean query (syntax : 'movie+AND+science+AND+fiction')
-
Proximity query (syntax : 'movie+"science+fiction"')
-
"Suggest More" search (syntax : "<docID1>+<docID2<")
-
Fuzzy query (Used when the spelling of a term is uncertain.)
You can also use the search system to:
-
Query over content AND METADATA (information about the document
such as author, title, summary, etc.) as well as extension metadata (custom
metadata)
-
Limit the number of results returned
- Limit the minimum acceptable relevance and
set the cutoff threshold for the document relevance scores (e.g.,
only interested in 70% relevance)
-
Sort by date and/or relevance
-
Generate automatic abstracts (different from the CM Abstract METADATA concept)
-
Query on demand (batch queries)
-
Support a thesaurus
-
Create multiple repository queries (multiple CM repositories as well as
'external' repositories)
Additionally, because the integration includes Autonomy's OmniSlave technology,
which filters text from binary formatted documents, the following binary
documents are supported for indexing:
-
HTML
-
SGML
-
XML
-
Plain text
-
Microsoft Word for Windows V3.x onwards
-
Microsoft excel V3.x onwards<
-
Microsoft Powerpoint V4.x onwards
-
Adobe Acrobat PDF arm
With SilverStream eXtend Director 3.0, you can create applications that
incorporate advanced search capabilities within a J2EE development model
and deploy them on multiple application servers, including SilverStream
Application Server 3.7.4, BEA Weblogic 6.1, and IBM Websphere 4.0.
Full-text searching (also called keyword or traditional
searching
by Autonomy) relies only on keywords and a thesaurus to find relevant hits.
By contrast, Autonomy's conceptual search technology analyzes conceptual
patterns to characterize documents based on usage, frequency, and relationship
of terms. Based on this analysis, the Autonomy technology builds a weighted
graph of terms and uses a probabilistic engine
to determine relevancy against the search criteria.
SilverStream bundles only the SilverStream eXtend Director CM fetcher with
its product, thus customers will need to purchase additional fetchers from Autonomy
so that the DRE can fetch documents from multiple sources such as file systems,
databases, etc. Doing this allows the SilverStream eXtend Director to query
multiple repositories to retrieve documents that pertain to a specific query.
Should a customer want to write custom fetchers, the proper SDK license is available
from Autonomy.
The FullTextSearch CM API calls found in ePortal 2.3 have become somewhat obsolete.
Now, the recommended way to access this feature is with the new search service
APIs or through the content management service APIs that pertain to searching.
The following example explores the search API a bit further.
(The following code can be used in most every environment within a Director
J2EE application, including servlet, EJB, SilverStream
eXtend Director portal component, JSP, etc.)
try {
// This is the first step in using the search API and acquiring an instance of the Query
// Engine delegate from the
// search client factory. This delegate will be used to perform most search operations.
com.sssw.search.api.EbiQueryEngineDelegate searchDel =
com.sssw.search.client.EboFactory.getQueryEngineDelegate();
// Next, you would get a QueryObject from a different factory.
com.sssw.search.api.EbiQuery query = com.sssw.search.factory.EboFactory.getQuery() ;
// This query object will be passed on to the query delegate to perform the search once
//all its parameters have been set:
// SilverStream eXtend Director 3.0 content management service now supports multiple repositories
// and vaults.
// As mentioned earlier, Autonomy supports multiple 'databases' (we are not referring here to SQL
// databases,
// but rather to the partitioned indexing spaces associated with a document source), including the
// SilverStream content management repositories. By default, Director applications are configured to
// handle two repositories, com.sssw.cm.Default and com.sssw.cm.System. Here we will use the default
// content manager to search only one of those vaults -- the default space where user documents are stored.
java.lang.String[] sa_rep = {"com.sssw.cm.Default"};
// Using Autonomy search string syntax, we'll now create the query string itself.
// AUTONOMY SYNTAX for SEARCH STRINGS (abbreviated)
// Conceptual Search Query syntax : 'Mutual funds'
// Traditional Search Query (a.k.a. FullTextSearch, Keyword search) syntax : 'Money:+Market'
// Boolean Query syntax : 'movie+AND+science+AND+fiction'
// Proximity Query syntax : 'movie+"science+fiction"'
// "Suggest More" Query syntax : '<docID1>+<docID2>' find more documents like these ones
String queryString = "holy grail";
// Now set the query parameters as follows:
query.setText(queryString);
// Here we'll set the maximum number of results that can be returned.
// This overrides the default set in the content management service jar
// ContentMgmtService-conf/config.xml file.
// <key>com.sssw.cm.search.synch.removes.batch.size.Default</key>
// <value>100</value>
query.setMaxNumResults(60) ;
// There are a number of other parameters that can be set.
// Narrow the scope by date:
query.setDateRange(Timestamp.valueOf("1999-03-05 12:00:00"), new java.util.Date());
// Generate summaries quickly:
query.setGenerateQuickSummary(true);
// Indicate 70% and up relevance:
query.setRelevanceCut(70);
// Sort results by date:
query.setSortByDate(true);
// Sort results by relevance:
query.setSortByRelevance(true);
// Now run the query through the QueryDelegate, which needs a context.
// We're doing that on behalf of a user, anonymous or not,
// on an array of repositories (sa_rep). Here we decided to use a collection of map
// objects to gather the results instead of using a collection of EbiQueryResult, which
// would be used when the value of the return elements parameter is set to true.
java.util.Collection queryResult = searchDel.runQuery( context, query,sa_rep ,false);
// After the query is run, and if the result set is not empty, we'll cycle through the collection
// and display its various values for the following keys:
// "EngineDocReference" indicates the URL of the document in the CM system.
// "DocWeight" indicates the relevance of the document within the query.
// A more thorough way to explore the results would be to ask for a collection of EbiQueryResult.
if (queryResult.size()>0)
{
java.util.Iterator goThrough = queryResult.iterator();
while (goThrough.hasNext()) {
java.util.Map thisDocMap = (java.util.Map) goThrough.next();
String s_doc = (String) thisDocMap.get("EngineDocReference");
String s_docurl = fullContextPath+"/../WebDAVService/main/"+s_doc;
String s_docurlsb.append(thisDocMap.get("DocWeight"));
// ...
As you can see, the SilverStream eXtend Director
search system supports full-text searching and conceptual searching of the content
repository. SilverStream eXtend Director also provides a wealth of functionality
(workflow, portal webtier, content management, user and directory services, a
rules engine, security, etc.) within a pure J2EE framework for building applications
that are deployable on multiple application servers. If you would like to learn
more about Director, its search capabilities, and other exciting features, don't
hesitate to download an
evaluation of the product and explore the online
documentation.
|