|
|

[an error occurred while processing this directive]
|
 |
 |
 |
introduction |
 |
 |
 |
|
Explanation
and examples of using flat files with and generating flat files from xCommerce
eBusiness transactions rely, increasingly, on files in XML (eXtensible Markup Language) format for data interchange. XML is ideal for this role since XML files are self-describing, easily parsed, inherently cross-platform, and capable of representing hierarchical data of arbitrary complexity. However, not all data files are in XML format. Many legacy applications are limited to importing and exporting flat files. Applications designed to transform XML (for example,
Services created with SilverStream's xCommerce) need to be able to deal with flat-file data as well as XML, if they are to be useful in the
eBusiness initiatives.
What are flat files? Typically, a flat file is a data file in which records are laid end-to-end, with no obvious hierarchical structure. (The term "flat" stems from the lack of hierarchy.) Each record is usually made up of atomic units called fields. For example, a file containing employee data might consist of one record for each employee, with a record, in turn, containing fields representing name, social security number, and years with the company. The records might look like:
"Johnson,John",555-66-7788,07
"Clark,Kimberly",454-77-9999,02
"Knight,Dwight",999-22-3333,03
In this case, each record is terminated by an end-of-line character and the fields are separated by commas, with double quotation marks
bounding any fields that use commas internally. This kind of file is often called CSV (for Comma Separated Values). In addition to the comma, the tab character (ASCII 0x09) is also a common delimiter, and in some cases a combination of fixed-length and variable-length fields are used, as in EDI files (which, despite being called "flat," actually conceal a fair amount of hierarchical structure).
Some flat files utilize fixed-length fields. By their nature, fixed-length fields have no delimiters. Instead, data bytes in a fixed-length field are followed by padding characters (for example, ASCII 0x20) so as to make the field occupy the desired amount of space. Files that use fixed-length fields might optionally contain one or more headers that contain field-length or layout information. (For example, dBase files use this technique.)
|
 |
 |
flat-file input and xCommerce |
 |
 |
 |
|
SilverStream's xCommerce Designer is a tool for creating
eBusiness integration applications. xCommerce uses XML as its native data
language. As such, it has no direct support for flat files per se. But Designer does incorporate a powerful ECMAScript binding (with extensions for performing file I/O, if needed). Also, it's possible to load custom Java classes, call custom Java methods,
etc. directly from ECMAScript, within an xCommerce Service or Component. One tactic for dealing with flat-file input, then, is to create a custom ECMAScript function (or equivalent Java code) for parsing flat-file input into XML. Overall, the flow of events (from the viewpoint of the xCommerce service)
might be:
- A small XML file arriving via HTTP triggers the xCommerce service. The XML file contains path information pointing to a flat file.
- The xCommerce service passes the flat file's path to a custom ECMAScript function.
- The custom ECMAScript function loads a custom Java package, instantiates an object, and calls appropriate methods on that object to open the flat file, read it, and parse it into XML, constructing an Output DOM on the fly.
If you want, you can write your flat-file parsing routine entirely in ECMAScript. The custom code can be stored in a Custom Script resource, which makes it available to any xCommerce component that might need to use it. To create a Custom Script resource, bring the xCommerce Designer main window to the front. In the menubar under File, select New xObject, then Resource, then Custom Script. The Script Editor window appears.
Any custom functions can be entered in this window and saved for use in xCommerce components. Be sure the "Function is Public" checkbox is checked.
|
 |
 |
custom Java classes |
 |
 |
 |
|
Custom Java Classes can be called from your ECMAScript functions. The main requirement is that the path to your classes must be included in the XCCLASSPATH environment variable. In Windows, this can be done either at the DOS command line with a SET command, or by setting the XCCLASSPATH in the System control panel's Environment tab.
To call the custom Java class, use ECMAScript's Packages facility:
var MyFile = new Packages.MyFileReader();
In this example, the MyFileReader constructor is called. Any methods in MyFileReader can then be called on MyFile.
The choice of whether to use ECMAScript or Java, or some combination of the two, is partly one of convenience and partly one of performance. If execution speed is of the essence, Java is the better choice since ECMAScript
can be significantly slower than Java at runtime. On the other hand, for rapid prototyping, ECMAScript is hard to beat. In fact, you can execute custom functions right in the Script Editor window (via the command-line interface at the bottom). This makes code development very quick and painless. (Or at least, quick.)
Bear in mind that if your flat files are large (many megabytes) but relatively simple in structure, you may be bottlenecked by I/O rather than code execution. When in doubt, run some tests. ECMAScript may or may not be
the appropriate choice for final production.
|
 |
 |
designing a custom flat-file parsing function |
 |
 |
 |
|
If you know in advance what kind of flat files you will be receiving (for example, CSV), the most expedient way to proceed might simply be to write a custom function containing hard-coded rules for parsing the file. One of the benefits of ECMAScript, as stated earlier, is that you can write and test such quick-and-dirty code very rapidly.
The only problem with the quick-and-dirty approach is that it leads to a loss of generality. You might very well end up rewriting your code again later to handle a new case involving some relatively trivial change in flat file format. Therefore if time permits, you should aim for some generality in your code. For example, you might design a FlatFile object that has constructors and methods for both the fixed-field case as well as the delimited-file case. This approach can be used in ECMAScript as well as Java. ECMAScript allows for the definition of custom objects, with their own constructors, etc.
Before writing any code, step back and do some planning. Remember that your ultimate goal is to build a DOM in which element names correspond, in some reasonable fashion, to field or column names in the flat file. For example: If you receive data which has a label saying "NAME:", pick the data after the colon to return as the content of a text node and label it with a tag of NAME.
In some cases, the incoming file may devote the first line (or, for a dBase file, an initial header section) to listing the headings for the various fields. If you can handle this case, it should be possible to generate a DOM that has tags corresponding directly to field names. But if the file lacks this information, your code should still be able to generate generic tags (with names like FIELD1, FIELD2, etc.) programmatically. You can apply an XSL stylesheet to the resulting file later on, to convert generic tags to custom tags. This separates the label mapping functionality from the data parsing functionality.
|
 |
 |
questions to ask |
 |
 |
 |
|
Determine the type of flat files you will be processing. Are they delimited flat files or fixed length flat files? There are multiple standards to be considered in either case. Establish the delimited flat file standard(s) of the files that will be encountered. Ask yourself the following types of questions.
- What delimiters does the file use?
- What escape characters are used?
- Are there headings in the first line?
- What is the record length, for fixed-length files?
- What is the padding character?
- Will there be a header, containing field-length hints?
- Does the record length or field length depend on type or order?
If your delimited flat file contains no escaped delimiters, you can parse it in ECMAScript using the String object's built-in split() method:
var delim = ",";
var myFields = aRecord.split(delim);
In this case, the variable myFields ends up containing an array, with each member of the array containing field data for each field. The delimiter specified above is the comma. Of course, if commas are
bounded in some manner inside the flat file, you will have to write extra code for that.
In Java, the StreamTokenizer class comes in handy for flat-file parsing, particularly for CSV files. See any good Java reference for information on using that class.
|
 |
 |
error handling |
 |
 |
 |
|
One other important point to note is flat files will most likely change as the business changes. Therefore, you want to try to create flexible functions (and/or objects with multiple constructors), and you want to code defensively, anticipating as many pathological cases as possible. If a record begins with an empty field, contains multiple escape characters, or has field names that contain characters that would be illegal for use as XML tag names, you should aim to handle those things gracefully.
|
 |
 |
example project |
 |
 |
 |
|
When we set out to create functions to process flat files (see the example project), we modeled our function on Excel's text import facility. Excel allows
imports of fixed length or delimited files. Excel also allows you to specify the delimiters of the file. We decided this was a good basic model to follow. We created two functions: One for processing delimited files, called ConvertText(), and one for processing fixed length files, ConvertFixedText().
The ConvertText function is written in ECMAscript, but calls a custom Java class, MyFileReader. The ConvertText function takes the following seven parameters:
- aInput - Required string - Input file name
- aOutput - Required DOMstring - Destination DOM
- aSepChar - Required string - delimiter i.e. coma for CSV
- aCharSepChar - Required string - string Delimiter i.e. " (usually)
- aHeadings - Not Required Boolean - use first line as headings or not
- aRowHeadings - Not Required Boolean - use first column as row headings or not
- aHeadSepChar - Not Required string - separator used in header if different
Using the information from these parameters, the function parses the input file and returns a DOM. The DOM can then be manipulated by an xCommerce
Component or written to an XML file.
Here is an example of calling the function from an xCommerce
Component.
This Component uses an input DOM to pass the necessary parameters to the ECMAscript ConvertText function.
The function executes using the parameters sent to it from the input DOM. Once executed, the function returns an Output DOM to the component.
The ConvertFixedText function takes a simple fixed length file as input and returns a DOM. The parameters for the ConvertFixedText function are:
- aList - Required List - List of field start and end positions
- aInput - Required String - Input file name
- aDomOutput - Required DOMstring - Destination DOM
- aHeadings - Required Boolean - Use first line as headings or not
- aRowHeadings - Required Boolean - Use first column as row headings or not
- aHeadSepChar - Required String - The separator char used in the headings
Using the information from these parameters, the function parses the fixed length input file and returns a DOM. The files data can then be manipulated by a xCommerce component or written to an XML file.
|
 |
 |
XML interchange |
 |
 |
 |
|
The Output DOM created by the two ConvertText functions can be saved to an XML file using an xCommerce
action called XML Interchange. The XML Interchange action reads external XML documents into a
Component's DOM and writes data from a Component's DOM as XML files. There are four types of XML Interchange
Actions: Get, Put, Post, and Post with Response.
When using the Get interchange, you must supply
a URL that points to the XML document you want to bring into the Component. Then you must specify a "Get Document Handle" or DOM which is to receive the XML. If the DOM name you specify does not exist, it will be created. You can optionally specify an HTTP connection Resource to use for the Get interchange type.
When using the Put interchange, you must supply
a URL that points to the location to which you want to write the XML document. Then you must specify a "Put Document Handle" (i.e., the name of a DOM in your component) or DOM to send its data as XML. You can optionally specify an HTTP connection Resource to use for the Get interchange type. The PUT method requests that the enclosed entity be stored under the supplied Request-URI.
When using the Post interchange, you must supply
a URL that points to the location to which you want to write the XML document. Then you must specify a "Put Document Handle" (i.e., the name of a DOM in your component) or DOM to send its data as XML. You can optionally specify an HTTP connection Resource to use. The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI.
When using the Post with Response interchange, you must supply the same parameters as for Post. In addition, you must specify a DOM to receive the Response XML document from the Post with Response action. It is the same as Post above except that the XML Interchange action is expecting a response XML object back from the origin server.
For our purposes in this example, we will use XML Interchange to do Put. We will put the Output DOM of the ConvertText_Component
to an XML file. Below is a screenshot of the XML Interchange action. As you can see, XML interchange asks for the type of interchange, the file name, the connection name (not applicable for this example), and the DOM that the data is coming from.
|
 |
 |
conclusion |
 |
 |
 |
|
Flat files continue to be an important data interchange format, and it is important for XML integration applications to be able to process such files. Custom ECMAScript functions (and custom Java code, reachable from those functions) allow any xCommerce Component or Service to handle flat files as input, thus bridging the gap between structured, hierarchical, metadata-rich XML data sources and more traditional, less hierarchically structured (flat) data sources. Once transformed to XML, flat-file data can be submitted to other xCommerce components or services for additional processing, written to disk, or sent to another server via HTTP. The XML Interchange action (a standard feature of xCommerce) makes it easy to send XML documents to URIs using GET, PUT, and two POST methods from within any xCommerce component.
Flat files are a reality of information processing. But, with a little planning and effort, processing them in xCommerce can be a snap.
|
 |
|
 |
 |
 |