Accessing Heterogeneous
Databases

Mala Krishnamurthy
John Lifter

CS 561
May 1997


Abstract

The suitability of a commercially available client/server development tool to provide simultaneous, inter-related access to multiple, heterogeneous databases was evaluated. The Collection Programming Language, developed at the University of Pennsylvania, was identified as a successful implementation of this objective. Identical functionality was demonstrated with using the PowerBuilder client/server application development software. We therefore conclude that commercially available development tools are suitable for combining data across a diverse spectrum of sources. Users with a knowledge of the data models employed at the data sources will be able to use these commercially available tools to integrate data from heterogeneous data sources. Additional database system drivers, particularly to object oriented database systems, need to be developed. Current HTML coding specifications are sufficient to allow knowledgeable users to develop WWW query tools for less sophisticated users.

Keywords:

Section 1
Introduction

The objective of database integration efforts is to make data distributed over a group of distinct, heterogeneous data sources accessible through a single user interface. This capability will become increasingly more important as the quantity of information available in publicly accessible databases increases as a result of public funding of multi-year data gathering research programs, and the number of individuals and organizations attempting to access and use this information become more numerous.

Three approaches to database integration can be envisaged: merging of the individual schema into a consensus schema before data entry; transformation of the data in the individual databases into a common database with a rich enough schema to provide for all of the collected data; and real-time querying of multiple heterogeneous databases and coordinated presentation of the selected data to the user. The third approach could be extended to include construction and maintenance of a local user specific data warehouse.

One area of interest to the students is the nucleic acid sequencing, protein sequencing and protein structure data collected through the Human Genome Project. Currently this information is distributed over a number of databases which are supported by different data schema and are executed under a variety of database management systems. Clearly it is too late to establish a consensus schema and software approach as a methodology for integrating the data sources.

While one of the major databases, GenBank, consolidates data from many sources into a single format, this source does not collect all of the pertinent information. Specifically, information cataloging the locations of genes and shorter nucleic acid sequences on the chromosomes is maintained within the Genome Data Bank database. Additionally, new experimental techniques alter the nature of the collected data and may require revision to the database schema or creation of new databases. Consequently, the existence of multiple databases, and a need to execute simultaneous queries against these multiple databases, will continue to be the standard situation. The only approach to database integration which can address this situation in the near future is effective real-time querying of multiple databases.

Buneman, Davidson, Overton, Hart and Wong at the University of Pennsylvania have demonstrated a successful approach to this problem. Within this work, a programming language, CPL (Collection Programming Language) was developed. This language was designed to deal with a variety of collection types, e.g., sets, bags and lists, and to transform data between the variable data formats. The language can serve as a query interface to data stored in these formats. Additionally, the language includes function definition capability which was used to define the set of primitives needed to implement a client/server type application. Using CPL, these workers were able to execute queries across multiple databases containing variable data formats and present the results as a nested report in which related data from the multiple databases were presented as a single unit.

Through an analysis of this work, the essential requirements for a real-time multi-database query tool have been defined. The query tool must be capable of:

Access to the University of Pennsylvania's genome multi-database query tool may be obtained through a WWW site at http://agave.humgen.upenn.edu/cpl/cplhome_queries.html. Through this interface a user may enter arguments to a predefined query, and receive the resulting output, through standard web formats. The following figure reproduces the input form for a typical query.

The results are returned in a format that retains the relationship between the multiple queries.

The objective of this course project was to use commercially available application development software to develop a WWW based application that will allow users to perform multi-database queries. After demonstrating successful multi-database querying, the application will be recast as a WWW based application.

The remainder of this report is divided into the following sections. Section 2 presents a motivating example. Section 3 presents the alternative methods evaluated for recasting the application as a WWW application. Section 4 contrasts the results of this project with the University of Pennsylvania's CPL-based approach, and Section 5 discusses possible future work.

Section 2
PC Based Client/Server Application

A complete description of the PC based client/server application has been presented in the interim report dated March 26, 1997. In brief, two model databases were created: a relational database containing three tables; and a non-relational database containing two spreadsheets. The relational database contains part and lot information for a pharmaceutical manufacturing operation. The non-relational database contains a duplication of selected part information from the relational database and a listing of part numbers and suppliers.

As documented in the interim report, the application was successfully implemented. Critical to the success of the application was the fact that the commercial application development software (PowerBuilder) contained a built-in Open Database Connectivity (ODBC) interface which included drivers to popular spreadsheet applications as well as to the integrated relational database management system (Sybase SQL-Anywhere).

Connection to multiple databases was facilitated through the use of "transaction objects." Within the client/server application, required connection information, e.g., storage location of the database file, appropriate database engine, user identification and password, is coded into the transaction object. Subsequent access to the database was accomplished by associating the transaction object with the query statement. While PowerBuilder provided a base transaction object as part of the class library, similar objects or functionality can be easily created with other client/server application development tools, e.g., Microsoft's Visual Basic or INTERSOLV's Allegris.

The PowerBuilder tool retrieves and manipulates data from a database through the datawindow object. The datawindow object maintains an association between the database ODBC interface, the query statement, and the presentation style. Additionally, the datawindow object automatically codes SQL update statements following modification of the data at the client computer. Multiple datawindow objects can interact; data can be passed from one datawindow object to a second datawindow object, providing a simple technique to use output from one query as a selection criterion for a subsequent query.

In the example application, the results from the primary, independent, query were displayed in a tabular format in a dedicated datawindow object. Selecting a row from this result table initiated a query of the secondary database, and the results were displayed in either tabular or list format in another dedicated datawindow object. Although the example application did not include print capability, the contents of a datawindow object can be easily printed using predefined functions encapsulated within the datawindow object. Maintaining the hierarchy between the database queries was accomplished by displaying the results from both the independent and dependent queries simultaneously.

This example client/server application clearly demonstrated the ability to simultaneously query multiple databases, using information returned from one database as selection criteria for the queries directed to subsequent databases. Extension of the project to developing a WWW based application with similar capability was initiated.

Section 3
WWW Based Application

To transfer the PC client/server application to a WWW format, another commercial product, PowerBuilder Internet Developer Toolkit, was selected. This product provides three approaches to providing database data via the WWW:

The Web.PB approach is similar to common gateway applications; the browser calls an application program located on the web server. The Web.PB approach differs from common gateway interace applications in that the application program located on the web server assumes the client/server role of "client," and all data processing and database accesses are performed by the database server and a finished report, converted to HTML, is transmitted back to the web server and browser. All of the software required to support this approach is installed on either the web server or database server; no plug-in's need to be installed with the browser.

Our first approach to adapting the application to a WWW format was to mimic the functionality of the PC client/server application through the use of HTML frames within a Web.PB approach.

In response to the user's initial request to access the application, the web server will provide an HTML frame page which serves as both the application home page and as the base page for presenting query results. A reproduction of this page is provided on the following page. The top frame serves as the home page, while the bottom frame provides further directions for use. The side frame contains a listing of the query selections. At the current time all queries are fully coded and no user input is required. Form processing could be incorporated into the side frame to allow user input.

For each type of query two alternative presentations are available: multiframe format; and nested format. The multiframe format mimics the PC client/server application. The top frame is populated with the results from the independent query and the bottom frame is populated with the results from the dependent query. The nested format opens a new HTML page in which the results from both the independent and dependent queries are presented in a nested presentation; information from the independent query is immediately followed by related information from the dependent query.

Like many of today's applications, the PowerBuilder Internet Developer Toolkit provides for direct translation of tabular data into HTML table format. Issuing the "htmltable" message to a datawindow object converts the contents of a datawindow object into a string containing the HTML code. This simple functionality was not, however, completely suitable for our requirements for two reasons. Firstly, the htmltable conversion does not allow for creation of hyperlinks. Our multi-frame format required that each result from the independent query contain a hyperlink to a dependent query. Secondly, the htmltable conversion produces standard HTML table code which would not maintain a clear distinction between data derived from the independent query and data derived from the dependent query. Our nested format required a method to clearly and unambiguously present nested information.

Consequently, we needed to go beyond the functionality provided by the Internet Developer Toolkit in creating the pages for the WWW application.

The interaction between the browser, web server, application server, and database management system within the Web.PB approach is summarized in the figure on the following page. Clicking a hyperlink within the list of query selections sends a request to the web server to run a common gateway interface (CGI) program. A common CGI program is called for each Web.PB request. Embedded within the CGI call is information which identifies the desired PowerBuilder application and provides required arguments to the CGI program. The Web.PB CGI program, located on the web server, uses the browser supplied information to identify the application server, database application, and query which has been requested and issues a call to the application server. The application server identifies the query and makes the appropriate connections to the database management system, issues the query, retrieves the results, and formats the results into a string containing the HTML code to be displayed by the browser. The application server returns the HTML code string to the web server which passes the information to the browser for display.

In creating the Web.PB application, the visual PC client/server application needs to be recoded to run as a non-visual application within the application server.

Each Web.PB application is composed of a collection of related functions. The functions composing a Web.PB application are collected into a non-visual object. A Web.PB application can contain any number of functions collected into any number of non-visual objects. Additionally, a single application server can be simultaneously running multiple Web.PB applications. Each Web.PB application is uniquely identified by a name, and each function is also identified by a unique name. The CGI program on the web server maintains a listing of application names, related application servers, and associated application functions and is able to call the requested application and link the return to the requesting browser.

Our Web.PB application contains a single non-visual object which contains 9 functions, summarized in the following figure.

To initiate our Web.PB application, the user requests document "cs561project.htm" from the web server. This document establishes the frame set-up for the application with the following HTML code:

<frameset cols="25%,*">
<frame src="side.htm" noresize name="side">
<frameset rows="*,*">
<frame src="top.htm" name="top">
<frame src="bottom.htm" name="bottom">
</frameset>
</frameset>

The document "side.htm" contains HTML coding for the query selection list; document "top.htm" contains HTML coding for the application's home page information; and document "bottom.htm" contains HTML coding for the users' directions.

For browsers which do not support frames, document "cs561project.htm" sets up a modified home application home page and an entry into the query listing.

<strong>This demonstration project mimics
the functionality of the University of PA's
CPL multi-database query tool.</strong>
<p><hr width="60%">
</center>

<p><h4>Directions:</h4>
<ol>
<li>Select a query from the listing on the
<A HREF="side.htm">next page.</A>
<li>The results from the primary query will be displayed.
<li>Select an entry from the independed query results.
<li>The results from the dependent query will be displayed.
<li>To select another part, move back to the preceding page.
</ol>

Two functions are required for each of the multi-frame queries. The first function is called from the query selection list and fills the top frame with the results from the independent query. Within each of these return items a hyperlink is established to call the second function with an appropriate argument. The second function is called when the user selects an item from the top frame, and fills the bottom frame with the results from the dependent query. Selecting another item from the top frame re-executes the second function and re-populates the bottom frame. This multi-frame format approach mimics the PC client/server application.

One function is required for each of the nested queries. Each of these functions operates in a similar manner. Firstly, the independent query is executed and the results are retrieved into a non-visual datawindow (termed a "datastore" by the Powersoft company). Secondly, the dependent query is called for each item in the independent query result set. If the dependent query returns data, the information from the independent query is combined with the information from the dependent query. If the dependent query does not return data, no output is generated. When all of the items returned from the independent query have been evaluated, the function returns the result HTML string to the web server. This nested format approach closely mimics the CPL approach.

The query listing provides for three queries: active parts and lots; active parts and suppliers; and raw materials and products. As each function progresses, a return string is constructed which contains the HTML coding for the resulting page. In the following sections, each function is briefly described and the coding discussed.

Active Parts and Lots
Multi-Frame Format

The two functions are:
f_parts_lots ( ) returns string; and
f_active_lots ( string part_no ) returns string.

f_parts_lots ( ) returns string

This function is called by clicking on the hyperlink "Multiframe" under the Active Parts and Lots query selection. This hyperlink is associated with the following tag:

<A HREF=/cgi-shl/pbcgi050.exe/cs561/u_cs561/f_parts_lots?
target="top">Multiframe</A>

This tag calls the unique PowerBuilder CGI pbcgi050, with the application name cs561, and calls function f_parts_lots within the u_cs561 non-visual object. The resulting HTML code is targeted to the top frame.

The responsibility of this function is to retrieve the parts list information and to generate an HTML table in which the part number displayed in each line is a hyperlink to the function f_active_lots.

f_active_lots ( string part_no ) returns string

The purpose of this function is to retrieve the active lot information when a entry in the top frame is selected. The function is provided with an argument, the part number of the selected item, and uses this value in the retrieval. Because this function only needs to return a tabular listing of the retrieved information, the built-in datawindow message "htmltable" is used to generate the HTML code.

A representative browser window for this query is reproduced in the figure on the following page.

Active Parts and Suppliers
Multi-Frame Format

The two functions are:
f_parts_suppliers ( ) returns string; and
f_active_suppliers ( string part_no ) returns string.

f_parts_suppliers ( ) returns string

This function is called by clicking on the hyperlink "Multiframe" under the Active Parts and Suppliers query selection. This hyperlink is associated with the following tag:

<A HREF="/cgi-shl/pbcgi050.exe/cs561/u_cs561/f_parts_suppliers?"
target="top">Multiframe</A>

This tag calls the unique PowerBuilder CGI pbcgi050, with the application name cs561, and calls function f_parts_suppliers within the u_cs561 non-visual object. The resulting HTML code is targeted to the top frame.

The responsibility of this function is to retrieve the parts list information and to generate an HTML table in which the part number displayed in each line is a hyperlink to the function f_active_suppliers.

f_active_suppliers ( string part_no ) returns string

The purpose of this function is to retrieve the supplier information when a entry in the top frame is selected. The function is provided with an argument, the part number of the selected item, and uses this value in the retrieval. Because this function only needs to return a tabular listing of the retrieved information, the built-in datawindow message "htmltable" is used to generate the HTML code.

A representative browser window for this query is reproduced in the following figure.

Raw Materials and Products
Multi-Frame Format

The two functions are:
f_raw_materials ( ) returns string; and
f_product_list ( string part_no ) returns string.

f_raw_materials ( ) returns string

This function is called by clicking on the hyperlink "Multiframe" under the Raw Materials and Products query selection. This hyperlink is associated with the following tag:

<A HREF="/cgi-shl/pbcgi050.exe/cs561/u_cs561/f_raw_materials?"
target="top">Multiframe</A>

This tag calls the unique PowerBuilder CGI pbcgi050, with the application name cs561, and calls function f_raw_materials within the u_cs561 non-visual object. The resulting HTML code is targeted to the top frame.

The responsibility of this function is to retrieve the lot list information and to generate an HTML table in which the lot number displayed in each line is a hyperlink to the function f_product_list.

f_product_list ( string part_no ) returns string

The purpose of this function is to retrieve the related lots of product when a entry in the top frame is selected. The function is provided with an argument, the lot number of the selected item, and uses this value in the retrieval. Because this function only needs to return a tabular listing of the retrieved information, the built-in datawindow message "htmltable" is used to generate the HTML code.

A representative browser window for this query is reproduced on the following page.

Active Parts and Lots
Nested Format

The function is:

f_nested_parts_lots ( ) returns string

f_nested_parts_lots ( ) returns string

This function is called by clicking on the hyperlink "Nested" under the Active Parts and Lots query selection. This hyperlink is associated with the following tag:

<A HREF=/cgi-shl/pbcgi050.exe/cs561/u_cs561/f_nested_parts_lots?
target="_top">Nested</A>

This tag calls the unique PowerBuilder CGI pbcgi050, with the application name cs561, and calls function f_nested_parts_lots within the u_cs561 non-visual object. The resulting HTML code is targeted to a new browser window as a result of the target="_top" specification.

This function creates two datastores and retrieves the part number information into the first datastore. For each entry in the datastore, the algorithm extracts the part number from the first column of the datastore and retrieves the lot numbers associated with the selected part number into the second datastore. If data is retrieved into the second datastore, i.e., lots of the selected part are in inventory, the algorithm builds a string which contains data from the current line of the parts datastore and all the data from the associated lots datastore. The loop is repeated until lot information on all parts has been retrieved. The resulting string, containing the HTML code, is returned to the web server.

A representative browser window for this query is reproduced on the following page.

Active Parts and Suppliers
Nested Format

The function is:

f_nested_parts_suppliers ( ) returns string

f_nested_parts_suppliers ( ) returns string

This function is called by clicking on the hyperlink "Nested" under the Active Parts and Suppliers query selection. This hyperlink is associated with the following tag:

<A HREF="/cgi-shl/pbcgi050.exe/cs561/u_cs561/f_nested_parts_suppliers?"
target="_top">Nested</A>

This tag calls the unique PowerBuilder CGI pbcgi050, with the application name cs561, and calls function f_nested_parts_suppliers within the u_cs561 non-visual object. The resulting HTML code is targeted to a new browser window as a result of the target="_top" specification.

This function creates two datastores and retrieves the part number information into the first datastore. For each entry in the datastore, the algorithm extracts the part number from the first column of the datastore and retrieves the suppliers associated with the selected part number into the second datastore. If data is retrieved into the second datastore, i.e., suppliers exist for the selected part, the algorithm builds a string which contains data from the current line of the parts datastore and all the data from the associated suppliers datastore. The loop is repeated until lot information on all parts has been retrieved. The resulting string, containing the HTML code, is returned to the web server.

A representative browser window for this query is reproduced below.

Raw Materials and Products
Nested Format

The function is:

f_nested_lots_products ( ) returns string

f_nested_lots_products ( ) returns string

This function is called by clicking on the hyperlink "Nested" under the Raw Materials and Products query selection. This hyperlink is associated with the following tag:

<A HREF="/cgi-shl/pbcgi050.exe/cs561/u_cs561/f_nested_lots_products?"
target="_top">Nested</A>

This tag calls the unique PowerBuilder CGI pbcgi050, with the application name cs561, and calls function f_nested_lots_products within the u_cs561 non-visual object. The resulting HTML code is targeted to a new browser window as a result of the target="_top" specification.

This function creates two datastores and retrieves the lot information into the first datastore. For each entry in the datastore, the algorithm extracts the lot number from the first column of the datastore and retrieves the products associated with the selected lot number into the second datastore. If data is retrieved into the second datastore, i.e., the selected lot has been used to manufacture other materials in inventory, the algorithm builds a string which contains data from the current line of the lots datastore and all the data from the associated products datastore. The loop is repeated until product information on all lots has been retrieved. The resulting string, containing the HTML code, is returned to the web server.

A representative browser window for this query is reproduced on the following page.

In addition to the non-visual object, "u_cs561," which encapsulates the database access functions described above, our Web.PB application includes several additional objects. The non-visual application object, "demo," serves as a repository for all objects required by the application. The only action associated with this object is the coding to open a notification window on the application server's screen.

// Start the application by opening the
// application window

open (w_cs561demo)

The visual window object, "w_cs561demo," has two important functions. Within the open event of the window are the commands which create the transport object, define the Web.PB application name, identify the application server ("localhost"), and set the driver characteristics ("Winsock"), as shown in the following coding:

// Declare local variables
string errText , errNo

// Set connection information
// The Transport object contains the parameters that
// PowerBuilder uses to process client requests to
// connect to a server application. The Transport object
// controls all communications between the client and
// the server.

mytransport = CREATE transport
mytransport.driver = "Winsock"
mytransport.location = "localhost"
mytransport.application = "cs561demo"

// Start listening for client connections
mytransport.listen ()

// Check for errors
if (mytransport.errCode <> 0) then
errText = mytransport.errText
errNo = string (mytransport.errCode)
MessageBox ("Application Open" , &
"Could not start listener! ~rñ" &
+ errNo + ": " + errText )
close (this)
return
end if

A command button on the window provides the capability to terminate the application by closing the window [ close ( parent ) ]. During the close event of the window, the transport object is destroyed.

// Stop listening for client connections
mytransport.stoplistening ()

// Free memory
DESTROY mytransport

The database access functions refer to "dataobjects," which are the datawindow controls containing the SQL query statement. These objects are the same objects as the datawindow objects used in the PC client/server application, which insures that identical queries are requested for both the PC and Web.PB based applications.

The following table summarizes the relationships between the non-visual object functions, the datawindow objects, and the database queries.

When HTML code includes a reference to the Web.PB CGI program "pbcgi050.exe," additional parameters must be provided. As parameters, the HTML anchor tag or form tag include: a reference to a section of an INI file located in the web browser computer Windows directory; the non-visual object within the Web.PB application which encapsulates the requested function; and the name of the requested function. The web server INI file contains references to the appropriate web server and application server driver software, the name of the application on the application server (as defined in the transport object of the application [see footnote 1, page 31] ), and the internet location of the application server. The web server uses the INI entry to identify the communication protocol, application server, and application name, and programmer supplied arguments to identify the appropriate non-visual object and function.

For our application, the web server "pbweb.ini" file contains the following listing:

[cs561]
INI file section identification

driver="winsock"
communication protocol

application="cs561demo"
transport object property

location="localhost"
application server address

A typical HTML anchor tag reference would be:

HREF="/cgi-shl/pbcgi050.exe/cs561/u_cs561/f_nested_parts_suppliers?"

where

cs561
refers to the section of the INI file

u_cs561
refers to the non-visual object

f_nested_parts_suppliers
refers to the desired function

If the requested function requires an argument(s), it(they) would be included after the "?" in the anchor tag.

For the purposes of our demonstration, the browser application, web server application, and Web.PB application are all running on a common computer under the Windows 95 operating system. The WebSite web server application and the Web.PB application are started; a small window opens indicating that the Web.PB application is running. The browser application is started and the home directory of the web server (IP address "localhost") is requested. From the web server home page, document "cs561project.htm," within the "WPI Project" directory, is requested. This document loads the demonstration project home page (see the Appendix for a listing of the HTML code). To terminate the demonstration, the browser is closed and the "Stop" command button on the Web.PB application window is clicked. The web server application may then be stopped.

Section 4
Comparison of the CPL and PowerBuilder Approaches

The University of Pennsylvania's CPL approach has four characteristics which make it a suitable tool to query multiple databases and integrate data from heterogeneous sources. Firstly, the language can handle a variety of data types and convert data from one data type to another. Secondly, CPL is capable of extracting specific elements from a data collection and providing the extracted element as an argument to subsequent operations. Thirdly, the CPL syntax allows multiple query patterns to be combined into a single expression. And fourthly, the language permits user definition of functions which allows complex operations to be reduce to simple function calls.

Using CPL, functions were designed which established connections with multiple database management systems. Additionally, functions describing standardized queries were coded and made available to interested users via a WWW site. Responses to users' queries are presented in a clear and unambiguous manner, frequently with embedded hyperlinks to further information.

We have achieved identical functionality using commercially available client/server application development software. In our approach, database connections are described through the property settings of a "transaction object." As shown in the accompanying screens, a standardized dialogue box is used to provide database connection information. Once completed, the information contained within these forms in incorporated directly into the PowerBuilder application during the creation of the executable.

For our application, two transaction objects were initialized, the first to the three table relational database and a second to the two spreadsheet database.

Initialization of the transaction objects was performed during the constructor event of the nonvisual object. The following code summarizes the steps involved in creating and initializing the transaction objects.

// Create a second transaction object
// trans2db has been defined as a instance variable,
// of type "transaction," within the nonvisual object
trans2db = CREATE transaction

// Connect to Parts.DB
// Initialize the transaction object properties
trans2db.DBMS = "ODBC"
trans2db.DBParm = "Connectstring='DSN=Parts;UID=DBA;PWD=SQL'"

Connect Using trans2db;

// Connect to Material.DB
// The SQLCA transaction object is a system
// defined global variable and does not
// need to be created by the user

SQLCA.DBMS = "ODBC"
SQLCA.DBParm = "Connectstring='DSN=Material;UID=DBA;PWD=SQL'"
Connect Using SQLCA;

The PowerBuilder application coding environment can also handle variable data types. Once a data source has been described to the application environment (generally by providing an overview of the data table layout and data types through the SQL statements used in defining the datawindow objects, although a field-by-field description is also acceptable), data may be retrieved from, or submitted to, the data base through the datawindow object.

Once data has been collected into the datawindow or datastore object, specific data items can be easily extracted. Functions encapsulated within the datawindow/datastore object allow retrieval of any item of data (equivalent to any attribute value of a tuple within a relation). These extracted data items can be used as retrieval arguments for subsequent database queries. Multiple datawindow or datastore objects can be used to collect data of different types and the information from each of the objects combined into a nested report. Our application utilized this scheme in generating the nested format query results.

The PowerBuilder environment provides for user coding in a C/C++ like language. Complex functions can be defined and encapsulated within non-visual (or visual) objects. Our application utilized 9 functions which provided user execution of three predefined queries. These functions could be easily rewritten to allow for user entry of selection arguments.

Our application also provided a clear and unambiguous reporting capability. Within the PC client/server application, a dual view approach was employed. All data from the independent query was retrieved to the client application and displayed to the user. User selection of data from the independent query triggered execution of the dependent query and retrieval of related data into a second display window. This master/slave approach is suitable for an online client/server application where connections to the distributed databases can be maintained throughout the session and a series of queries will be handled without interruption.

In the WWW application, the dual view approach was recreated through the use of HTML frames. This approach was workable but not as satisfactory as the similar approach within the PC client/server application. Each query from the browser through the web server to the application server will be handled as a separate database transaction requiring reestablishment of the connection to the database management system. The nested reporting format was more efficient. In the nested format, all retrievals of dependent data were completed during the execution of a single function call. Consequently, only a single connection to the database management system must be established. Additionally, the browser user is provided with a complete result data set which may be visually browsed or printed.

We conclude that our application approach and nested report format provide the same querying capability and completeness and clarity of reporting as the CPL approach developed at the University of Pennsylvania.

Section 5
Future Work

Although our work has emphasized PowerBuilder and the PowerBuilder Internet Developer Toolkit, similar approaches could be successfully designed with other commercially available software and client/server development tools. Virtually all word processing, spreadsheet, desktop database management systems, and drawing programs now include the capability of saving documents in HTML format. Consequently, any information can be represented as a static WWW page. For simple tabular or text information, these tools will be sufficient. To provide the WWW user with the ability to interact with the database application, or to provide a clear and unambiguous representation of nested data, a more sophisticated approach is required.

The software we employed in this project met these more rigorous requirements. With the datawindow plug-in approach, the WWW user can access the versatility of a client/server application. As the datawindow is capable of presenting graphical as well as text data, both figures and tabular summaries can be provided. The datawindow format is not suitable for presenting nested data as the simple table to HTML conversion routine and the elementary presentation style of the browser will destroy the formatting that nested reports require. Additionally, the datawindow plug-in format does not permit including hyperlinks in the output. Consequently, complex text based reports, with or without hyperlinks, require that the HTML coding be developed on an element-by-element basis.

To develop fully interactive, dynamic WWW based database applications, the following three capabilities must be provided:

Database connectivity is accomplished through a database management system specific driver. Most client/server development tools provide these drivers for the more popular commercial relational database management systems and spreadsheet programs. These drivers adhere to the ODBC standards and allow queries to be expressed in standard SQL format. For example, through the PowerBuilder dBase ODBC driver our application used SQL commands to interact with the non-relational spreadsheets. Drivers to object oriented database management systems are generally not available as part of the client/server development tools. Future work must include development of additional database system drivers.

The ability to embed database commands in a general purpose programming language is required if the developer wants to include hyperlinks or uniquely formatted output. Again, all of the commercially available client/server development tools provide this capability. These embedded database commands are uniquely identified within the source code (for PowerBuilder, embedded SQL commands end with a semicolon; general source code lines are terminated with the carriage return) and are passed directly to the database management system for execution. It is not required that these commands adhere to standard SQL; query commands optimized for object oriented databases could also be embedded. When using embedded database commands, the developer must identify the variables and/or data structures to store the query results. Our application utilized the PowerBuilder datawindow object to store the query results, but we could have used the approach employed by other development tools and stored the results in named variables of the appropriate data types. This second approach would be required when queries of object oriented databases return data types not supported by the datawindow object.

Because the general purpose programming languages incorporated into the client/server development tools support the standard programming techniques of looping, conditional testing, etc., it is straight forward to code nested queries, with the option of directing the queries to separate databases through unique transaction objects.

In the past, client/server development tools have been used to develop only the client side of the application; these applications could only communicate with the database management system through the ODBC driver. The PowerBuilder Internet Developer Toolkit extends the PowerBuilder application by providing the capability to communicate with the web server. More recently, Sybase has included the ability to communicate with web servers into their relational database management system software, and it is now possible for web server CGI programs to interact directly with the database management system. The functionality of the client/server development tool to create a server application capable of communicating with the web server was the primary enabling technology provided by the toolkit extension. The fact that the server application accesses the database(s) in response to a web browser/server request, results in a highly interactive and dynamic WWW database application equivalent in all aspects to the CPL approach.

We therefore conclude that the commercially available development tools are suitable for combining data across a diverse spectrum of sources. Users with a knowledge of the data models employed at the data sources will be able to use these commercially available tools to integrate data from heterogeneous data sources. Additional database system drivers, particularly to object oriented database systems, need to be developed. Current HTML coding specifications are sufficient to allow knowledgeable users to develop WWW query tools for less sophisticated users.