Integrating Heterogeneous
Mala Krishnamurthy
CS 561
March 26, 1997
Integrating Heterogeneous Data Sources
The objective of database integration efforts is to make data distributed over a group of distinct,
heterogeneous data sources accessible through a single user interface. This capability will become
increasingly more important as the quantity of information available in publicly accessible databases
increases as a result of public funding of multi-year data gathering research programs, and the
number of individuals and organizations attempting to access and use this information become
more numerous.
Three approaches to database integration can be envisaged: merging of the individual schema into
a consensus schema before data entry; transformation of the data in the individual databases into
a common database with a rich enough schema to provide for all of the collected data; and real-time querying of multiple heterogeneous databases and coordinated presentation of the selected
data to the user. The third approach could be extended to include construction and maintenance
of a local user specific data warehouse.
The students are familiar with the work of Buneman, Davidson, Overton, Hart and Wong at the
University of Pennsylvania, in which a programming language (CPL: Collection Programming
Language) has been developed to transform data between multiple data formats and to serve as
a query interface to data stored in these formats. Through an analysis of this work, the essential
requirements for a real-time multi-database query tool have been defined.
The query tool must be capable of:
The objective of this course project will be to use commercially available application development
software [ Windows 95; Sybase SQL Anywhere (RDBMS); dBASE; PowerBuilder; PowerBuilder
Internet Developer Toolkit; WebSite (server); and Netscape (browser) ] to develop a WWW based
application that will allow users to perform multi-database queries. This report summarizes the
current status of the project.
Equipment Selection
The application will be developed on stand-alone Pentium chip based computers. No attempt will
be made to network multiple computers. Consequently, all of the application's software
components will be running simultaneously on a single computer. This arrangement allows each
student to develop components of the application while having the capability of running and
evaluating the complete application.
Software Selection
PowerBuilder 5.0 Desktop and the PowerBuilder Internet Developer Toolkit, from the PowerSoft
Division of Sybase, will be used to develop the database management, common gateway interface,
and database management components of the application. PowerBuilder is an industry standard
tool for developing client/server applications. Both students have had experience using a earlier
release, PowerBuilder 4.0. The Internet Developer Toolkit provides methods to transform
client/server applications into common gateway interface applications; this is a new product with
which the students have no prior experience. These packages are described more completely in
a following section.
WebSite hypertext transfer protocol server software is marketed by the publishers O'Reilly and
Associates. This is also a product with which the students have no prior experience.
The relational database management system will be Sybase SQL Anywhere, a 32-bit Windows
95/NT application. While this software is available as a stand-alone product, it is also included as
part of the PowerBuilder application. The students have had experience using an earlier release
of the product, Watcom SQL, which was a 16-bit application for Windows 3.1 and PowerBuilder 4.0.
Corel Quattro Pro7 will be used to create non-relational databases. Files will be saved as dBASEIII
files, as this is a data source format supported by PowerBuilder.
Model Databases
Although the students have an interest in applying the techniques developed in this project to
querying multiple public databases containing information related to the human genome program,
this project will be developed using model databases stored on the local computer. This approach
was required as the selected version of PowerBuilder only supports local database access, and it
is unlikely that user accounts on genome center databases can be established as part of this
project.
The model relational database was obtained from a material tracking application used in a
pharmaceutical manufacturing operation. The database includes three tables containing part
descriptions, lot descriptions, and product component listings.
The following screen-shot shows the relationships between the tables.
For the purposes of this project, two non-relational databases have been created: a parts
description database, which contains copies of the part_number and description columns of the
relational database parts_list table; and a suppliers database which contains a listing of part
numbers and suppliers (there can be multiple suppliers for a single part number).
PowerBuilder 5.0 Desktop
PowerBuilder 5.0 Desktop is a single user tool set for building Windows applications. As with other
application development tools, e.g., Microsoft Visual Basic or Visual C++, the developer has access
to a large variety of objects, e.g., command buttons, single-line edit boxes, static text. However the
PowerBuilder product line has unique features to simplify the development of client/server
applications that access databases.
Firstly, managing database connections is accomplished through either a built-in Open Database
Connectivity (ODBC) interface or direct connection via a database management system specific
database interface. When using the ODBC interface, the developer uses the PowerBuilder
application to define a data source name, source file, and associated database management
system. Direct connections to network databases are made through add-on modules which
connect to the database through the database vendor's application programming interface. The
PowerBuilder Desktop product can only utilize the ODBC interface to local databases; interfaces
to Access, dBASE, and FoxPro, in addition to 16-bit and 32-bit Sybase SQL Anywhere (Watcom)
interfaces, are included with the product. Direct database connection is supported in the
PowerBuilder Professional and Enterprise editions. Applications developed with the PowerBuilder
Desktop edition can be transported to the more advanced editions. Consequently, adapting the
application developed in this project to network databases is uncomplicated.
Secondly, retrieving and manipulating data from a database is accomplished through the
PowerBuilder datawindow object. The datawindow object maintains an association between the
database ODBC interface, the query statement, and the presentation style. Additionally, the
datawindow object automatically codes SQL update statements following modification of the data
at the client computer. Multiple datawindow objects can interact; data can be passed from one
datawindow object to a second datawindow object, providing a simple technique to use output from
one query as a selection criterion for a subsequent query.
Because the PowerBuilder product line was introduced to develop business applications, the
capability to generate a variety of printed report formats has been included in the product.
Basically, the developer can edit the style, format, and content of the data within the datawindow
object and create an attractive and effective report. Using the Internet Developer Toolkit (described
in following section), the formatted report can be converted into HTML for viewing with web
browsers.
Thus PowerBuilder provides the capabilities to meet all of the requirements of a multi-database
query tool.
PowerBuilder Internet Developer Toolkit
The Internet Developer Toolkit provides three approaches to providing database data via the WWW:
Both the "window plug-in" and "datawindow plug-in" approaches require additional software on the user's computer which is accessed by the browser. While this requirement would be acceptable in a private corporate environment, it is not suitable for a public WWW interface.
The Web.PB approach is similar to common gateway interface applications; the browser calls an application program located on the web server. The Web.PB approach differs from common gateway interface applications in that the application program located on the web server assumes the client/server role of "client," and all data processing and database accesses are performed by the database server and a finished report, converted to HTML, is transmitted back to the web server and browser. Because all of the software required to support this approach can be installed at the source, the Web.PB approach is suitable for a public WWW interface.
This project will utilize the Web.PB approach to provide information from the model databases to the requesting browser.
Client/Server Application
As an initial step in the development program, we used PowerBuilder to create a client/server application which retrieved information into two datawindow controls. The application was written as a multi-document interface, i.e., separate documents within a single application are used to execute independent queries. Queries that extracted data from either one or two databases were written; either the relational or non-relational database could serve as the data source for the independent or dependent query.
The transaction object specifies the parameters that PowerBuilder uses to connect to a database. PowerBuilder has a built-in transaction object (SQLCA: SQL Communication Area) that is created automatically when the application is started. In order to connect to two databases simultaneously, a second transaction object must be created by defining a global object inherited from the built-in transaction object. The variables of each transaction object are populated with the specifics of the database connection.
For this developmental application, two transaction objects were used; the first transaction object, SQLCA, maintains information related to the relational database; the second transaction object, trans2db, is provided with information related to the non-relational database.
Connect Using SQLCA;
// Test for successful connection
// Create a second transaction object
// Connect to Parts.DB
Connect using trans2db;
// Test for successful connection
// PowerBuilder provides a transaction
// object -- SQLCA
// There is no need to create SQLCA
// Connect to Material.DB
SQLCA.DBMS = "ODBC"
SQLCA.DBParm = "Connectstring='DSN=Material;UID=DBA;PWD=SQL'"
If SQLCA.SQLCODE = -1 Then
Messagebox("Database","Unable to connect to Material DB" , information!)
End IF
// trans2db has been defined as a global variable
trans2db = CREATE transaction
trans2db.DBMS = "ODBC"
trans2db.DBParm = "Connectstring='DSN=Parts;UID=DBA;PWD=SQL'"
If trans2db.SQLCODE = -1 Then
Messagebox("Database","Unable to connect with Parts DB", information! )
End IF
The important code in this excerpt are the assignments to the DBMS and DBParm variables.
SQLCA.DBMS = "ODBC"
SQLCA.DBParm = "Connectstring='DSN=Material;UID=DBA;PWD=SQL'"
and
trans2db.DBMS = "ODBC"
trans2db.DBParm = "Connectstring='DSN=Parts;UID=DBA;PWD=SQL'"
These assignment statements associate the ODBC interface and the data source name with each transaction object. Within PowerBuilder, the data source name "Material" had been previously associated with the Sybase SQL Anywhere ODBC driver and Material.DB file and the name "Part" associated with the dBASEIII ODBC driver and the directory containing the non-relational databases.
Three queries have been coded. The first provides a listing of part numbers and descriptions (top datawindow), and for each part number a listing of the lots of that part number (bottom datawindow). Scrolling through the parts listing causes the appropriate lot listing to be retrieved and displayed. A representative screen may be viewed.
The second query lists part numbers and descriptions, and for each part a list of suppliers is retrieved and displayed. A representative screen is reproduced below.
The third query lists the lots of material in inventory, and for each lot provides a listing of products manufactured from that material. To list the products in the dependent datawindow, this query extracts data from multiple tables of the relational database, as described in the following SQL statement.
SELECT "parts_list"."part_number",
"components"."lot_number",
"parts_list"."description"
FROM "components",
"lot_list", "parts_list"
WHERE ( "lot_list"."lot_number" = "components"."lot_number" ) and
( "parts_list"."part_number" = "lot_list"."part_number" ) and
( ( "components"."component" = :lot ) )
ORDER BY "parts_list"."part_number" ASC,
"components"."lot_number" ASC
The argument :lot, corresponds to the lot number of the selected part and is passed to the query from the independent datawindow. A representative screen is reproduced below.
The complete design of the client/server application is documented in the following library listing.
The PowerScript coding associated with each object is presented in the Appendix.
Continuing Work
Initial coding of the client/server application was completed on March 10th. Throughout the remainder of March we will be learning the basics of web server maintenance (WebSite) and the details of the Internet Developer Toolkit. Initial WWW pages will be designed and we will gain experience with forms programming.
In order to convert this client/server application into a web application, the Web.PB approach of the Internet Developer Toolkit will be utilized. We anticipate that the user's initial interaction with the application will be a WWW page allowing selection of one of the pre-coded or a user initiated query. To mimic multiple datawindow controls on a single screen, we will utilize "frame tags" to divide the response WWW page into multiple sections.
During the first two weeks of April, we will transpose the client/server application into a Web.PB application. During the third week of April, we will code queries in which the user provides a part number, lot number or part description and the database application provides some relevant information directly into the Web.PB application, and the last week will be spent completing the project documentation.
References
Buneman, P, Davidson, SB, Hart, K, Overton, C, & Wong, L. A Data Transformation System for Biological Data Sources. In: Proc. of the 21th VLDB Conference. Zurich, Switzerland. 1995.
Davidson, S, Hart, K, & Wong, L. Using CPL. Department of Computer and Information Sciences, University of Pennsylvania.
Davidson, SB, Overton, C, & Buneman, P. Challenges in Integrating Biological Data Sources. Department of Computer and Information Sciences and Department of Genetics, University of Pennsylvania.
Hart, K, & Wong, L. CPL as a Query Language for Genetic Databases. Department of Computer and Information Sciences, University of Pennsylvania.
Hart, K, & Wong, L. A Query Interface for Heterogenous Biological Data Sources. Department of Computer and Information Sciences, University of Pennsylvania.
Musciano, C. & Kennedy, B. HTML: The Definitive Guide. O'Reilly and Associates, Sebastopol, CA. 1996.
Peck, SB, & Arrants, S. Building Your Own WebSite. O'Reilly and Associates, Sebastopol, CA. 1996.
Appendix
The entire application is stored in a single PowerBuilder library file, "project.pbl." Within this file, objects, corresponding to windows, menus, and data window objects (the database queries), etc., are stored. Each object has an associated list of events. For example, the list of events associated with a window object includes "open," "close," "clicked," etc. To carry out the logic of the program, the application programmer associates code with selected events; not all events require code.
The application object, "project," is a nonvisual object whose name becomes the name of the application program. All application specific visual and nonvisual objects are must be associated with an application object. The code presented on page 6 of this report is contained within the "open" event of the application object. Within the "close" event, connections to the databases are released and the second transaction object is destroyed.
// Destroy the second transaction object
// PowerBuilder destroys SQLCA automatically
// Disconnect from the databases
disconnect using sqlca;
disconnect using trans2db;
destroy trans2db
Five window objects are currently required for the application. "w_about" simply presents version information for the application. None of the associated window events contain code.
"w_mdi_frame" is the multi-document interface window, the window in which each of the query windows is displayed. The only coding associated with this window is contained within the "open" event. When this window first opens, no query windows are opened. Consequently, coding within the "open" event resets the opened-window-counter to zero and disables the "File-Close" menu item. [ uo_sheetcounter is a nonvisual object which encapsulates a single variable representing the number of opened query windows. The object also encapsulates three functions which operate on the variable: zero, which resets the variable to zero; increment, which adds one (1) the variable; and decrement, which subtracts one (1) from the variable. As query windows open and close, the increment and decrement functions will be called. ]
// Disable the Close menu item
uo_sheetcounter.zero ()
m_mdi.m_file.m_close.enabled = false
The "open" and "close" events are coded for each of the query windows: "w_parts_lots;" "w_products;" and "w_suppliers." The code within the windows' "open" event increments the opened-window-counter, deactivates a window specific "File-Open-" menu item, associates the datawindow controls with the appropriate transaction object (database), and retrieves data into the independent datawindow.
For the "Parts and Associated Lots" query window, the "open" event code is:
// Deactivate the open menu
// Associate datawindow controls with transaction objects
// Retrieve data into the parts datawindow
// Increment the sheet counter and enable
// the Close menu item
uo_sheetcounter.increment ()
m_mdi.m_file.m_open.m_lotlisting.enabled = false
dw_partslisting.settransobject (trans2db)
dw_lotlisting.settransobject (sqlca)
dw_partslisting.retrieve ()
dw_partslisting.setfocus ()
For the "Active Parts and Suppliers" query window, the "open" event code is:
// Deactivate the open menu
// Associate datawindow controls with transaction objects
// Retrieve data into the parts datawindow
// Increment the sheet counter and enable
// the Close menu item
uo_sheetcounter.increment ()
m_mdi.m_file.m_open.m_supplierlisting.enabled = false
dw_suppliers.settransobject (trans2db)
dw_active_parts.settransobject (sqlca)
dw_active_parts.retrieve ()
dw_active_parts.setfocus ()
And for the "Products" query window, the "open" event code is:
// Deactivate the open menu
// Associate datawindow controls with transaction objects
// Retrieve data into the parts datawindow
// Increment the sheet counter and enable
// the Close menu item
uo_sheetcounter.increment ()
m_mdi.m_file.m_open.m_products.enabled = false
dw_lots.settransobject (sqlca)
dw_products.settransobject (sqlca)
dw_lots.retrieve ()
dw_lots.setfocus ()
The code within the"close" event decrements the opened-window-counter and enables the associated "File-Open" menu item. For the "Products" query window, the "close" event code is:
// Activate the open menu
// Decrement the sheet counter
uo_sheetcounter.decrement ()
m_mdi.m_file.m_open.m_products.enabled = true
Similar code is contained in the "close" event scripts for the other query windows.
As with the application object and windows, events are also associated with objects positioned on the window. The independent datawindow control on each window has associated events which may contain application specific code. For this application, the "rowfocuschanged" event of the independent datawindow contains code which retrieves data into the dependent datawindow. The data retrieved into the dependent datawindow is specific to the data in the selected row of the independent datawindow.
For the w_parts_lots query window, the coding in the "rowfocuschanged" event of the independent datawindow (dw_partslisting) is:
// Highlight the selected row
// Retrieve the part number and description from the row
// Set the title of the lot listing data window
// If no lots are retrieved, notify the user
string ls_partname , ls_partnumber
this.selectrow ( 0 , false )
this.selectrow ( currentrow , true )
ls_partname = this.getitemstring ( currentrow , 2 )
ls_partnumber = this.getitemstring ( currentrow , 1 )
dw_lotlisting.title = "Lot Listing for " + ls_partname
if dw_lotlisting.retrieve ( ls_partnumber ) = 0 then
dw_lotlisting.title = "There are no lots of " + ls_partname + " in inventory."
end if
As the user scrolls through the data, or clicks on a row of data, the "rowfocuschanged" event is triggered. With each execution of the "rowfocuschanged" event script the selected row is highlighted, information is extracted from the data in the selected row, the title of the dependent datawindow (dw_lotlisting) is changed to indicate the selected row, and, finally, data is retrieved into the dependent datawindow with the function call:
dw_lotlisting.retrieve ( ls_partnumber ).
Similar coding is contained within the "rowfocuschanged" event of the independent datawindow (dw_lots) of the w_products query window. The only difference in the script is the syntax of the function call which retrieves data into the dependent datawindow (dw_products):
dw_products.retrieve ( ls_lotnumber ).
While the coding is nearly identical the sources of data differs for the two query windows. w_parts_lots retrieves data from the nonrelational data base into the independent datawindow and from the relational database into the dependent datawindow. w_products retrieves data from the relational database into both datawindows. The appropriate database connections were established within the "open" event of the query window through the function call:
For the w_parts_lots query window:
dw_partslisting.settransobject (trans2db)
dw_lotlisting.settransobject (sqlca)
For the w_products query window:
dw_lots.settransobject (sqlca)
dw_products.settransobject (sqlca)
The transaction object "sqlca" contains connection information to the relational database and transaction object "trans2db" contains connection information to the nonrelational database.
Coding for the "rowfocuschanged" event of the independent datawindow (dw_active_parts) of the w_suppliers query window is slightly simpler:
// Highlight the selected row
// Retrieve the part number and description from the row
// Retrieve the listing of manufacturers
string ls_partname , ls_partnumber
this.selectrow ( 0 , false )
this.selectrow ( currentrow , true )
ls_partname = this.getitemstring ( currentrow , 2 )
ls_partnumber = this.getitemstring ( currentrow , 1 )
dw_suppliers.retrieve ( ls_partnumber )
Data is retrieved into the dependent datawindow (dw_suppliers) through the function call:
dw_suppliers.retrieve ( ls_partnumber )
The appropriate database connections for the w_suppliers query window were established within the window "open" event through the function calls:
dw_suppliers.settransobject (trans2db)
dw_active_parts.settransobject (sqlca)
During the application design process, the programmer associates the database queries with the datawindow object. These pre-defined datawindow objects are then placed into a datawindow control on the appropriate query windows. In event scripts, functions reference the datawindow control, as this is the object on the window, not the datawindow object.
The following table summarizes the relationships between the query windows, the datawindow objects, the datawindow controls, and the database queries.
A single common menu is used throughout the application. A submenu under the "File-Open-" menu allows opening of the query windows. As each window is opened, the corresponding submenu entry is disabled during execution of the "open" event of the query window; as each window is closed, the corresponding submenu entry is enabled during execution of the "close" event of the query window. The script within each submenu item is:
opensheet ( w_parts_lots , w_mdi_frame , 0 , original ! )
,
for query window w_parts_lots;
opensheet ( w_suppliers , w_mdi_frame , 0 , original ! )
,
for query window w_suppliers;
and
opensheet ( w_products , w_mdi_frame , 0 , original! )
,
for query window w_products.
The "File-Close" menu item is disabled as the application starts. Opening a query window enables the "File-Close" menu item during execution of the function call -- uo_sheetcounter.increment (). When a query window is closed, execution of the function call -- uo_sheetcounter.decrement () -- disables the "File-Close" menu item if no other query windows remain open. The script of the "File-Close" menu item first obtains a reference to the active query window and then closes the window. The code for this action is:
close ( parentwindow.getactivesheet ( ) ).
Finally, the "File-Exit" menu item exits the application. The code for this script is:
close ( parentwindow ).
No menu selections exist under the "Window" menu item. As each query window is opened, a reference to the window is added to the "Window" menu item. The "0" within the argument list for the opensheet ( ) function indicates that a reference to the opened window should be added to the next to last menu item on the menu bar (typically the "Window" menu item).
The "Help" menu item has a single selection "About...," which opens the informational window w_about through the script:
open (w_about).