Both data warehousing and data mediation are forms of data intregration

Problem 2. Data Integration, OLAP and Multidimensional Analysis (20 points)

1. (8 points) Compare the data warehouse and the mediation approaches to data integration. Discuss the similarities and differences between the architectures of the two approaches.

Both data warehousing and data mediation are forms of data integration over multiple sources. The primary difference is that a data warehouse acts as a materialized view rather than a dynamic view like data mediation.

Both sources can answer the same queries, however there are tradeoffs to using each. A data warehouse tends to be faster since the view is materialized before the query is run. Mediation on the other hand provides up to date information because the view is not evaluated until runtime, which is why mediation is slower.

Data warehousing requires additional equipment such as a server with a DBMS. Most of this can be bought off the shelf. Mediation on the other hand needs specific programs tailored to the data to act as mediators and wrappers.

2. Consider the following relational table:

Model	Year	Color	Sales
Chevy	1990	red	5
Chevy	1990	white	87
Chevy	1990	blue	62
Chevy	1991	red	54
Chevy	1991	white	95
Chevy	1991	blue	49
Chevy	1992	red	31
Chevy	1992	white	54
Chevy	1992	blue	71
Ford	1990	red	64
Ford	1990	white	62
Ford	1990	blue	63
Ford	1991	red	52
Ford	1991	white	9
Ford	1991	blue	55
Ford	1992	red	27
Ford	1992	white	62
Ford	1992	blue	39

(a) (4 points) Show how the data in the relational table above can be represented in a multidimensional (MOLAP) database.

(b) (8 points) Using the data above explain how the pivoting, roll-up, drill-down, and slice-and-dice operations work.

Pivoting, rollup, drill-down, and slice-and-dice refer to front end query applications of OLAP. The focus of these applications is to compute data

relationships for one or more dimensions of the data by consolidating/aggregating data of interest to the user. A brief summary of these front end applications related to the above database is given below.

Pivoting - Pivoting operation is to select 2 (or more) dimensions of the data to aggregate a particular measure(s) of interest. For the above rational database, an example of pivoting is to view (obtain) all sales of cars that are Model-Chevy, Year-1990 and Color-red. Hence, total sales of 5 would be returned in this example. Sales information about all other years, models, and colors is omitted in this search.

Roll-up - Roll-up operation is to further increase the level of aggregation of the data, perhaps to focus on a single dimension. An example of roll-up using the above data is to view (obtain) all sales of cars that are Model-Chevy. This would return a total of 508 sales of Model-Chevy (irrespective of year and color). Sales information on all other models, years and colors in the database is omitted in this query.

Drill-Down - Drill-down operation is an inverse operation to roll-up, i.e. to decrease the level of aggregation of the data. For the above relational data, we can use drill-down to determine the total sales for each color of a particular car model. For example, given the sales of Model-Chevy, drill-down to obtain sales of all Color-red Chevys. Another example of drill-down for a different dimension is given sales by Model-Chevy, return the sales for a particular year.

Slice-and-Dice - Slice and Dice operation reduces the dimensionality of the data by taking a projection of the data on a subset of dimension for selected values of another dimension. For example, given the total sales by Model, we can slice-and-dice to determine a subset of the sales of a particular color model for a particular year. Using the above data, we can view (obtain) sales for Model-Chevy, Year-1991. This would return 198 sales. Another dimensional view using slice-and-dice is to return all sales of Color-red cars. This would return 233. Given the dimensionality of the above relational datd base, slice-and-dice is similar to pivoting.