Problem 2. Data Integration, OLAP and Multidimensional Analysis (20 points)

1. (8 points) Compare the data warehouse and the mediation approaches to data integration. Discuss the similarities and differences between the architectures of the two approaches.

Both data warehousing and data mediation are forms of data integration over multiple sources. The primary difference is that a data warehouse acts as a materialized view rather than a dynamic view like data mediation.

Both sources can answer the same queries, however there are tradeoffs to using each. A data warehouse tends to be faster since the view is materialized before the query is run. Mediation on the other hand provides up to date information because the view is not evaluated until runtime, which is why mediation is slower.

Data warehousing requires additional equipment such as a server with a DBMS. Most of this can be bought off the shelf. Mediation on the other hand needs specific programs tailored to the data to act as mediators and wrappers.

2. Consider the following relational table:

Model

Year

Color

Sales

Chevy

1990

red

5

Chevy

1990

white

87

Chevy

1990

blue

62

Chevy

1991

red

54

Chevy

1991

white

95

Chevy

1991

blue

49

Chevy

1992

red

31

Chevy

1992

white

54

Chevy

1992

blue

71

Ford

1990

red

64

Ford

1990

white

62

Ford

1990

blue

63

Ford

1991

red

52

Ford

1991

white

9

Ford

1991

blue

55

Ford

1992

red

27

Ford

1992

white

62

Ford

1992

blue

39

 

(a) (4 points) Show how the data in the relational table above can be represented in a multidimensional (MOLAP) database.

 

 

(b) (8 points) Using the data above explain how the pivoting, roll-up, drill-down, and slice-and-dice operations work.

Pivoting, rollup, drill-down, and slice-and-dice refer to front end query applications of OLAP. The focus of these applications is to compute data

relationships for one or more dimensions of the data by consolidating/aggregating data of interest to the user. A brief summary of these front end applications related to the above database is given below.

Pivoting - Pivoting operation is to select 2 (or more) dimensions of the data to aggregate a particular measure(s) of interest. For the above rational database, an example of pivoting is to view (obtain) all sales of cars that are Model-Chevy, Year-1990 and Color-red. Hence, total sales of 5 would be returned in this example. Sales information about all other years, models, and colors is omitted in this search. 

Roll-up - Roll-up operation is to further increase the level of aggregation of the data, perhaps to focus on a single dimension. An example of roll-up using the above data is to view (obtain) all sales of cars that are Model-Chevy. This would return a total of 508 sales of Model-Chevy (irrespective of year and color). Sales information on all other models, years and colors in the database is omitted in this query.

Drill-Down - Drill-down operation is an inverse operation to roll-up, i.e. to decrease the level of aggregation of the data. For the above relational data, we can use drill-down to determine the total sales for each color of a particular car model. For example, given the sales of Model-Chevy, drill-down to obtain sales of all Color-red Chevys. Another example of drill-down for a different dimension is given sales by Model-Chevy, return the sales for a particular year.

Slice-and-Dice - Slice and Dice operation reduces the dimensionality of the data by taking a projection of the data on a subset of dimension for selected values of another dimension. For example, given the total sales by Model, we can slice-and-dice to determine a subset of the sales of a particular color model for a particular year. Using the above data, we can view (obtain) sales for Model-Chevy, Year-1991. This would return 198 sales. Another dimensional view using slice-and-dice is to return all sales of Color-red cars. This would return 233. Given the dimensionality of the above relational datd base, slice-and-dice is similar to pivoting.