Problem 3

Problem 3. Pre-Processing, Feature Selection

1. Consider the Singular Value Decomposition (SVD) method as presented in THE paper "Matrices, Vector Spaces, and Information Retrieval" (by M.W. Berry, Z, Drmac, and E.R. Jessup).
Given the SVD of the specific matrix A into 3 matrices U, Sigma, and V shown on page 349 of the paper, answer the following questions. Explain your answers.
What is the rank of A?

The rank of A is equal to the number of nonzero singular values. In the SVD of A, Sigma contains the singular values of A on its diagonal. This means A has a rank of 4, since Sigma has 4 nonzero rows.

What collection of vectors naturally derived from the SVD of A forms a basis for the row space of A?

The first four columns of U form the basis for the column space of A because Sigma has 4 nonzero values on its diagonal.

What collection of vectors naturally derived from the SVD of A forms a basis for the row space of A?

The first four rows of V form the basis for the row space of A because Sigma has 4 nonzero values on its diagonal.
How can one obtain the best possible rank-3 approximation of A?

The rank 3 approximation of A can be obtained by neglecting the lowest element (0.4195) of Sigma.

According to the Formula mentioned in the paper:

|| A - Ax ||f = min || A - X ||f rank(k)<=k For this example: || A - A3 ||f _______________ = .1876 || A ||f || A - A2 || f ________________ = .42 || A ||f
If we consider 19% a reasonably small change and 42% too large compared to the initial uncertainty, we can obtain the rank 3 approximation.

This technique does lower the rank but at the cost of losing some information.
Suppose that A represents the database that you want to mine for patterns. Explain how you would use SVD to reduce the number of features in A before mining.

Singular Value Decomposition (SVD) is a technique to achieve rank reduction. Unlike the QR factorization approach, which only gives us a rank reduced basis for the column space, SVD simultaneously gives us rank reduction approximations for both spaces (row and column). The technique for using SVD to reduce the number of features in A is thus:

The SVD factorization methods can be used to find redundant features in the data by the zero rows in the singular, diagonal matrix. Those features can be eliminated without any loss of data. (you can use a statistics tool to do this).

Then, we can obtain the lowest possible rank approximation of A by setting all but the k largest singular values of A equal to zero. Note that we will lose some small amount of data, but this is often acceptable because it makes the mining problem less complex. The test shown in the answer to part d above may be used to determine how many features you can remove before the approximation gives unsatisfactory results.
2. In the paper "Selection of Relevant Features in Machine Learning", Langley describes two strategies for evaluating sets of attributes: filter methods and wrapper methods. Describe the two approaches and discuss advantages and disadvantages of them.

Filter methods use the general characteristics of the training set to select some features and exclude others. This is done independently of the induction algorithm that will use them. Wrapper methods generate a set of candidate features, run the induction algorithm on the training data and use the accuracy of the resulting description to evaluate the feature set. It is necessary to pick some estimate of accuracy. In this method one can stop adding or removing attributes when none of the alternatives improve the estimate of classification accuracy.

Filter methods require less computational time but require knowledge of the data that can be used to select the attributes. Wrapper methods have a higher computational cost but they may be more accurate because they do not introduce a new bias by using a different method for selection.