1.Consider the Singular Value Decomposition (SVD) method as presented in THE paper "Matrices, Vector Spaces, and Information Retrieval" (by M.W. Berry, Z, Drmac, and E.R. Jessup).
Given the SVD of the specific matrix A into 3 matrices U, Sigma, and V shown on page 349 of the paper, answer the following questions. Explain your answers.2. In the paper "Selection of Relevant Features in Machine Learning", Langley describes two strategies for evaluating sets of attributes: filter methods and wrapper methods. Describe the two approaches and discuss advantages and disadvantages of them.
- What is the rank of A?
The rank of A is equal to the number of nonzero singular values. In the SVD of A, Sigma contains the singular values of A on its diagonal. This means A has a rank of 4, since Sigma has 4 nonzero rows.
- What collection of vectors naturally derived from the SVD of A forms a basis for the row space of A?
The first four columns of U form the basis for the column space of A because Sigma has 4 nonzero values on its diagonal.
- What collection of vectors naturally derived from the SVD of A forms a basis for the row space of A?
The first four rows of V form the basis for the row space of A because Sigma has 4 nonzero values on its diagonal.
- How can one obtain the best possible rank-3 approximation of A?
The rank 3 approximation of A can be obtained by neglecting the lowest element (0.4195) of Sigma.
According to the Formula mentioned in the paper:
|| A - Ax ||f = min || A - X ||f rank(k)<=k For this example: || A - A3 ||f _______________ = .1876 || A ||f || A - A2 || f ________________ = .42 || A ||fIf we consider 19% a reasonably small change and 42% too large compared to the initial uncertainty, we can obtain the rank 3 approximation.
This technique does lower the rank but at the cost of losing some information.
- Suppose that A represents the database that you want to mine for patterns. Explain how you would use SVD to reduce the number of features in A before mining.
Singular Value Decomposition (SVD) is a technique to achieve rank reduction. Unlike the QR factorization approach, which only gives us a rank reduced basis for the column space, SVD simultaneously gives us rank reduction approximations for both spaces (row and column). The technique for using SVD to reduce the number of features in A is thus:
- The SVD factorization methods can be used to find redundant features in the data by the zero rows in the singular, diagonal matrix. Those features can be eliminated without any loss of data. (you can use a statistics tool to do this).
- Then, we can obtain the lowest possible rank approximation of A by setting all but the k largest singular values of A equal to zero. Note that we will lose some small amount of data, but this is often acceptable because it makes the mining problem less complex. The test shown in the answer to part d above may be used to determine how many features you can remove before the approximation gives unsatisfactory results.