The notion that text documents themselves might be visualizable is certainly intriguing, but not obvious. What types of data are contained in documents? Certainly there exists nominal data -- indeed the document itself might be considered one nominal data value in an infinite range of possible values.
Furthermore, text itself is nominal in nature. The assertions to the right are clearly meaningless (at least without some context, and even then, I doubt it). |
happy < architecture John < query |
Components of Text
First, we note that text does indeed have some interval-based components. We can count the number of words in a document, count the frequency of certain words, do classifications of certain words and count the number of times a class is referred to in a document, and so on.
Unfortunately, there are so many features to text (certainly some of them nominal), that the final data set is rather high dimensional. We can then go ahead and use the techniques briefly described prior to this example to scale the data set to a reasonable size.
In the video that follows, you will see a visualization system which displays nominal data in an extremely coherent form, doing so through a series of row and column transformations.