A traditional approach to text mining has been to represent a document by a vector. In the
bag-of-words representation binary vectors are used and two documents are regarded as similar if the
angle between their corresponding vectors is small (i.e., correlation between the vectors is high).
The document vectors may be assembled into a term-document matrix (TDM). A more satisfying representation
of a document can be formulated in terms of bigrams or trigrams, because these have a better chance of
capturing semantic content Bigram vectors ran be assembled into bigram document matrices (BDM).
The TDM and BDM resemble the two-mode adjacency matrices associated with social network analysis (SNA).
Using cues from SNA, we formulate the one-mode social network adjacency matrices to form document-document
matrices (DD) and bigram-bigram matrices (BB). In this talk I outline the basics, discuss the connection
between text mining and social networks and, by example, illustrate the dimensionality issues raised
by such vector space methods.