Text mining is a software technology that can help read unstructured texts. In other words, it recognizes grammatical structures of sentences, phrases and words and creates a kind of “fingerprint” from the source text as a result. Combined with conventional database indexing techniques, the comparison of text fingerprints delivers search and analysis methods that represent a paradigm shift for classical search-related activities.
While traditional search locates documents according to a user-entered keyword or combination of keywords, text mining allows the user to compare documents without entering any keyword(s) to identify documents that are related to the search document based on the fact that they contain similar language. This difference represents a complete change in thinking with regard to search, as explained below.
Figure 1: A document fingerprint
Text mining methods deliver new capabilities to knowledge workers:
They can use any document as a query to find familiar documents from any available source;
We have developed a proprietary, patented text mining method that differs from other methods in that it does not require a taxonomy or lexical dictionary to generate a highly representative fingerprint from a document. Unlike most other methods, it needs neither complex statistical algorithms nor complex indexing. It is purely linguistic in nature using only grammatical processing and therefore covers a text completely, even if the text incorporates words or word combinations that are new to the world (neologisms).
White Spaces
Because of our use of text mining, the researcher can find “unused” terminology or ‘white spaces'. When the software compares a document with a large array of others, it calculates the number of hits for each source or filtered collection, showing which concepts have matches and which ones do not. Visibility of the zero scores allows for analysis and detection of areas of content that are very new, as well as areas in which one would still be relatively free to operate.