Bag-of-words representation of text
Consider the following text:
| A (real) vector is just a collection of real numbers, referred to as the components (or, elements) of the vector; |
The row vector
contains the number of times each word in the list {vector, of, the} appear in the above paragraph. Vectors can be thus used to represent text documents. The representation often referred to as the bag-of-words representation, is not faithful, as it ignores the respective order of appearance of the words. In addition, often, stop words (such as the or of) are also ignored.
See also: Bag-of-words representation of text: measure of document similarity.