Bag-of-words representation of text: Measure of document similarity

Returning to the bag-of-words example, we can use the notion of angle to measure how two different documents are close to each other.

Given two documents, and a pre-defined list of words appearing in the documents (the dictionary), we can compute the vectors of frequencies x, y of the words as they appear in the documents. The angle between the two vectors is a widely used measure of closeness (similarity) between documents.

See also:

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Linear Algebra and Applications Copyright © 2023 by VinUiversity is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.

Share This Book