CS-Notes-byCKL
Primary version
Primary version
  • First Page
  • PHP Notes
    • How a Theme Is build
  • Applied Machine Vision
    • The Pinhole Camera Model
    • Linear Algebra Recap
  • NLU/NLP Chatbot powerd by Rasa for Telekom
    • RASA
      • Terminology and Definitions and structure of a bot
      • Rasa Core Vs Rasa NLU
      • Rasa NLU
        • Training and Data Format RASA NLU
        • Entity Extraction
        • Evaluation
      • Preprocessing Methods
        • Bag of Words
      • Setup
  • Evalutation/Testing
  • Distributed Sytems
    • Week 1
    • Java I/O
  • Diskrete Strukturen
    • 1. Kombinatorik
      • Kombinatorische Beweisprinzipien
      • Ziehen von Elementen aus einer Menge
      • Wichtige Zählprobleme
      • Rekursionsgleichungen
    • 2. Zahlentheorie
      • Teilbarkeit und Primzahlen
      • Modulare Arithmetik
      • Vermischtes
    • MIT Week 1 Proofs
    • 3. Graphentheorie
      • Modellieren von Graphen
      • Grundbegriffe
      • Bäume und Wälder
      • Graphen Eigenschaften
    • 4. Algebraische Strukturen
      • Verbände
      • Isomorphe und homomorphe Abbildungen
  • Fingerprint extraction of electrical appliances
    • 1. Data Preprocessing
    • 2. Transition Detection
  • AI in a nutshell
Powered by GitBook
On this page
  • What is bag of Words useful for?
  • How Does it work?
  1. NLU/NLP Chatbot powerd by Rasa for Telekom
  2. RASA
  3. Preprocessing Methods

Bag of Words

PreviousPreprocessing MethodsNextSetup

Last updated 7 years ago

What is bag of Words useful for?

Bag of words is currently a state of the art Method of measuring the similarity between 2 Texts

How Does it work?

Given 2 Texts, the occurence of each word in both texts is counted. When in the same words occurred in two texts multiple times, bag of words will tell us that they are simmilar to each other

  • Given X Documents with N different words, the bag of Words algorithm projects them into an N dimensional Space (Hyperplane) .

  • In This Hyperplane, every Document is just a point. Now we can measure similarities between individual points with regular linear algebra aka euklidean distance .

Example of Document Dimension 3 , where only Words lion, dog and cat Occur