Introduction to SVM, hyperplane, TF-IDF and BoW

Upasana | August 05, 2019 | 2 min read | 217 views

Explain SVM (Support Vector Machine)

Support Vector Machine algorithm is also known as its short form, SVM. SVM is advanced extension of an algorithm known as support vector classifier, which is advanced extension of maximal marginal classifier. If you are aware of random forest then you must know that Random forest is improvised extension of Decision trees & Bagging. SVM helps in classifying the probelm sets where boundary is going te be very musch defined and it performs much better in binary classification problems. It uses hyperplane to define the boundary or say it finds the hyperplane which miximizes the margine between two labels.

What is hyperplane?

Hyperplane is a subspace of which dimension is always one less dimensional than the space. Lets say, we are in 3-D vector space then hyperplane will be a 2-D vector sub space.

Why didn’t you normalise dataset?

The project was a classification based project. Features being used had a nice correlation factor without standardizing and normalising them so was no need to normalize data. FYI, Normalization is not always necessary and there is difference between standardization & normalization.

How to build a model on textual data?

Since, we have textual data which is not acceptable by algorithsm as it is not in form of numbers, so we convert text based data into vectors format with fixed length.

How to convert text data to vector format?

We can use algorithms like Bag-of-Words & TF-IDF for convert text to vectors.

What is tf-idf?

Full form of TF-IDF is Term Frequency - Inverse Document Frequency. TF part of algorithms makes sure that vectors have the words which are frequent in the text and IDF makes sure to remove the words which have frequently occurred across all the text data. So in conclusion, TF-IDF finds out the words which refer to the context of the text and then convert it into fixed length vector format.

What is difference between Bag of words and tf-idf?

TF part of algorithms makes sure that vectors have the words which are frequent in the text and IDF makes sure to remove the words which have frequently occurred across all the text data. So in conclusion, TF-IDF finds out the words which refer to the context of the text.

Whereas Bag-of-Words (BoW) just works on assigning a unique number to every words and finding out the frequency of occurrence of word in the text and converting the text into fixed length vector format.

ebook PDF - Cracking Java Interviews v3.5 by Munish Chandel

Book you may be interested in..

ebook PDF - Cracking Spring Microservices Interviews for Java Developers

Find more on this topic:

Machine Learning

Data science, machine learning, python, R, big data, spark, the Jupyter notebook, and much more

Last updated 1 week ago

Subscribe to Interview Questions

Do you like cookies? 🍪 We use cookies to ensure you get the best experience on our website. Learn more