# Machine Learning: Understanding Logistic Regression

Upasana | May 22, 2019 | 3 min read | 695 views

## Introduction

We will be getting familiar with algorithm : Logistic regression here like

1. When to use Logistic Regression

2. How it works

3. Logistic function

4. Working of Logistic Function

5. Analysing Model results

## Classification

Classification is a process to classify data into the subjected labels(unique values in response variables). Classification modelling in machine learning can be supervised and unsupervised both.

But why classification, why not regression?

Suppose that we are trying to predict the medical condition of a patient in the emergency room on the basis of her symptoms. In this simplified example, there are three possible diagnoses: stroke, drug overdose, and epileptic seizure. We could consider encoding these values as a quantitative response variable, Y , as follows:

Y = 1 (if stroke), 2(if drug overdose), 3 (if epileptic seizure)

Using this coding, least squares could be used to fit a linear regression model to predict Y on the basis of a set of predictors X1 , . . . , Xp . Unfortunately, this coding implies an ordering on the outcomes, putting drug overdose in between stroke and epileptic seizure, and insisting that the difference between stroke and drug overdose is the same as the difference between drug overdose and epileptic seizure. (ref: ISLR)

Above conclusion is totally wrong as values in Y are categorical, not continuous. This is why we should consider classification.

Classification would imply a totally different relationship among the three conditions. It will be considering these encodings as categorical only and will be trying to predict probability based on set of observations.

There are many classification techniques and here we will be discussing Logistic Regression.

## Logistic Regression

In Logistic Regression, we predict the probability of response variable, Y being true given a set of observations. This implies Logistic is based on conditional probability.

p(X) = Pr(Y = 1|X)

As we will be predicting probabilities, that means value is going to be stay between 0 and 1 always.

## Logistic Function

Now, we need to define relationship between X and p(X) which can be defined by logistic function

p(X) = e^(\beta_0+\beta_1X)/(1 + e^(\beta_0+\beta_1X))

This function can be re-defined to find odds and then logit as follows:

(p(X))/(1-p(X)) = e ^(\beta_0+\beta_1X)

(p(X))/(1-p(X)) is odds and it ranges from 0 and infinity.

Probability and odds have different set of properties. Here, odds represents the continuous effect of X on the likelihood that Y will be true.

Now, if we take logs on both sides:

log((p(X))/(1-p(X))) = \beta_0+\beta_1X

In above equation, log(p(X)/(1-p(X)) is called log-odds or logits.

## Working of Logistic Function

By referring Logistic function, we can see that

1. if X increases by 1 unit , it changes the log odds by β1 or it multiplies the odds by eβ1.

2. Change in p(X) due to 1 unit change in X will depend on current value of X.

3. if β1 is positive then increasing X will also be leading to increasing p(X)

4. if β1 is negative then increasing X will also be leading to decreasing p(X)

Here, we can also conclude that Logistic function would work better for Binary Classification which means Logistic model would be better in the case where we have binary response target variable.

Another conclusion is that, this function is non-linear function so Logistic model is a non linear model and relationship between p(X) and X will be in S shape.

## Analysing Model results

Null Hypothesis is going to be

H : β1 = 0

In Logistic model, we have z-statistic. when you will see its value corresponding to one of feature, lets say β1 will be always equal to the ratio of β1 to the Standard error of β1

 ((β1)/(SE(β1)))

Such that, a large value of z-statistic will indicate evidence against null hypothesis which will also means that p-value is small. When p-value is small, it rejects null hypothesis. That means, we can ignore this feature in data.

This is how, we can work on other features of the data for evaluations if they have significant effect on response variable or not.

## Evaluating model

Below are the accuracy metrics that can be used to evaluate model

1. Accuracy score

2. Confusion matrix

3. In case, data is imbalanced(ratio of frequency of unique values in response variable is not around 1) then use F-score, Precision and Recall to evaluate model.