Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. It is used when the dependent variable is binary, meaning it has only two possible outcomes. The goal of logistic regression is to predict the probability that a given set of independent variables will result in a particular outcome.
In logistic regression, the dependent variable is binary and represented by a value of 0 or 1. The independent variables can be continuous or categorical. The relationship between the independent variables and the probability of the outcome is modeled using a logistic function.
The logistic function, also known as the sigmoid function, is defined as:
$$ P(Y=1|X) = \frac{1}{1 + e^{-(b_0 + b_1X_1 + b_2X_2 + ... + b_nX_n)}} $$
Where:
The coefficients $$ b_0, b_1, b_2, ..., b_n $$ are estimated using maximum likelihood estimation, a method that finds the values of the coefficients that maximize the likelihood of the observed data given the model.
The coefficients in logistic regression represent the effect of each independent variable on the log-odds of the outcome being 1. A positive coefficient indicates that as the value of the independent variable increases, the log-odds of the outcome being 1 also increase. A negative coefficient indicates the opposite relationship.
The odds ratio can be calculated from the coefficients to interpret the effect of an independent variable on the odds of the outcome. The odds ratio is defined as:
$$ OR = e^{b_j} $$
Where $$ b_j $$ is the coefficient of the independent variable of interest. An odds ratio greater than 1 indicates a positive relationship between the independent variable and the outcome, while an odds ratio less than 1 indicates a negative relationship.
There are several metrics that can be used to evaluate the performance of a logistic regression model:
Additionally, a confusion matrix can be used to visualize the performance of the model by showing the true positive, false positive, true negative, and false negative predictions.
Logistic regression is widely used in various fields for binary classification tasks. Some common applications include:
Advantages of logistic regression include: