Definition

Logistic regression is a technique used to determine which input Factors (Xs) contribute most significantly to a binary response (yes/no), and to model the relationship between them. The logistic model can be extended to response variables with more than two categories such as nominal (unordered, discrete categories) or ordinal (ordered, discrete categories) variables. The explanatory variables may be categorical or continuous.

Application

For the sake of discussion, let (Y = 1) represent a “success” and (Y = 0) represent a “failure” (this definition might just as easily be reversed)

Then,

P(Y=1) = the probability of success

P(Y=0) = the probability of failure = 1 - P(Y=1)

The odds of success are given by: P(Y=1)/P(Y=0)

If we used the regular least squares technique to fit a line to the binary response, the resulting model would give values of Y on a continuous scale. Logistic regression overcomes this problem by fitting an S-shaped curve to P(Y=1), the probability of success. To convert this to a linear form, we take the natural logarithm (log to the base ‘e’) of the odds of success, ln[P(Y=1)/P(Y=0)], and use it as a response variable. This log-odds of success is modeled as a linear function of the explanatory variables using the maximum likelihood technique.

Thus, a logistic regression model takes the form:

ln[P(Y=1)/P(Y=0)] = α + b_{1}X_{1} + b_{2}X_{2} + .... + ε

Here,

ln[P(Y=1)/P(Y=0)] = the log-odds (logit) of success

ε = random error term

The b_{i}’s give the rate of increase (if b_{i} > 0) or decrease (if b_{i} < 0) in the log-odds of success for a unit increase in the i^{th} explanatory variable X_{i}, keeping all other Xs constant.