Introduction

Probability is one of the most important concepts in statistics and mathematics. It is used to express the likelihood of the occurrence or non-occurrence of an event and always takes values between 0 and 1. A value of 0 means that an event will never occur, while a value of 1 indicates that the event will occur with absolute certainty. For intermediate values, probability expresses the ratio of successes to failures. Within the framework of logistic regression, probability is transformed into odds, that is, into a ratio that compares the probability of occurrence with the probability of non-occurrence, and is then converted into the logarithm of the odds so that it becomes possible to apply a linear model with independent variables.

Probabilities, Odds, and Logarithms

To better understand the relationship between probabilities, odds, and the logistic model, let us look at some examples. If we assume that the probability of an event is π = 0.2, then the odds are 0.2 to 0.8, that is 0.25. The logistic model, which equals the natural logarithm of the odds, is ln(0.25) = -1.386. Similarly, if the probability is π = 0.7, then the odds are 0.7 to 0.3, that is 2.33, and the logistic model equals ln(2.33) ≈ 0.847. In a third case, if π = 0.9, then the odds are 0.9 to 0.1, that is 9, and the logistic model is ln(9) ≈ 2.197. Through these examples, we can see that the logistic model increases as probability increases, without having an upper or lower limit. This characteristic makes it especially useful in statistical modeling, because it allows the transformation of the restricted interval [0,1] into a scale that extends from negative infinity to positive infinity.

Probability Ratios and Odds Ratios

Odds are a numerical way of expressing probability. For example, odds of 10 mean that the event will occur ten times for each one time that it does not occur. This representation is particularly useful because it allows direct comparisons between different probabilities. For instance, odds of 9 to 1 are three times greater than odds of 3 to 1. At the same time, it is important to distinguish between odds and odds ratios. Odds refer to the ratio of the probability of occurrence to the probability of non-occurrence of a single event. In contrast, the odds ratio compares the odds of two different groups or situations. A characteristic example comes from the General Social Survey of 1994, where it was found that 29.5% of men and 13.1% of women reported owning a gun. For men, the odds equal 0.295 to 0.705, that is 0.418. This means that about four out of ten men own a gun. For women, the odds are 0.131 to 0.869, that is 0.151, which corresponds to about one and a half women out of ten. If we compare these two groups, we find that the odds ratio of men to women is 0.418 to 0.151, that is approximately 2.77. This means that men are almost three times more likely to own a gun compared to women. This example highlights how the odds ratio can be used to compare different categories, providing a clearer picture than the simple use of percentages.

The Logistic Model

Logistic regression connects the probability of an event occurring with a set of independent variables. The basic equation is expressed as follows: ln(π/(1-π)) = β0 + β1Xi. In this equation, the left-hand side represents the logarithm of the odds, while the right-hand side describes a linear relationship with the independent variables X. To transform the logistic model back into probability, we use the exponential transformation, so the probability is given by π = e^(β0+β1Xi) / (1 + e^(β0+β1Xi)). In this way, we can estimate the probability of an event based on different values of the independent variable. While the relationship between probability and independent variables is nonlinear, the relationship between the log-odds and the variables remains linear, which makes the model more stable and interpretable.

Conclusion

In summary, logistic regression is a particularly useful statistical tool for analyzing data where the dependent variable is binary. The transformation of probability into odds and then into log-odds allows for the application of linear models to data that would otherwise be constrained by the [0,1] interval. The distinction between probabilities, odds, and odds ratios provides clearer interpretation and practical comparisons. With examples such as gun ownership, it becomes evident how logistic regression can reveal significant differences between groups and contribute to social research, as well as to many other scientific fields. The strength of the model lies in the fact that it transforms the complex concept of probability into a tool with direct application, enabling predictions and comparisons based on solid mathematical foundations.