Perform a Likelihood Ratio Test in R

Perform a likelihood ratio test in R. Use the lrtest to compare two nested linear regression models. Learn how to generalize the likelihood-ratio test
Dr Zubair Goraya

You've built two regression models—one simple and one more complex model. The complex one seems to have a slightly better fit, but is that improvement statistically meaningful, or just noise from the observed data? How do you prove, with statistical certainty, that the extra complexity is actually worth it?

Perform a Likelihood Ratio Test in R

You prove it using the Likelihood Ratio Test (LRT). In essence, the LRT is a hypothesis test explicitly designed to compare the goodness of fit of two nested models. One model must be a simpler version (a subset) of the other. The test calculates a statistic based on the difference in the log-likelihood values between your two models. This test statistic follows a known chi-squared distribution. By comparing our result to this distribution, we get a p-value. If this p-value is below our chosen Statistical significance level (e.g., 0.05), we reject the null hypothesis. This provides strong evidence that the more complex model offers a significant improvement in fit. If the p-value is high, we conclude the extra complexity isn't justified.

RStudioDataLab
library(lmtest)
library(readr)
Bank_and_insurance <- read_csv("Bank and insurance.csv")
# Model 1: Simpler model
model_1 <- lm(InsurancePremium ~ Age + CreditScore, data = Bank_and_insurance)
# Model 2: More complex model with an extra variable
model_2 <- lm(InsurancePremium ~ Age + CreditScore + AccountBalance, data = Bank_and_insurance)
# Perform the likelihood ratio test
lrtest(model_1, model_2)
Table of Contents

Key Points

  • The Likelihood Ratio Test (LRT) is used to compare the goodness of fit of two nested statistical models.
  • The core requirement is that one model must be a simpler subset of the other (e.g., the same model with fewer predictors).
  • The LRT statistic follows a chi-squared distribution, which is used to calculate a p-value.
  • A low p-value (typically < 0.05) means you can reject the null hypothesis and conclude the more complex model provides a significantly better fit.
  • In R, the easiest way to perform a likelihood ratio test is with the lrtest() function from the lmtest package.

What is a Likelihood Ratio Test and Why Should You Care?

You have two competing statistical models—a simple one and a more complex model—and you need to know which one is the champion

The likelihood ratio test (LRT) is a general statistical procedure for comparing two competing hypotheses: 

  • a restricted (null) model and
  • a more general (alternative) model.

Based on their respective likelihood functions (Deeks & Altman, 2004; Habibzadeh & Habibzadeh, 2019). In its most basic formulation, one computes the ratio   r = L(θ₀)⁄L(θ̂),

Where L(θ₀) is the maximum likelihood under the null hypothesis (with parameters restricted to satisfy H₀) and L(θ̂) is the overall maximum likelihood obtained by optimizing over all allowable parameter values (Deeks & Altman, 2004; Cheng et al., 2022). The rationale behind this procedure is that a considerably lower likelihood under the null compared to that under the alternative implies that the data do not support the constraints imposed by H₀.

The Core Idea of the Likelihood-Ratio Test

Think of the likelihood ratio test as a statistical referee. You have two competing statistical models—a simple one and a more complex model—and you need to know which one is the champion. From my experience, students often assume that a model with more variables is automatically better. The LRT challenges that assumption by asking a critical question:

Does the extra complexity significantly improve the model's goodness of fit, or is the improvement just due to random chance?

It's a formal method for comparing two models and determining if the added complexity is genuinely worthwhile. It prevents you from building overly complicated models that are hard to interpret and may not perform well on new data. It helps you choose the best model that is both accurate and simple.

The "Nested" Rule: Why Your Models Must Be Related

The "Nested" Rule: Why Your Models Must Be Related

This is the most essential rule of the likelihood ratio test: the two models you compare must be nested.

What does that mean?

It means that one model (the simpler one) must be a special case, or a subset, of the other. All predictor variables in the simpler model must also be included in the more complex model. For example, you can't use the LRT to compare a model using Age and CreditScore to predict InsurancePremium with a completely different model that uses PolicyType and IncomeCategory. The test won't work. The second model must contain all the variables from the first one, plus at least one additional variable.

Model Type Predictors for InsurancePremium Is it Nested?
Simpler Model (Model 1) Age, CreditScore N/A
Complex Model (Model 2) Age, CreditScore, AccountBalance Yes, Model 1 is nested in Model 2.
Different Model PolicyType, IncomeCategory No, this is not nested with Model 1.

When to Use the LRT: Common Scenarios

So, when do you use the nested model comparison power of the LRT? It’s most valuable when you need to make clear decisions during model building. From my work with researchers, the LRT is the go-to tool in several everyday situations. It provides a clear p-value to support your choices, which is essential for any thesis or research paper.

Here are a few classic scenarios:

  • Deciding if a variable is necessary: You have a core regression model and want to know if adding a new predictor variable (or covariate) makes a real difference. The LRT provides a straightforward answer.
  • Testing multiple variables at once: You can test whether a group of variables (e.g., three different demographic predictors) collectively improves the model fit.
  • Checking interaction terms: You have a hypothesis that two variables interact (e.g., the effect of Age on ClaimAmount depends on Gender). The LRT can tell you if that interaction term is a significant improvement to the model.
  • Comparing different error distributions: In generalized linear models, you might want to know if one error distribution (like a Poisson distribution) fits the observed data better than another (like a negative binomial). The LRT can help you decide.

The Statistical Foundation

Understanding Maximum Likelihood and Log-Likelihood

Understanding Maximum Likelihood and Log-Likelihood

Maximum Likelihood Estimation (MLE) is about finding the best explanation for the data you already have. Imagine you have a set of observed data and a model with a tunable knob, which we call a parameter. MLE is the process of turning that knob until the model makes your data look as probable as possible. It finds the parameter value that maximizes the likelihood function. 

However, working with likelihoods, which are often tiny probabilities multiplied together, can be tricky. That’s where log-likelihood comes in. By taking the natural logarithm, we convert multiplication into addition, making the mathematical calculations much simpler without changing the core result. It's the same concept, just in a more convenient package.

People also read

The LRT Statistic Formula Explained

The formula for the likelihood-ratio test may appear technical, but the underlying idea is straightforward. From my own experience, the key is to see it as a way to measure the "improvement" gained by a more complex model. The test statistic is calculated as:

LR=2×(logLik full −logLik reduced)

Here, we take the log-likelihood of the complete model (alternative hypothesis) and subtract the log-likelihood of the simpler, reduced model (null hypothesis). A larger difference indicates that the whole model is performing significantly better. We multiply by two for mathematical reasons that help the final statistic match a known distribution. If this LR value is significant, it suggests the improvement is real, making it unlikely the simpler model is sufficient and leading us to reject it.

The Chi-Squared Distribution and Degrees of Freedom

Once we have our LR statistic, how do we know if it's "large enough" to be meaningful? We compare it to the chi-squared distribution. Think of this distribution as a benchmark that tells us what to expect if the simpler model were accurate. The shape of this distribution depends on the degrees of freedom, which is simply the difference in the number of parameters between your two models. For example, if your complex model has five parameters and your simpler model has three, the degrees of freedom equal two. This comparison yields a p-value, which represents the probability of obtaining our result (or a more extreme one) if there were no real improvement.

How to Perform a Likelihood Ratio Test | R Code

Setting Up: The lmtest Package

To get started, you need the lmtest package in R. This package is a standard for hypothesis testing in regression models and contains the lrtest() function we'll use. If you haven't installed it yet, you can do so with a simple command. Once installed, you need to load it into your R session. This one-time setup makes it incredibly easy to perform a likelihood ratio test whenever you need to compare models.


# Install the lmtest package if you don't have it
# install.packages("lmtest")

# Load the package into your R session
library(lmtest)

Example 1: Is an Extra Predictor Variable Worth It?

Let's test this with a real-world scenario using our insurance dataset. Suppose we have a linear model predicting a customer's Insurance Premium based on their Age and Credit Score. This is our simpler model (likelihood ratio test model 1). Now, we wonder if adding AccountBalance as a predictor offers a significant improvement. This becomes our complex model (model 2). We can use the lrtest function to find out. A low p-value (e.g., < 0.05) would lead us to reject the null hypothesis and keep the AccountBalance variable.

# Load the data set (https://docs.google.com/spreadsheets/d/1TRuo3JtoSRHXVBdSMTht9Vf60-0VKtZj0u4yRenHd6g/edit?usp=sharing)
library(readr)
Bank_and_insurance <- read_csv("Bank and insurance.csv")
# Model 1: Simpler model
model_1 <- lm(InsurancePremium ~ Age + CreditScore, data = Bank_and_insurance)
# Model 2: More complex model with an extra variable
model_2 <- lm(InsurancePremium ~ Age + CreditScore + AccountBalance, data = Bank_and_insurance)
# Perform the likelihood ratio test
lrtest(model_1, model_2)
simpler model (likelihood ratio test model 1 and 2) using lm and then compare it using lrtest

Example 2: Testing a Categorical Covariate

The likelihood ratio test is also suitable for determining whether a categorical variable is significant. Let's build on our previous analysis. We aim to predict a customer's Claim Amount. Our null hypothesis is that MaritalStatus has no effect, so we create a simpler model using only Age as a predictor. Our alternative hypothesis is that marital status does matter, so we make a second model that includes both Age and marital status. By comparing nested models this way, the LRT helps us determine if adding the categorical covariate results in a statistically better fit for the observed data.

# Model 1: Simpler model without the categorical variable
simple_model <- lm(ClaimAmount ~ Age, data = Bank_and_insurance)
# Model 2: Complex model including the categorical covariate
complex_model <- lm(ClaimAmount ~ Age + MaritalStatus, data = Bank_and_insurance)
# Use the lrtest to compare the goodness of fit of two models
lrtest(simple_model, complex_model)
Simpler model without the categorical variable, Complex model including the categorical covariate and Use the lrtest to compare the goodness of fit of two models

Interpreting the Results: What Do the Numbers Mean?

Interpreting the Results: What Do the Numbers Mean? likelihood ratio test R output

The All-Important p-value: Your Decision Rule

After all the calculations, everything comes down to the p-value. This number is your guide for making a decision. The rule is simple: if the p-value is smaller than your chosen significance level (the most common choice is alpha = 0.05), you reject the null hypothesis. It means you have strong evidence that your complex model provides a significantly different and better fit. If the p-value is larger than your significance level, you fail to reject the null hypothesis. In this case, you stick with the simpler model because the extra complexity didn't provide enough of an improvement to be statistically meaningful.

Understanding the Output of lrtest()

When you run the lrtest() function in R, you'll get a small table of results. From my experience helping students, breaking this table down makes it much less intimidating. Each column gives you a key piece of information for your hypothesis test.

  • Res.Df: The residual degrees of freedom for each model.
  • LogLik: The log-likelihood value for each model. The goal of MLE is to make this number as large (i.e., as close to zero) as possible.
  • Df: The degrees of freedom for the test. This is the difference in the number of parameters between the two models.
  • Chisq: The chi-squared test statistic is calculated from the formula.
  • Pr(>Chisq): This is your p-value. It's the most critical number for making your decision.

Reporting Your Findings in a Paper or Report

Reporting your results is crucial for any research paper or assignment. You need to state the test you used, the result, and your conclusion. Don't just say "the result was significant." Provide the numbers to support it.

A good template is: A likelihood ratio test was conducted to determine if adding [Variable(s)] significantly improved the model fit. The test was statistically significant, indicating that the more complex model was a better fit for the data, χ²([Df]) = [Chisq value], p-value < [p-value, e.g., 0.001].

This sentence conveys your method, results, and their implications.

Beyond the Basics: Advanced Considerations

LRT vs. Wald Test vs. AIC/BIC: Choosing the Right Tool

The LRT is not the only tool for model comparison; it's essential to understand how it differs from others. The Wald test is another standard asymptotic test, but it can be less reliable with small sample sizes, where the LRT is often preferred. On the other hand, criteria such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are used for model selection, rather than formal hypothesis testing

They help you compare a whole set of models (even non-nested ones) but don't give you a p-value. Think of it this way: use the LRT when you have a specific hypothesis to test between two nested models. Use AIC/BIC when you want to explore and find the best model from a larger group of candidates.

LRT vs. Wald Test vs. AIC/BIC: Choosing the Right Model for Comaprison of different models

Common Pitfalls and How to Avoid Them

While robust, the likelihood ratio test can be misused. I've seen a few common mistakes in student projects that are easy to avoid. The most significant error is trying to compare models that are not nested; the test results will be meaningless. Another issue is miscalculating the degrees of freedom—always remember it's the difference in the number of parameters. 

Finally, be cautious when your observed data is sparse, as the asymptotic assumptions of the test might not hold, which can lead to an inaccurate p-value. Always double-check that you are comparing nested models and that your sample size is adequate for the complexity of your models.

Conclusion

The Likelihood Ratio Test (LRT) helps you decide if a complex statistical model is genuinely better than a simpler version. The most important rule is that the models must be nested, meaning the simple model is just a subset of the complex one. The test works by comparing how well each model explains the data and calculates a p-value to determine if the improvement is statistically significant. In R, you can easily do this using the lrtest() function. The interpretation is straightforward: if the p-value is low (usually less than 0.05), the complex model is a significantly better fit. If the p-value is high, the extra complexity isn't justified, and you should stick with the simpler model. Ultimately, using the LRT helps you build powerful and efficient models, ensuring your conclusions are backed by strong statistical evidence without being needlessly complicated.

Struggling to compare your statistical models or make sense of your R output? The Likelihood Ratio Test is a powerful tool, but it's just one piece of the puzzle. At RStudioDatalab, our experts can guide you through every step of your data analysis, from model selection to interpreting complex results. Contact us today for personalized assistance with your research, thesis, or assignments, and ensure your conclusions are statistically sound.

Frequently Asked Questions (FAQs)

What is a likelihood ratio test in R?

A likelihood ratio test in R is a statistical method used to compare two "nested" models. Nested means one model is a simpler version of the other. The test helps you decide if the more complex model explains your data significantly better than the simpler one. For example, you can test if adding a new predictor variable to a regression model is actually worthwhile. R makes this easy to perform, often with just a single function call.

What is the `lrtest()` function in R?

The lrtest() function is the most common tool in R for performing a likelihood ratio test. It is part of the popular lmtest package. To use it, you simply provide two fitted model objects (e.g., from lm() or glm()), and the function calculates the test statistic and p-value for you. Its purpose is to make the process of comparing nested models quick and reliable. You would use it like this: lrtest(simple_model, complex_model).

What is the purpose of the likelihood ratio test in logistic regression and GLM?

In logistic regression and other Generalized Linear Models (GLMs), the purpose of the likelihood ratio test is to determine if adding one or more predictor variables significantly improves the model's fit. Since logistic regression doesn't use R-squared like linear regression, the LRT becomes a key tool for model building. It helps you decide, for instance, if a variable like 'Age' significantly helps predict an outcome like 'Customer Churn'. It provides a p-value to give you statistical evidence for keeping or removing variables.

What does the likelihood ratio test actually test?

The likelihood ratio test tests a specific null hypothesis: that the simpler model is just as good as the more complex model. The alternative hypothesis is that the complex model is significantly better. Essentially, it tests if the extra parameters (or variables) in the complex model have a real effect. If the p-value is low, you reject the null hypothesis and conclude the complex model is superior. If the p-value is high, you stick with the simpler model.

How do you interpret likelihood ratio test results?

Interpretation focuses on the p-value. A small p-value (typically less than 0.05) means the improvement from the complex model is statistically significant. You would then choose the complex model. A large p-value (greater than 0.05) means the improvement is likely due to chance, so you should choose the simpler, more parsimonious model. While a likelihood ratio value itself (e.g., 1.0 or 2.0) is calculated, you don't interpret it directly; you interpret the final p-value that comes from comparing the test statistic to a chi-squared distribution.

What is the difference between ANOVA and a likelihood ratio test?

Both tests compare models, but they are used in different contexts. ANOVA is typically used for linear models (like those from lm()) and works by comparing the variance (sums of squares) between models. The likelihood ratio test is more general and can be used for a wider variety of models, especially Generalized Linear Models (like logistic regression from glm()). The LRT works by comparing the log-likelihoods of the models. For linear models with normal errors, the two tests often give the same conclusion.

What is the difference between a chi-square test and a likelihood ratio test?

This can be confusing! A chi-square test often refers to Pearson's chi-square test, which is used to check for independence between two categorical variables in a contingency table. The likelihood ratio test is a different tool used to compare how well two nested statistical models fit the data. The confusion arises because the likelihood ratio test uses the chi-squared distribution to find its p-value. So, they are different tests, but the LRT relies on the chi-squared distribution for its final step.

What is the formula for the likelihood ratio test?

The formula calculates a test statistic, often called the G-statistic or LR statistic. It is:
LR = 2 * (logLik(Complex Model) - logLik(Simple Model))
Where logLik is the log-likelihood of each model. This statistic measures the improvement in fit. A larger LR value suggests a bigger improvement, which is more likely to be significant. This LR value is then compared to a chi-squared distribution to get the final p-value.

What is the likelihood function in R?

The likelihood function is a statistical concept, not a single function in R. It asks, "Given my observed data, what is the probability of seeing this data for different values of a model parameter?" R calculates this behind the scenes when fitting models with functions like lm() or glm(). You can extract the log-likelihood (a more convenient, logged version of this value) from a fitted model using the logLik() function. This value is the core ingredient for the likelihood ratio test.

What is the `dplyr` function in R?

The term dplyr refers to a very popular R package, not a single function. It is the go-to tool for data manipulation and cleaning. It provides a set of easy-to-understand "verbs" or functions like filter() to select rows, select() to choose columns, mutate() to create new columns, and summarise() to calculate summary statistics. It is not used for statistical testing like the likelihood ratio test, but it is essential for preparing your data *before* you build models.

References 

  • Akobeng, A. (2007). Understanding diagnostic tests 2: likelihood ratios, pre‐ and post‐test probabilities and their use in clinical practice. Acta Paediatrica, 96(4), 487-491. https://doi.org/10.1111/j.1651-2227.2006.00179.x 
  • Cheng, Y., Wang, H., & Li, X. (2022). The geometry of generalized likelihood ratio test. Entropy, 24(12), 1785. https://doi.org/10.3390/e24121785 
  • Deeks, J. and Altman, D. (2004). Diagnostic tests 4: likelihood ratios. BMJ, 329(7458), 168-169. https://doi.org/10.1136/bmj.329.7458.168 
  • Habibzadeh, F. and Habibzadeh, P. (2019). The likelihood ratio and its graphical representation. Biochemia Medica, 29(2), 193-199. https://doi.org/10.11613/bm.2019.020101 
  • Huber, F. (2008). Milne's argument for the log-ratio measure. Philosophy of Science, 75(4), 413-420. https://doi.org/10.1086/595838

Transform your raw data into actionable insights. Let my expertise in R and advanced data analysis techniques unlock the power of your information. Get a personalized consultation and see how I can streamline your projects, saving you time and driving better decision-making. Contact me today at contact@rstudiodatalab.com or visit to schedule your discovery call.

<--[anti ad blocker]-->