Hello, I am Zubair Goraya, a Ph.D. scholar, a certified data analyst, and a freelancer with 5 years of experience. I will explain how to perform and report parametric tests with R, using examples of different parametric tests. But before we start, let me ask you a question:
How do you determine if the data you are analyzing, whether parametric or nonparametric, is reliable and valid?
How do you conduct hypothesis testing and conclude the statistical data analysis?
How do you communicate your analysis findings-- derived from comparing two data sets using parametric or nonparametric tests—and make recommendations to your audience
Data analysts confront these questions daily and must use appropriate statistical tools and techniques to answer them. One of data analysts' most common and powerful tools is parametric tests.
Table of Contents
Key Points
- Parametric tests, such as t-tests and ANOVA heavily based on assumptions, including normality, homogeneity of variance, two independent variables, and random sampling. Ensuring these conditions are met is crucial for valid results.
- Parametric tests, designed for interval and ratio data, excel in situations with precise measurement scales and significant intervals between values, facilitating detailed analysis. Nonparametric tests, i.e., tests that don't assume a specific distribution, are also used when these criteria aren't met.
- When all necessary assumptions, i.e., normality and homoscedasticity, are met, parametric tests generally provide a higher statistical power than nonparametric tests. This means that both parametric and nonparametric statistics efficiently identify genuine effects, if any exist, enhancing the credibility of the results.
- Parametric tests, such as those involving two or more variables (like regression analysis), often require larger sample sizes. A large sample size contributes significantly to the robustness and precision of both the parametric and nonparametric statistical inferences.
- Parametric tests, widely employed in experimental research settings, and nonparametric tests find their niche in controlled studies where the assumptions tally with the data's nature. Parametric and nonparametric statistics are pivotal in psychology, biology, and medicine.
What is a Parametric Test?
- Normality,
- Homogeneity of variance,
- Independence,
- Random sampling
Difference Between Parametric and Non-Parametric Tests?
What are the Assumptions of Parametric Tests?
Assumptions are the conditions the data must meet for the parametric test to be applicable and reliable. If the assumptions are violated, the parametric test may produce inaccurate or misleading results, such as false positives, negatives, or incorrect estimates. There are four common assumptions of parametric tests.
Normality
The data follows a normal distribution, or a bell-shaped curve, where most values are clustered around the mean, and the tails are symmetrical and thin; read more.Homogeneity of variance
The data has equal or similar variances, or spreads, across different groups or levels of the independent variable. It means the data has consistent variability and does not have outliers or extreme values; read more.Independence
The data is independent across different observations or samples or not influenced by each other. It means that the data is randomly collected and has no hidden or confounding factors.Random sampling
The data is randomly sampled or selected by chance from the population of interest. It means the data is representative and has no bias or selection error.These assumptions are important because they ensure that the parametric test is appropriate and valid for the data and that the results are generalizable and meaningful. Therefore, checking these assumptions before performing any parametric test and dealing with violations appropriately is essential.
What happens if assumptions are violated?
Validating Parametric Test Assumptions in R
There are two methods to check the assumptions of parametric tests in R, which are:
- Graphical methods
- Numerical methods
Graphical methods
These methods use plots or graphs to visually inspect the data and look for patterns or deviations that may indicate violations of assumptions. Some examples of graphical methods are:
Numerical methods
These methods use statistics or tests to measure the data numerically and look for any values or results that may indicate violations of assumptions. Some examples of numerical methods:
- Mean,
- Standard deviation,
- Skewness and kurtosis,
- Shapiro-Wilk test,
- Levene’s test,
- Durbin-Watson test.
Both graphical and numerical methods have advantages and disadvantages and may complement or contradict each other. Therefore, using both ways to check the assumptions of parametric tests and to use your judgment and common sense to decide whether the assumptions are met is recommended.
Check Parametric Test Assumptions in R
Load the data set and the packages.
# Load the packages library(ggplot2) library(dplyr) library(car) # Load the data set heights <- c(168, 172, 171, 169, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218)
Normality Assumption
The normality assumption states that the data follows a normal distribution. We will use a histogram and a QQ plot to visually inspect the data and a Shapiro-Wilk test to measure the data numerically.
# Check the normality assumption # Plot a histogram of the data ggplot(data = data.frame(heights), aes(x = heights)) + geom_histogram(bins = 10, fill = "skyblue", color = "black") + labs(title = "Histogram of Heights", x = "Height (cm)", y = "Frequency")
# Plot a QQ-plot of the data ggplot(data = data.frame(heights), aes(sample = heights)) + stat_qq() + stat_qq_line() + labs(title = "QQ-plot of Heights", x = "Theoretical Quantiles", y = "Sample Quantiles")
# Perform a Shapiro-Wilk test of the data shapiro.test(heights)
The histogram shows the data is roughly symmetrical and bell-shaped, with no obvious outliers or skewness. The QQ plot shows that the data points are mostly aligned with the diagonal line, with no obvious deviations or patterns.
The Shapiro-Wilk test shows that the p-value is 0.8793, greater than 0.05, the common significance level. It means that we fail to reject the null hypothesis that the data is normally distributed. Therefore, based on this analysis, we can conclude that the normality assumption is met for the data.
People Also Read
Homogeneity of Variance
The homogeneity of variance assumption states that the data has equal or similar variances across different groups or levels of the independent variable. Since we only have one group or sample in this example, this assumption is not applicable, and we can skip it.
Independence assumption
Next, we will check the independence assumption, which states that the data is independent or not influenced by each other across different observations or samples. This assumption is usually verified by the study's design or the data collection process rather than by data analysis. Therefore, we need to rely on the information about the data set and use our common sense and judgment to decide whether this assumption is met.
In this example, we are told that the data set contains the heights of 50 students from a class. We can assume that the heights of the students are not influenced by each other and that they are randomly collected from the class. Therefore, the independence assumption is met for the data.
Random sampling assumption
The random sampling assumption states that the data is randomly sampled, or selected by chance, from the population of interest. This assumption is also verified by the study's design or the data collection process rather than by data analysis. Therefore, we need to rely on the information about the data set and use our common sense and judgment to decide whether this assumption is met.
In this example, we are told that the data set contains the heights of 50 students from a class. The class is representative of the population of interest, which is the population's average height. Therefore, the random sampling assumption is met for the data.
How to Report Parametric Tests in R?
After we have performed and interpreted them in R, we need to report them clearly and professionally, using APA style and best practices. Reporting parametric tests in R involves two main steps:
- The results section summarizes the main findings and statistics of the parametric tests using text and numbers.
- Creating and formatting the tables and figures, which display the data and the results of the parametric tests, using visuals and labels.
Common Mistakes in Parametric Testing
Violation of Assumptions
One common mistake involves disregarding the assumptions of parametric tests. The results may be unreliable if the data fails to meet these assumptions. To assess normality, it is vital to employ graphical methods, such as histograms or normal probability plots.
In cases where non-normality is observed, alternative approaches like transformations or non-parametric tests should be considered.
Improper Sample Size
Small sample sizes may result in underpowered tests, diminishing the ability to detect significant differences. Power analysis can aid in determining the appropriate size of the sample based on the effect size and significance level.
Misinterpretation of Results
Misinterpreting the results is a common error in parametric testing. It is crucial to comprehend the output generated by Rstudio and accurately interpret its implications. Seeking guidance from statisticians or consulting reputable sources can help mitigate misinterpretation.
Best Techniques and Practices
Data transformation
If the assumption of normality is flawed, data transformation techniques may be employed to normalize the data distribution. Frequently used transformations include logarithmic, square-root, and Box-Cox transformations. These modifications maintain the proper presumptions while allowing the use of parametric testing.
Non-Parametric Tests
When the assumptions of parametric tests cannot be fulfilled, non-parametric tests provide a reliable alternative. The Kruskal-Wallis and Wilcoxon rank-sum tests are examples of non-parametric tests that don't rely on specific data distribution hypotheses. When dealing with ordinal or skewed data, they are handy.Increase the Sample Size
The effectiveness of parametric tests will rise as the sample size grows. A larger sample size reduces standard error and increases the likelihood that when substantial changes do occur, they will be detected. Depending on the specific research topic, a statistician's guidance can be required to choose the appropriate sample size.Conclusion
In this article, we have learned how to perform and report parametric tests in R, using examples of parametric tests. We have also learned how to check the assumptions of the parametric tests and how to use some packages and functions that can help us with the analysis and the reporting process.Parametric tests are powerful and widely used statistical tools that can help us test hypotheses and draw conclusions about the data. However, they also require some conditions and criteria to be met, such as normality, homogeneity of variance, independence, random sampling, etc. Therefore, we must be careful and rigorous when performing and reporting parametric tests in R and follow the APA style and best practices.
We hope this article has been informative and helpful and that you have gained some insights and skills on performing and reporting parametric tests in R. If you have any questions or feedback, please contact us. Thank you for reading.
Frequently Asked Questions (FAQs)
What are the parametric tests in R?Common parametric tests in R include t-tests (e.g., `t.test()`), ANOVA (e.g., `aov()`), and linear regression (e.g., `lm()`).
Which statistical test should I use in R?
The choice depends on your data and research question. For comparing means, use t-tests or ANOVA; for relationships, Pearson correlation (e.g., `cor.test()`) or linear regression.
How do you know if data is parametric or non-parametric in R?
Graphical methods (histograms, QQ-plots) and statistical tests (Shapiro-Wilk, Kolmogorov-Smirnov) help assess normality. Non-parametric tests may be suitable if normality assumptions are violated.
What is a Non-parametric test?
Non-parametric tests are used when parametric assumptions cannot be met, allowing for robust analysis without specific distributional requirements.
What are the non-parametric tests in R studio?
Non-parametric tests include Mann-Whitney U test (`wilcox.test()`), Kruskal-Wallis test (`kruskal.test()`), and Spearman correlation (`cor.test(method = "spearman")`).
What is an ANOVA test in R?
ANOVA (Analysis of Variance) in R is performed with `aov()` to compare means of more than two groups, testing whether significant differences exist.
What is the Pearson chi-test in R?
The Pearson chi-squared test (`chisq.test()`) in R assesses independence between categorical variables in a contingency table.
Is Chi-Square a parametric test?
The Chi-Square test is non-parametric as it makes no assumptions about the data distribution.
What are the 4 non-parametric tests?
Common non-parametric tests include Mann-Whitney U test, Kruskal-Wallis test, Wilcoxon signed-rank test, and Spearman correlation.
When not to use a parametric test?
Avoid parametric tests when assumptions (normality, homogeneity of variance) are violated or for non-continuous data.
How do I run a non-parametric test in R?
Use functions like `wilcox.test()` or `kruskal.test()` for non-parametric tests in R, depending on your study design.
What is the difference between parametric and non-parametric tests in R?
Parametric tests assume specific data distributions. In contrast, non-parametric tests make fewer distributional assumptions, offering robustness in various scenarios.
Is Pearson's R a parametric test?
Yes, Pearson's correlation (R) is a parametric test, assuming a normal distribution of variables.
Is the Mann-Whitney U test a parametric test?
No, the Mann-Whitney U test is non-parametric, suitable for ordinal or continuous data that doesn't meet parametric assumptions.
Should I use Spearman or Pearson?
Use Pearson for linear relationships with continuous data; use Spearman for monotonic relationships or when assumptions of linearity are not met.
Parametric tests with R examples
The parametric test compares means between two groups assuming normality. Parametric test examples in R include t-tests (`t.test()`), ANOVA (`aov()`), and linear regression (`lm()`).Characteristics of parametric tests
Parametric tests assume specific data distributions, continuous variables, and adherence to assumptions like normality and homogeneity.
Conditions for parametric tests
Conditions for parametric tests include normality, homogeneity of variance, independence, and continuous data.
References
- Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage publications.
- R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.
- Dahl, D. B. (2016). xtable: Export tables to LaTeX or HTML. R package version 1.8-2. https://CRAN.R-project.org/package=xtable
- Hlavac, M. (2018). stargazer: Well-formatted regression and summary statistics tables. R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
- Fox, J., & Weisberg, S. (2019). An R companion to applied regression. Sage publications.
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge.
Do you need help with a data analysis project? Let me assist you! With a PhD and ten years of experience, I specialize in solving data analysis challenges using R and other advanced tools. Reach out to me for personalized solutions tailored to your needs.