The cor function in R computes correlation coefficients between numeric variables — either as a single value between two vectors or as a full correlation matrix across a data frame. Paired with cor.test(), it also delivers the p-value and confidence interval you need to report statistically sound results. This guide covers every practical use case: basic syntax, all three methods (Pearson, Spearman, Kendall), missing-data handling, significance testing, and visualization — with copy-ready code for each.
| Task | Code | Output |
| Correlation between two vectors | cor(mtcars$mpg, mtcars$hp) |
Single coefficient, −1 to 1 |
| Full correlation matrix | cor(mtcars) |
n × n matrix |
| Correlation test with p-value | cor.test(mtcars$mpg, mtcars$hp) |
t, df, p-value, 95% CI |
| Spearman correlation | cor(x, y, method = "spearman") |
Rank-based coefficient |
| Kendall correlation | cor(x, y, method = "kendall") |
Concordance-based coefficient |
| Handle missing values (listwise) | cor(df, use = "complete.obs") |
Matrix, complete rows only |
| Handle missing values (pairwise) | cor(df, use = "pairwise.complete.obs") |
Matrix, max pairs used |
| Correlation matrix with p-values | rcorr(as.matrix(df)) — Hmisc |
Matrix + significance levels |
Table of Contents
Key Points
- The cor function in R returns a value between −1 and 1. A result near +1 means a strong positive relationship; near −1 means a strong negative relationship; near 0 means no linear relationship.
- Use cor.test() — not just cor() — whenever you need to report whether the correlation is statistically significant. It gives you a t-statistic, degrees of freedom, p-value, and a 95% confidence interval.
- Choose the right method: Pearson for continuous, normally distributed data; Spearman for ranked or non-normal data; Kendall's tau for small samples or data with many tied ranks.
- Always handle missing values explicitly. Leaving the
useparameter at its default ("everything") returnsNAif any value is missing. Use"complete.obs"or"pairwise.complete.obs"instead. - Visualize with corrplot or ggplot2 to spot patterns across many variables simultaneously. A correlation heatmap communicates structure that a raw matrix cannot.
- Correlation is not causation. A coefficient of −0.78 between mpg and hp (as in mtcars) tells you they move together — it does not tell you that horsepower causes poor fuel economy.
What Is the cor Function in R?
The cor function in R is a base-R function that computes the correlation coefficient — a standardized measure of the linear relationship between two or more numeric variables. It accepts either two separate vectors or a full data frame, and returns either a single number or a square correlation matrix.
| Aspect | Detail |
| Function name | cor() |
| Package | Base R — no installation needed |
| Output range | −1 to +1 |
| Default method | Pearson |
| Companion function | cor.test() — adds p-value and CI |
cor() Syntax
cor(x, y = NULL, use = "everything", method = c("pearson", "spearman", "kendall"))
- x — a numeric vector, matrix, or data frame.
- y — a second numeric vector or matrix. Omit when
xis a data frame (produces full matrix). - use — how to handle missing values. Options:
"everything","complete.obs","pairwise.complete.obs". - method — the correlation algorithm. Default:
"pearson".
How to Interpret the Correlation Coefficient
| Value range | Meaning |
| 0.9 to 1.0 (or −0.9 to −1.0) | Very strong correlation |
| 0.7 to 0.9 (or −0.7 to −0.9) | Strong correlation |
| 0.5 to 0.7 (or −0.5 to −0.7) | Moderate correlation |
| 0.3 to 0.5 (or −0.3 to −0.5) | Weak correlation |
| 0.0 to 0.3 (or 0.0 to −0.3) | Negligible or no correlation |
Basic Usage of the cor Function in R
Calculating Correlation Between Two Variables
Pass two numeric vectors to cor() to get a single correlation coefficient. The example below uses the built-in mtcars dataset to check the relationship between miles-per-gallon (mpg) and horsepower (hp).
data(mtcars)
cor(mtcars$mpg, mtcars$hp)
The result is −0.776 — a strong negative correlation. As horsepower increases, fuel efficiency decreases. The value is between −1 and 1, where the sign gives direction and the magnitude gives strength.
Generating a Correlation Matrix
Pass a full data frame to cor() to produce a correlation matrix — a symmetric table showing the coefficient for every pair of numeric variables at once.
library(dplyr)
mtcars %>% select_if(is.numeric) %>%
cor()
The diagonal always shows 1.000 — every variable is perfectly correlated with itself. Off-diagonal values are the pairwise coefficients you analyze. This matrix is the fastest way to scan for strong relationships across many variables simultaneously.
Correlation Test in R: Using cor.test()
cor() gives you the coefficient. cor.test() gives you the full statistical picture: the coefficient, a t-statistic, degrees of freedom, p-value, and a 95% confidence interval. Use it whenever you need to report whether a correlation is statistically significant.
cor.test() Syntax
cor.test(x, y, method = c("pearson", "spearman", "kendall"),
alternative = c("two.sided", "less", "greater"),
conf.level = 0.95)
Running a Correlation Test in R — Example
cor.test(mtcars$mpg, mtcars$hp)
How to Read the cor.test() Output — Line by Line
| Output line | What it means |
t = -6.7424 |
The test statistic. Larger absolute values mean stronger evidence against no correlation. |
df = 30 |
Degrees of freedom = n − 2. Here n = 32 cars, so df = 30. |
p-value = 1.788e-07 |
Probability of observing this result by chance if the true correlation is zero. Below 0.05 = significant. |
95 percent confidence interval: -0.8852 -0.5863 |
The interval does not cross zero → the negative correlation is reliably non-zero. |
cor = -0.7761684 |
The Pearson correlation coefficient. Same as cor(mtcars$mpg, mtcars$hp). |
The p-value of 1.79e−07 is far below 0.05, confirming the correlation is highly significant. The confidence interval (−0.885 to −0.586) does not cross zero, meaning the negative relationship is not a sampling artefact.
Interpreting p-values and Confidence Intervals
| p-value | Interpretation |
| p < 0.001 | Very strong evidence of a real correlation |
| p < 0.05 | Statistically significant at the standard threshold |
| p < 0.10 | Marginally significant — interpret cautiously |
| p ≥ 0.10 | No statistically significant correlation detected |
Choosing the Right Correlation Method in R
The cor function in R supports three methods. Picking the wrong one will give you a coefficient that misrepresents the actual relationship in your data.
| Situation | Method | Code |
| Continuous data, normally distributed, no major outliers | Pearson (default) | cor(x, y) |
| Ranked / ordinal data, non-normal, or outliers present | Spearman | cor(x, y, method = "spearman") |
| Small sample size or many tied ranks | Kendall's tau | cor(x, y, method = "kendall") |
Pearson, Spearman, and Kendall — Side-by-Side
# Pearson (default) — linear relationship
cor(mtcars$mpg, mtcars$hp)
# Spearman — rank-based, robust to non-normality
cor(mtcars$mpg, mtcars$hp, method = "spearman")
# Kendall — concordance of ranks, best for small n or ties
cor(mtcars$mpg, mtcars$hp, method = "kendall")
All three methods return different values for the same data. Pearson measures the linear relationship; Spearman measures the monotonic relationship on ranks; Kendall measures concordance between ranked pairs. The underlying relationship between mpg and hp is strong enough that all three point in the same direction here — but that will not always be the case in your data.
Running a Spearman or Kendall Correlation Test in R
# Spearman correlation test in R
cor.test(mtcars$mpg, mtcars$hp, method = "spearman")
# Kendall correlation test in R
cor.test(mtcars$mpg, mtcars$hp, method = "kendall")
Note: cor.test() with method = "spearman" or "kendall" does not produce a confidence interval in base R (the output will show NA for it). To get confidence intervals for Spearman, use the DescTools package with SpearmanRho(x, y, conf.level = 0.95).
Handling Missing Data in cor() and cor.test()
The default use = "everything" returns NA for any pair that includes a missing value. This silently breaks your correlation matrix. Always set the use parameter explicitly.
| use parameter | Behaviour | When to use it |
"everything" |
Returns NA if any value is missing | Only when you are certain data is complete |
"complete.obs" |
Listwise deletion — drops entire row if any value is NA | When missing data is rare and random |
"pairwise.complete.obs" |
Uses all available pairs for each correlation separately | When missing data is common — preserves more observations |
# Listwise deletion — only rows with complete data across ALL variables
cor(mtcars, use = "complete.obs")
# Pairwise deletion — maximum data used per pair
cor(mtcars, use = "pairwise.complete.obs")
For most research datasets, "pairwise.complete.obs" is the safer default because it does not discard entire rows for unrelated missing values. Use "complete.obs" when you need every correlation in the matrix to be computed on the same set of observations — required for some downstream analyses like PCA.
Advanced: Correlation Matrix with p-values Using Hmisc
Base R's cor() does not attach significance markers to a matrix. The Hmisc package's rcorr() function solves this — it returns both the coefficient matrix and a matrix of p-values simultaneously.
if(!require(Hmisc)){
install.packages("Hmisc")
library(Hmisc)
}
corstudiodatalab <- function(x){
require(Hmisc)
x <- as.matrix(x)
R <- rcorr(x)$r
p <- rcorr(x)$P
mystars <- ifelse(p < .01, "**|", ifelse(p < .05, "* |", " |"))
R <- format(round(cbind(rep(-1.111, ncol(x)), R), 3))[,-1]
Rnew <- matrix(paste(R, mystars, sep=""), ncol=ncol(x))
diag(Rnew) <- paste(diag(R), " |", sep="")
rownames(Rnew) <- colnames(x)
colnames(Rnew) <- paste(colnames(x), "|", sep="")
Rnew <- as.data.frame(Rnew)
return(Rnew)
}
mtcars %>% select_if(is.numeric) %>%
corstudiodatalab()
The ** marker means p < 0.01; * means p < 0.05. This output format is publication-ready and immediately tells you which correlations are worth interpreting versus which may be noise.
Visualization of Correlation Matrices in R
A raw correlation matrix with 10+ variables is difficult to scan. Visualizations let you identify clusters of related variables, spot sign changes, and communicate findings to non-statistical audiences.
| Tool | Best for |
| corrplot | Quick, publication-quality correlation plots with minimal code |
| ggplot2 + reshape2 | Fully customizable heatmaps integrated into a ggplot2 workflow |
corrplot — Circle Method
library(corrplot)
corr_matrix <- cor(mtcars)
corrplot(corr_matrix, method = "circle")
ggplot2 Heatmap
library(ggplot2)
library(reshape2)
# Compute and reshape correlation matrix
corr_matrix <- cor(mtcars)
melted_corr <- melt(corr_matrix)
# Build heatmap
ggplot(data = melted_corr, aes(x = Var1, y = Var2, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), space = "Lab",
name = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1)) +
labs(title = "Correlation Matrix Heatmap", x = "", y = "")
Red tiles show strong positive correlations; blue tiles show strong negative correlations; white tiles indicate near-zero relationships. This layout makes it immediately obvious which variable pairs deserve further analysis.
Integrating Correlation Analysis in a Reproducible Workflow
A correlation analysis that cannot be reproduced is not publishable. Use R Markdown to embed your cor() and cor.test() calls inside a document that renders code, output, and narrative together. Use Shiny when your audience needs to explore the matrix interactively — for example, filtering by variable group or switching between Pearson and Spearman dynamically.
Best Practices Checklist
- Set a random seed (
set.seed()) before any sampling or imputation steps that precede correlation analysis. - Check for outliers with a scatterplot matrix (
pairs(mtcars)) before choosing Pearson vs Spearman. - Test normality with
shapiro.test()on each variable — use Spearman if any variable fails. - Apply Bonferroni correction when testing many pairs simultaneously to control the false discovery rate.
- Document your
useparameter choice and justify it in your methods section. - Never report a correlation coefficient without its p-value or confidence interval.
Common Pitfalls and How to Avoid Them
Pitfall 1 — Correlation Does Not Mean Causation
A coefficient of −0.78 between mpg and hp tells you these variables move together consistently. It does not tell you that increasing horsepower causes lower fuel efficiency. There may be a confounding variable (e.g., vehicle weight) driving both. Always pair correlation analysis with domain knowledge and, when appropriate, regression modelling.
Pitfall 2 — Using Pearson on Non-Normal Data
Pearson assumes both variables are approximately normally distributed. Applying it to ordinal survey responses, count data, or heavily skewed distributions produces a coefficient that understates or overstates the true relationship. Check normality first; switch to Spearman when in doubt.
Pitfall 3 — Ignoring Outliers
A single extreme observation can move a Pearson coefficient by 0.2 or more in small samples. Plot your data before running cor(). If outliers are present and cannot be removed on scientific grounds, use Spearman — it is rank-based and therefore robust to extreme values.
Pitfall 4 — Leaving use = "everything" (the default)
If your data frame has any missing values, the default setting returns NA for every affected pair without warning. Always set use explicitly. If you receive a matrix full of NAs, this is almost certainly the cause.
Pitfall 5 — Multiple Testing Without Correction
A 10-variable correlation matrix produces 45 unique pairs. At a 0.05 threshold, you expect roughly 2–3 significant results purely by chance. Apply the Bonferroni correction (p.adjust(p_values, method = "bonferroni")) or the Benjamini-Hochberg FDR procedure when testing many pairs.
Conclusion
The cor function in R and its companion cor.test() together cover the full workflow of correlation analysis: computing the coefficient, testing its significance, and building publication-ready matrices. Use Pearson for linear continuous data, Spearman for ranked or non-normal data, and Kendall's tau for small samples with ties. Always handle missing values explicitly with the use parameter, verify significance with cor.test(), and visualize results with corrplot or ggplot2 to communicate findings clearly. Embedding this workflow in R Markdown ensures it stays reproducible and shareable.
Frequently Asked Questions
What is the difference between cor() and cor.test() in R?
cor() returns only the correlation coefficient. cor.test() returns the coefficient plus a t-statistic, degrees of freedom, p-value, and 95% confidence interval. Use cor() for quick exploration; use cor.test() whenever you need to report whether the result is statistically significant.
How do I run a correlation test in R?
Use cor.test(x, y). Example: cor.test(mtcars$mpg, mtcars$hp). Add method = "spearman" or method = "kendall" for non-parametric variants. The default tests Pearson correlation.
What does a p-value in cor.test() mean?
The p-value is the probability of observing a correlation this extreme (or more extreme) if the true correlation were zero. A p-value below 0.05 means the correlation is statistically significant at the 95% confidence level — it is unlikely to be a sampling artefact.
When should I use Spearman instead of Pearson correlation in R?
Use Spearman when your data is ordinal or ranked, clearly non-normal, or contains outliers you cannot remove. Spearman measures the monotonic relationship on ranks rather than the raw values, making it more robust. Code: cor(x, y, method = "spearman") or cor.test(x, y, method = "spearman").
How do I handle missing values in cor() in R?
Set the use parameter explicitly. use = "complete.obs" drops any row with a missing value across all variables. use = "pairwise.complete.obs" uses all available pairs for each correlation separately — preserving more data. The default "everything" returns NA for any affected pair.
What is the difference between COV and COR in R?
cov() computes covariance — an unstandardized measure of how two variables change together. Its magnitude depends on the scale of the variables. cor() standardizes covariance to produce a value always between −1 and 1, making it scale-independent and directly comparable across different variable pairs.
How do I get a correlation between two columns in R?
Use cor(df$column1, df$column2). For significance testing: cor.test(df$column1, df$column2).
What is the Hmisc rcorr() function and when should I use it?
rcorr() from the Hmisc package computes an entire correlation matrix and its corresponding p-value matrix simultaneously. Use it when you need significance levels for every pair in a matrix — base R's cor() does not provide p-values for matrices, only cor.test() does, and only for one pair at a time.
References:
- Shantal, M., Othman, Z., & Bakar, A. (2023). A novel approach for data feature weighting using correlation coefficients and min–max normalization. Symmetry, 15(12), 2185. https://doi.org/10.3390/sym15122185
- Wang, J. and Zheng, N. (2014). Measures of correlation for multiple variables. https://doi.org/10.48550/arxiv.1401.4827
- Çayak, S. (2022). A study on teachers shows the mediating role of organizational happiness in the relationship between work engagement and life satisfaction. International Journal of Contemporary Educational Research, 8(4), 27–46. https://doi.org/10.33200/ijcer.852454
Need help applying these techniques to your own dataset? Our team at RStudioDatalab supports researchers, students, and businesses with one-on-one sessions via Zoom, Google Meet, or chat. Contact us at contact@rstudiodatalab.com or schedule a discovery call.