Descriptive statistics summarize a dataset's center, spread, and shape using values like the mean, median, mode, standard deviation, and quartiles. In R, the fastest way to get them is summary(data) for a quick overview or sapply(data, function) to pull one specific statistic across every column.

Descriptive statistics describe the sample you have. Inferential statistics use that sample to draw conclusions about a larger population — they answer different questions and use different tools.
R's built-in summary() and sapply() functions cover most use cases without installing anything extra.
The summarytools and psych packages add frequency tables, cross-tabulations, and group-wise summaries with minimal code.
Excel can compute the same statistics manually or via the Data Analysis ToolPak — useful context if you're moving between the two tools.

Descriptive statistics in R — central tendency and dispersion measures explained with iris dataset code

Table of Contents

Descriptive Statistics in R: Complete Tutorial with Code Examples

Q: What are descriptive statistics in R?

Descriptive statistics in R are numerical summaries — mean, median, mode, standard deviation, range, and quartiles — computed using functions like summary(), mean(), median(), and sd(). They describe the dataset you have without making claims about a larger population.

Q: What is the difference between descriptive and inferential statistics?

Descriptive statistics summarize the sample you've collected. Inferential statistics use that sample to draw conclusions about a broader population, typically through hypothesis tests, confidence intervals, or regression models — and they require assumptions descriptive statistics don't.

Q: What are the 5 basic descriptive statistics?

The five most commonly reported are the mean, median, mode, range, and standard deviation. Together they describe a dataset's center and spread.

Q: How do I get descriptive statistics for a data frame in R?

Run summary(your_dataframe) for a one-line overview of every column, or sapply(your_dataframe, function_name) to apply one specific statistic to every numeric column at once.

Q: How do I compute descriptive statistics by group in R?

Use by(data, data$grouping_column, summary) in base R, describeBy() from the psych package, or group_by() combined with summarise() from dplyr.

Q: Is standard deviation a descriptive statistic?

Yes. Standard deviation measures how spread out the data is around the mean and is one of the core measures of dispersion in descriptive statistics, alongside variance and the interquartile range.

Q: Is correlation a descriptive statistic?

Correlation describes the strength and direction of a relationship between two variables and is generally classified as descriptive. Testing whether that correlation is statistically significant moves into inferential statistics.

Q: What is the main purpose of descriptive statistics?

To summarize and characterize the key features of a dataset — its central tendency, variability, and distribution — so the data can be understood and communicated before any further modeling or hypothesis testing.

Q: How do I handle missing values when computing descriptive statistics in R?

Most base R functions accept an na.rm = TRUE argument to exclude missing values from the calculation. Alternatively, use na.omit(data) to remove rows with missing values before computing statistics.

Q: What are the limitations of descriptive statistics?

Descriptive statistics summarize data but can't establish causation or explain why a pattern exists. They're also sensitive to outliers and don't tell you whether a pattern would hold in a different sample.

Descriptive statistics are the numerical and graphical summaries — mean, median, mode, standard deviation, quartiles, range — that describe the basic shape of a dataset before any modeling begins. In R, you can get most of them in one line: summary(your_data). This guide covers what descriptive statistics are, how they differ from inferential statistics, how to compute and interpret them in R using both base functions and the summarytools package, worked examples on a real dataset, and a short comparison with doing the same calculations in Excel.

What Is Descriptive Statistics?

Descriptive statistics is the branch of statistics concerned with summarizing, organizing, and presenting the main features of a dataset — without drawing conclusions beyond that dataset. They reduce a long column of numbers into a handful of interpretable values, and they're typically the first step in any data analysis project, R-based or otherwise.

Descriptive statistics fall into two groups:

Measures of central tendency — the mean, median, and mode, which describe where the data is "centered."
Measures of dispersion — the range, variance, standard deviation, and interquartile range, which describe how spread out the data is.

A smaller set of descriptive measures — skewness and kurtosis — describe the shape of the distribution (how lopsided or peaked it is) and are covered in the advanced section below.

Descriptive Statistics vs. Inferential Statistics

It is the single most common point of confusion for students starting out in R, so it's worth a direct answer: descriptive statistics describe the data you have; inferential statistics use that data to make claims about a population you don't have. A mean and standard deviation calculated on your sample are descriptive. A t-test or confidence interval that tells you something about the broader population is inferential.

Aspect	Descriptive Statistics	Inferential Statistics
Purpose	Summarize the sample at hand	Generalize from a sample to a population
Example R functions	`summary()`, `sapply()`, `sd()`, `IQR()`	`t.test()`, `aov()`, `lm()`, `cor.test()`
Typical output	Mean, median, mode, SD, quartiles	p-values, confidence intervals, test statistics
Answers	"What does this data look like?"	"Can I trust this pattern beyond my sample?"
Requires assumptions?	No	Yes — normality, independence, etc.

In practice, every statistical workflow starts with descriptive statistics and moves to inferential statistics only once the data's shape is understood. If you're checking whether your data meets the assumptions for an inferential test, see how to run a Shapiro-Wilk normality test in R or the Levene test for equal variances. For a deeper treatment of inferential methods themselves, see our guide to inferential statistics in R.

Measures of Central Tendency

Mean

The arithmetic average — sum of all values divided by the count. Sensitive to outliers, which can pull it noticeably higher or lower than the "typical" value.

Median

The middle value once the data is sorted. Less sensitive to outliers than the mean, which makes it the better central-tendency measure for skewed data like income or housing prices.

Mode

The most frequently occurring value. Most useful for categorical or discrete data; a dataset can have one mode, several, or none at all.

Measures of Dispersion

Range

Maximum minus minimum. Simple, but says nothing about how the values are distributed between those two extremes.

Variance and Standard Deviation

Variance is the average squared deviation from the mean; standard deviation is its square root, which puts the measure back into the same units as the original data — the reason standard deviation is reported far more often than variance.

Quartiles and Interquartile Range (IQR)

Quartiles split the sorted data into four equal parts. Q1 (25th percentile), Q2 (the median), and Q3 (75th percentile). The IQR — Q3 minus Q1 — measures the spread of the middle 50% of the data and, unlike the range, isn't distorted by extreme outliers.

How to Calculate Descriptive Statistics in R

We'll use R's built-in iris dataset throughout — 150 measurements of sepal and petal dimensions across three flower species — since it's available in every R installation with no download required.

Load and inspect the data.
```
dat <- iris          # load the iris dataset
head(dat)             # first 6 rows
str(dat)               # structure: types and dimensions
```
Info!
iris has 150 observations across 5 columns: four numeric measurements (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) and one factor (Species, with 3 levels).

Get a full summary with one function.

summary(dat)

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width  
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50

summary() returns the minimum, 1st quartile, median, mean, 3rd quartile, and maximum for every numeric column, and a frequency count for the factor column — six numbers per variable, computed and labeled automatically.

Pull a single statistic across every column with sapply().
```
# standard deviation for each numeric variable
sapply(dat[, -5], sd)

# variance for each numeric variable
sapply(dat[, -5], var)

# interquartile range for each numeric variable
sapply(dat[, -5], IQR)
```
Warning!
dat[, -5] excludes the 5th column (Species) because it's categorical — functions like sd() and var() will error or return NA on non-numeric data.

Summarize a categorical variable with table().

table(dat$Species)

    setosa versicolor  virginica 
        50         50         50

Descriptive Statistics Examples

A worked example makes the formulas concrete. Using dat$Sepal.Length from the iris dataset:

Statistic	R code	Result
Mean	`mean(dat$Sepal.Length)`	5.843
Median	`median(dat$Sepal.Length)`	5.8
Standard deviation	`sd(dat$Sepal.Length)`	0.828
Variance	`var(dat$Sepal.Length)`	0.686
Range	`range(dat$Sepal.Length)`	4.3 – 7.9
IQR	`IQR(dat$Sepal.Length)`	1.3

Read together: the average sepal length across all 150 flowers is 5.84 cm, the typical (median) flower is close behind at 5.8 cm — so the distribution isn't badly skewed — and the middle 50% of flowers fall within a 1.3 cm band, while the full range spans 3.6 cm.

Advanced Descriptive Statistics in R

Base R functions cover the essentials. For more polished, report-ready output, the summarytools package is the most commonly used add-on, built around four functions:

freq() — frequency tables for categorical variables
ctable() — cross-tabulations between two categorical variables
descr() — descriptive statistics for numeric variables, including by-group breakdowns
dfSummary() — a full data-frame summary with mini-graphs for every column at once

library(summarytools)

# frequency table for a categorical variable
freq(dat$Species)

# descriptive statistics for numeric variables
descr(dat, stats = "common")

# full data frame summary with mini-plots
dfSummary(dat)

The psych package's describeBy() function is the standard choice when you need statistics broken out by group, and additionally reports skewness and kurtosis — the shape measures mentioned earlier:

library(psych)
describeBy(dat, dat$Species)

For grouped summaries using the tidyverse approach, dplyr's group_by() and summarise() are the most widely used combination in modern R code — see our dplyr guide for the full syntax.

Descriptive Statistics in Excel

If you're coming from Excel or need to cross-check R output against a spreadsheet, the same statistics are available two ways. Manually, with individual functions: =AVERAGE() for the mean, =MEDIAN(), =MODE.SNGL(), =STDEV.S() for sample standard deviation, =VAR.S() for sample variance, and =QUARTILE.INC() for quartiles. Or in bulk, via Excel's Data Analysis ToolPak: Data tab → Data Analysis → Descriptive Statistics, which returns mean, standard error, median, mode, standard deviation, sample variance, range, minimum, maximum, sum, and count in a single output table — functionally equivalent to R's summary() plus a few extras. The ToolPak is an Excel add-in and may need enabling first under File → Options → Add-ins.

Descriptive Statistics in Psychology

In psychology research specifically, descriptive statistics serve the same summarizing role but are reported with a standard set of conventions: mean and standard deviation (written as M and SD) for continuous variables, and frequencies or percentages for categorical ones, typically presented in APA format — for example, "M = 5.84, SD = 0.83." Psychological studies almost always report descriptive statistics before any inferential test (t-tests, ANOVA) so readers can judge whether the sample looks reasonable before trusting the inferential conclusions drawn from it.

Quick Reference: R Functions for Descriptive Statistics

Package	Function	Description
base	`summary()`	Summarize an object
base	`sapply()`	Apply a function across all columns
base	`table()`	Create a frequency table
base	`quantile()`	Calculate sample quantiles
summarytools	`dfSummary()`	Full data-frame summary with mini-graphs
summarytools	`freq()`	Frequency table
summarytools	`descr()`	Descriptive statistics by variable or group
summarytools	`ctable()`	Cross-tabulation of two categorical variables
psych	`describeBy()`	Descriptive statistics by group, with skew/kurtosis
dplyr	`summarise()`	Custom summary statistics with group_by()

Conclusion

Descriptive statistics are the foundation every later analysis — t-tests, ANOVA, regression — gets built on. In R, summary() and sapply() alone cover the vast majority of real-world needs, and summarytools or psych close the gap when you need report-ready output or group comparisons. Once your descriptive statistics are in hand and you understand your data's shape, the next step is usually checking whether it meets the assumptions for an inferential test — start with a normality test if you're heading toward a t-test or ANOVA.

What are descriptive statistics in R?

Descriptive statistics in R are numerical summaries — mean, median, mode, standard deviation, range, and quartiles — computed using functions like summary(), mean(), median(), and sd(). They describe the dataset you have without making claims about a larger population.

What is the difference between descriptive and inferential statistics?

Descriptive statistics summarize the sample you've collected. Inferential statistics use that sample to draw conclusions about a broader population, typically through hypothesis tests, confidence intervals, or regression models — and they require assumptions descriptive statistics don't.

What are the 5 basic descriptive statistics?

The five most commonly reported are the mean, median, mode, range, and standard deviation. Together they describe a dataset's center (mean, median, mode) and spread (range, standard deviation).

How do I get descriptive statistics for a data frame in R?

Run summary(your_dataframe) for a one-line overview of every column, or sapply(your_dataframe, function_name) — e.g. sapply(df, sd) — to apply one specific statistic to every numeric column at once.

How do I compute descriptive statistics by group in R?

Use by(data, data$grouping_column, summary) in base R, describeBy(data, data$grouping_column) from the psych package, or group_by() combined with summarise() from dplyr.

Is standard deviation a descriptive statistic?

Yes. Standard deviation measures how spread out the data is around the mean and is one of the core measures of dispersion in descriptive statistics, alongside variance and the interquartile range.

Is correlation a descriptive statistic?

Correlation describes the strength and direction of a relationship between two variables and is generally classified as descriptive — it summarizes a pattern in the sample. Testing whether that correlation is statistically significant (with cor.test()) moves into inferential statistics.

What is the main purpose of descriptive statistics?

To summarize and characterize the key features of a dataset — its central tendency, variability, and distribution — so the data can be understood and communicated before any further modeling or hypothesis testing.

How do I handle missing values when computing descriptive statistics in R?

Most base R functions accept an na.rm = TRUE argument — for example mean(x, na.rm = TRUE) — to exclude missing values from the calculation. Alternatively, use na.omit(data) to remove rows with missing values before computing statistics.

What are the limitations of descriptive statistics?

Descriptive statistics summarize data but can't establish causation or explain why a pattern exists. They're also sensitive to outliers (particularly the mean and range) and don't tell you whether a pattern would hold in a different sample.

Need help applying descriptive statistics to your own dissertation or research dataset? I'm a PhD-level statistical consultant working in R, SPSS, Minitab, and Excel. Message me on WhatsApp and I'll review your data and tell you exactly what's needed.

Chat on WhatsApp Schedule a Consultation

RStudioDataLab

Descriptive Statistics in R: Tutorial, Code & Examples | reddit