Curriculum Overview
The story that we tell our students about statistics is a tale that can be told in many ways. This document summarizes the story as ActivStats tells it. It then provides detailed information and advice on teaching with ActivStats. Finally we provide a detailed syllabus that discusses each lesson in turn, highlighting the big ideas conveyed in that lesson and noting pedagogical points of interest.
Course Overview
We study statistics to better understand the world through better understanding of data. Data are (usually quantitative) information along with a context for that information. The principal concern of statistics is the variation in that data; if there were no variation we could understand the world with no further analysis. We seek patterns and relationships that show through the variation, and we hope to draw conclusions about the world based on those patterns, that honestly reflect the limitations imposed by that variation.
When we first encounter data, we ask after its pedigree. If we do not know what was measured, we can never learn from the data. Data are structured as variables, which record the same information in the same units for many individuals or cases. Ordinarily, variables appear as columns and cases as rows in a data table.
We display data, seeking patterns. The intuitive area principle guides the design of displays, devoting equal display area to equal amounts or values. For individual variables we display distributions and think about numbers of modes, skewness and symmetry, and the possibility of outliers. A standard of comparison is the Normal density. Density curves formalize the idea of relative frequency from the vague area principle idea.
We summarize individual variables by finding their center and spread. For pairs of variables measured on the same individuals we make scatterplots and consider the direction, form, and squares lines and with residuals away from those lines.
We next consider how good data arise, with particular attention to the role of randomization. We randomize experiments to reduce bias and minimize the effect of factors that we cannot control. We draw random samples so that they will represent the underlying population. To characterize randomization we turn to probability.
We consider random outcomes. The outcome of an event is random if we cannot predict individual outcome values, but anticipate long-run regularity. We name this long-run regularity probability. Experiments with random outcomes reveal the law of large numbers and the central limit theorem.
As we appreciate randomness, we realize that the summary statistics we discussed early in the course are themselves random, because repeating a randomized experiment or random sample will yield different summary values. As random phenomena, these statistics exhibit long-run regularity; in particular, they have sampling distributions, which we can examine. The sampling distribution of a statistic permits formal inference because it describes what to expect if we repeat the study many times. We note specifically that sampling distributions arise because of the randomness that we deliberately introduced in random sampling and randomized assignment of experimental treatments.
This lets us consider ways to be precise in describing our uncertainty about the true state of the world. With the central idea of understanding what would happen over many repetitions, we construct confidence intervals and hypothesis tests. Simulation makes it easier to see confidence intervals and hypothesis tests as statements whose probability component talks about how often they are correct over many repetitions.
These methods of inference are much the same across a variety of applications, and the way in which ActivStats visualizes them emphasizes that consistency. We close the course with discussions of inference in a variety of settings.