III. Generating Data
Good data arise from careful data collection. An important insight of modern statistics is the central role of randomness in helping to obtain modest amounts of data that describe entire populations and in making subsequent inference possible. A laudable trend in statistics teaching has moved toward greater and more careful discussion of good data collection. ActivStats supports this trend with real-world video stories, discovery learning based on student experiments, and simulations.
10. Sample Surveys
We distinguish a sample from the underlying population. We hope to learn about the population by collecting and examining sample data. A video shows Frito-Lay sampling to accept a truckload of potatoes. Students try sampling from a large population of potatoes in Data Desk, observing the ways in which the samples do and do not resemble the population, and how sample size affects the resemblance.
A discussion of bias in sample surveys leads to a video presentation of the Literary Digest poll story.
If your sample is not representative, even a large sample size can't correct for the bias.
We define a Simple Random Sample and introduce the idea of random sampling. An amusing video demonstrates how poor question wording can introduce bias.
Key Points
| Population | the entire group of individuals or instances about whom we hope to learn. |
| Sample | a representative subset of a population, examined in hope of learning about the population. |
| Simple Random Sample | A simple random sample of n elements is one in which each set of n elements in the population has an equal chance of selection. |
| Bias | Any systematic failure of a sample to represent its population is bias. Common errors include: * relying on voluntary response * undercoverage of the population * nonresponse bias |
| Randomization | The best defense against bias is randomization, in which each individual is given a fair, random, chance of selection. |
Teachers' Notes: Although everyone expects a spoonful of soup to represent how the entire pot tastes, seeing the histogram of a sample mimic the histogram of a population is new to most students.
This is the first mention of randomness. ActivStats presents randomization as a good thing. A goal of this discussion is to motivate the study of random behavior.
The Literary Digest story, although old, introduces George Gallup as he founds his polling organization. The story is often told in textbooks, but the video shows and tells more of the story than usually appears in texts.
This is a good place to discuss internet-based voluntary response "surveys" and warn students that they are no more valid (and usually less so) than the Literary Digest survey and for most of the same reasons. It is easy to find such "surveys" online, so the discussion can be about a specific survey going on at the time of the class. It is also a good time to look at the "random" digits in the class survey.
Background Notes: This is the first mention of Sample and Population in ActivStats. Many textbooks introduce these concepts early in an attempt to explain what Statistics is about, but the concepts are not needed in any of the discussions up to this point. By delaying the introduction of these concepts, we have allowed students to focus on the data.
Of the two common definitions of an SRS: "all samples of n are equally likely" and "each case has an equal and independent chance of selection," we use the first here because the concept of independence has not yet been defined. Beware texts that define SRS only as giving each individual the same chance of selection without mentioning independence.
11. Designed Experiments
A video shows Union Carbide performing experiments to compare a new foam to other kinds of foam. Randomized comparative experiments and random sampling both use chance deliberately to reduce bias. Experiments have advantages especially if we hope to show causation.
The three rules of experiment design are:
Students perform an experiment on themselves testing whether they can read a pie chart or a stacked bar chart more accurately. A case study returns to the earlier circle clicking experiment, now viewing the student's individual data in terms of factors and responses.
Key Points
| Experiment | An experiment deliberately imposes one of two or more randomly assigned treatments on individuals in order to observe and compare their responses to the different treatments. |
Teachers' Note: The importance and the effectiveness of randomized assignment of subjects to treatments is a central theme here. It is important to overcome the natural feeling that "random" means "haphazard; out of control; scary" and substitute identification with "fair and equitable; unbiased." We are still motivating the study of random behavior soon to come.