The Teaching with Real Data project collects real-world data from sources such as scholarly journals and government agencies for use in statistics and data science courses. The data sets can be sorted by uses to make it easier for educators to identify data sets that fits their needs.

All data was listed as open source but if you would like it removed then send me an email at . Feedback and data suggestions are welcomed. This repository is licensed under GPL-2.

Visit my Substack Briefed by Data if you are interested in data, current events, or are curious. Faculty webpage with vita and other information.

Note: I started this in August 2025 and I will be adding datasets regularly throughout the fall.

Other Resources



Return to Github home page

Datasets

The Stats, Stats2, DataSci, and Graphs columns suggest specific and common uses for the data set, however there are certainly more. Separate columns are used to make sorting easier. Notes give some useful information for instructors, whereas meta contains citations and defines variables.