By building a set of modules to teach data science in STEM courses at Dartmouth, we build a flexible and reusable set of tools and methods for faculty to enrich learning objectives through the hands-on exploration of data collection, analysis, and visualization.
DIFUSE Modules
Our team works with faculty in the sciences and social sciences to build data science learning modules for existing courses. These modules could be for a short assignment or a longer-running exercise with skill-building components. Module teams consist of 2-3 students (graduate and undergrad), one of the DIFUSE grant PI’s. We do the heavy lifting, with input from the faculty member during weekly meetings.
Modeling First Order Systems with Footage of a Small Motorized Cart
This module examines the open loop response of a small motorized cart with a voltage applied to the motor. This module has 2 components: The individual data collection and analysis, and then drawing conclusions based on the aggregated class data.
Exploring the Relationships between Land Use, Deer Population, and Lyme Cases in Four U.S. States
This module allows students to explore data on lyme disease cases, deer population, and land use and environmental factors for four different states, Connecticut, Maryland, New Hampshire, and Massachusetts using various data analysis techniques. Six Canvas quizzes with mainly short answers and a few multiple choice questions guide students through a Google Colab application.
Using the Wind Power Equations to Site a Wind Farm
This module allows students to engage with the wind energy power equations and explore other considerations in the siting of a wind farm. Students work through three block assignments in Google Colab, beginning with the wind power equations and culminating in considerations in siting a wind farm.
Using Statistics and Supervised Machine Learning to Inform Airline Decision Making
This module reinforces underlying statistical concepts in the process of building a data analysis pipeline. Students practice statistical concepts to gain an understanding of the airline data in Part 1, then the data is used to implement machine learning models in Part 2. The final deliverable is a slide deck, in which students act as consults for the Phoenix Sky Harbor Airport using insights gained from supervised machine learning analysis of the relationship between airline carrier delays and passengers per flight.
Using Footprint Data to Make Inferences about Historical Societies
In this module students learn and apply the systematic steps that anthropologists may take to make deductible inferences about historical societies given the observations of fossil (foot print) records. Students first collect data on their own footprints using a sandbox built by DIFUSE, then analyze aggregated data from the entire class, and finally use their insights to make inferences about social behavior of historical populations.
Quantifying Behavior Using Focal Bout and Instantaneous Scan Sampling
This course module is a two-step assignment in which students collect data on shots taken during a provided basketball game video using the two main data collection methods used in research on primate behavior, focal bout sampling and instantaneous scan sampling. The class data is then aggregated and visual representations are created and discussed. The goal of the module is for students understand the respective strengths and weaknesses of the two data collection methods.
Examining Air Quality Data in Germany
This module consists of six assignments in which students learn and then apply air quality dispersion modeling using an R-based programming module, with the help of the package ‘openair’ and open-sourced air quality datasets of cities in Germany.
Examining the Effect of Different Factors on Self-Rated Health in Texas Counties
This course module consists of four assignments in which students explore different categories of factors that could affect self-rated health in Texas counties. Students use a linear regression and a heat map to explore these relationships.
Modeling the Glucose Insulin System
This module consists of two assignments. The first guides students through modeling simple ODEs in Matlab, and the second, longer assignment, guides students through modeling the Glucose Insulin System in Matlab with Euler’s Method. The students are then expected to explore this model by optimizing one parameter for a given set of data using the least squares method.
Examining the Racial, Environmental, and Economic Influences on COVID-19 Mortality in Louisiana
This course module is a web-app accompanied by a short-answer-based assignment to guide students through it. Students will use spatial data to visualize human-environment relationships and analyze those relationships through data visualization, plotting, and linear regression analysis.
Statistics in R
This course module consists of Jupyter notebooks designed to introduce students to basic functional R commands/procedures whilst tying in key statistical content. It aims to give novice students competence in R and challenge experienced students.
Stars and the Milky Way
This course module is a series of group exercises and one problem set designed to introduce students to the way Astrophysicists manipulate data and perform analyses in Python with an emphasis on data visualization and plot interpretation.
Exploring Eddy Covariance Method
The purpose of this lab is to introduce students to the basics of Eddy Covariance, explore raw measurement data to observe visible patterns across seasons and time of day, as well as being able to discover meaningful relationships between variables important to the ecosystem.
Environmental Change
Through this project, the students will have the opportunity to measure environmental change. Students will also be exposed to temperature and insolation related public datasets. The project is appropriate for courses in introductory environmental sciences, earth sciences, and any other courses related to the climate.
Climate Extremes in a Warming Planet
The problem sets were designed to introduce students to important concepts/applications in Python and to connect the lecture content. In order to keep the problem sets simple and not overwhelm the students, the problem sets were broken up into five separate, shorter assignments. The contents of the problem sets are outlined below to indicate after which lectures the problem sets should be introduced.
Data Science in Psychology
The course module is designed to show students what Data Science in Psychology is like, at a high level. We want them to see how real-world data can be collected, and how that gets translated into something we can hypothesize and experiment with.
Differential Equations
Ensure that students appreciate the need for understanding domain context (Math 23 concepts) in deriving ODE solutions.