online course: intro to stats with R

This Summer, Felicia Zhang and I are developing an online course with the Princeton McGraw Center for Teaching and Learning. Below is an overview of the course, which will be accessible in Fall, 2018:

Two of the biggest challenges for undergraduates in psychology are understanding key concepts in statistics and applying those concepts to analyze data and interpret findings. These challenges not only make it difficult for students to understand material in lectures and labs, but also difficult for instructors to help the students because students feel defeated and become unwilling to engage with the course. Therefore, we are designing an online course to introduce statistics and R programming. Integrating statistics and R programming in one course is ideal for learning: The former is essential for students to understand research more broadly and the latter is an important tool for students to engage with research directly. For example, a psychology student must interpret statistical results from prior experiments as well as analyze their own data for their senior thesis. In sum, our course is designed to help students who have minimal prior experience to understand key concepts in statistics and to apply those concepts to realistic problems in psychology research.

The course will include 6 modules (i.e., getting started with statistics; getting started with R; descriptive statistics; correlation; one-sample t-test and binomial test; two-sample t-test) and each module will have the same basic structure: The first portion helps students to understand key concepts. Students will watch narrated text, live drawings, or videos. To assess students’ understanding, students will complete multiple choice questions. The next portion of the module helps students to apply key concepts via R programming. First, we will pose realistic psychology research questions (e.g., Do toddlers who hear more language from caregivers tend to have larger vocabularies?). Students will observe how we answer each question with the appropriate statistical test (e.g., correlation) and R syntax (e.g., cor.test(data$language, data$vocabulary)). Next, we will pose new, similar questions (e.g., Do toddlers who have more books at home tend to hear more language from caregivers?) and students will attempt to answer these questions. To assess students’ understanding, students will input their answers and receive feedback. The last portion of the module helps students to review key concepts. Students will watch narrated text, live drawings, or videos. Finally, to assess students’ overall understanding, students will complete a module quiz.

After participating in our course, students will have fundamental knowledge of statistics and R programming. Although both are extremely important for students in psychology, students need more resources to understand key concepts in statistics and to apply those concepts to real research (e.g., their senior thesis). Our course will provide these resources, with two notable strengths: First, unlike other online R programming courses, we will use realistic, psychology-specific examples. This design enables direct connections between what students learn in lecture, in lab, and in our online course. Second, although our course is tailored to the needs of psychology students, having basic knowledge of statistics and R programming is applicable to a growing number of fields (e.g., sociology, politics, etc.). In sum, our online course will support learning among undergraduates in psychology and could have wide-reaching impact among undergraduates in science, more broadly.

princeton R workshop

Almost just as soon as I finished the NIRS workshop in Rochester, I was back on the road! This past week, I was in Princeton for a statistics workshop.

The invited speaker was Dr. Stefan Th. Gries from UC Santa Barbara. He’s a corpus linguist, so thankfully most of his examples were somewhat familiar to me. Variables like “givenness of the subject” can be a little tricky if you’re not familiar! Additionally, he came prepared with all the R code we’d need, and several sample datasets.

During the workshop, we used R to explore the sample datasets with models and plots. The main focus was the mixed-effects model selection process. This has always been confusing for me, and I sincerely appreciated the clarity of the overview. It was a great confidence-booster!

Here’s Dr. Gries’ recommended strategy:

  • Formulate a model with the most complete fixed effects and random effects structure, using REML.
  • Run model selection process for the random effects (i.e., simplify the random effects structure).
  • Refit that model using ML.
  • Run model selection process for the fixed effects (i.e., simplify the fixed effects structure).
  • Refit the final model with REML again.
  • Interpret the output of the final model, and run diagnostics (e.g., categorization accuracy).
  • Plot fixed and random effects.

And, of course, here are some photos from the trip!

IMG_6983  IMG_7010

IMG_7014 IMG_7032