Thursday, November 8, 4:00 - 5:00 p.m.,
S107 Pappajohn Business Building

Hosted by the DeLTA Center. TLC credit: traditional, REACH

Presenter: Neil Heffernan - Professor, Department of Computer Science; Director, Learning Science & Technologies Graduate Program, Worcester Polytechnic Institute
Title: A Brief History on the Last 20 Years of Educational Data Mining: A Personal Perspective


Personalizing education has emerged as a popular topic in recent years. Mark Zuckerberg's foundation, for instance, is devoting large sums of money toward incorporating personalized instruction and pacing into the educational experience. Intuitively, if a human or computer instructor thinks that a student knows a particular topic, then the student should be allowed to progress to learn new topics. The question here then becomes how this estimation of knowledge is made. How can researchers, developers, and administrators operationalize student knowledge (i.e. the ‘if a student knows a particular topic' aspect of the problem)? The field of educational data mining, including a large society of over 200 scientists, has focused on this problem for the last 20+ years. The field now has a thriving journal and an annual conference that has just met for its 10th consecutive year. In 2010, the KDD Cup Competition raised a great deal of interest in predicting student knowledge as individuals and teams competed to win prizes over whose algorithms could best predict what is now termed “Next Problem Correctness” (NPC); given a student history of problem correctness over time, the task is to predict whether the student will answer the next problem correctly. In the past decade, a large number of papers have explored how best to do this prediction using a variety of techniques including Random Forests, Bayesian Networks, Logistic Regression-based techniques, and several others. Some of these approaches have even attempted to personalize the predictions, investigating the use of techniques such as clustering or through learning individualized latent attributes such as prior knowledge and learning rates. More recently, Deep Learning techniques have been applied to the task as well. In this talk I will give a personal history of this research area with which I have remained actively involved. I will talk about what we have learned as a field, and I will also discuss what I believe the new agenda needs to be, as predicting NPC has serious limitations and has waned in its usefulness to the field. In particular, I suggest that the EDM community needs to focus on how to act to benefit students in a personalized manner; if ‘personalization’ means anything, a system should be able to decide what to give each student (i.e. what assistance to provide) to most benefit that student. Toward achieving this goal, we, as a field, need to run Randomized Controlled Trials (RCTs) that compare different instructional messages and analyze them with the latest state-of-the-art techniques to determine which types of instruction work best for each individual student. I will talk about a recent attempt my graduate student and I have made to apply Deep Learning to predict the results of 22 experiments conducted inside of ASSISTments, a web-based tutoring system used by 50,000 kids to do their homework. Lastly, I will end with shared conversation about how we are further using reinforcement learning (i.e. bandit algorithms) to attack these problems. While accurate predictions are presumably an important step toward doing so, I will posit that if our field is going to achieve our lofty goals we will need to not just PREDICT but to take ACTION.