Online learning, after all, is just a form of learning: time spent studying is one of the best predictors of success.
Both the pattern (and the exceptions) can be seen quite clearly on the Open University Learning Analytics dataset, which collects anonymized data about the personal characteristics and, crucially, interactions with the Open University's Virtual Learning Environment (as counts of clicks by date) of 32,593 students registered in 22 courses; see the linked entry on Nature for a detailed description of the data set. For this quick exploratory analysis I chose to focus on students that either passed or failed their courses, ignoring those who withdrew along the way; the latter is a very frequent outcome in this kind of setting (31% of cases in the data set), but one that merits a separate analysis.
Of those students that completed the course, 68.6% passed it (13.5% of them with Distinction), and 31.4% failed. To what degree was this a matter of sheer effort?
Here the data supports what teachers and parents always say. Only about a third of the students who interacted with the learning platform between 10 and 23 days (the second decile of activity) passed the course, while 94% of those who did it between 120 and 155 days (the ninth decile) did. This is perhaps an obvious effect, but it's noteworthy that even among the highest deciles of activity, more activity leads to a better result: moving from the eight to the ninth decile of activity — from, say, 110 days of activity to 140 — raises the probability of passing the test an extra five percent.
There are things we can say about the probability of somebody passing the course before it begins. Most significantly, the probability of passing the course among students who finish it grows strongly with the already achieved educational level of the student (note that this date refers to the UK educational system).
There's nothing mysterious about the mechanics of it. By and large, better-educated students interact more often with the platform, and the extra days explain much of the variability in outcome.
This is the point where reading the data becomes tricky, and domain experience and a healthy dosis of skepticism become useful. There's both a correlation and a reasonable mechanism of influence between studying more days and getting a better outcome, which — as an hypothesis to guide interventions — suggests we should attempt to get students to interact with the platform more often. But understanding why they already don't do it on their own is critical to understanding what would help, and that's not necessarily obvious from this data. For example, one possibility is that students simply underestimate how many days of study they'll need in order to get a reasonable chance of passing the course; if that's the case, then explicit, dynamic guidance on this could be of use (including something like a regular, model-based Estimated Probability of Passing alert).
On the other hand, the data does suggest that more exogenous constraints probably play a role. To their credit, and that's something that every educational system should attempt to replicate, this Open University data set also includes socio-economic information in the form of the student's approximate Index of Multiple Deprivation, an statistical proxy — based on a ranking comparison between places in England — to issues like crime prevalence, unemployment, education, income, etc, of the place where the student lived during the course.
This index is correlated with the outcome of the course, as would be expected (a higher IMD band indicates a more favorable socio-economic context):
But also with, and arguably through, the number of days students interact with the platform:
So there are factors, which could be cultural but might as well be, and we could easily imagine are, related to constraints in resources, time, energy, support networks, etc., of students living in more deprived areas. If or to the degree to which the latter is the cause, "gamification" features like the one described above would at best be useless and at worst a mockery. The point of data-driven analysis is to be able to determine what's going on, in order to guide our intuition about what could help; this data set suggests possibilities, but that's as far as we can get with it.
Of course, that in this post we're playing at reinventing the wheel — poorly. Education experts are deeply familiar with everything we've discussed so far, from the impact of study time on outcomes to the effect of socioeconomic constraints. The point isn't that we have found anything new, but rather to show how already-known things surface very quickly and obviously whenever data is gathered in a sufficiently comprehensive and open way, and the possibilities for personalized diagnostics and scalable assistance that this might offer as a way of assisting and helping educational systems.
On the topic of things already well-known, we've seen that putting in days of interaction with the platform improves students' chances of passing the course, and that better-educated students have a higher a priori chance of doing it. Is the increased time all of it? In other words, do higher educational achievements, besides being correlated with exogenous and endogenous factors related to being able to study more, also enable students to do it better? Do students with different educational backgrounds get different amounts of value from a day of interacting with the system?
The data set only offers indirect clues to this, but as far as we can see, this is true pretty consistently. For each intensity of interaction with the platform, students with a higher level of education will, generally speaking, do better (click on the graph for a larger version):
We can't distinguish with this data, of course, the details of the mechanics of how this happens; "better study habits" can include anything from a larger store of previous knowledge to draw relationships from, to a better physical environment in which to study. The often large correlations between different factors are part of what makes research in social sciences both difficult and important. But we see there's a difference, which means there's also potential for improved outcomes.
Online learning isn't, in many ways, a radical departure from traditional education: we can see how the traditional issues of socio-economic context, educational history, and effort continue to play the roles they always have. However, the increased legibility of the online process, and the enormous flexibility it offers for interventions and experiments, make it not just a powerful teaching mechanism on its own, but also a tool to help us understand and improve learning in general.