Chaos, mass protests and resignations. What we can learn about using algorithms from the UK’s diabolical student exam grading

On the morning of 13th August, as has been for generations, British students with eager anticipation opened envelopes and checked emails to find out their A-level results (school leaver exams). These exams have a huge impact on what the students do next. Universities make offers to students based on their predicted grades and on the condition they achieve them.

Unlike generations past, they didn’t even sit their exams due to the COVID-19 lockdown. Instead, the UK government decided to base results on teachers’ predictions and statistical modeling. Teachers were asked to predict grades they thought pupils would achieve on their exams based on homework assignments and practice exams and ranked the students vs others within the school.

This year’s results day was a complete mess. 40%, equating to around 200,000 results, of teacher’s based predicted grades were downgraded leading to uproar amongst students, schools and universities. Universities didn’t know how to respond to students not meeting their conditional offers, while hundreds of students and schools protested in Parliament Square and outside the Department of Education, burning their exam results amidst the uncertainty.

Even worse, the results were mired in inequality and lack of social mobility. There was a disproportionate effect on students from disadvantaged backgrounds with a significant jump of A grades for private schools and a more modest rise for public schools such as further education colleges. Around 85% of students from the poorest households were predicted to get a C or above by their teachers but that fell by over 10% under the new moderation process whereas students from the wealthiest families only dropped by 8%. Also students in the North of England saw lowest increases in A grades whereas the richer South East and London saw larger increases.

The UK’s Office of Qualifications and Examinations Regulations (Ofqual) initially defended their position and insisted that students could appeal if their grade seemed unfair. However, after heaps of public pressure, the government reversed their decision and awarded grades based on teachers’ predicted grades.

So how did a “mutant algorithm”, as described by Prime Minister Boris Johnson, get deployed to determine a generation of British students’ academic futures and job prospects?

From working in tech, I’m fully aware that algorithms and statistical modeling are everywhere. From social media to car insurance to even recommendations of what show to watch; so many things that we use today are powered by statistical modeling and extrapolation. With the development of this algorithm, however, there were plenty of mistakes that were made along the way that could have easily been fixed by learning from industry best practices.

Did we even need an algorithm?

Back in March, the Exams regulator, Ofqual, advised the government to either hold socially-distanced exams or delay them until the peak of the virus passed or use some form of teach & school-assessed certificates. It’s least favourable option was to combine school-based assessments with a statistical model to grade students. Ofqual highlighted that these predictions could cause “widespread dissatisfaction”.

An algorithm with a bad set of instructions

An algorithm is simply a set of instructions based on certain inputs. The algorithm’s job was to generate exam results that on a national level were on average similar to results in the previous years. The Department of Education didn’t want to inflate students’ grades based on teacher’s predictions so they used other inputs such as historic school performance, students’ prior exam attainment (16+ exams) and practice exam results. These inputs, however, were deeply flawed and baked in inequality from the start. For example, historic school performance biased students from deprived schools due to their lack of funding vs more affluent schools. Students prior exam results (GCSEs) from two to three years back were considered even though these curriculums were very unrelated. Lastly, not all schools take practice exams, nor do they collect data centrally for their own students. This led to a very biased set of results based on unfair inputs. High-performing students at underperforming schools were lost in the statistics and average students at better schools were given more leeway. It was a clear reflection on the state of the bias of the UK’s school system.

Confusion without communication

The silence from the government after results were shared to students was deafening. After the initial results were published on 13th August, the Prime Minister and Education minister insisted that students had been awarded a “robust set of grades” and if students were unhappy, they could appeal. On the 15th August, England’s exam regulator, hours after posting their guidance on the appeals process, removed its advice. Finally, on the 17th August, the government ditched the algorithm and reverted to grades that were predicted by teachers. Throughout the whole process from as early as the cancelling of the exams in mid-March, there’s been minimal communication on how results would be determined. During this stressful week, when students were meant to have secured university places and sealed job prospects, they faced mind numbing levels of anxiety and confusion caused by uncertainty amidst a global pandemic.

Learn from others and work with the experts

If England’s education ministry wanted to have a preview of how things might shake out, it was right in front of them. Two weeks prior, there was outrage in Scotland caused by an algorithm that downgraded approximately 25% of grades. The figures also showed students from deprived backgrounds were more likely to have their scores downgraded. The Scottish government were forced to reverse their decision and lean on teachers’ predicted grades. England could’ve seen it coming! In Germany, students took tests in spaces where they could be kept at a suitable distance, while in Italy, students were asked to take oral exams rather than written tests. In France, students were awarded an average grade based on their previous performance in the first two terms of the school year. There were plenty of countries that the UK government could learn from that delivered simple, secure and fair solutions.

The government even refused the distinguished Royal Society of Statistics’s offer for advice. After the society objected to the confidentiality agreement, they didn’t receive a response about their concerns. I’m sure the Department of Education would have found expert advice helpful given the complexity of the work and the severely problematical risks of putting a generations’ future in the hands of an algorithm.

As algorithms are applied to more diverse use cases and by entities that don’t traditionally have statistical expertise, there are important steps that they can take and apply best practices. They should be fully aware of the ethical ramifications that these algorithms can cause. Google has plans to launch an AI ethics service to advise institutions on catching mistakes that they’ve previously fallen into. Ideally, these expertise can be developed by a council of public sector and academic experts that can provide guidance to these entities on developing algorithms so we can avoid chaos, mass protests and resignations from a “mutant algorithm”.

Former Product Manager @Google. Worked in 3 different continents across lots of different product. Former Co-founder @Luna

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store