This is a grave new world of assessment. In my post yesterday I referenced via an article in Schools Week the assessment company No More Marking [a ruse of a title if I have ever heard one] and as is so often the case with me, such has rankled my sensibilities about teaching, learning and assessment.
There are two things I want to quote from the No More Marking website for any interested readers to check out and judge for yourselves [I think you are entitled to judge for yourself, without access to a graph/matrix/model….].
The first, which I found mildly amusing, is their ‘Colours Test Demo’, here, which is meant to prove the hypothesis as follows [and quoted in yesterday’s posting]:
Marking does not work when it involves any degree of human judgement. This is due to a simple principle.
“There is no absolute judgment. All judgments are comparisons of one thing with another”. (Human Judgment: The Eye of the Beholder by Donald Laming, p.9).
I can confirm that when I completed the demo I had been unable to retain the information needed to make the ‘correct’ judgement about the sequence of colours. I am flummoxed by how this relates to and proves that I cannot effectively compare and comment on writing from across a range of writing? With 30 years of human teaching experience I feel I have an expertise to do so [accepting there will be variations in aesthetic appeal/expectation – what makes writing, especially creative, what it is in its infinite variety] and by extrapolation I reckon that if I had experienced the sequence of squared colours used for 30 years I would then be able to sequence them precisely as originally sequenced.
Second, and to leave readers with, is the following. On the one hand, in not understanding this I could just be hugely out of my comfort zone in not comprehending the mathematics/statistics of it all [and of course I am!]; on the other, it could just be totally ridiculous, an emperor’s new clothes of assessment gobbledygook that sums up its meaninglessness to me as a human English teacher in its meaningless to me as a human English teacher. I will of course be making a found poem out of this stuff:
Following a series of pairwise judgements we can establish a measurement scale using a statistical model. The most commonly used model is the Bradley Terry model (Hunter, 2004) which predicts the outcome from any comparison. The statistical model enables us to build a measurement scale without having to make all the possible pairwise comparisons that would otherwise be required.
The measurement scale that results from a CJ study has some powerful characteristics. The Bradley Terry model is algebraically equivalent to the Rasch model (Rasch, 1960), so the measurement scale shares the advantages of a Rasch measurement scale. The scale is linear, robust to missing data, has estimates of precision, detects misfit, and the parameters of the objects being measured can be separated from the measurement instrument being used.
A CJ scale can therefore be examined in terms of its reliability and consistency: a high value of reliability would suggest we could replicate the scale. The linear scale means that CJ studies can be anchored together using a sub-set of common items, which can be useful, for example, in measuring progress over time. Misfit to the model can be detected both for objects being measured and for the judges doing the measurement. An object may misfit if there is no consensus amongst judges over the quality of the object. A judge may misfit if their judgements are not consistent with the overall measurement scale. Misfit is useful in understanding the traits under consideration and the interactions between judges and the traits (Pollitt, 2012).
© No More Marking [colour chart and quoted sections] – https://www.nomoremarking.com/aboutcj