Your complimentary articles
You’ve read one of your four complimentary articles for this month.
You can read four articles free per month. To have complete access to the thousands of philosophy articles on this site, please
Uncertainty Made Measurable
Rob Selzer sizes up a human confidence interval.
You, like most people, would probably prefer a doctor who’s competent. Perhaps one, I would imagine, who’s even good. One who has the skills to take a blood pressure measurement, diagnose a tumour, smile reassuringly when your self-diagnosed angina turns out to be simple indigestion. And like most people, you instantly know a good doctor when you see one. Measuring those doctoring skills, however, is harder than you think. It requires swarms of experts to be called into service. These experts assess future doctors to within an inch of their lives as the students journey from school uniform to white coat (although quite frankly, the last time I saw someone wearing a white coat, they were applying make-up in a department store).
I am one of those examiners. But something about the whole examination process has always bothered me. It’s not the measurement of the students’ knowledge; about that, I’m confident. Written examinations such as multiple choice questions give a fairly accurate gauge of a memory for facts – for details such as which antibiotics to prescribe for pneumonia, or the ECG changes in a heart attack. Likewise, discrete skills such as listening to heart sounds or tapping out a knee reflex are easily measured because they are objective, circumscribed, and have clear outcomes.
What bugs me, is the way that we experts measure complex skills used in messier real life situations, like taking a report of an illness from a distraught parent, or doing a physical exam on a frail pensioner. Such contexts are teeming with multiple, minute, but important two-way interactions, at the end of which an examiner is expected to give a grade.
Let’s say for example that I’m marking a candidate on their skill in interviewing a patient with major depression, with possible scores between 1 to 10, 5 being a pass. I give Prakash a 5 and I’m confident he deserves that mark. Now it’s Mary’s turn. She speaks so softly I can hardly hear what she says and, when I can hear her, I’m not entirely sure what her questions are inquiring about. For instance, she asked if the patient still works – which could either be an enquiry about: a. Anhedonia (loss of enjoyment of activities, a symptom of depression); b. Whether the patient was financially secure, or; c. If anxiety was interfering with their day-to-day activities, such as work. I can’t determine which – which makes her appear disorganized. She comes across as smart and caring, but there were also important questions she didn’t ask, such as about the patient’s history of thyroid disease. Maybe she was just nervous, I tell myself. I give her a 5 – a pass – but in my mind she could be anywhere from a 4 to a 6.
Prakash gets a 5. Mary gets a 5. And that’s what gets recorded on their official mark sheet. Nowhere, though, is there any mention of my confidence in that mark.
Uncertain Thinking Rose de Castellane 2023
Image © Rose De Castellane 2023
The Scale of the Problem
Why is this important? Well, let me ask you, would you buy a set of bathroom scales that were accurate to within 0.1 kg? Yes? But what if they were accurate to within 1 kilogram? What if they were only accurate to within 5 kilograms?
Each of these scales has a different confidence interval. This is a concept familiar to anyone who spends time with their nose in scientific journals. For the more accurate scales, the confidence interval is narrow: plus or minus 0.1kg on either side of the weight it displays. If it displays a reading of 75.5kg, then you can be confident your weight lies somewhere between 75.4 to 75.6kg. But with the last set of scales, the confidence interval is a gigantic 5kg either side of the reading. If it reads 75.5kg, it means your weight could be anywhere between 70.5 and 80.5kg – a huge margin! Let’s face it, you wouldn’t buy those scales even if they were on special offer.
All measuring devices, from bathroom scales to thermometers, and even some statistics, come with a confidence interval, which is often stated. Humans too, have confidence intervals, but only when we’re making informal measurements. For example, a friend asks me, “How many people were at the concert last night?” I reply, “Five or six hundred.” Another example: It takes me “about twenty minutes to walk home, give or take a few minutes.” Us mortals do this sort of approximation all the time, because we recognize that we, as flesh and blood instruments, are very imprecise. We allow for a wide confidence interval – often even wider than that dodgy set of scales.
Circling back to examinations: exam markers are never asked to state their confidence interval. And yet, taking a measurement of a student’s complex performance is arguably much less precise than estimating how many people were at the concert last night or how long it takes me to walk home. Nonetheless, I, as an examiner, am not required to declare my confidence in my mark.
Weird, huh? In an informal situation where the stakes are low, we give it, or at least infer it, frequently. But when it comes to high stakes situations, where for instance it can mean the difference between someone becoming a doctor or not, we ignore it altogether. If examiners were bathroom scales, no one would buy us because purchasers would have no idea how precise or not we really are.
We can take some comfort from the fact that students do not become doctors, or fail to become doctors, on the basis of just one exam. They take many, many tests, and are assessed by legions of experts over many years of training. There are also statistical methods that can approximate the accuracy of a particular examination – but such blunt arithmetic methods do not take into account individual examiners’ confidence in their own marks.
Vote of Confidence
During a sabbatical, a colleague and I were pondering this very conundrum, and we came up with a novel idea: examiners could give their usual point score (in Mary’s case, a 5) and a confidence interval (in my mind hers would be 4 to 6). If the confidence interval crosses into the fail zone (less than 5, like Mary’s), the student wouldn’t fail, but would be re-examined – because the exam, not the student, had failed to perform adequately.
We were very proud of ourselves for having this idea, but I soon finished my sabbatical and then left that academic position. Nonetheless, a human confidence interval became a lens through which I began to view how we mere mortals, the least precise of all measuring devices, might incorporate uncertainty into our decision-making.
This notion of a human confidence interval extends much further than the world of education. It could come into play any time an individual is asked to make an assessment about almost anything using their wits alone. Think about it. When was the last time you had to make an appraisal of something important? Perhaps it was, “Should the person I just interviewed get the job?” or, “Would Rishi Sunak (or Joe Biden, etc) make a good leader of the country?” I reckon you weren’t 100% sure about any of those decisions (although I reserve the right to be wrong there about the political question).
You might ask, but what difference would it make? Beyond the stuffy world of academic assessments, of what practical use is a human confidence interval?
Take something important, like an election. Traditionally, you would put an X on the ballot paper next to your preferred candidate. Come tally time, the candidate with the most Xs wins. Seems fair, no? But what would happen if you included an estimate of how confident you were of your vote? Not necessarily giving a confidence interval per se, but a simple level of confidence in your decision. For example, you vote for Jayashri, and you are 100% sure she would make an excellent MP (or Senator, etc). I vote for John, but mainly because he looks good in the posters; honestly, I’m only about 50% sure he might be any good as a politician. In this election, in addition to our usual Xs, we would also write our approximate percentage confidence alongside them. Then, when all the votes were counted, John wins on the absolute number of votes, but – and here’s the kicker – when each vote was multiplied by its confidence, and the results then all added up, Jayashri won!
To break it down: John got 100,000 votes, Jayashri got 80,000, but the people who voted for Jayashri were much more certain of her capabilities than those who voted for John. The average confidence for John was 70%; for Jayashri it was 90%. Thus, John gets 70% x 100,000 = 70,000 confidence-adjusted votes, while Jayashri gets 90% x 80,000 = 72,000. She wins by 2,000.
This methodology takes into account the individual’s subjective level of certainty – and if I’m honest, I’ve never been 100% sure of any candidate I’ve voted for in any election, nor have any of my non-party-affiliated friends. Such a method is also simple, practicable and, most importantly, a truer reflection of our thoughts and beliefs.
Whether it’s deciding which students should pass an exam, picking a talent contest winner, or electing a political representative, incorporating a measure of subjective uncertainty is an advance on traditional thinking concerning the decision-making. It treats decision-making less as a binary, yes/no outcome, and more as a nuanced process. No matter if it is via a confidence interval, a simple confidence percentage, or indeed some other measure of subjective uncertainty, acknowledging our uncertainty is a more accurate representation of our internal states of mind.
If we require the measuring devices we make to declare their level of accuracy, why would we not expect the same of ourselves? It is, after all, a more honest expression of the implicit fallibility of what it is to be human. And that, for me, is a quality I really appreciate in my doctor.
© Rob Selzer 2023
Rob Selzer is an adjunct associate professor at Monash University, and a psychiatrist at Alfred Health, Melbourne.