Failure, The Lollipop Forth of Death, Clinical AI and Healthcare

.png)
I have dismantled our Lollipop Fort of Death. For 12 years, this shack, balancing 5 feet in the air on top of a single post, was a wobbly fixture of our backyard. I built it after seeing an instructional music video. It’s a tree-house alternative for when there is a lack of suitable trees.
The fort’s signs of physical failure were obvious. A portion of the roof had blown off, the siding was peeling in places, and the floor was spongy in ways that wood should not be. Usage metrics were way down, too. At launch, it was at capacity for games and sleep overs. Over the last 12 months, the cat was the only family member to spend any time inside the fort.
Because the signs of failure were obvious, and usage was low ,I was able to take it out of service before there was a catastrophic failure—collapse with someone inside or underneath. Wood, nails, and screws are well-established and well-understood technology. We recognize the signs of deterioration and the deterioration is incremental. One board breaking doesn’t cause the whole house to fall.
A risk with novel technology is that signs of deterioration are not always readily apparent. A user’s first indication of deterioration may be by way of a catastrophic event. This was a concern in the early days of carbon mountain bike frames. Riders knew how to inspect alloy frames for dents, bends and cracking welds. These indicators of impending failure in the legacy technology could not be directly applied to the new technology. Carbon frames, the rumor went, could accumulate stress in ways that weren’t obvious to the human eye. You wouldn’t know it was brittle until it came apart under you.
Clinical AI applications, like the Lollipop Fort of Death, will experience performance degradation. (Vela,D., Sharp, A., Zhang, R. et al. Temporal quality degradation in AI models is a seminal article on the topic; Sahiner,B et al. Data drive in medical machine learning is a very approachable overview) Like early carbon bike frames, the signs of degradation may be impossible to detect through the techniques we used for older solutions. This is especially true as we bring online clinical AI applications that go beyond automating human skills and are used to make inferences that a human cannot.
Furthermore, because this is healthcare, we need to be concerned about the welfare of each individual on whom the tool is used. 98%accuracy is great unless you are one of the 2% with inaccurate results. A gradual decay in overall performance can be a catastrophic failure on the individual level.
What does this mean for healthcare providers? Legacy quality control and IT governance procedures are suboptimal for clinical AI. The causes, signs, and patterns of clinical AI performance degradation are novel and may not be effectively or efficiently captured by legacy systems.
What does this mean for innovators entering clinical AI? Your customers are increasingly aware that clinical AI performance degrades and needs to be monitored. So are regulators. Both will want to know about your post-market surveillance tools and your pre-determined change control plan. Sales needs to be able to explain the tools you have in place for detecting localized performance degradation. Customers may also ask how you determine if the drift is due to a change in the underlying population or in the data collection method.
Does your team need additional bandwidth or expertise to address the issues raised by this article? Asher Orion Group, using tools developed by Asher Informatics, can help.