Human vs. Computer: Who Can Identify Carious Lesions More Accurately?

Pearl's founders - Cambron Carter (CTO), Dr. Kyle Stanley (CCO), Ophir Tanz (CEO) - review an intraoral scan marked by Pearl's AI platform

Sponsored Content

Artificial Intelligence is rapidly revolutionizing the world around us, as driverless cars weave their way through traffic, computer speech becomes indistinguishable from human, and robotic players defeat world champions at complex games like chess and Go. Medical applications of computer vision have been especially surprising, with computer analyses of medical images, such as chest X-rays, equaling or surpassing the sensitivity and accuracy of experienced human clinicians. The powerful tandem of computer vision and artificial intelligence is now moving into the field of dentistry, scanning dental X-rays with the accuracy of a skilled clinician at hundreds of times the speed.

To ascertain whether parity between human and computer diagnostic abilities has really been achieved, Pearl conducted a pilot study comparing the performance of three experienced dental radiograph readers with a computer vision/machine learning (CV/ML) system for identifying caries. The human and digital analysts annotated a sample of more than 10,000 dental X-rays, scoring them for the presence or absence of caries. The study compared levels of diagnostic agreement and disagreement among the human readers, singly and in combination, and compared those with the computational results from the CV/ML system.

The results showed that the CV/ML system was better at predicting the existence of caries on the basis of radiographic images than the human readers. The superiority of the CV/ML system against human experts in this study echoes what has been shown in artificial intelligence-based diagnostic implementations across other medical fields. The study suggests that CV/ML systems can supplement the work of a dentist, both by pre-screening images and identifying suspect areas, and by providing a second opinion that is demonstrably more reliable than one from a human clinician. A CV/ML system, working in tandem with the human practitioner, can improve diagnostic accuracy, reduce costs by enhancing early detection, increase patient confidence in diagnoses, reduce liability exposure, and improve long-term outcomes – a win-win for patients and dentists alike.

Three dental clinicians with professional experience were asked to highlight carious lesions in 10,617 radiographs, composed of 4,147 bitewings and 6,440 periapicals. The online annotation tool used, created by Pearl, Inc. (West Hollywood, CA), allowed them to identify suspected caries with a rectangular bounding box and to categorize them as Enamel only, Into dentin, and Approaching or into pulp. (Note that the images used in this study were not among those used in the training or validation of the CV/ML system.)

Table 1: Number of caries-positive images identified by each of the three clinicians.

Clinicians Caries Exist %
C1 1,644 15.9%
C2 783 8.6%
C3 514 5.3%

Performance metrics of human readers
The table below presents the percentage of images in which pairs of readers agreed that caries existed.

Clinician %
C1-C2 7.2%
C2-C3 4.3%
C3-C1 5.2%

Table 2: Agreement between clinicians on the existence of carious lesions at the image level, i.e. this image does contain at least one carious lesion.
The percentages are consistent with the statistically probable incidence of caries in any large random data set. We estimate the expected frequency of caries-positive bitewings and periapicals to be between 2-12%1,2. The relatively large difference between the pair of readers with the highest level of agreement and the pair with the lowest presumably reflects the difficulty of positively identifying a carious lesion in borderline cases.

The next table shows the converse: the level of agreement that no caries are present.

Clinician %
C1-C2 79.6%
C2-C3 89.5%
C3-C1 79.5%

Table 3: Agreement between clinicians on the non-existence of carious lesions at the image level, i.e. this image does not contain any carious lesions.
Not surprisingly, pairs of clinicians found it easier to agree when no suspicion of a lesion was present.

The pie-charts show the levels of disagreement within each pair. It is noteworthy that clinician 1 was more than twice as likely to disagree with either of the other clinicians than they were to disagree with each other.

Figure 3: Visualization of inter-clinician agreement and disagreement. (Top) C1 compared with C2. (Middle) C2 compared with C3. (Bottom) C3 compared with C1.


Figure 4: Visualization of agreement and disagreement between multiple annotators


Performance metrics of CV/ML system
Finally, the performance of a CV/ML system trained to analyze dental radiographs was benchmarked against the human clinicians, singly and in combination, in determining whether or not an image contained a carie. The test consisted of several iterations, in each of which the judgment of one human clinician was assumed to be correct, and the other humans and the computer were scored on their degree of agreement with this assumed ground truth (GT). The metric used for calculating accuracy is the area under the (AUC) Receiver Operating Characteristic curve (ROC). ROC curves were normalized to the unit square in order to allow comparisons to be made between the absolute yes/no judgments of human readers and the fractional confidence levels generated by the CV/ML system. The results are shown below in Table 7.

C1 C2 C3 CV
GT-C1 X 0.684 0.627 0.810
GT-C2 0.846 X 0.734 0.850
GT-C3 0.862 0.844 X 0.880

Table 4: Results of three clinicians and one computer vision model tested while holding each of the three humans as ground truth. The rows represent a different clinician being used as ground truth and the columns represent a different clinician and CV/ML being used as predictors.

An observation worth noting is the difference in the comparison of any two readers when one is held as ground truth and the other as the predictor. For example, Clinician 1 predicts with an AUC of 84.6% against Clinician 2 as ground truth. In the inverse situation, Clinician 2 only predicts with an AUC of 68.4%. The lack of symmetry is due to the effect of class imbalance on the ROC paradigm. If the likelihood of positives greatly outweighs the likelihood of negatives the penalty for a Type II error (false negative) will have a lesser impact on the AUC. Considering that bitewing and periapical radiographs naturally occur with a likelihood of being caries-negative, i.e. healthy, the penalty for Type II errors has a higher impact on the AUC. The consequence of this is that clinicians with an affinity for sensitivity will generally outperform those with an affinity for specificity. As can be seen in Figure 4 (venn diagram), C1 > C2 > C3 with respect to sensitivity.

C1 C2 C3 CV
GT-C1-C2 X X 0.780 0.906
GT-C2-C3 0.912 X X 0.928
GT-C3-C1 X 0.884 X  0.902

Table 5: Results when holding reader-pairs (unanimous) as ground truth and measuring the performance of the remaining clinician and computer vision model as predictors.

As dental technology progresses, diagnosis, treatment planning, and clinical practice can be expected to evolve and improve. At present, however, a widespread problem in the industry, from the point of view of the patient, is the lack of diagnostic unanimity among clinicians, which may confuse patients and generate mistrust.

The results of this study indicate that three dentists will disagree on the existence of caries for approximately 17% of bitewing and periapical radiographs, while two clinicians are more likely to agree. The CV/ML tool is superior to clinicians in predicting the existence of caries on the basis of radiographic images. This holds both for situations in which a single clinician is used for ground truth and in comparisons of CV/ML predictions with the intersection between two clinicians’ annotations.

Takeaways for the industry:

  • Typically, practices lose a significant number of their new patients every year and struggle to replace them. Among contributors to this dynamic is the lack of trust in diagnoses that are perceived as expensive and concern by patients that they are being “sold” and not treated. The lack of diagnostic consensus among dentists validates the patient’s initial reaction to try another dentist, because they will often receive a different or more favorable diagnosis.
  • Because CV/ML is not only more accurate but also more consistent, it holds the promise of defining a new standard of care that will improve practice economics and increase the overall health of the patient population by increasing trust in the diagnostic process.
  • Dental practitioners will limit their liability exposure by incorporating a second opinion into their diagnosis and treatment planning sessions.
  • The knock-on effects will also improve systemic health by increasing the overall patient population in treatment.


  1. World Health Organization. Factsheet 355, Noncommunicable Diseases. Updated June 2017. Accessed online July 2017:
  1. World Health Organization. Projections of mortality and causes of death, 2015 and 2030. Online database ‘WHO Regions’ accessed 28 October 2016:

Contact Pearl To Learn More

Pearl’s founders, Ophir Tanz and Dr. Kyle Stanley, recently joined the Group Dentistry Now podcast to discuss how Pearl is revolutionizing the dental industry. Dental groups and DSOs with the best equipment, data and analysis have the edge, but many dentists are still analyzing x-rays manually. Pearl’s AI technology identifies dental pathologies diagnosed in x-rays with three times the accuracy of the average dentist. Listen to this podcast to learn more about the future of technology enhanced dental care, including Pearl’s new “Second Opinion” technology. The two guests also give their opinions about the future of AI in the dental industry.

Looking for a Job? Looking to Fill a Job? can help:Subscribe for free to the most-read and respected
resource for DSO analysis, news & events:Read what our subscribers & advertisers think of us: