RA2-DREAM Challenge Demonstrates Successful Crowdsourcing Approach in Developing Joint Damage Algorithms

Machine learning models collected using a crowdsourcing approach that provide feasible, rapid, and accurate methods for quantifying joint damage in rheumatoid arthritis (RA) could potentially be incorporated into electronic health records.

The Rheumatoid Arthritis 2–Dialogue for Reverse Engineering Assessment and Methods (RA2-DREAM Challenge) used a crowdsourcing approach to successfully develop machine learning models that provide feasible, rapid, and accurate methods of quantifying joint damage in the PR.

“These results suggest that after being refined and validated with larger cohorts, these algorithms alone or in combination could be incorporated into electronic health records, contributing to more informed and accurate management of RA,” the authors said. of the study.

The study was published in Open JAMA Network.

The RA2-DREAM challenge used 674 x-ray or x-ray sets of the hands, wrists, and feet, as well as expert-curated Sharp-van der Heijde (SvH) scores from 2 clinical studies for training (367 sets), the classification (119 sets), and final evaluation (188 sets).

The challenge included 3 sub-challenges, tasking participants with developing methods to automatically quantify global damage (sub-challenge 1), joint space narrowing (sub-challenge 2) and erosions (sub-challenge 3) .

A total of 173 submissions from 26 participants or teams from 7 countries were entered, and 13 submissions were included in the final evaluation. These submissions came from experts in biomedicine, computer science and engineering.

The performance and reproducibility of each model was assessed by comparing each joint’s submission scores to the ground truth SvH scores using a patient-weighted root mean square error (RMSE) approach.

According to the authors, the weighted RMSE ratings showed that the winning algorithms produced scores very close to the expert-curated SvH scores.

“Although there is inherent complexity and variability in patient images, the most successful algorithms achieved relatively high accuracy and were reproducible,” they said.

Two major observations which meet the author’s expectations have also been made.

First, the scoring of the metacarpophalangeal and proximal interphalangeal joints of the hands and forefoot was more accurate than the scoring of those of the wrist.

“This can probably be explained by the anatomical complexity of the joints of the 8 carpal bones,” note the authors, adding that “the posteroanterior images lead to difficulties in visualizing all the components of the joints.”

Additionally, joint space narrowing scores were more concordant with SvH scores compared to joint erosion scores. This may be because joint space narrowing is a more direct measure of distance, while the measure of joint erosion depends on bone morphological characteristics and bone breakage.

Most of the submitted methods used deep learning-based approaches, which reflects a trend in research on image analysis and the replicability of pre-trained models such as DenseNet, ResNet, and U-Net.

Models that have been refined and deployed, optimized and validated in real-world studies could eventually be adopted.

“The results of this RA2-DREAM challenge prognostic study suggest that an international, award-winning, participatory collaboration could create robust and reproducible algorithms for interpreting radiographic images of bones and joints,” the authors concluded. “Such algorithms have great potential to improve outcomes in patients with RA and other chronic forms of arthritis.”


Sun D, ​​Nguyen TM, Allaway RJ, et al. A crowdsourced approach to developing machine learning models to quantify radiographic joint damage in rheumatoid arthritis. JAMA Netw Open. 2022;5(8):e2227423. doi:10.1001/jamanetworkopen.2022.27423

Sharon D. Cole