Automated identification of hip arthroplasty implants using artificial intelligence

Study design and radiograph acquisition

After institutional review board approval, we retrospectively collected all radiographs taken between June 1, 2011 and Dec 1, 2020 at one university hospital. The images are collected by Neusoft PACS/RIS Version 5.5 on a personal computer running Windows 10. We confirm that all methods were performed in accordance with the relevant guidelines and regulations. Images were collected from surgeries performed by 3 fellowship-trained arthroplasty surgeons to ensure a variety of implant manufacturers and implant designs. At the time of collection, images had all identifying information removed and were thus de-identified. Implant type was identified through the primary surgery operative note and crosschecked with implant sheets. Implant designs were only included in our analysis if more than 30 images per model were identified14.

From the medical records of 313 patients, a total of 357 images were included in this analysis.

Although Zimmer and Biomet merged (Zimmer Biomet), these were treated as two distinct manufacturers. The following 4 designs from the four industry leading manufacturers were included: Biomet Echo Bi-Metric (Zimmer Biomet), Biomet Universal RingLoc (Zimmer Biomet), Depuy Corail (Depuy Synthes), Depuy Pinnacle (Depuy Synthes), LINK Lubinus SP II, LINK Vario cup, and Zimmer Versys FMT and Trilogy (Zimmer Biomet). Implant designs that did not meet the 30-implant threshold were not included. figure 1 demonstrated an example of Cup and Stem anterior–posterior (AP) radiographs of each included implant design. The four types of implants are denoted as type A, type B, type C, and type D respectively in this paper.

Figure 1
figure 1

Demonstrated an example of cup and stem radiographs of each included implant design.

Overview of framework

We used convolutional neural network-based (CNN) algorithms for classification of hip implants. Our training data consists of images of anteroposterior (AP) view of the hips. For each image, we manually cut the image into two parts: the stem and the cup. We will train four CNN models, the first one using stem images (stem network), the second one using cup images (cup network), and the third one using the original uncut images (combined network). The fourth one is an integration of the models trained with stem network and the cup network (joint network).

Since the models involve millions of parameters, while our data set only contained less than one thousand images, it was infeasible to train a CNN model from scratch using our data. Therefore, we adopted the transfer learning framework to train our networks17. The transfer learning framework is a paradigm in the machine learning literature that is widely applied in scenarios where the training data is scarce compared to the scale of the model18. Under the transfer learning framework, the model is first initialized to some model pretrained with other data sets that contain enough data for a different purpose related task. Then, we tune the model using our data set by performing gradient descent (backward-propagation) only on the last two layers of the networks. As the parameters in the last two layers of the network are comparable with the size of our data set (for the target task), and the parameters in the previous layers have been tuned from the pre-trained model, the resulting network model can have satisfactory performance on the target task.

In our case, our CNN models we used are based on the established ResNet50 network pre-trained on the ImageNet data set19. The target task and our training data sets correspond to the images of the AP views of the hips (stem, cup, and combined).

figure 2 demonstrates the overview of the framework of our deep learning-based method.

Figure 2
figure 2

Overview of the framework of our deep learning-based method.

Data set

Our dataset contained 714 images from 4 different kinds of implants.

Image pre-processing

We followed standard procedures to pre-process our training data so that it could work with a network trained on ImageNet. We rescaled each image to a size of 224*224 and normalized it according to ImageNet standards. We also performed data augmentation, ie, random rotation, horizontal flips, etc., to increase the amount of training data and make our algorithm robust to the orientation of the images.

partition data set

We first divided the set of patients into three groups of sizes ~ 60% (group 1), ~ 30% (group 2), and ~ 10% (group 3). This split technique was used on a per-design basis to ensure the ratio of each implant remained constant. Next, we used the cup and stem images of patients in group 1 for training, those of patients in group 2 for validation, and those of patients in group 3 for testing. The validation set was used to compute cross-validation loss for hyper-parameter tuning and early stopping determination.

Model training

We adopted the adaptive gradient method ADAM20 to train our models. Based on the cross-validation loss, we chose the hyper-parameters for ADAM as (learning rate (mathrm{alpha }) = 0.001, ({upbeta }_{1}=0.9, {beta }_{2}=0.99, epsilon ={10}^{-8},) weight_decay = 0). The maximum number of epochs was 1000 and the batch size was 16. The early stopping threshold was set to 8. During the training process of each network, the early stopping threshold was hit after around 50 epochs. As we mentioned above, we trained four networks in total.

The first network is trained with the stem images, the second with the cup images. The third network is trained with the original uncut images, which is one way we propose to combine the power of stem images and cup images. We further integrate the first and the second network as an alternative way of jointly utilizing stem and cup images. The integration was done via the following logistical-regression based method. We collected the outputs of the stem network and the cup network (both are of the form of a 4-dimensional vector, with each element corresponding to the classification weight the network gives to the category of implants), and then fed them as the input to a two-layer feed-forward neural network, and trained the network with the data from the validation set. The integration is similar to a weighted-voting procedure among the outputs of the stem network and the cup network, with the weighting votes computed through the validation data set. Note that the above construction related on our dataset division procedure, where the training set, validation set, and testing set, each contained the stem and cup images of the same set of patients. We referred to the resulting network constructed from the outputs of stem network and cup network as the “joint network”.

Model testing

We tested our models (stem, cup, Joint) using the testing set. The prediction result on each testing image was a 4-dimensional vector, with each coordinate representing the classification confidence of the corresponding category of implants.

Statistical analysis

Since we were studying a multi-class classification problem, we would directly present the confusion matrices of our methods on the testing data, and compute the operation characteristics generalized for multi-class classification.

Ethics review committee

The institutional review board approved the study with a waiver of informed consent because all images were anonymized before the time of the study.


Please enter your comment!
Please enter your name here