A Deep-Learning–Based Partial-Volume Correction Method for Quantitative 177Lu SPECT/CT Imaging

Julian Leube; Johan Gustafsson; Michael Lassmann; Maikol Salas-Ramirez; Johannes Tran-Gia

doi:10.2967/jnumed.123.266889

Visual Abstract

Abstract

With the development of new radiopharmaceutical therapies, quantitative SPECT/CT has progressively emerged as a crucial tool for dosimetry. One major obstacle of SPECT is its poor resolution, which results in blurring of the activity distribution. Especially for small objects, this so-called partial-volume effect limits the accuracy of activity quantification. Numerous methods for partial-volume correction (PVC) have been proposed, but most methods have the disadvantage of assuming a spatially invariant resolution of the imaging system, which does not hold for SPECT. Furthermore, most methods require a segmentation based on anatomic information. Methods: We introduce DL-PVC, a methodology for PVC of ¹⁷⁷Lu SPECT/CT imaging using deep learning (DL). Training was based on a dataset of 10,000 random activity distributions placed in extended cardiac–torso body phantoms. Realistic SPECT acquisitions were created using the SIMIND Monte Carlo simulation program. SPECT reconstructions without and with resolution modeling were performed using the CASToR and STIR reconstruction software, respectively. The pairs of ground-truth activity distributions and simulated SPECT images were used for training various U-Nets. Quantitative analysis of the performance of these U-Nets was based on metrics such as the structural similarity index measure or normalized root-mean-square error, but also on volume activity accuracy, a new metric that describes the fraction of voxels in which the determined activity concentration deviates from the true activity concentration by less than a certain margin. On the basis of this analysis, the optimal parameters for normalization, input size, and network architecture were identified. Results: Our simulation-based analysis revealed that DL-PVC (0.95/7.8%/35.8% for structural similarity index measure/normalized root-mean-square error/volume activity accuracy) outperforms SPECT without PVC (0.89/10.4%/12.1%) and after iterative Yang PVC (0.94/8.6%/15.1%). Additionally, we validated DL-PVC on ¹⁷⁷Lu SPECT/CT measurements of 3-dimensionally printed phantoms of different geometries. Although DL-PVC showed activity recovery similar to that of the iterative Yang method, no segmentation was required. In addition, DL-PVC was able to correct other image artifacts such as Gibbs ringing, making it clearly superior at the voxel level. Conclusion: In this work, we demonstrate the added value of DL-PVC for quantitative ¹⁷⁷Lu SPECT/CT. Our analysis validates the functionality of DL-PVC and paves the way for future deployment on clinical image data.

Quantitative SPECT/CT has become the method of choice to spatially resolve activity distributions for the dosimetry of radiopharmaceutical therapies. One of the most important radionuclides used today is ¹⁷⁷Lu (1,2). Mainly because of its nonperfect collimation and the resulting relatively poor spatial resolution (1–2 cm for ¹⁷⁷Lu and medium-energy collimation (3)), ¹⁷⁷Lu SPECT imaging reaches its limitations for small structures such as lesions or small organs (4). When activity quantification is based on volumes of interest, poor spatial resolution leads to spatial allocation uncertainty, which is referred to as partial-volume effect. For many imaging modalities, the acquired activity distribution can be described in good approximation by a convolution of the true activity distribution with the point-spread function of the imaging system. Since this approximation holds for PET, several techniques for partial-volume correction (PVC) of PET have been proposed (5). However, the fundamental problem for transferring such methodology to SPECT is that the poor spatial resolution of gamma cameras inevitably leads to information loss. Hence, purely data-driven methods, such as resolution modeling during reconstruction (resolution recovery [RR]) (6) or postreconstruction deconvolution, will never result in partial-volume effect–free activity concentration estimates. Accordingly, some form of prior information has to be supplied, as performed, for example, in the iterative Yang technique for postreconstruction PVC (5), an enhancement of the Yang method (7). Iterative Yang PVC (IY-PVC) uses prior knowledge about the spatial resolution to fold the activity back into an estimated mask of active volume. For practical implementation of PVC methods, a spatially invariant point-spread function is often assumed for the sake of simplicity, which approximately holds true for PET imaging. For SPECT, however, this may introduce substantial errors, as the spatially variant SPECT resolution cannot be well approximated by a single value (3). Furthermore, the exact distribution of radiopharmaceuticals in the structures under investigation is typically unknown and can be only roughly estimated from morphologic imaging such as CT. When the active regions cannot be properly defined on the basis of morphologic imaging, substantial errors may be introduced.

In recent years, convolutional neural networks have demonstrated their tremendous potential in medical image processing. In the field of SPECT imaging, convolutional neural networks have been used for automated segmentation (8), CT-free attenuation correction (9), acceleration of SPECT imaging (10,11), and denoising (12). In addition, deep learning (DL) techniques have recently been used for PVC (13,14). Xie et al. (13) trained a neural network to perform IY-PVC without the need for segmentation using uncorrected SPECT images as input and images corrected with IY-PVC as target. As mentioned above, however, the prior knowledge used to train the network can introduce systematic errors. Li et al. (14) proposed a DL-based enhancement of dose calculations. More specifically, they used [⁶⁸Ga]Ga-DOTATATE PET/CT patient data as ground truth to reduce the partial-volume effect in [¹⁷⁷Lu]Lu-DOTATATE SPECT/CT–based absorbed dose distributions. Although the method was shown to enhance the dose maps, it works only under the assumption that the distribution of radiopharmaceutical is comparable despite the different radiopharmaceuticals scanned at different measurement times after administration. In addition, differences in ligand amount, affinity, and internalization have not yet been sufficiently investigated, possibly leading to additional errors (15). Both studies, although demonstrating the potential of DL for PVC, suffer, like many other published implementations of DL for clinical applications, from small dataset sizes (28 and 14 patients in the work of Xie et al. (13) and Li et al. (14), respectively) and lack of ground-truth activity distributions for training.

In this work, we present DL-PVC, a methodology for PVC of ¹⁷⁷Lu SPECT/CT imaging using DL trained on a large dataset of 10,000 pairs of random patient-shaped activity distributions and associated SPECT images generated using Monte Carlo radiation transport simulations. These pairs are used as input and target for a convolutional neural network, trained to perform PVC without segmentation. For performance evaluation, we investigated the impact of different normalization (i.e., activity conservation) methods, input matrix sizes, and network architectures on the performance of DL-PVC. Subsequently, we compared our new methodology with IY-PVC as a reference method and performed a validation based on ¹⁷⁷Lu SPECT/CT measurements of 3-dimensionally printed phantoms of different geometries.

MATERIALS AND METHODS

Generation of a Dataset of Random Activity Distributions

A large database of 3-dimensional activity distributions of randomly arranged random shapes and corresponding SPECT simulations was created to train neural networks for PVC. A schematic overview of the dataset generation is given in Figure 1. First, density maps and activity masks were generated. The activity masks were then transformed into inhomogeneous activity distributions with a patientlike activity range. Next, simulations were performed in the SIMIND (SImulating Medical Imaging Nuclear Detectors) Monte Carlo simulation program (16), using these masks to obtain SPECT projections. Last, iterative reconstructions with RR (Software for Tomographic Image Reconstruction, STIR (17)) and without RR (Customizable and Advanced Software for Tomographic Reconstruction, CASToR (18)) were performed to obtain SPECT images. The approach is based on previously described work (11). A detailed description of the generation of the dataset is given in the supplemental materials (supplemental materials are available at http://jnm.snmjournals.org). In addition, the complete dataset is available at https://doi.org/10.5281/zenodo.8282567.

FIGURE 1.

Schematic overview of dataset generation used in this study. SIMIND = Simulating Medical Imaging Nuclear Detectors. STIR = Software for Tomographic Image Reconstruction; CASToR = Customizable and Advanced Software for Tomographic Reconstruction.

The most important features of the dataset are as follows:

Realistic Attenuation and Scattering Conditions

Extended cardiac–torso (XCAT) phantoms (19) were used to achieve realistic attenuation and scatter conditions. By varying the size scaling of individual organs or areas, 250 variations of 16 patients (6 female, 9 male; age, 18–76 y; body mass index, 18.6–38.0) resulted in a total of 4,000 different density maps. By defining 3 bed positions, we generated a total of 10,000 attenuation images (4,000 thoracic, 4,000 abdominal, and 2,000 head images; matrix, 256; voxel size, 2.4 mm).

Patientlike Binary Activity Masks

Patientlike binary activity masks (0, no activity; 1, activity) were created by placing random shapes (minimal and maximum shape sizes of 4 and 100 voxels, respectively), created using previously described methodology (11), inside the XCAT-based attenuation mask until a randomly selected, patient-representative target volume was reached.

Nonuniform Activity Distributions

Each activity mask was multiplied voxelwise by a spatially contiguous, nonuniform pattern (11) to create more complex, heterogeneous activity distributions. An example of the resulting target datasets used to train the neural network is shown in Figure 2.

FIGURE 2.

Example target dataset as used for neural network. XCAT phantom is shown in gray scale, whereas random activity distribution is shown in color map. At top are axial sections, and at bottom are coronal sections. From left to right, 3 bed positions are shown: head, thorax, and abdomen. Camera orbits are indicated as blue dotted lines.

Realistic Activity Distributions

To resemble ¹⁷⁷Lu SPECT patient acquisitions as closely as possible, the activity distributions were scaled on the basis of the active volumes and total activities of 717 peritherapeutic ¹⁷⁷Lu SPECT/CT acquisitions (429 [¹⁷⁷Lu]Lu-PSMA-I&T and 288 [¹⁷⁷Lu]Lu-DOTATATE SPECT/CT examinations of 202 different patients), which had been conducted at University Hospital Würzburg between January 2014 and June 2021 (waiver 20230207 04).

Monte Carlo–Based SPECT Simulations

For each of the 10,000 random activity distributions, a set of realistic SPECT projections was generated by SIMIND Monte Carlo simulations (16). The simulations were set up to replicate a ¹⁷⁷Lu SPECT acquisition on our Siemens Intevo Bold SPECT/CT system (9.5-mm crystal; medium-energy low-penetration collimator; 9% energy resolution; 120 projections of 30 s each; noncircular orbit; matrix, 128; pixel size, 4.79 mm; 20% main energy window at 208 keV; and 2 adjacent 10% scatter windows). As described previously (11), Poisson noise was added to the simulated (noise-free) projections to obtain realistic (noisy) projections for the given activities and acquisition parameters.

SPECT Reconstructions

SPECT reconstructions (voxel size, 4.8 mm) were performed for all 2 × 10,000 projection sets (noise-free and noisy) using 2 different reconstructions: CASToR, an ordered-subset expectation maximization reconstruction (10 iterations, 2 subsets, attenuation correction, scatter correction) without RR (18), and STIR, an ordered-subset expectation maximization reconstruction (6 iterations, 6 subsets, attenuation correction, scatter correction) with RR (17). Accordingly, 4 different SPECT datasets were available for training and analysis of the presented approach: CASToR (noRR) or STIR (RR) performed with noise-free (nf) or noisy (n) projections (noRR_nf/noRR_n and RR_nf/RR_n, respectively).

Evaluation of Activity Conservation

An important criterion for any PVC is that the correction preserves the total activity. Before the U-Net was applied, the input SPECT images were normalized by their maximum activity concentration to an interval of [0,1]. In this study, we investigated 2 different approaches for scaling the output of the proposed PVC. The first was rescaling the output of DL-PVC with the maximum activity concentration of the input SPECT image, and the second was normalization of the sum of all voxel values of the output of DL-PVC to the total activity of the input SPECT image.

Evaluation of Input Matrix Size

In our work, we investigated 2 kernel sizes to which the PVC method was applied. In the first, DL-PVC is directly applied to the entire field of view (FOV), in which case the entire SPECT image (matrix size, 128 × 128 × 128) and the entire ground-truth activity distribution serve as input and target, respectively. In the second, DL-PVC is applied to smaller patches (cube-shaped image sections with an edge length of 32 voxels), which are subsequently reassembled (more details can be found in the supplemental materials).

Evaluation of U-Net Architecture

A 3-dimensional U-shaped convolutional neural network (U-Net) (20) based on the fastMRI architecture (21) and implemented using the PyTorch library (22) with Adam optimizer (23) was used to perform the PVC. A more detailed explanation of the architecture is given in the supplemental materials. In addition to the standard U-Net architecture, 4 other architectures were tested: R2U-Net by Alom et al. (24); AttU-Net by Oktay et al. (25); R2AttU-Net, a combination of both methods (26); and U-Net++, a nested U-Net proposed by Zhou et al. (27). The performance of these 5 network architectures was compared on the basis of the RR_n and noRR_n datasets. PVC was performed on the entire FOV, preserving the total activity.

Evaluation Criteria for PVC Performance

Several evaluation metrics were used to evaluate the quality of the different PVC methods. Their calculation was restricted to a masked region within each test dataset in which ground-truth activity was present. Besides structural similarity index measure (SSIM) (28) and normalized root-mean-square error (NRMSE), a volume activity accuracy (VAA) was defined. It indicates the proportion of voxels in which the relative deviation in activity concentration was less than α (fixed at 5%). More information is given in the supplemental materials. In addition, the deviation between total activity before and after PVC was calculated as percentage difference. Because not all evaluation metrics were normally distributed, paired Wilcoxon tests with a significance level of 1% were chosen for the statistical analysis.

Comparison with Iterative Yang Technique

To compare the proposed DL-PVC methodology with an already-established PVC method, IY-PVC (5) was applied to all SPECT reconstructions. First, a matched filter analysis (3) was used to determine the spatial resolution for STIR (8.75 mm; applies to RR_n, RR_nf) and for CASToR (21.35 mm; applies to noRR_n, noRR_nf). Subsequently, 10 iterations of IY-PVC were performed using the PETPVC toolbox (29) with spatial resolution and ground-truth activity mask as input.

Investigation of Minimum Feature Size

To determine the minimum feature size that DL-PVC can still resolve, further simulations based on the XCAT phantom dataset were performed. For this purpose, multiple SPECT simulations of random activity distributions were performed, in which a cube with an edge length of 1–10 voxels (increment, 1 voxel) was introduced centrally into the activity distribution. Recovery coefficients (RCs) were calculated to determine how well DL-PVC recovers the activity in the different-sized cubes. More detail on the simulations and the calculation of the RCs is given in the supplemental materials.

Activity Concentration–Voxel Histograms

To illustrate differences in the distribution of activity concentrations for the SPECT simulations without PVC, after IY-PVC, and after DL-PVC, activity concentration–voxel histograms (proportion of voxels containing a given relative activity concentration plotted against the respective relative activity concentration) were created. More details can be found in the supplemental materials.

Phantom Measurement

To justify application of DL-PVC in a clinical context, we validated the methodology on increasingly patient-realistic ¹⁷⁷Lu SPECT/CT phantom measurements. For this purpose, a previously published series of ¹⁷⁷Lu SPECT/CT measurements of 3 self-designed 3-dimensional phantoms (sphere, ellipsoid, and renal cortex geometry, all with the same filling volume of 100 mL) was used (30). In addition, a 3-dimensionally printed 2-organ phantom (International Commission on Radiological Protection publication 110 [ICRP110]–based 2-compartment kidney and spleen) was analyzed to evaluate DL-PVC on a phantom that is more representative of patient data. These data had been acquired at our institution as part of the Europe-wide MRTDosimetry comparison exercise for quantitative ¹⁷⁷Lu SPECT/CT imaging (31). The acquisition parameters had been the same as the parameters chosen for the Monte Carlo simulations. On the basis of these 4 measured projection datasets, SPECT reconstructions were performed with CASToR and STIR with the same parameters as for the simulated SPECT projections. For analysis, all SPECT images were interpolated to CT resolution (matrix, 512; voxel size, 0.98 mm) using trilinear interpolation. These were compared with the ground truth created by multiplying the masks used for phantom fabrication by the nominal activity concentrations (1.08 ± 0.03 MBq/mL for sphere/ellipsoid/cortex, 1.44 ± 0.04 MBq/mL for spleen/cortex, and 0.47 ± 0.01 MBq/mL for medulla of the ICRP110-based phantom), the determination of which was previously described (30).

RESULTS

Optimal Selection of Activity Conservation, Input Matrix Size, U-Net Architecture, and Resolution Modeling

In light of the investigations regarding activity conservation, input matrix size, U-Net architecture (20,24–27), and the application of RR, an optimal configuration and reconstruction method, DL-PVC, was determined for further analysis. It comprises the following components, the selection of which, including statistical tests based on the evaluation metrics, are described in detail in the supplemental materials: SPECT reconstruction with RR (RR_nf or RR_n); activity conservation based on the total activity of the uncorrected SPECT; input matrix size: direct application of PVC to the entire FOV; and R2U-Net network architecture (24).

Comparison with Iterative Yang Technique as Reference Method

Table 1 and Figure 3 show a numeric and visual comparison of the evaluation metrics for SPECT without PVC, after DL-PVC, and after IY-PVC. In both cases, DL-PVC demonstrates significantly superior evaluation metrics. In addition, Figure 4 gives a visual impression of the different image qualities. Visually, activity distributions corrected with DL-PVC closely resemble the ground-truth activity distribution, which is illustrated by cross sections. A considerable increase in the number of cyan voxels in the VAA maps indicates that the activity concentration after DL-PVC better matches the true activity concentration. Furthermore, the true activity concentration is restored, especially in voxels located at the center of larger shapes. However, deviations can still be seen at the edges of larger objects or for smaller objects.

View this table:

TABLE 1.

Mean Evaluation Metrics for Both SPECT Datasets with RR Without PVC, After IY-PVC, and After DL-PVC over All 500 Test Activity Distributions

FIGURE 3.

Comparison of different PVC approaches. Depicted are violin plots of evaluation metrics for SSIM (A), NRMSE (B), VAA (C), and activity deviation (D) for reconstructions without PVC, after IY-PVC, and after DL-PVC. Darker shades represent reconstructions without RR, and brighter shades represent reconstructions with RR. Inside violins, solid lines represent median, and dashed lines represent upper and lower quartiles. Note that activity deviation is same without PVC and after DL-PVC because of activity conservation in DL-PVC. For SSIM and VAA, higher values correspond to better performance, whereas for NRMSE and activity deviation, better performance is indicated by values closer to 0%. AD = activity deviation; NRMSE = normalized root-mean-square error; SSIM = structural similarity index measure; VAA = voxel activity accuracy.

FIGURE 4.

Visual performance analysis of different PVC approaches. (A) Top: axial slice of example activity distribution from test dataset reconstructed with RR without PVC, after IY-PVC, and after DL-PVC. White numbers correspond to SSIM values with respect to ground truth. Bottom: VAA maps of corresponding SPECT reconstructions with respect to ground truth. Red represents deviation in voxel’s activity concentration by more than or equal to ; cyan represents deviation smaller than. Black numbers indicate VAA between SPECT reconstruction and ground truth. (B) Cross-sections indicated by cyan lines in SPECT reconstructions in A. SSIM = structural similarity index measure; VAA = voxel activity accuracy.