Our cohort included a total of 277 patients diagnosed with primary nasopharyngeal carcinoma in The First People’s Hospital of Foshan, China. All selected subjects were histopathologically confirmed as primary NPC without a prior history of radiation therapy, chemotherapy, or other malignancies that could potentially distort the tumor morphology. All patients were restaged according to the eighth edition of the Union for International Cancer Control/American Joint Committee on Cancer (UICC/AJCC) staging system27. The collected patients with age, gender, TNM stages, histopathological diagnosis, EBV status (VCA-IgA, EBV-DNA) and the five-year progression-free survival further contributed to the clinical database. The process of data is depicted in Fig. 1.
Graphic abstract and showcase. The MRI showcases a patient’s MRI images including T1WI, T2WI, and CE-T1WI sequences, which were processed by the manual annotation with labeled tumor boundaries and highlighted areas, respectively.
In our study, the image acquisition process was meticulously structured to ensure the highest quality of data for investigating nasopharyngeal carcinoma (NPC) segmentation. All cases underwent both non-contrast and contrast-enhanced MRI scans, providing comprehensive imaging and clinical data (see Table 4). The exclusion criteria included: 1. patients who had undergone radiotherapy or chemotherapy before the MRI examination, as the internal tumor structure and lesion boundaries may change after treatment, preventing an accurate reflection of the tumor’s original growth state; 2. patients with a history of other malignant tumors, which may confound variables and affect the study’s outcomes; 3. images that did not meet quality standards: (1) Incomplete scan coverage: When lesion areas were excessively large, parts of the lesions may have extended beyond the imaging field, resulting in incomplete evaluation of the entire NPC lesion. (2) Insufficient image resolution: Low-resolution images hindered accurate segmentation of tumor boundaries, potentially leading to imprecise measurements and analysis. (3) Presence of artifacts: Various imaging artifacts,such as motion or susceptibility effects, could obscure or distort the appearance of lesions, reducing the reliability of assessments.
The dataset was captured using six MR scanners from manufacturers including GE Discovery MR750w 3.0T and Philips Achieva 1.5T systems with Gadoteric Acid Meglumine Salt as the contrast agent. The setup and calibration of the six MR scanners (see Table 3) were similar, as can be seen from the STD of machine parameters in Table 1.
To ensure the privacy and confidentiality of patient information, a strict anonymization protocol was applied. Personal identifiers such as names, ages, birth data, sex, weight, content date were removed. Furthermore, unique patient IDs were replaced with the patients’ index within the dataset. Indirect identifiers were also deleted to prevent any potential re-identification, e.g., study date and institute information.
Standardization and Calibration of MRI Machines
To ensure consistency across different imaging sessions, a rigorous standardization and calibration protocol was implemented for all MRI scanners following international quality assurance standards. The calibration process aimed to minimize variations due to machine-specific differences and environmental factors.
Calibration Procedure
Each MRI machine was calibrated before the commencement of the study and checked regularly throughout the imaging period. The main aspects of the calibration process included:
-
Geometric calibration: Ensuring the geometric accuracy of the images by calibrating the scanner’s spatial resolution settings using a standardized phantom object.
-
Signal intensity calibration: Standardizing the signal intensity levels by adjusting the MR signal parameters to match a predefined baseline, ensuring consistent image brightness and contrast across sessions.
-
Magnetic field homogeneity adjustment: Regularly assessing and optimizing the magnetic field homogeneity to reduce artifacts and improve the accuracy of the image data.
Quality Assurance
To maintain the calibration standards, quality assurance tests were conducted periodically. These tests involved:
-
Acquiring images of a standard test phantom that includes structures to evaluate resolution, contrast, and signal uniformity.
-
Comparing these images against reference images to detect any deviations that might indicate a need for recalibration.
Segmentation procedure
Recognizing the importance of high-quality manually segmented data in advancing NPC imaging research, we have decided to share our meticulously curated dataset of manual NPC segmentations. This dataset, created by experienced radiologists using a standardized protocol, encompasses multiple MRI sequences and provides a valuable resource for researchers and developers working on improving NPC segmentation techniques.
The segmentation procedure was a critical step in preparing MRI data for effective analysis. Since the manual segmentation is widely regarded as the gold standard due to its high accuracy, all the MRI images in this study were manually segmented.
1. Image Review and Tumor Identification: All three sequence images were thoroughly reviewed to define tumor regions and areas of surrounding invasion. The process involved: (1) Localization: Confirming that the lesion originated from the nasopharyngeal mucosa or submucosa, typically presenting as early-stage mucosal thickening or soft tissue mass formation. On T1WI, lesions appeared as equal or high signals, while on T2WI, they showed high signals. CE-T1W images demonstrated significant enhancement. (2) Invasion Assessment: Larger lesions often exhibited invasion into surrounding structures. This could manifest as: a. Parapharyngeal space involvement, characterized by the disappearance of surrounding fat planes and muscle invasion. b. Upward extension with skull base bone destruction and intracranial invasion. Bone destruction was evident as high-signal yellow bone marrow was replaced by low-signal tumor tissue. c. Intracranial invasion commonly affects the cavernous sinus, temporal lobe, and cerebellopontine angle region. d. Cervical lymph node metastases, typically showing a top todown, ipsilateral to contralateral sequential pattern.
2. Tumor Boundary Determination: Nasopharyngeal MRI scans usually include T1, T2, and CE-T1 sequences in axial, coronal, and sagittal planes. Radiologists integrated information from all three sequences to accurately determine tumor boundaries: T1WI effectively displayed the surrounding fat spaces and muscle structures. CE-T1WI was crucial for precise boundary determination of early mucosal thickening. T2WI helped differentiate tumor mass from mucosa, with the mass typically showing lower signal intensity than the mucosa.
3. Manual Segmentation Procedure: Two experienced diagnostic radiologists, each with over ten years of work experience, independently performed layer-by-layer manual segmentation of lesion boundaries using ITK-SNAP 3.6.1 software (version 3.6.1)28. The polygon mode drawing icon was utilized for all three sequences in the axial plane.
4. Specific Steps for Manual Segmentation: (1) Data Import: – DICOM format neck MR images exported from the Picture Archiving and Communication System (PACS), including the nasopharyngeal region, were imported into ITK-SNAP software. (2) Data Retrieval and Visualization: Window width and level were adjusted for optimal lesion edge visualization. Relevant slices containing lesions were selected for ROI annotation. (3) ROI Delineation and Adjustment: ROIs were manually delineated layer-by-layer, following the inner lesion boundary to reduce partial volume effects. For extensive tumors, care was taken to exclude adjacent structures (e.g., blood vessels, lymph nodes) while including direct invasion areas. Difficult boundaries in one sequence were cross-referenced with other sequences. Both continuous and point-by-point delineation methods were used, with manual adjustments as needed to ensure accuracy. After completing all layers, 3D segmentation images were generated for each sequence per patient.
The segmented tumor regions were then converted into binary masks, representing the presence or absence of tumor tissue in each pixel. This rigorous segmentation procedure ensured the creation of a highly accurate and reliable dataset, forming a robust foundation for our subsequent analyses on NPC segmentation.