Our dataset was meticulously planned through the deployment of logistics vehicles equipped with cameras, ensuring the capture of a variety of images under different environmental conditions and locations. These varying environmental conditions are primarily related to weather, especially the position and intensity of the sun. Changes in sunlight intensity, angle, and shadows alter the appearance of objects in images, affecting contrast, color, and texture. This dataset encompasses these diverse conditions, enabling it to represent real-life scenarios. To protect privacy while preserving the dataset’s efficacy for pavement distress detection, specific alterations have been implemented. This includes the use of advanced repair techniques and selective blurring to effectively anonymize identifiable features, ensuring that the resulting images remain highly relevant for analytical purposes.
Performance of object detection algorithms using PaveTrack_OD
Our dataset is employed for training and assessing the performance of seven predominant object detection algorithms, including Faster-RCNN15, YOLOv516, YOLOv817, YOLOX18, YOLOv1119, and RT-DETRv220. The performance of the models is evaluated using four critical metrics: precision, recall, mAP50, and FLOPS. These metrics collectively provide a multifaceted view of the model’s accuracy and efficiency in correctly identifying and classifying each category within the dataset. For the dataset collected in China, RT-DETRv2 showed excellent detection performance with mAP50 of 0.593. As shown in Table 3, YOLOv11 follows closely with an mAP50 of 0.569. These results underscore the efficacy of our dataset in conjunction with deep learning algorithms for the detection of pavement distresses.
For the dataset collected in the United States, the models’ performances differ from those observed in the Chinese dataset due to variations in volume and distribution. As evidenced in Table 4, YOLOv8 achieves a mAP50 of 0.561, significantly outperforming other models. Faster-RCNN, despite being proposed several years ago as a classical object detection method, maintains the highest precision of 0.782 in this dataset.
Baseline methods and results for pavement distress matching and tracking using PaveTrack_PD
For the second part of the dataset, we designed a three-step matching algorithm to filter a large number of pavement distresses, with the specific design as follows:
Step 1 (GPS Clustering): This step involves collecting images from the same location. To overcome the potential offset in GPS data due to tall buildings, this study employs an improved K-means algorithm that incorporates a filtering mechanism for outlier data during the clustering process. This enhanced algorithm is better equipped to handle outliers in GPS data, thereby improving the accuracy and efficiency of clustering. By filtering out outliers, the algorithm can more accurately determine cluster centers, resulting in clustering results that more closely reflect actual conditions. Considering the positioning errors of GPS in urban environments with building obstacles and multi-lane roads, images within a range of 5 to 20 meters are clustered to facilitate matching at each location.
Step 2 (Background Matching): To accurately identify and match images that are close in GPS coordinates but have visual content differences, we need to match different scenes based on the background features of the images. The SuperPoint algorithm is first used to detect keypoints and extract descriptors for the images to be matched. The extracted keypoints and descriptors are then input into the SuperGlue algorithm for matching. Through SuperGlue’s graph attention mechanism, the similarity between keypoints is learned, establishing reliable matching correspondences. Based on the matching results, the similarity between two images is evaluated to achieve scene matching.
Step 3 (Adjacent Local Area Matching): The SuperGlue network provides pixel-wise matches between two images, while the image recognition algorithm draws bounding boxes for each image. If the pixels within two bounding boxes match, there is a duplicate pavement distress in both images. However, due to different shooting angles, unremarkable features, or unusual weather conditions, the same distress in two images may not share matching features. Repaired defects and potholes are easily matched due to their more pronounced features, but the original defects are not prominent throughout the image and are difficult to match directly using SuperGlue. To address this, we designed an algorithm that matches specific pavement distresses by comparing adjacent local areas within the images. Specifically, we can extract local areas around the distress and then use a feature matching algorithm to compare the similarity of these regions. By calculating the relative position and orientation of the local areas, we can determine whether the two distresses are the same. This method helps us accurately match defects in two images, even if their shapes and sizes differ.
Pavement performance degradation can also be observed through a designed matching framework, which illustrates five common degradation scenarios.
Case 1 (Fig. 9) illustrates the onset of pavement distress, where, within a span of 20 days, the initially smooth pavement developed cracks. This demonstrates that distress does not accumulate gradually but occurs abruptly.
Successful pavement distress matching case 1.
Case 2 (Fig. 10) depicts a scenario where no significant deterioration occurred in the pavement. Over a four-month monitoring period, the crack at this location remained in its initial state.

Successful pavement distress matching case 2.
Case 3 (Fig. 11) showcases an instance of crack propagation, likely due to the combined effects of water infiltration and heavy vehicular loading. After four months, the length of the crack expanded to nearly double its original size.

Successful pavement distress matching case 3.
Case 4 (Fig. 12) presents a situation where a pothole was repaired. If the pothole becomes a repaired defect, it typically indicates that the road maintenance unit has intervened. This approach enables us to track changes in pavement damage status at high frequency, optimizing maintenance timing.

Successful pavement distress matching case 4.
Case 5 (Fig. 13) demonstrates the progression of a crack into a pothole, indicating that pavement distress is undergoing deterioration.

Successful pavement distress matching case 5.
Annotation validation
The annotation process was a collaborative effort among team members, with images fairly distributed among the initial annotators. Dr. Liu Chenglong played a pivotal role in reviewing and refining these annotations to ensure accuracy. Following annotation, the images were anonymized and further processed using inpainting techniques to maintain the integrity of the dataset without compromising network performance.
Influence of the inpainting process
To evaluate the impact of our anonymization techniques on model performance, we conducted a comparative analysis. Initially, the model was trained using the original dataset without anonymization. The accuracy of the model was validated against different validation sets to establish a performance baseline. Subsequently, the same model was evaluated using the anonymized dataset, ensuring consistency in testing conditions. Notably, the comparative analysis revealed no significant differences in validation results between the anonymized and non-anonymized datasets. This finding indicates that our inpainting-based anonymization process does not impair the model’s ability to detect pavement distress, confirming the integrity of our data processing methodology.