Retrieval Algorithm on Checking Night-shot Infrared Pedestrian’s Image into Visible-light-rendering Image Library of Pedestrians
Image retrieval algorithms aim at achieving individual re-identification through searching across a gallery of people’s images from different non-overlapping video monitoring cameras. However, night-shot infrared pedestrians’ images captured by surveillance systems have different data distributions from those of conventional colored RGB images, resulting in loss of all the color components which are crucial for pedestrian comparison. The differences between the two kinds of images are regarded as their modality discrepancy that can lead to large intra-class variations and modality gaps across different cameras. Thus, it is necessary and valuable to effectively bridge the modality gaps and alleviate the heterogeneous characteristics between pedestrian images shot nightly in infrared mode and those monitor-kept in the daytime so that the available retrieval algorithms could be improved of their effects under the widely-used visible-light-rendering image database. Accordingly, a dual-path partial-parameter-sharing framework was here designed with combination into a deep convolutional neural network (CNN) to extract the features from two-modality images possessing both the global coarse-granularity and local fine-granularity. The pre-trained ResNet50-based network was used to optimize the model training with the constraints of cross-modality entropy loss, center-based and sample-based triplet loss, eventually having brought forth an optimal model through adjustment of the weighted constraints into functions of the three losses indicated above. Subsequently, the feature measurement was carried out to validate the recognition effect on cross-modality images. The experimental results showed that the extracted multi-granularity features presented good application effect on the tested dataset under single shot mode, having SYSU-MM01 dataset demonstrated of effect verification and visualization that the resulting multi-granularity components did complement each other and hold a competitive testing accuracy compared to other several relevant algorithms. This research can serve video investigation work and help the on-going dynamic and static pedestrian image comparison. The algorithm proposed here would have broad applicable prospects in both the video investigation and future intelligent practice.
