Abstract:Due to the poor performance of deep learning-based detectors in the past,scale variations,background variations,and visual occlusions reduce the accuracy of raw image localization.Therefore,an improved YOLOv5s+DeepSORT model is proposed to enhance adaptability to dock environments.To enhance the robustness of multi-scale loading objects,multi-scale convolution is embedded in YOLOv5s,and an efficient pyramid segmentation attention(EPSA) network is added to achieve more powerful feature fusion multi-scale representation.The mean average precision(mAP) is improved from 90.05% to 90.90%.By optimizing the original classification loss function through distributed sorting loss,the impact of imbalanced loading objects and background changes in dock image sequences is reduced,resulting in a 4.8% improvement in multiple object tracking accuracy (MOTA).Experiments on self built datasets show an average accuracy of 90.9% and a detection accuracy of 92.2%.