Technology trends in deep learning networks for real-time object detection in drone environment

Jonghyeon Mun; Chaebong Sohn

doi:10.37944/jams.v6i2.220

Authors

Jonghyeon Mun Kwangwoon University
Chaebong Sohn Kwangwoon University

DOI:

https://doi.org/10.37944/jams.v6i2.220

Keywords:

drones, computer vision, deep learning network, real-time object detection

Abstract

With the recent announcement of the Defense Innovation 4.0 Basic Plan by the Ministry of National Defense, the role and operational scope of drones are expanding as a key force in AI-based, unmanned, and autonomous systems. Consequently, the significance of real-time object detection technology is emphasized as drones take on diverse missions, including delivering, analyzing, and assessing real-time, target-related information. The emergence of recent deep learning has led to substantial advancements in the field of computer vision, particularly in object detection. Deep learning-based object detection is actively being researched, with a focus on algorithms suited for embedded and mobile environments such as drones. This research predominantly aims to develop deep learning-based object detection models that ensure real-time performance and accurately identify objects’ various forms and sizes. Recent object-detection models have been categorized into backbone networks, neck networks, and head networks. By utilizing these three network components, design considerations can be tailored to fulfill the requirements of drone operations. In this paper, we investigate the technology trends of deep learning network models that can be loaded into drones for real-time object detection. Thus we contribute to strengthening effective drone operation in military operations and supporting research and decision-making processes.

Metrics

Metrics Loading ...

Author Biographies

Jonghyeon Mun, Kwangwoon University

(First Author) Kwangwoon University, Department of Electronic Communication Engineering, Master’s course student, [email protected], https://orcid.org/0000-0002-9921-2796.

Chaebong Sohn, Kwangwoon University

(Corresponding Author) Kwangwoon University, Department of Defense Acquisition Program Division, Professor, [email protected], https://orcid.org/0000-0001-9584-7930.

References

Ali, S., Siddique, A., Ateş, H. F., & Güntürk, B. K. (2021). Improved YOLOv4 for aerial object detection. In 2021 29th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE. https://ieeexplore.ieee.org/abstract/document/9478027

Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. https://doi.org/10.48550/arXiv.2004.10934

Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., ... & Tian, Q. (2018). The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European conference on computer vision (ECCV) (pp. 370-386). Retrieved from https://openaccess.thecvf.com/content_ECCV_2018/html/Dawei_Du_The_Unmanned_Aerial_ECCV_2018_paper.html

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6569-6578). https://openaccess.thecvf.com/content_ICCV_2019/html/Duan_CenterNet_Keypoint_Triplets_for_Object_Detection_ICCV_2019_paper.html

Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448). https://openaccess.thecvf.com/content_iccv_2015/html/Girshick_Fast_R-CNN_ICCV_2015_paper.html

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587). https://openaccess.thecvf.com/content_cvpr_2014/html/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.html

Guo, J., Han, K., Wang, Y., Zhang, C., Yang, Z., Wu, H., ... & Xu, C. (2020). Hit-detector: Hierarchical trinity architecture search for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11405-11414). https://openaccess.thecvf.com/content_CVPR_2020/html/Guo_HitDetector_Hierarchical_Tr inity_Architecture_Search_for_Object_Detection_CVPR_2020_paper.html

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861

Hussain, M. (2023). YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines, 11(7), 677. https://doi.org/10.3390/machines11070677

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. https://doi.org/10.48550/arXiv.1602.07360

Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV) (pp. 734-750). https://openaccess. thecvf.com/content_ECCV_2018/html/Hei_Law_CornerNet_Detecting_Objects_ECCV_2018_paper.html

Lee, J. W., Kim, J. Y., Kim, J. K., & Kwon, C. H. (2021). A Study on Realtime Drone Object Detection Using On-board Deep Learning. Journal of the Korean Society for Aeronautical & Space Sciences, 49(10), 883-892. https://doi.org/10.5139/JKSAS.2021.49.10.883

Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125). https://openaccess.thecvf.com/content_cvpr_2017/html/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.html

Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128, 261-318. https://doi.org/10.1007/s11263-019-01247-4

Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768). https://openaccess.thecvf.com/content_cvpr_2018/html/Liu_Path_Aggregation_Network_CVPR_2018_paper.html

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision– ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11– 14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing. https://link.springer.com/chapter/10.1007/978-3-319-46448-0_2

Pal, S. K., Pramanik, A., Maiti, J., & Mitra, P. (2021). Deep learning in multi-object detection and tracking: state of the art. Applied Intelligence, 51, 6400-6429. https://doi.org/10.1007/s10489-021-02293-7

Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. https://doi.org/10.48550/arXiv.1804.02767

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Redmon_You_Only_Look_CVPR_2016_paper.html

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28. https://proceedings.neurips.cc/paper_files/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556

Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR. https://proceedings.mlr.press/v97/tan19a.html?ref=jina-ai-gmbh.ghost.io

Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp.10781-10790). https://openaccess.thecvf.com/content_CVPR_2020/html/Tan_EfficientDet_Scalable_and_Efficient_Object_Detection_CVPR_2020_paper.html

Wu, X., Li, W., Hong, D., Tao, R., & Du, Q. (2021). Deep learning for unmanned aerial vehicle-based object detection and tracking: a survey. IEEE Geoscience and Remote Sensing Magazine, 10(1), 91-124. https://doi.org/10.1109/MGRS.2021.3115137

Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500). https:openaccess.thecvf.com/content_cvpr_2017/html/Xie_Aggregated_Residual_Transformations_CVPR_2017_paper.html

Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4203-4212). https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.html

Zhu, L., Lee, F., Cai, J., Yu, H., & Chen, Q. (2022). An improved feature pyramid network for object detection. Neurocomputing, 483(28), 127-139. https://doi.org/10.1016/j.neucom.2022.02.016