UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective


Full Citation in the ACM Digital Library

SESSION: Session 1: Cross-view Drone-based Geo-localization

An Orthogonal Fusion of Local and Global Features for Drone-based Geo-localization

  • Tian Zhan
  • Cheng Zhang
  • Sibo You
  • Kai Sun
  • Di Su

Drone-based geo-localization is an image retrieval task which is the foundation of many drone-based multimedia applications, such as object detection, drone navigation and mapping. It is challenging due to the large visual appearance changes caused by viewpoint variation and time misalignment. Existing methods primarily focus on global representation embedding while disregarding the local features. We propose a CNN-based model containing a global and a local branch to extract features in these two perspectives and then features are subsequently aggregated by orthogonal fusion. We achieve competitive results on University-1652/160k datasets among the ViT-based state-of-the-art models. Experimental and qualitative results that validate the effectiveness of our solution are also shown.

Orientation-Guided Contrastive Learning for UAV-View Geo-Localisation

  • Fabian Deuser
  • Konrad Habel
  • Martin Werner
  • Norbert Oswald

Retrieving relevant multimedia content is one of the main problems in a world that is increasingly data-driven. With the proliferation of drones, high quality aerial footage is now available to a wide audience for the first time. Integrating this footage into applications can enable GPS-less geo-localisation or location correction. In this paper, we present an orientation-guided training framework for UAV-view geo-localisation. Through hierarchical localisation orientations of the UAV images are estimated in relation to the satellite imagery. We propose a lightweight prediction module for these pseudo labels which predicts the orientation between the different views based on the contrastive learned embeddings. We experimentally demonstrate that this prediction supports the training and outperforms previous approaches. The extracted pseudo-labels also enable aligned rotation of the satellite image as augmentation to further strengthen the generalisation. During inference, we no longer need this orientation module, which means that no additional computations are required. We achieve state-of-the-art results on both the University-1652 and University-160k datasets.

Navigating the Metaverse: UAV-Based Cross-View Geo-Localization in Virtual Worlds

  • Ryota Yagi
  • Takehisa Yairi
  • Akira Iwasaki

In GPS-denied UAV flight, understanding the visual scene ahead is crucial. This paper focuses on cross-view geo-localization within the realm of the UAV's viewpoint in the Coarse-to-Fine problem setting. Previous research in this area has been limited due to challenges in obtaining images from the drone's perspective. To overcome this limitation, we leverage the flexibility provided by virtual space to create datasets. Moreover, we created datasets changing illumination and weather conditions, which would be cost taking to obtain in real-world scenarios. Additionally, we introduced angle loss and evaluated using these datasets, confirming that the proposed loss function performs effectively on datasets with diverse settings and on an unseen dataset.

A Cross-View Matching Method Based on Dense Partition Strategy for UAV Geolocalization

  • Yireng Chen
  • Zihao Yang
  • Quan Chen

\beginabstract This paper reports our solution for ACM Multimedia 2023 cross-view geo-localization challenge~\citezheng2023UVA, which aims to solve real-world geo-localization task with extremely large satellite-view gallery distractors. Our solution is built on the basis of SwinV2~\citeliu2022swin and LPN~\citewang2021each. Concretely, we adopt the SwinV2-B~\citeliu2022swin, the current mainstream transformer-based feature extractor, as the backbone of our model. Inspired by the feature partition strategy of LPN~\citewang2021each, we design a more efficient partition strategy named dense partition strategy. It segments and combines features to alleviate the problem of feature discontinuity at the boundary of different partitions. We get eight place in the geographic localization track on the official leaderboard. \endabstract

Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization

  • Bing Zhang
  • Jing Sun
  • Rui Yan
  • Fuming Sun
  • Fasheng Wang

Cross-view geo-localization aims to locate the target image of the same geographic location from different viewpoints, which is a challenging task in the field of computer vision. Due to the interference of similar images and the surrounding environment of the target building, the matching accuracy is significantly reduced when facing complex scenes. To solve this problem, we propose a cross-view geo-localization method based on dual-branch pattern and multi-scale context to provide a solution for challenging dataset with numerous distractors. This method exploits a Transformer feature extraction network to reduce the loss of fine-grained features. Meanwhile, a dual-branch structure is designed to capture image semantic information and local context information bidirectionally, which can effectively deal with the problem of more interference items in satellite images and improve the accuracy of geographic location tasks in complex scenes. After quantitative experimental verification, both recall rate (Recall) and image retrieval average precision (AP) indicators have been significantly improved on benchmark dataset University-1652 and challenging dataset University-160K, our method can achieve advanced cross-view geo-localization performance.

Modern Backbone for Efficient Geo-localization

  • Runzhe Zhu
  • Mingze Yang
  • Kaiyu Zhang
  • Fei Wu
  • Ling Yin
  • Yujin Zhang

With the development of autonomous driving technology, vision geo-localization has obtained a consistently growing following. How to match correct image pair from different perspectives is the key technology. Existing geo-localization methods focus on designing complex attention mechanism based on traditional backbone, e.g., VGG, ResNet, but neglect the importance of backbone network. In this article, we propose a modern backbone based geo-localization method (MBEG). MBEG introduces the latest vision fundamental network EVA-02 as backbone, which has been fully trained in large datasets. In addition, the feature rotate encoding strategy is presented to eliminate the effects of image rotation. We also apply the knowledge distillation to squeeze network's parameters for actual application. Our work exhibited excellent performance on the University-1652 dataset, and our solution attained the top-1 ranking in the UAVs in Multimedia Challenge for the University-160k dataset.

AFPN: Attention-guided Feature Partition Network for Cross-view Geo-localization

  • Zhifeng Lin
  • Ranran Huang
  • Jiancheng Cai
  • Xinmin Liu
  • Changxing Ding
  • Zhenhua Chai

Cross-view geo-localization is to retrieve images of the same geographic target from different platforms. Since drones have received increasing attention in recent years because of their ability to capture high-quality multimedia data from the sky, we focus on image retrieval from the drone platform to the satellite platform in this paper. We propose an attention-guided feature partition network (AFPN) which leverages learnable spatial attention maps to divide the global high-level feature map into the class-aware foreground and the class-agnostic background feature in an end-to-end learning manner. Our backbone is based on the powerful vision transformer to model long-range global dependencies between patches. Data augmentation and multiple sampling strategies are also adopted in our experiments. Our method achieves Recall@1 accuracy at 95.60% on University-1652 and 94.48% on University-160k, and ranks 2nd in the ACMMM23 Multimedia Drone Satellite Matching Challenge.

Dual Path Network for Cross-view Geo-Localization

  • Leyi Dong
  • Yuhui Wang
  • Junshi Huang
  • Xueming Qian
  • Mingyuan Fan
  • Shenqi Lai

Cross-view geo-localization is to find images of the same geographical target from different views. We reevaluate the square-ring partition strategy proposed in LPN and identify a limitation in its fixed splitting strategy of features. Specifically, we observe that this method struggles to accommodate the variations in target sizes in images. Consequently, the performance of LPN experiences a substantial decline when the dataset is expanded. To optimize this problem, we propose a multi-scale ring partition strategy that effectively partitions the features at different scales to accommodate the change in the target scale. Building upon this, we construct Dual Path Network(DPN). And, we investigate the distribution of training images in University-1652 and propose a new method of sampling training data, which improves Recall@1 by 4.5%. Experimental results on the University-1652 and University-160k datasets demonstrate the effectiveness of our method.

Drone Satellite Matching based on Multi-scale Local Pattern Network

  • Haoran Li
  • Quan Chen
  • Zhiwen Yang
  • Jiong Yin

\beginabstract In this technical report, we represent our solution for the ACMMM23 Multimedia Drone Satellite Matching Challenge. Our solution focuses on exploiting the feature aggregation strategy to develop a robust cross-view geo-localization system for Drone Satellite Matching. In particular, we propose an end-to-end framework named Multi-scale Local Pattern Network~(MLPN), which builds upon the LPN and incorporates a multi-scale aggregation block. LPN is employed to divide high-level features of different scales, and the multi-scale aggregation block, as the name implies, is utilized to aggregate the local features obtained by the division. Experiments show that MLPN can effectively match UAV images with satellite images and achieve a competitive accuracy on University160k. Additionally, in Multimedia Drone Satellite Matching Challenge, our solution achieves the fourth place. \endabstract

SESSION: Session 2: Drone-based Object Detection / Scene Understanding

Edge Intelligence Resource Consumption by UAV-based IR Object Detection

  • Andrii Polukhin
  • Yuri Gordienko
  • Mairo Leier
  • Gert Jervan
  • Oleksandr Rokovyi
  • Oleg Alienin
  • Sergii Stirenko

The investigation of the feasibility of using the YOLO (You Only Look Once) architecture for object detection in infrared images from unmanned aerial vehicles (UAVs) on low-power devices, specifically the Raspberry Pi, and Orange Pi, is conducted. The study measures the consumption of computing resources for each device, such as inference time (ms), peak power consumption (W), memory consumption (MB), inference energy (J), memory consumption (MB), and storage consumption (MB). It also investigates the correlation between number of model parameters and resource consumption of the different YOLO model sizes. Finally, the study draws conclusions about the expediency and realism of using YOLO on low-power devices for Edge Intelligence and proposes methods of speeding up work. The results show that YOLO can be used effectively on low-power devices with some optimizations to increase performance and energy efficiency.

CDMNet: Contrastive Distribution Mapped Network for Infrared Small Target Detection

  • Chengtao Lv
  • Jinyang Guo
  • Jiaqi Yu
  • Ruiyan Zhang
  • Xianglong Liu

Single-frame infrared small target (SIRST) detection is an extremely challenging task due to its low signal-to-noise ratio and low contrast. Previous methods fail to achieve promising performance as they do not consider the analogous and blurred background surrounding. To this end, we first propose a prototype-based contrastive loss (PCL) by modeling the foreground targets and the surrounding nearest backgrounds. As a result, the prototypes of different categories in the latent space could be far away, which enables the model to make clear decisions on the boundaries of infrared small targets. Moreover, previous methods neglect the distribution inconsistency caused by feature fusion in U-shaped architecture. Therefore, we design a multi-scale distribution-mapped fusion (MDMF) module, which greatly mitigates the distribution inconsistency issue. We incorporate the proposed PCL and MDMF module into the existing SIRST detection method to construct a new SIRST detection framework called Contrastive Distribution Mapped Network (CDMNet). Extensive experiments on two infrared small target datasets, NUDT-SIRST and IRSTD-1k, demonstrate that our model outperforms current competitive models on a variety of metrics.

SkySea: Connecting Satellite, UAV and Underwater Imagery for Benthic Habitat Mapping

  • Brendan Do
  • Jiajun Liu
  • Ziwei Wang
  • Brano Kusy
  • Torsten Merz
  • Andy Steven
  • Geoffrey Carlin
  • Joseph Crosswell
  • Yang Li
  • Nicholas Mortimer
  • Fereshteh Nayyeri
  • Mat Vanderklift
  • Mark Wilson

Satellite imagery, UAV imagery, and geo-referenced underwater photo transects (from the surface) are different methods used in marine monitoring and benthic habitat mapping applications to collect observations at different spatial scales. There are however challenges in linking them all together to provide fine-grained mapping and analysis for underwater, benthic habitats with complex geometric and ecological properties. We propose a novel framework called SkySea that offers users access to aligned observational data at multiple spatial scales. SkySea can integrate satellite images (e.g., from SENTINEL-2 at 10m resolution), UAV images (<5cm ground sampling distance), detailed underwater images, 3D reconstruction of the seafloor/benthos from underwater images, and make the data available through a commonly used user interface, such as QGIS. Initial evaluation indicates that the spatial overlay achieves sub-meter-level accuracy, while the underwater 3D reconstruction reaches an average relative error of less than 10% for size estimation with reference objects. We believe that this is a novel and innovative framework to achieve a seamless connection across an enormous gap of scales from satellite images, regional UAV images, local underwater images and local 3D reconstruction of the underwater environment, for benthic habitat mapping. It enables marine biologists to perform survey planning, species mapping, and model validation tasks in an integrated pipeline.

Research on Unmanned Aerial Vehicle Flight Attitude Control Method Based on Sparrow Search Algorithm and PID Optimization

  • Yanpeng Sun
  • Jinhai Miao

In order to improve the adjustment speed of quadrotor unmanned aerial vehicle (UAV) flight attitude under disturbance, a sparrow search algorithm (SSA) is combined with proportional-integral-derivative (PID) control to design an SSA-based PID controller for precise and stable flight control of UAVs and to address control accuracy and robustness issues during flight. This paper focuses on the quadrotor UAV as the research object, first briefly introduces its structure and working principle, then designs the controller based on the system model, and finally verifies the effectiveness of the control algorithm using MATLAB/Simulink simulation. By comparing the adjustment time of pitch angle control between SSA-PID and traditional PID controllers, the simulation results show that the SSA-based PID controller can more stably control quadrotor UAVs than the traditional PID controller.