MMCPP: A MULTI-MODAL CONTRASTIVE PRE-TRAINING MODEL FOR PLACE REPRESENTATION BASED ON THE SPATIO-TEMPORAL FRAMEWORK

Abstract. The concept of "place" is crucial for understanding geographical environments from a human perspective. Place representation learning involves converting places into numerical low-dimensional dense vectors and is a fundamental procedure for artificial intelligence in geography (GeoAI). However, most studies ignore multi-level distance constraints and spatial proximity interactions that enable behavioral interactions between places. Furthermore, representing the temporal characteristics of these interactions in trajectory sequences poses a challenge for natural language processing and other field techniques. In addition, most existing methods rely on all modalities from inputs as they use joint training to integrate multiple modalities. To address these issues, we propose a Multi-Modal Contrastive Pre-training model for Place representation (MMCPP). Our model consists of three encoders that capture corresponding place attributes across different modalities, including point of interests (POIs), images, and trajectories. The trajectory encoder, named RodtFormer, takes fine-grained spatio-temporal trajectories as input and leverages self-attention with rotary temporal interval position embedding to simulate dynamic spatial and behavioral proximity interactions between places. By using a coordinated pre-training framework, MMCPP independently encodes place representations across different modalities and improves model reusability. We verify the effectiveness of our model on a taxi trajectory dataset using the location prediction task at next n seconds, including 30 seconds (s), 180 (s), 300 (s). Our results demonstrate that compared to existing embedding methods, our model is capable of learning higher-quality position representations during pre-training, leading to improved performance on downstream tasks.

Location
Deutsche Nationalbibliothek Frankfurt am Main
Extent
Online-Ressource
Language
Englisch

Bibliographic citation
MMCPP: A MULTI-MODAL CONTRASTIVE PRE-TRAINING MODEL FOR PLACE REPRESENTATION BASED ON THE SPATIO-TEMPORAL FRAMEWORK ; volume:X-1/W1-2023 ; year:2023 ; pages:303-310 ; extent:8
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences ; X-1/W1-2023 (2023), 303-310 (gesamt 8)

Creator
Chen, Y.
Yu, X. S.
Qin, K.

DOI
10.5194/isprs-annals-X-1-W1-2023-303-2023
URN
urn:nbn:de:101:1-2023120703301687889527
Rights
Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
Last update
15.08.2025, 7:32 AM CEST

Data provider

This object is provided by:
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.

Associated

  • Chen, Y.
  • Yu, X. S.
  • Qin, K.

Other Objects (12)