Abstract: Ontology alignment has been studied for over a decade, and over that time many alignment systems and methods have been developed by researchers in order to find simple 1-to-1 equivalence matches between two ontologies. However, very few alignment systems focus on finding complex correspondences. One reason for this limitation may be that there are no widely accepted alignment benchmarks that contain such complex relationships. In this paper, we propose a real-world data set from the GeoLink project as a potential complex ontology alignment benchmark. The data set consists of two ontologies, the GeoLink Base Ontology (GBO) and the GeoLink Modular Ontology (GMO), as well as a manually created reference alignment that was developed in consultation with domain experts from different institutions. The alignment includes 1:1, 1:n, and m:n equivalence and subsumption correspondences, and is available in both Expressive and Declarative Ontology Alignment Language (EDOAL) and rule syntax. The benchmark has been expanded from its original version to contain real-world instance data from seven geoscience data providers that has been published according to both ontologies. This allows it to be used by extensional alignment systems or those that require training data. This benchmark has been incorporated into the Ontology Alignment Evaluation Initiative (OAEI) complex track to help researchers test their automated alignment systems and algorithms. This paper also analyzes the challenges inherent in effectively generating, detecting, and evaluating complex ontology alignments and provides a road map for future work on this topic.
Keywords: Complex ontology alignment; Real-world ontology; Ontology population; Benchmark