In our paper "Template-Based Question Answering over Linked Geospatial Data" we have presented a gold standard for geospatial question answering over the knowledge graph DBpedia and the geospatial datasets GADM and OpenStreetMap. This standard can be found here. In the GeoQA project, we extend this standard for question answering over the YAGO2geo knowledge graph as described below.
In GeoQA we are developing the geospatial knowledge graph YAGO2geo by extending the well-known knowledge graph YAGO2 with geographical administrative data provided by a range of official sources, the Global Administrative Areas dataset and OpenStreetMap.
The Greek Administrative Geography (GAG) dataset contains information about the administrative divisions of Greece from official sources of the Kallikratis law which defines the administrative divisions of Greece in 2011. The administrative divisions of Greece, according to Kallikratis, consist of decentralized administrations, regions, regional units, municipalities, municipal units and municipal communities. The Kallikratis administrative divisions have been defined as linked data by our group in the past and has been publicly available here.
The Ordnance Survey dataset, contains the administrative boundaries of Great Britain. We have focused on the information about the following administrative divisions: European regions, counties, districts and metropolitan districts, unitary authorities, boroughs, wards, parishes, and communities. Ordnance Survey Northern Ireland is the official cartographic agency of Northern Ireland. Users are able to obtain its data using the ONSI Open Data portal. In this project we use the datasets NI Outline, Local Government Districts 2012, Wards 2012 and Townlands. The Ordnance Survey Ireland is the national mapping agency of the Republic of Ireland and it provides multiple products and datasets. For the extension of the geospatial information of entities that belong to the Republic of Ireland, we consider the datasets (i.e., administrative areas) city and county council, county council, city council, municipal district, barony, parish, townland and rural area.
The USGS Governmental Unit Boundaries dataset (NBD) from the National Map represents major civil areas for the U.S., including States or Territories, counties (or equivalents), Federal and Native American areas, congressional districts, minor civil divisions, incorporated places (such as cities and towns), and unincorporated places. Boundaries data, acquired from a variety of government sources, include extents of forest, grassland, park, wilderness, wildlife, and other reserve areas.
Open Street Map (OSM) is a collaborative project to create a free editable map of the world. It contains information about various features like rivers, lakes, cities, roads, points of interest (e.g., museums, restaurants and schools) etc. The geometries of these features can be points, lines or polygons. In addition to the geometry of a feature, OSM contains information such as name, feature class, layer, OSM ID, OSM code and population. More details about what the OSM dataset are available here. OSM data can be obtained in various formats. We obtained the dataset in shapefile format and converted it into RDF using the GeoTriples tool. The OSM ontology that we used is available in graphical format and RDF/XML format for your information. The ontology uses the GeoSPARQL vocabulary to model the geometries of various OSM features. We use a subset of OSM which consists of all the data from two English-speaking areas of Europe: the United Kingdom (England, Scotland, Wales and Northern Ireland) and the Republic of Ireland.
Global Administrative Areas (GADM) is a dataset containing information about the administrative divisions and their boundaries. GADM gives the geometry of each administrative area and it also provides some other information such as its name and variant names.
Following the tradition of other researchers, we have developed benchmark set of 201 natural
language questions, expressed in English, that can be used to evaluate the geospatial question
answering engines over our dataset. The benchmark contains the questions and their respective
GeoSPARQL/SPARQL queries written by members of our team.
The questions for benchmark is collected from students of the Department of Informatics and
Telecommunications of the National and Kapodistrian University of Athens, in the context of an
Artificial Intelligence undergraduate course taught by Prof. Manolis Koubarakis (academic year
2017-2018).
These questions have a geospatial flavour and they target DBpedia, and the United Kingdom and
Ireland part of OpenStreetMap and GADM.
The questions have been classified as "simple" by the
students who were also asked to write "complex" geospatial quetions.
The complex geospatial
questions are currently processed by our team and will also be released on this Web site soon.
The 201 questions and their translation into SPARQL or GeoSPARQL using the ontologies are listed
below. For the first five questions, you can also execute the corresponding GeoSPARQL queries
and see the results. The questions are also available in a CSV file.
Many of the queries use the SPARQL keyword SERVICE to target the different endpoints where our
data sources are located. The DBpedia datasource is the official SPARQL endpoint
while the GADM and OpenStreetMap data sources are stored at our public SPARQL endpoint
.
The complete 201 questions are listed below.
*In the resulting GeoSPARQL query, we assume following distance for "near" :
Near to a Restaurant: 500 meters |
Near to a City: 5 km |
Near to a Hotel: 1 km |
Near to a Landmark: 1km |
Near to a park: 500 meters |