Background

Many intelligent applications are driven today by knowledge graphs such as the Google KG, DBpedia and YAGO. The first version of YAGO was released in 2007. YAGO was created by combining knowledge from WordNet and Wikipedia, and it is one of the first open and free knowledge graphs.

YAGO2, the second version of YAGO, was released in 2011. YAGO2 introduces geospatial and temporal information to the YAGO knowledge graphby introducing geoentities. Geopatial information in YAGO2 comes not only from Wikipedia but also from the gazeteer GeoNames. The geospatial information in YAGO2 is represented with the properties hasLongitude and hasLatitude which give the longitude and latitude of the center of a geoentity.
Temporal information is introduced in YAGO2 to entities of type people, groups, artifacts or events. Temporal information is represented using dates. Dates in YAGO2 follow the ISO 8601 format (YYYY-MM-DD) and represent time points. If we want to model intervals e.g., the lifetime of an entity such as a person, we can use pairs of properties e.g., wasBornOnDate and diedOnDate which connect an entity with a date.

Contributions

The main technical contributions of our research on YAGO2geo are the following:

  • We develop a new version of YAGO2, called YAGO2geo, with more precise geospatial information. YAGO2geo contains 640 thousand polygons and 137 thousand lines. The line and polygon information introduced in YAGO2geo makes, in many cases, more sense than the coordinate pairs that exist in YAGO2. For example, we do not need to model any more the longitude/latitude center of a stream or another geoentity for which it is not clear what the center is. Also, YAGO2geo can be used to answer questions for which precise geospatial information is required. This has not been possible with YAGO2. For example, such questions are “What is the city of Germany where two streams meet at a lake?”, or “Which are the neighboring municipalities of the municipality of Athens?”.
    The extension, in combination with the 12 million coordinate pairs of YAGO2, creates a geospatial KG much richer, in terms of geospatial knowledge, compared to DBpedia which contains 1 million coordinate pairs and Wikidata which contains almost 2 million coordinate pairs and only 2 thousand shapes. This makes YAGO2geo the richest, in terms of geospatial information, publicly available, open source, knowledge graph.

  • We draw the new geospatial information from two sources. First, we utilize administrative data taken from official datasets of three countries: the Greek Administrative Geography (GAG) dataset, the administrative divisions dataset for the United Kingdom obtained from Ordnance Survey (OS) and OrdnanceSurvey Northern Ireland (OSNI), and the administrative division datasets of the Republic of Ireland obtained from Ordnance Survey Ireland (OSI). To obtain the geometries of administrative divisions of countries of the whole world, we also utilized the latest (2018) version of the Global Administrative Areas dataset (GADM).

  • We also introduce to YAGO2geo geospatial information from the biggest volunteered, crowdsourced and open dataset with geospatial information, OpenStreetMap (OSM).

More information on YAGO2geo can be found at http://yago2geo.di.uoa.gr/.