Publications Office of the EU
Identifiers and how to use more than one - EU Vocabularies
DisplayCustomHeader
Semantic knowledge base - title

Semantic knowledge base

Knowledge Base Display

Identifiers and how to use more than one

As any believer in the principles of reference data, you might have asked yourself why the Publications Office does not “walk the talk” in using a unique identified for the concepts listed in the authority tables. The answer is mixed because OP does believe in the utility of a unique identifier but also has to publish multiple identifiers at the same time.

If that sounds not very logical we think that the below description of the context will help us understand the decisional mechanism.

As you might know already, the reference data catalogue of the Publications Office contains close to two hundred vocabularies. Some of them are domain-specific, and some other are technical tables that are serving our common needs in terms of access and organisation of the data (e.g., Use context).

The main subject-based vocabularies have to follow the standards applicable in each field. In most cases those standards are maintained by organisation like ISO or IANA and following their identifiers is a logical decision. But this can extend to situations where:

  • multiple standards exist in that specific field
  • previous codes have been used in the past by some applications and for backward compatibility, you need to keep them

As a result, some of the vocabularies you can find in the Publications Office catalogue do list a multitude of identifiers.

Let’s get one particular example that is well known, the Country table.

If you look inside it for a concept like Austria for example, you will find the following list of identifiers:

Identifier Associate code
IANA.at
ISG COUAT
ISO 3166-1 α-2AT
ISO 3166-1 α-3AUT
ISO 3166-1 num040
TIRA
UNSD M49040
FD_010A
FD_040AT
FD_050A
FD_110AT
FD_140AT
FD_160AT
FD_290A
FD_325A
FD_375AT
FD_380AT
FD_400AT
MNEAT
PUB_LOCAT
PUB_LOC{AUT}
TEDAT
TED SchemaAT

 

It is easy to observe in this list of identifiers two areas. The first part contains the identifiers associated with standards, like IANA, Interinstitutional Style Guide, ISO, TIR, and UNSD notation. The second part is formed by codes that have been used in different other systems (e.g., TED) or references to other datasets that contain the same value (FD_nnn).

Seeing a total of over 20 identifiers associated with a single concept in the table looks not very tidy. Yet this is acceptable for a reference data asset that is constructed as a knowledge graph. You do not need to interact with all of them. A SPARQL query can be written to return just one of them or eventually a set of identifiers as following:

PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX lbl: <http://publications.europa.eu/resource/authority/label-type/>
PREFIX euvoc: <http://publications.europa.eu/ontology/euvoc#>
  
SELECT ?label ?code as ?ISO_3166_1_3
 
# chose the table you are looking into
FROM <http://publications.europa.eu/resource/authority/country>
 
WHERE
{
    ?c skosxl:prefLabel|skosxl:altLabel ?xlLabel .

    # pick the label type: Standard label
    VALUES ?Labeltype { <http://publications.europa.eu/resource/authority/label-type/STANDARDLABEL> }
    ?xlLabel dct:type ?Labeltype .

    ?xlLabel skosxl:literalForm ?xlLiteralForm.

    filter (lang(?xlLiteralForm)="en")

    # convert label to string
    BIND ( str(?xlLiteralForm) as ?label)

    ?c euvoc:xlNotation ?notation .
 
    # look for a specific notation type: ISO 3166-1 α-3
    VALUES ?notationType { <http://publications.europa.eu/resource/authority/notation-type/ISO_3166_1_ALPHA_3> }
 
    ?notation dct:type ?notationType .
 
    # select the code
    ?notation euvoc:xlCodification ?xcode.

    # convert code to string
    BIND ( str(?xcode) as ?code)

     
}
ORDER BY ?label
LIMIT 10

This script executed on the CELLAR SPARQL endpoint will return the following:

label ISO_3166_1_3

Afghanistan

AFG

Albania

ALB

Algeria

DZA

American Samoa

ASM

Andorra

AND

Angola

AGO

Anguilla

AIA

Antarctica

ATA

Antigua and Barbuda

ATG

Argentina

ARG

 

As you can see, the data returned is clean and clear, listing just the basic values that are needed for a particular system or project. This is a very simple demonstration of the practical use of the knowledge architecture used by the Publications Office for the storage and dissemination of reference data.

 

Note: All the SPARQL queries mentioned in the article can be tested on the SPARQL endpoint of the Publications Office found at the following address:  http://publications.europa.eu/webapi/rdf/sparql  

Tags
reference data sparql identifiers
Most Recent
Dissemination formats 24 ta’ Jannar 2022
Federated queries 24 ta’ Ottubru 2021
About reference data 23 ta’ Ottubru 2021
Labels and data models, why and how to use them Previous