Semantic knowledge base
As any believer in the principles of reference data, you might have asked yourself why the Publications Office does not “walk the talk” in using a unique identified for the concepts listed in the authority tables. The answer is mixed because OP does believe in the utility of a unique identifier but also has to publish multiple identifiers at the same time.
If that sounds not very logical we think that the below description of the context will help us understand the decisional mechanism.
As you might know already, the reference data catalogue of the Publications Office contains close to two hundred vocabularies. Some of them are domain-specific, and some other are technical tables that are serving our common needs in terms of access and organisation of the data (e.g., Use context).
The main subject-based vocabularies have to follow the standards applicable in each field. In most cases those standards are maintained by organisation like ISO or IANA and following their identifiers is a logical decision. But this can extend to situations where:
- multiple standards exist in that specific field
- previous codes have been used in the past by some applications and for backward compatibility, you need to keep them
As a result, some of the vocabularies you can find in the Publications Office catalogue do list a multitude of identifiers.
Let’s get one particular example that is well known, the Country table.
If you look inside it for a concept like Austria for example, you will find the following list of identifiers:
Identifier | Associate code |
---|---|
IANA | .at |
ISG COU | AT |
ISO 3166-1 α-2 | AT |
ISO 3166-1 α-3 | AUT |
ISO 3166-1 num | 040 |
TIR | A |
UNSD M49 | 040 |
FD_010 | A |
FD_040 | AT |
FD_050 | A |
FD_110 | AT |
FD_140 | AT |
FD_160 | AT |
FD_290 | A |
FD_325 | A |
FD_375 | AT |
FD_380 | AT |
FD_400 | AT |
MNE | AT |
PUB_LOC | AT |
PUB_LOC | {AUT} |
TED | AT |
TED Schema | AT |
It is easy to observe in this list of identifiers two areas. The first part contains the identifiers associated with standards, like IANA, Interinstitutional Style Guide, ISO, TIR, and UNSD notation. The second part is formed by codes that have been used in different other systems (e.g., TED) or references to other datasets that contain the same value (FD_nnn).
Seeing a total of over 20 identifiers associated with a single concept in the table looks not very tidy. Yet this is acceptable for a reference data asset that is constructed as a knowledge graph. You do not need to interact with all of them. A SPARQL query can be written to return just one of them or eventually a set of identifiers as following:
PREFIX skos:<http://www.w3.org/2004/02/skos/core#> PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#> PREFIX dct: <http://purl.org/dc/terms/> PREFIX lbl: <http://publications.europa.eu/resource/authority/label-type/> PREFIX euvoc: <http://publications.europa.eu/ontology/euvoc#> SELECT ?label ?code as ?ISO_3166_1_3 # chose the table you are looking into FROM <http://publications.europa.eu/resource/authority/country> WHERE { ?c skosxl:prefLabel|skosxl:altLabel ?xlLabel . # pick the label type: Standard label VALUES ?Labeltype { <http://publications.europa.eu/resource/authority/label-type/STANDARDLABEL> } ?xlLabel dct:type ?Labeltype . ?xlLabel skosxl:literalForm ?xlLiteralForm. filter (lang(?xlLiteralForm)="en") # convert label to string BIND ( str(?xlLiteralForm) as ?label) ?c euvoc:xlNotation ?notation . # look for a specific notation type: ISO 3166-1 α-3 VALUES ?notationType { <http://publications.europa.eu/resource/authority/notation-type/ISO_3166_1_ALPHA_3> } ?notation dct:type ?notationType . # select the code ?notation euvoc:xlCodification ?xcode. # convert code to string BIND ( str(?xcode) as ?code) } ORDER BY ?label LIMIT 10
This script executed on the CELLAR SPARQL endpoint will return the following:
label | ISO_3166_1_3 |
---|---|
Afghanistan | AFG |
Albania | ALB |
Algeria | DZA |
American Samoa | ASM |
Andorra | AND |
Angola | AGO |
Anguilla | AIA |
Antarctica | ATA |
Antigua and Barbuda | ATG |
Argentina | ARG |
As you can see, the data returned is clean and clear, listing just the basic values that are needed for a particular system or project. This is a very simple demonstration of the practical use of the knowledge architecture used by the Publications Office for the storage and dissemination of reference data.
Note: All the SPARQL queries mentioned in the article can be tested on the SPARQL endpoint of the Publications Office found at the following address: http://publications.europa.eu/webapi/rdf/sparql
Tagit
Uusin
Terms used in our reference data catalogue
18. syyskuuta 2024
How to extract a list of concepts from a vocabulary
18. syyskuuta 2024
About the use of Authority tables
12. elokuuta 2024
useContext property and how to use it
18. toukokuuta 2022
Labels and data models, why and how to use them
5. toukokuuta 2022
Dissemination formats
24. tammikuuta 2022
Federated queries
24. lokakuuta 2021
Semantic technologies in practice
23. lokakuuta 2021
About reference data
23. lokakuuta 2021
|
Suosituin
Dissemination formats
57020 kertaa luettu
useContext property and how to use it
55467 kertaa luettu
Terms used in our reference data catalogue
55244 kertaa luettu
Federated queries
55075 kertaa luettu
About the use of Authority tables
53200 kertaa luettu
Labels and data models, why and how to use them
52491 kertaa luettu
About reference data
37808 kertaa luettu
Semantic technologies in practice
36004 kertaa luettu
How to extract a list of concepts from a vocabulary
1 kertaa luettu
|