DisplayCustomHeader

Publication guidelines

Dataset publication guidelines

To ensure the proper ingestion of your datasets and facilitate its dissemination on the EU Vocabularies website we advise you to comply with a set of basic rules as follows:

Packaging format and communication

The content of the future publication will be delivered as a zip archive.

The delivery has to take place in accordance with the scheduled “code freeze” date.

Any change of date has to be communicated at least 2 weeks in advance of “code freeze”.

Unless defined otherwise, the package will be sent to the following email address:

OP-EU-VOCABULARIES@publications.europa.eu

Content of the publication package

A package will not be accepted for publication unless the following components are included:

Dataset file(s)

The actual dataset files will always be located in the root folder of the archive
Depending on the type, the files will be in one of the following formats
- Semantic vocabularies: RDF, TTL, XML, JSON-LD
- Generic vocabularies: CSV, GC, XML, SVG
- Models: OWL, XML Schema, DTD, XML, TTL
- Alignments: RDF, TTL, XML

Documentation

Every dataset type intended for publication will be accompanied by at least a documentation file and a release note
All documentation files associated with the dataset will be stored in the Documentation folder
The Documentation folder will be located in the root folder of the main package
The documentation will be provided only in HTML or PDF format
Any documentation file will clearly state in the beginning the dataset name and the title of the document (first page or first screen to be displayed)
If only on documentation file is provided, this file will contain at least the following sections:
- Title of the document
- Title of the dataset
- The scope and intended target of the document
- A basic description of the dataset

A main section presenting the dataset at large, as well as its intended use, should be included. Such a description might give details about the structure, usage principles, data models, associated statistics, etc.

The Release notes will be stored in the Release folder that is located in the root folder of the main package
The release notes will be delivered as a HTML, PDF or TXT file.
The Release note will contain at the minimum : the version ID, a list of distribution formats included in the release, contact details of the copyright owner and if possible a list of new elements that the release is providing

Optionally, and if relevant for the scope of the dataset, a publication package might contain as well:

Sample files – Packed together as a zip file with the name Samples. Stored in the root folder of the main package
Diff files – Stored as independent files under the folder Diff that is located in the root folder of the main package

Depending on the type of dataset, some elements of the package might differ.

Any such deviation has to be clarified in advance with the publication team (OP-EU-VOCABULARIES@publications.europa.eu)

File naming and conventions

In order to ensure clarity in communicating the scope of each file to the intended users it is advisable to use a proper naming convention for the various files stored in the publication package.

Our preferred file naming structure follows the rules bellow:

DA – [Required] Dataset name or acronym (e.g. EuroVoc, IMMC, ECLAS, etc.)
FC – [Required] File content, intent or distribution (e.g. Alignment, Example, User_manual, Release_note, Diff, SKOS, MARC, etc)
VS – [Optional] Version ID or date of the dataset|
EXT – File extension (e.g., RDF, TTL, XML, PDF, CSV, etc.)

File name = DA_FC_VS.EXT

No spaces are accepted in the file names of the package or the files included in the publication package.

In case of non compliance

If an already existing convention (for content, labels, etc.) was defined and/or used for previously published packages, please inform the publication team (OP-EU-VOCABULARIES@publications.europa.eu) to identify the best approach to be followed.

SPARQL samples

Downloads via SPARQL queries

Download is done by retrieving the official published data directly from the common data repository of the Publications Office (Cellar) in the specified format. The download links are direct links to the downloadable query results. HTML format opens in a table view in the browser, CSV and JSON files will be downloaded.

The CSV files can be imported into Excel (NB: don’t use ‘open with’ but open the Excel first and import data: Data/From text/CSV – select the downloaded file; import; choose the delimiter: comma; load). The links retrieving JSON can be directly included in external systems to use the data.

Sisältöjulkaisija

EU Member States must be listed alphabetically using the spelling of their source language. To retrieve the correct listing the protocol order attribute can be used.

For other countries and territories no specific recommendations are made.

The following query retrieves the current EU Member States, their English preferred label, long label and protocol order.

The results are set in 4 columns:

?country_uri : the identifier of the country inside the dataset
?country_en : the preferred label (in English)
?longLabel_en : the long label (in English)
?protocol_order: the protocol order.

Current EU members in HTML

Current EU members in JSON

Current EU members in CSV for Excel

The following query retrieves the current countries and territories issuing citizenship (see concept scheme 0001: Current countries and territories issuing citizenship), together with their English preferred label, ISG code and ISO codes.

The results are in 7 columns:

country_uri: the identifier of the country or territory
country_en: the preferred label in English
named_authority_code: same as the ISO_31661_alpha3 when it exists, or a user-assigned alpha-3 code when there is no ISO code
interinstitutional_style_guide_code: an alpha-2 code used in EU institutions which is the same as ISO_31661_alpha2 except for Greece and UK
ISO_31661_alpha2: two-letter country code based on ISO 3166-1
ISO_31661_alpha3: three-letter country code based on ISO 3166-1
ISO_31661_num: three-digit country code based on ISO 3166-1

Current countries and territories in HTML

Current countries and territories in JSON

Current countries and territories in CSV - for Excel

The results enclose all the countries and territories included in the dataset (including the "deprecated" countries, which are available for historical reasons).

They are shown in 2 columns:

country_uri : the identifier of the country or territory in the dataset
country_en : the preferred label in English

Full list of countries and territories in HTML

Full list of countries and territories in JSON

Full list of countries and territories in CSV - for Excel

Dataset publication guidelines

Packaging format and communication

Content of the publication package

Dataset file(s)

Documentation

File naming and conventions

In case of non compliance

Downloads via SPARQL queries

Tarvitsetko apua?

Seuraa meitä

Oikeudelliset ilmoitukset

Sivuston ylläpitäjä

Resurssit

Työkalut

Ota yhteyttä EU:hun

Sosiaalinen media

EU:n toimielimet ja muut elimet

Dataset publication guidelines

Packaging format and communication

Content of the publication package

Dataset file(s)

Documentation

File naming and conventions

In case of non compliance

Downloads via SPARQL queries

List of the EU Member States in protocol order

List of the current countries and territories issuing citizenship, with the English preferred label, ISG and ISO codes

List the all countries and territories available in the Countries and territories dataset (including historical data)

Euroopan unionin julkaisutoimisto

Tarvitsetko apua?

Seuraa meitä

Oikeudelliset ilmoitukset

Sivuston ylläpitäjä

Resurssit

Työkalut

Euroopan unioni

Ota yhteyttä EU:hun

Sosiaalinen media

EU:n toimielimet ja muut elimet