Dataset publication guidelines
To ensure the proper ingestion of your datasets and facilitate its dissemination on the EU Vocabularies website we advise you to comply with a set of basic rules as follows:
Packaging format and communication
The content of the future publication will be delivered as a zip archive.
The delivery has to take place in accordance with the scheduled “code freeze” date.
Any change of date has to be communicated at least 2 weeks in advance of “code freeze”.
Unless defined otherwise, the package will be sent to the following email address:
OP-EU-VOCABULARIES@publications.europa.eu
Content of the publication package
A package will not be accepted for publication unless the following components are included:
Dataset file(s)
- The actual dataset files will always be located in the root folder of the archive
- Depending on the type, the files will be in one of the following formats
- Semantic vocabularies: RDF, TTL, XML, JSON-LD
- Generic vocabularies: CSV, GC, XML, SVG
- Models: OWL, XML Schema, DTD, XML, TTL
- Alignments: RDF, TTL, XML
Documentation
- Every dataset type intended for publication will be accompanied by at least a documentation file and a release note
- All documentation files associated with the dataset will be stored in the Documentation folder
- The Documentation folder will be located in the root folder of the main package
- The documentation will be provided only in HTML or PDF format
- Any documentation file will clearly state in the beginning the dataset name and the title of the document (first page or first screen to be displayed)
- If only on documentation file is provided, this file will contain at least the following sections:
- Title of the document
- Title of the dataset
- The scope and intended target of the document
- A basic description of the dataset
A main section presenting the dataset at large, as well as its intended use, should be included. Such a description might give details about the structure, usage principles, data models, associated statistics, etc.
- The Release notes will be stored in the Release folder that is located in the root folder of the main package
- The release notes will be delivered as a HTML, PDF or TXT file.
- The Release note will contain at the minimum : the version ID, a list of distribution formats included in the release, contact details of the copyright owner and if possible a list of new elements that the release is providing
Optionally, and if relevant for the scope of the dataset, a publication package might contain as well:
- Sample files – Packed together as a zip file with the name Samples. Stored in the root folder of the main package
- Diff files – Stored as independent files under the folder Diff that is located in the root folder of the main package
Depending on the type of dataset, some elements of the package might differ.
Any such deviation has to be clarified in advance with the publication team (OP-EU-VOCABULARIES@publications.europa.eu)
File naming and conventions
In order to ensure clarity in communicating the scope of each file to the intended users it is advisable to use a proper naming convention for the various files stored in the publication package.
Our preferred file naming structure follows the rules bellow:
DA – [Required] Dataset name or acronym (e.g. EuroVoc, IMMC, ECLAS, etc.)
FC – [Required] File content, intent or distribution (e.g. Alignment, Example, User_manual, Release_note, Diff, SKOS, MARC, etc)
VS – [Optional] Version ID or date of the dataset|
EXT – File extension (e.g., RDF, TTL, XML, PDF, CSV, etc.)
File name = DA_FC_VS.EXT
No spaces are accepted in the file names of the package or the files included in the publication package.
In case of non compliance
If an already existing convention (for content, labels, etc.) was defined and/or used for previously published packages, please inform the publication team (OP-EU-VOCABULARIES@publications.europa.eu) to identify the best approach to be followed.
Advanced: Growing as a Data Steward
For advanced learners, we offer resources that focus on specific tools and platforms. These materials are designed to help you grow as a data steward.
Tutorials on our tools
VocBench
VocBench is a web-based multilingual collaborative development platform for managing controlled vocabularies. VocBench is designed to help public administrations maintain and publish their controlled vocabularies in an open and interoperable way. It can be accessed only via an account. The English videos guide you through the process of using VocBench, from accessing the tool to developing your vocabulary and maintaining your data. To play the videos on this site, you need to allow external content.
Start with accessing VocBench, setting up your preferences and selecting projects
2 Interface and accessing to data
Learn how to navigate, search and display content inside projects
Why do I need a concept scheme? How to create one? – here are the answers
4 Adding concepts/terms to your vocabulary
Create your top and narrower concepts
5 Adding properties to a concept
Enrich your concepts with labels and definitions in different languages
6 Importing and exporting RDF data
Importing and exporting your data in an RDF-compatible way
7 Improving your editing workflow
Master your editing skills by using validation and project history to ensure your data quality
ShowVoc
ShowVoc is an online platform built to facilitate access to, dissemination and visualisation of controlled vocabularies. ShowVoc helps communities and teams explore, engage and promote the use of semantic technologies and reference data. It can be accessed with or without an account. The English videos explain how to access and use ShowVoc to interact with and discover data assets efficiently and in a user-friendly way. To play the videos on this site, you need to allow external content.
Start with accessing ShowVoc, setting up your preferences and selecting projects
2 Interacting with a vocabulary
Set the language of the content and search inside a vocabulary, use the graphical view and the SPARQL interface
Search catalogues and access alignments, and try useful advanced functions