Follow-up questions
As we've heard earlier this morning: reaching agreement in the first place is hard. What role does your organization play herein? How do your tools help with that?
We have different governance frameworks in place to define the collaboration with EU institutions. The Publications Office has the role of coordinating and communicating with the stakeholders. As an example, we collect all modification proposals to an interinstitutional vocabulary, we share it with the institutions, collect the feedback via written procedure or at meetings and after the reached agreement we implement the modifications and finally publish the updated vocabulary. Currently we are using standard tools to support these procedures such as SharePoint, Teams, Confluence or emails. VocBench supports the collaborative editing and validation workflow, and we use it also to collect feedback in one specific case for EuroVoc. In this use case our editorial team adds the proposals for new concepts to VocBench and the subgroup of the governance committee of EuroVoc, responsible for the content improvement, oversees and comments them directly in VocBench. After this step the final agreement is discussed at dedicated meetings.
What is the reliability of alignments generated by AI?
The reliability of alignments generated by AI is generally good but not perfect. AI tools can quickly identify many correct matches between terms, especially when the wording is similar, but they can still make mistakes when concepts are complex or depend on context. These alignments should therefore be seen as useful suggestions that save time and effort, not as final results. Human review and validation are still essential to ensure accuracy and trust in the final mappings.
Now there is still human in the loop. But for how long? Is this still needed when AI model is more trained?
For now, having humans in the loop is still important to make sure alignments are accurate and meaningful. As AI models become more advanced and better trained on domain-specific data, the need for human intervention will decrease, but it won’t disappear completely. Experts will still be needed to check sensitive or ambiguous cases and to ensure that the final alignments follow agreed standards and interpretations.
What is the solution you use for tag documents with semantic artifacts, in the document itself or the hosting platform? How ensure the docs keep their semantics when moved around?
We usually tag documents with semantic information directly in the hosting platform rather than inside the document itself. This allows the tags to be managed, updated, and linked to official vocabularies without changing the document content. To make sure the semantics are not lost when documents are shared or moved, we keep the links to stable identifiers (persistent URIs) that point to the official vocabularies. In this way, even if a document changes location, its meaning and connections to the reference data remain intact.
Do you create your semantic assets from scratch, or can a Member State offer their assets for to be used around all EU?
We have different types of semantic assets:
- created by the Publications Office from scratch;
- created by other EU institutions and agencies, either by scratch or as an export of existing vocabularies, and published by the Publications Office.
Our role is to support EU institutions in the creation, maintenance and publication of semantic assets. Member States can benefit from the open source and free solutions and platforms such as ShowVoc and Cellar to retrieve the data and VocBench which they can install on their premises. Additionally, the datasets of Member States are harvested by data.europa.eu, the central portal for European data.
How you evaluate the quality of the sematic tagging and alignment after having trained your algorithms?
We evaluate the quality of semantic tagging and alignment by comparing the AI results with a set of manually validated examples, often called a “golden dataset.” This helps us see how many of the AI’s suggestions are correct and where it still makes mistakes. We also review cases where the AI disagrees with human experts to understand why. Over time, these checks help us fine-tune the algorithms and improve their accuracy, ensuring that the tags and alignments become more consistent and trustworthy.
When comparing the ATTO and Authority tables, did you encounter challenges that you were not anticipating? If so, which one(s)?
Yes, especially certain tables were challenging due to semantic discrepancies.
For example, to compare our Corporate authors table (PUB_CORP) with the AT (Corporate body). There was a semantic difference. The ATTO Table, respecting the cataloguing rule that a corporate body is established under ‘the name by which it is commonly identified’, contained more entities than the AT Corporate Body, where corporate bodies are established as entities with varying labels over time. With the help of the Vocabularies Team, mappings from ATTO to AT Historic labels for Corporate Bodies were established. This enables the extraction of bibliographic data using either the historic label (valid at the time of the publication) or the generic label (Corporate body as an entity in linked data universe).
Another example has been the definition of semantics for table AT Resource Type and AT Product Form. The conclusion of a one-year analysis has been that the AT Resource Type is mostly related to content, while AT Product Form to format. This had as an impact the revision of many values and their definitions, as well as the deprecation of some values and the creation of new tables, i.e., AT Carrier & AT Type of binding.
Authority Tables are not used by the cataloguing team or librarians only. There are other users, and their need had to be respected too. For this reason, we find useful the experience of thinking outside of the box and including other potential clientele, who might want to use our ATs in future. Very often we needed to adapt our wording to the standard, but with the help of the Vocabularies team we managed well.
Not all datasets have developed a SHACL shape validation, but all of them have a need for automatic documentation/UML-diagram generation. Could your approach be applied, with due workarounds, to dataset missing SHACL shapes?
Yes. SHACL Play contains a feature to automatically derive the SHACL structure from a dataset (see https://shacl-play.sparna.fr/play/generate#documentation). The documentation can then be produced from this auto-generated SHACL profile.
How you take care of governance, if you work with SHACL on surface level? I think OWL should describe the WHAT and SHACL describes the restrictions, used for validation, not so much as primer description. (S = shape, like shadow)
I agree with « OWL should describe the WHAT and SHACL describes the restrictions ». OWL ontologies still exist and declare classes and properties ; the diffence with the pre-SHACL era is that the OWL contains much less restrictions than before (typically, very few domain or ranges declarations). This makes the semantics of the ontology more reusable across contexts.
SHACL is not « at the surface level », it simply applies to a different scope : OWL ontologies are scoped by the knowledge domain on which they apply, but do not care about how the datasets and data flows are implemented in real systems. SHACL specifications apply to a precise dataset, and will encode the precise structure of such a dataset, be them at the data acquisition step, or at the data dissemination step.
VC spec uses JSON-LD but the schema is described in JSON Schema (as far as I understand). Is there any future plan to support more LOD-native schema instead?
As far as I know, the Working Group has no plan in this direction. But nothing prevents users to use SHACL or other RDF-based shape/schema language to validate their credentials. This was done for example in the CIRPASS project (https://cirpassproject.eu/).
Could these identity standards use existing authorities, like ORCID (Open Researcher and Contributor ID), VIAF (Virtual International Authority File) or ISNI (International Standard Name Identifier) in any way?
In general, yes it is possible. One could define an `orcid` DID method (resp. `viaf` or `isni`) and describe what the DID document for these DID methods would contain. In those cases, the “Verifiable Data Registry” would be the respective services. Some people in the DID community may consider this as an abuse of the DID technology, because each of these registries is in fact centralized. Others would counterargue that decentralization can come from the choice of multiple registries, not necessarily from all individual registries being decentralized. I personally sympathize with the latter argument.
For future cordis you could consider that organizations are obliged to provide a standard identifier like ROR or ISNI to identify themselves, that might help for reconciliation. Is this on the roadmap?
Yes, this is something we plan to have by linking with WIKIDATA, which contains ROR, while for the ISNI we can investigate relying on their linked data platform.
Question for Xueying Deng: Why do the Flanders need a different data space than Belgium, than Europe? Isn’t the biology of humans and legislation about the same?
Xueying Deng: Even though the human body and European health-data laws are common, the ecosystem of actors, data flows, standards, language, and governance in Flanders is different enough that a dedicated Flemish Health Data Space makes sense. It allows us to build trust, harmonise locally, deliver value regionally — and then plug into the bigger Belgian and European stage.
Question for Xueying Deng: Since March this year, EU has adopted a Regulation to build the European Health Data Space (ref: EHDS Regulation). The DCAT Health Application Profile which extends DCAT-AP with vocabularies like DPV, CSVW, or DQV is pilar to achieve interoperability for a health data landscape.
Xueying Deng: Thanks so much for the information — this is very good to know. The Flemish Health Data Space (FHDS) is designed to adhere to the key principles of the European Health Data Space (EHDS), such as data sovereignty, interoperability, secure exchange, and governed access. The EHDS indeed serves as a regulatory and conceptual framework for FHDS. For more details, please refer to our final report: Vlaamse Health Data Space project. Eindrapport.
