Semantic knowledge base
This article intends to describe the alignment outcomes, deliverables and methodology. The main idea is to perform automatic alignment between two RDF (mainly SKOS) datasets based on lexical content comparison. The expected result is a set of resource pairs (each from a different dataset) that shall be considered the same or similar, with various degrees of confidence.
Goals
One or multiple files alignment files (in SKOS or EDOAL
formats)
One or multiple files containing evaluation
samples
A report describing the preliminary dataset assessment,
the designed process and parameters, the output alignment files and a
final basement
Methodology
Preliminary assessment
In this step the asset pair or set of asset pairs are established and their initial state is assessed to define whether they are suitable as input for the automatic alignment software. Attention shall be paid to both technical and content quality, available languages, presence of duplicates, encoding, estimated pre-processing operations and other aspects. At this step is important to document the initial state of the resources, business relevance of the resources, some of their history, internal structure, then describe what are the final outcomes followed by an enumeration of intended operations to be performed.
Pre-processing
Based on the initial assessment the input datasets are cleaned up, normalised and transformed into a form suitable for the automatic alignment software.
Useful tools during the pre-processing phase are:
- VocBench3: Sheet2RDF tool
- KNimes
- LinkedPipes ETL
- SKOS Play from Sparna
- OpenRefine
- Custom Python scripts
Alignment design
Following parameters of the project are established in this step:
- Main inputs: a pair of datasets or in case of batch alignments many-to-one or one-to-many (don’t do many to many)
- Main outputs: SKOS and/or EDOAL formats
- Matching rules:
- Exact matches: only based on perfect equality operator (expected one output) OR
- Close matched: based on a designed comparison operator (expected multiple outputs, one per degree of confidence: high, medium, low)
Comparison operator(s) design
The operators are encoded in SILK workbench as a Linking Task.
The main fields considered by the alignment comparison operator are linguistic in nature. This means that concepts such as language, word, spacing, sequencing, capitalisation, script, encoding, transliteration and others shall be taken into consideration. In case of SKOS datasets (most of them are expected to be such) the following properties are considered of primary relevance (with various weights):
skos:prefLabel, skos:altLabel
skos:definition,
skos:scopeNote
rdfs:label, rdfs:comment
In designing the alignment procedure please consider the
relevant factors from the systematisation presented below.
Etiquetas
Más reciente
How to extract a list of concepts from a vocabulary
18 de septiembre de 2024
Federated queries
24 de octubre de 2021
Semantic technologies in practice
23 de octubre de 2021
|
Más popular
Federated queries
55075 Accesos
Semantic technologies in practice
36004 Accesos
|