Générer un profil SHACL

This algorithm derives a set of SHACL constraints from an RDF dataset. It can work from an uploaded RDF dataset, or from an online SPARQL endpoint. Detailed documentation is available below.

  Dataset

Annuler Sélectionner un fichier Modifier
Vous pouvez sélectionnez plusieurs fichiers d'un coup. Extensions supportées : .rdf, .ttl, .n3, .trig. Les autres extensions seront considérées comme du RDF/XML. Vous pouvez aussi envoyer des fichiers zip.
URL d'un fichier RDF valide - supporte les mêmes extensions que l'upload ci-dessus.
Syntaxes supportées : Turtle, RDF/XML, JSON-LD, TriG, TriX, N-Quads. Privilégiez Turtle.

  SPARQL endpoint

Adresse d'un SPARQL endpoint public, de pr??ence sans trop de donn?s (n'essayez pas avec DBPedia ni Wikidata, ? ne marchera pas)

  Options

/!\ prend du temps ! Ex?ute une analyse suppl?entaire pour compter le nombre de targets de chaque node shape, le nombre d'occurrences et le nombre de valeurs distinctes de chaque property shape. Exprime le r?ultat dans des void:classPartitions et void:propertyPartitions, en utilisant les pr?icats void:entities, void:triples et void:distinctObjects.

Documentation

This algorithm was derived from this original one implemented by Cognizone here. Credits to them. It was improved in significant ways:

  • Used a layered visitor patterns architecture for more modularity
  • Used sampling technique to work with large datasets
  • Improved NodeShape derivation algorithm to exclude certain types, when entities have multiple types
  • Added counting of entities and properties

This can work best if the dataset:

  • Uses one and only one rdf:type value per entity (although the algorithm can be smart enough to exclude some types, see below)
  • Contains only data, not the RDFS/OWL model

SHACL generation algorithm

The algorithm follow these steps to generate the SHACL:

  1. Find all types in the dataset. Relies on this SPARQL query. Generates one sh:NodeShape for each type, with sh:targetClass set to the type.
  2. For each found type, find all properties used on instances of this type. Relies on this SPARQL query. Generates one sh:PropertyShape for each property on the type, with an sh:path set to this property.
  3. For each property shape previously found, determine its node kind (IRI or Literal). Relies on this SPARQL query, this one, and this one. Generates the sh:nodeKind constraint on the property shape accordingly.
  4. For each property shape previously found with a sh:nodeKind IRI or BlankNode, determine the types of the property values. Relies on this SPARQL query. Generates the sh:class constraint on the property shape accordingly. If more than one class is found, the algorithm determines if some can be removed:
    • If one class is a superset of all other classes found, (indicating that the dataset uses some redundancy on the typing of instances, e.g. assigning skos:Concept and a subclass of skos:Concept to entities), but is a superset of other classes as well, then the this superset class (e.g. skos:Concept) is removed from the list, and only the most precise class(-es) are kept.
    • If one class is a superset of all other classes found, and is not a superset of other classes, then only the superset class is kept, and other more precise classes are removed from the list
  5. For each property shape previously found with a sh:nodeKind Literal, determine the datatype and languages of the property values. Relies on this SPARQL query, and this one. Generates the sh:datatype and sh:languageIn constraints on the property shape accordingly.
  6. For each property shape previously found, determine the cardinalities of the property. Relies on this SPARQL query, and this one. This can determine one minimum and maximum cardinalities set to 1. Generates the sh:minCount and sh:maxCount constraints on the property shape accordingly.
  7. For each property shape previously found, list the values of the property if it has a limited number of possible values. Relies on this SPARQL query. This is done only if the property has 3 distinct values or less. Generates an sh:in or sh:hasValue constraint on the property shape accordingly.
  8. For each node shape previously found, determines if one of the property shape is a label of the entity. If a property skos:prefLabel, foaf:name, dcterms:title, schema:name or rdfs:label (in this order) is found, mark it as a label. Otherwise, tries to find a literal property of datatype xsd:string or rdf:langString, with a sh:minCount 1; if only is found, mark it as a label. Generates a dash:propertyRole with dash:LabelRole value accordingly.
  9. If requested, for each node shape and property shape previously found, count the number of instances of node shapes, number of occurrences of property shapes, and number of distinct values.. This currently works only with sh:targetClass target definition, but can be easily extended to deal with other target definition. Generates a void:Dataset, void:classPartition, void:propertyPartition with a dcterms:conformsTo pointing to the corresponding shapes. Stores the counting in either void:entities, void:triples, or void:distinctObjects properties.