This algorithm derives a set of SHACL constraints from an RDF dataset. It can work from an uploaded RDF dataset, or from an online SPARQL endpoint. Detailed documentation is available below.
This algorithm was derived from this original one implemented by Cognizone here. Credits to them. It was improved in significant ways:
This can work best if the dataset:
The algorithm follow these steps to generate the SHACL:
sh:NodeShape
for each type, with sh:targetClass
set to the type.sh:PropertyShape
for each property on the type, with an sh:path
set to this property.
sh:nodeKind
constraint on the property shape accordingly.
sh:class
constraint on the property shape accordingly. If more than one class is found, the algorithm determines if some can be removed:
sh:datatype
and sh:languageIn
constraints on the property shape accordingly.
sh:minCount
and sh:maxCount
constraints on the property shape accordingly.
sh:in
or sh:hasValue
constraint on the property shape accordingly.
dash:propertyRole
with dash:LabelRole
value accordingly.
void:Dataset
, void:classPartition
, void:propertyPartition
with a dcterms:conformsTo
pointing to the corresponding shapes.
Stores the counting in either void:entities
, void:triples
, or void:distinctObjects
properties.
Here is an example of how statistics are expressed:
@prefix void: <http://rdfs.org/ns/void#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix sh: <http://www.w3.org/ns/shacl#> # The dataset being analyzed <https://xxx/sparql> a void:Dataset ; # one partition is created per NodeShape void:classPartition <https://xxx/sparql/partition_Place> ; # Total number of triples in the Dataset void:triples "11963716"^^xsd:int ; # A pointer to the URI of the shapes graph being used to generate these statistics sh:suggestedShapesGraph <https://xxx/shapes/> . # A "Node Shape partition", that is, a partition of the entire dataset corresponding to all # targets of one NodeShape <https://xxx/partition_Place> # Link to the NodeShape dct:conformsTo <https://xxx/shapes/Place> ; # When the NodeShape actually targets instances of a class, the partition we are describing is # actually a class partition, and we can indicate the class here void:class <https://www.ica.org/standards/RiC/ontology#Place> ; # Total number of targets of that shape in the dataset void:entities "4551"^^xsd:int ; # One property partition is created per property shape in the node shape void:propertyPartition <https://xxx/partition_Place_label> , <https://xxx/partition_Place_sameAs> . # A "Property Shape partition", that is, a sub-partition of a "Node Shape partition" corresponding to all # triples matching the path of the property <https://xxx/partition_Place_label> # a link ot the property shape dct:conformsTo <https://xxx/shapes/Place_label> ; # number of distinct values of the property shape void:distinctObjects "17330"^^xsd:int ; # when the property shape as a simple path as a predicate, we can repeat it here # and our partition is actually a real property partition void:property <http://www.w3.org/2000/01/rdf-schema#label> ; # number of triples corresponding to the property shape void:triples "17567"^^xsd:int . <https://xxx/partition_Place_sameAs> dct:conformsTo <https://xxx/shapes/Place_sameAs> ; void:distinctObjects "14847"^^xsd:int ; void:property <http://www.w3.org/2002/07/owl#sameAs> ; void:triples "14854"^^xsd:int .