Introduction‎ > ‎

Validation

There are various types of validation that can be performed on one of these STEMMA source files.

 

Syntax

Whenever a software unit loads the data into an XML DOM (Document Object Model) then the syntax of the XML is automatically validated to be “well formed”. It should not be possible for a source to be other than well-formed unless it has been edited by hand or corrupted.

 

Schema

The structure of the XML content can be validated against an XSD Schema defining the XML representation of the STEMMA data model. This is a manual (non-automatic) software operation but not a difficult one. It should be made clear that this differs from the automatic validation of the XML being well-formed since it is validating the particular XML dialect as defined here.

 

If there are formal extensions to the XML schema (see Extended Schema) then they must have associated XSD definitions that can additionally be used to validate them.

 

Semantic

This is the validation of the data content itself and the implications of the data stored in the source file. Here are some aspects for the data that could be validated:

 

  • Symbolic names (e.g. Keys) have valid format
  • All referenced Key names actually defined in a given Dataset.
  • No duplicate Key names defined in a given Dataset.
  • No imported Key names are already defined in a Dataset.
  • Key type matches reference type.
  • No circularity in biological lineage. Links should constitute a DAG.
  • No circularity in hierarchies for Places, Groups, Events, or Citations.
  • No circularity in derived Groups.
  • No circularity in inheritance relationships.
  • No circularity in DetLnk/DetKey references (see Source/Matrix entities).
  • Event constraints are valid (see below).
  • All STEMMA tag values (e.g. types, Property names, modes) are valid.
  • All Property values match the defined data-type.
  • Parameter names and types match their definitions.
  • Links between top-level entities are unique (see Dataset Structure).

 

Validating the constraints in the Events to ensure that there are no contradictions or impossibilities is an exercise in graph theory. It has been solved before for ‘scheduling constraints’ in project management software.