Introduction‎ > ‎Document Structure‎ > ‎


The Source entity is a core component of STEMMA’s informational sub-model that is, the support that relates source descriptions, assimilation of information therein, analysis, transcriptions, and citations. More than that, though, it also provides a link from conclusions back through the reasoning and evidence, to the information involved, and to the underlying sources.


The Source entity is primarily built from a number of linked items, called profiles, which relate source fragments to concepts and relationships in a humanly-readable manner. Such information may be used to implement a Graphic Organiser that allows those relationships to be analysed visually in a process known as Link Analysis.




<Source Key=’key’>

[ <Title> source-title </Title> ]




[ TEXT_SEG ] ...






[ <Where DetLnk=’key’/> ]

[ <When DetLnk=’key’ | Value=’std-date’/> ]


[ <Credibility> information-credibility </Credibility> ]

[ <Reliability> information-reliability </Reliability> ]

[ <Quality> source-quality </Quality> ]

[ TEXT_SEG ] ...





<SourceLet [Key=’key’]>

[ <Title> sourcelet-title </Title> ]



[ TEXT_SEG ] ...





<ProtoPerson { DetKey=’key’ || Key=’key’ }>

[ <Title> proto-title </Title> ]


[ TEXT_SEG ] ...


<!-- or ProtoAnimal, ProtoPlace, ProtoGroup, ProtoEvent -->




<Commentary DetKey=’key’>

[ <Title> commentary-title </Title> ]


[ TEXT_SEG ] ...





<ProtoDate DetKey=’key’>

[ <Title> date-title </Title> ]


[ TEXT_SEG ] ...





<Link  [Type=’link-type’] { DetLnk=’key’ || Value=’value’ }>

[ TEXT_SEG ] ...



A ProtoPerson profile basically corresponds to what is described as a “persona” elsewhere, but there are also equivalent profiles for references to animals, places, and groups.


The various profiles — prototype subjects, prototype events, prototype dates, and commentary — have by one or more links; links to source fragments or to other profiles. Each link has a link-phrase that is used to describe the semantics of its data. Any outputs from a profile (i.e. Value and/or DetLnk) are associated with this link-phrase, and by default all links from an input profile are carried through to higher-level profiles, creating threads of information. For instance, a prototype person may have links to several source fragments that use the same link-phrase of “name”. By default, they would be merged, in their declared order, to yield a single Value and/or DetLnk, but an output link may make an inference to change the exported value for that thread. This is basically what happens in any profile: the links are acted up in-order to control the output threads, but the profile as a whole may be thought of as a “black box” that just relates inputs to outputs.  There are very few rules and the connectivity is subjective, according to the assimilation-analysis process of the user.


The distinction between “links”, which are mechanically created using the <Link> element, and “threads”, which are a notional concept, may appear confusing at first. The easiest way to visualise it is that the threads carry named items of information, and the profiles are snapshots of that information and knowledge at a particular point in the assimilation and analysis. The links indicate a dependency of one profile on another.


In software terms, the threads constitute notional named tuples of the form link-phrase={Value,DetLnk} that are carried by the links. The links must constitute a Directed Acyclic Graph (DAG), meaning no circularity.


The following link-types are currently predefined in the partially-controlled vocabulary


  • Source — Provides a link to a source fragment identified by mark-up elements such as: <Mark>, <PersonRef>, <DateRef>, etc. The Value attribute may be used to provide a normalised interpretation of the fragment, if necessary.
  • Input (default) —Link to another profile used as input. For instance, to which something is being added or changed, or from which some inference is derived. All the associated threads have continuance, and could be used to build multi-tier “Personae”. The Value may be used to add an additional thread, or modify a prior one, if a link-phrase is provided.
  • Reference — Identical to Input but with no continuance of threads from the input profile. An output DetLnk is effectively a ‘Reference’ to the respective profile.
  • Reading — A reading of information from the previous Source-type link in the profile. Any Value and/or DetLnk is effectively an output from the profile, if a link-phase is provided. Together, for instance, they may describe a relationship.
  • Inference Comment, observation, conclusion, etc., on the inputs to the current profile. Any Value and/or DetLnk is effectively an output from the profile, if a link-phase is provided.


One of the functions of the link-type is to support applications that want to display different types of connector in a graphic organiser, or to allow a user to filter them based on the contents of the link-phrase. Others could be defined to support the evidence categories in Evidence Analysis Process Map, or distinguish link according to whether they relate to dates/events, persons, places, etc. See Extended Vocabularies for creating custom link-types.


The <Frame> element enumerates a list of materials relevant to the current source (such as citations, images, and transcriptions), and specifies the where and when relevant to the general source. The <Where> and <When> elements will usually refer to some corresponding profile in the body of the corresponding Source or SourceLet, but <When> may also take an explicit date value if it is known for the source but not visible in the transcription (e.g. for a census page).


The <Quality>, <Credibility>, and <Reliability> elements characterise the confidence in a source, and of information derived from it Note that these do not relate to a specific datum from the source. The Surety data-attribute is provided for that case. See Extended Vocabularies for defining custom values.


    • Quality — source quality:

      • Original — Material in its original recorded form.

      • Copy — Facsimile of original, e.g. image copy, certified copy.

      • Derivative — Manipulated version of original, e.g. translation, abstract, extract.

      • Authored — Narrative work using other sources but providing independent conclusions.

      • Unknown — Unknown or unspecified assessment.

    • Reliability — information reliability:

      • Primary — Details provided by someone with first-hand knowledge.

      • Secondary — Details provided by someone with second-hand or more-distant knowledge.

      • Unknown — Unknown or unspecified assessment.

    • Credibility — credibility of information author, compiler, or reporter:

      • Expert — Information from someone with relevant expertise.

      • Questionable — Questionable credibility of information, as in interviews and oral genealogies, or with potential for bias as in an autobiography.

      • Trusted — Information from a trusted source.

      • Unsubstantiated — Claims or opinions.

      • Unknown — Unknown or unspecified assessment.


        When a source has disjointed parts — such as a multi-page census household, or a book’s pages — or it contains anterior (from a previous time) references — such as a diary, chronological narrative, or recollections during a story — then smaller sets of linked profiles can be specified in corresponding <SourceLet> elements. These will have their own <Frame> elements that are more precise in relation to the associated material, and which may specify a different where and when. For instance, their citation may specify an actual page or entry, and any Resource entity may include a specific transcription of it. In order to facilitate their use for more-localised citations, any Parameters used in associated <CitationLnk> or <ResourceLnk> elements are deemed to be inclusive of those specified in the main <Frame> for the same entity reference. In other words, it may only be necessary to include a page Parameter and leave the remainder implicit. Also, the <Where> and <When> elements inherit from the main <Frame> if unspecified.


        At the lowest level, a ProtoPerson profile, for instance, may relate to a specific person reference in a transcription, and allows the details of that reference to be collected together, including the subject’s relationships. For example, a simple statement of “John was the neighbour of Samuel” might be represented as follows:


<ProtoPerson DetKey=‘dpJohn’>

<Link DetLnk=’dsJohn’ Value=’John’ Type=’Source’>



<Link DetLnk=’dsJohnRel’  Value=’neighbour’ Type=’Source’>

<Text>relation 1</Text>


<Link DetLnk=’dpSamuel’ Type=’Inference’>

<Text>relation 1</Text>




<ProtoPerson DetKey=‘dpSamuel’>

<Link DetLnk=’dsSamuel’  Value=’Samuel’ Type=’Source’>





What this represents are two prototype persons: dpJohn and dpSamuel. The first has a <Link> element identifying the name of “”John” in the transcribed source, and another pair of links identifying the relationship to dpSamuel, again beginning with a relevant source fragment as their input.


Note that the link-phrase may be chosen freely by the researcher, and may be expressed in multiple languages if required. Also, the link-phrase is not constrained by any taxonomy or controlled vocabulary.


The associated transcription would have labels to which these links would be connected. For instance:


<PersonRef DetKey=’dsJohn’>John</PersonRef> was the <Mark DetKey=’dsJohnRel’>neighbour of</Mark> <PersonRef DetKey=’dsSamual’>Samuel</PersonRef>


A link-type of Input may be used to build up a multi-tier persona-like profile. For instance:


<ProtoPerson DetKey=‘dpJohnSmith’>

<Link DetLnk=’dpJohn’ Type=’Input’/>

<Link DetLnk=’dpMrSmith’ Type=’Input’/>

<Link Value=’John Smith’ Type=’Inference’>





NB: This merged prototype effectively embraces all the threads from the lower-level prototypes (merged in order), and an inferred name was established to override the merged threads that have link-phrase “name”.


The same principles apply identically to ProtoAnimal, ProtoPlace, and ProtoGroup profiles. If they are defined with a DetKey of their own then they can be referenced in higher profiles in the detail-link network (see below). At the very top, a Key attribute may be specified that links to a conclusion entity of a corresponding type, and either Key or DetKey, or both, must be specified. For instance:


<ProtoPerson Key=’pJohnSmith’>


That would indicate an end to that prototype as a direct connection to a conclusion entity had been established.


The profiles ProtoEvent and/or ProtoDate might be connected using link values (again, free-form) to indicate their relative ordering. Elements of logic and inference can be specified on their own via the Commentary profile if necessary.


A more complex example may be found at: Census Roles.



Detail Linkage

Support for the Source and Matrix entities uses an independent set of keys to chain conclusions to reasoning, to evidence, and to the original information. The entity keys used elsewhere are strongly typed; when something expects a Person key then only a Person key will do. The detail-links, on the other hand, are un-typed, but they are scoped meaning they have referential containment. For clarity, attributes that define these keys are called DetKey, and the ones that reference them are called DetLnk.


The DATA_ATTRIBUTE syntax includes a DetLnk attribute and this means that it can be added to many conclusion items in a subject entity, including Property values. Because Property values are defined within a SourceLnk, their DetLnk instances can only refer to DetKey keys defined in the respective Source entities, or in an associated SourceLet. Other DetLnk instances can refer to DetKey keys in any Matrix or Source entity.


DetLnk instances in a Matrix entity can only refer to DetKey keys defined in the Source entities specified in its <Frame> element. DetLnk instances in a Source or SourceLet can only refer to DetKey keys in the same entity (or a lower SourceLet) or in a transcription associated with a Resource specified in the respective <Frame> element.