STEMMA History

Development of the STEMMA® ("Source Text for Event and Ménage MApping”) data model and source format began around 2011. This page charts its chronological history.

Draft [2012-01-02]


First draft specification uploaded to new STEMMA Web site.

Draft [2012-04-27]

The STEMMA research notes were collected together and made (almost) readable. The 70+ pages were uploaded to the STEMMA site as a resource for any similar work on family history data to utilise.

V1.0 [2012-07-16]

STEMMA passed from being a draft specification to the first fully working version.

A number of its features were streamlined or revised as a result of it being applied to my own data, and following further research. The copious associated Research Notes were updated in keeping with the new specification, and supplemented by a Data Model section that shows the model being applied to a number of case studies.

New or improved features include:


  • Rationalised way of extending partially controlled vocabularies in order to support custom types, subtypes, roles, styles, and other tag values.
  • Unified approach to defining core and custom properties for Persons and Places.
  • Streamlined handling of multi-valued properties (and of Citation/Resource parameters) such as 'Roles'.
  • Support for local-events (i.e. that only affect one person).
  • Support for Events with multiple sources of information, i.e. multiple sets of properties for each associated Person.
  • Support for Dublin Core semantic tags for both Person/Place properties and Resource/Citation parameters. Support for their machine-readable OpenURLs.
  • Streamlined approach to Person and Place names that retains their unified handling but accommodates name types, name styles and sorting for different cultures.
  • Support for Dual Dates (aka Double Years).
  • Support for URL hyperlinks in narrative.
  • Support for general reference notes in narrative.
  • Copyright and other permissions/prohibitions.
  • Identification of physical artefacts as Resources.
  • Extended inheritance mechanism to Resources (e.g. attachments) and Citations (e.g. sources) so that the same details may be shared between multiple entities.


V2.0 [2013-05-28]


STEMMA underwent a considerable number of refinements to both strengthen and streamline its specification. Features include:


  • Better support for recording transcriptions, including uncertain characters, marginalia, original emphasis, alternative spellings/meanings.
  • Better separation of evidence from conclusion for marked-up references and for Property values.
  • Generic Group concept that can be used to model time-dependent Sets of Person, e.g. family units.
  • Support for attribution of individuals, whether represented within the family history or external to it. Contact details, including address, phone, email addresses, Web sites, and messaging systems.
  • Revised date-string representation for world calendars.
  • Downloads section added to Web site.


V2.1 [2013-10-16]


Changes include:


  • Changes to NoteRef. Added new Anom element for transcription anomalies.
  • Allow <br> in orig-text data.
  • Added Hamlet place-type.


V2.2 [2014-04-17]


Changes include:


  • Changed Group to also represent real-life entity rather than just Person Sets. Include hierarchy, events, alternative names, subtype, resources (docs & photos).
  • Added GroupRef mark-up.
  • Added GroupRef data-type.
  • Move Person GroupLnk inside EventLnk/Eventlet.
  • Split BirthEvent/DeathEvent to allow for Eventlet.
  • Have equivalent to Birth/Death (Creation & Demise) for Group and Place. Replaces Void in Places.
  • Handle "related entities" with JoinFrom (in Creation), SplitTo (in Demise), and RelatedTo elements in Place/Group.
  • Simplify Eventlet by removing hierarchy support.
  • Persisted Counters in Dataset header for assisted key generation.
  • Added GroupProperties to ExtendedProperties.
  • Event Place optional in Eventlet.
  • Resource entity distinguishes physical artefacts and images thereof.
  • Improved digital data-types for Resources.
  • Sensitivity levels accepted on Resources (e.g. photographs and documents).
  • External IDs accepted on Person, Place, Group, and Event entities.



V3.0 [2014-10-20]


Major change to trim excess flexibility, and to address certain known failings:


  • Could not represent the Properties for unidentified or incidental people in a given source.
  • Overloading of Role Property with relationships.
  • Problems representing a “directed Property”, to another entity reference, as opposed to, say, Head.Wife.
  • Cannot inherit from an Event when it has Detail elements in it.
  • Representation of top-level research reports.


Changes include:


  • Reversal of Person-to-Event (etc) links to place Properties in the Event, alongside the respective source details.
  • Added References element to Event for representing subject references in the sources, and their respective Properties. This element supersedes the previous Detail element.
  • Introduction of “abstract” entities for the sole purposes of inheritance.
  • Make Event hierarchies bottom-up rather than top-down for consistency and ease of validation.
  • Deprecation of parameter substitution into Citation URIs; both named parameter markers and the ‘=?’ form.
  • Inclusion of NARRATIVE as a top-level Dataset entity for research reports and authored works.
  • Changed semantic types on Properties and Parameters to use “DC:” namespace prefix rather than simply “DC.”. DCType attribute changed to SemType.
  • Reinstatement of Event-specific Property values to represent named items of information for an event.
  • Explicit control over entity-Key imports for multi-Dataset Documents and multi-Document collections.
  • PersonEL, PlaceEL, and GroupEL data-types added for Properties that describe a relationship between two evidential subjects, such as person-to-person.
  • Addition of ‘Header’ TEXT_TYPE for details of authorship, title, etc., in narrative works.
  • Adjustments to NAME_VARIANTS to move the Type attribute, add an Initial=’boolean’ option for using initials, an indication of cultural style, and an optional override for character sorting.
  • Added optional PersonalName (within Person entity) to complement PlaceName (in Place) and GroupName (in Group).
  • Revise syntax of <Constraints> element to associate narrative with a specific constraint, e.g. to express causal relationships.
  • Added optional coordinates to a Place in order to represent a point, an enclosed area (i.e. polygon), or an open line (e.g. for a street).
  • Separation of Relationship from Role.
  • Several new event-types and event-subtypes.



V4.0 [2015-11-22]


Major change to finally accommodate sources, information, evidence, and conclusions in a single model that supports the major approaches to research and representation.


Changes include:


  • Introduction of a new Source entity that embraces both Citations and Resources for a particular information source. Citations and Resource entities are now connected to Source entity rather than to each other.
  • Support for source assimilation & analysis, source mining, and the ability to drill-down on conclusions, all provided via the Source entity.
  • The <References> element, within Events, is now superseded by <SourceLnk> which links to the new Source entity. Enclosed *Ref elements (e.g. <PersonRef>) changed to *Lnk elements for consistency. Removal of the ID attribute introduced in V3.0.
  • Support for cross-source analysis and correlation via a new Matrix entity.
  • Support for a generalised approach to multi-tier personae.
  • Additional of Animal entity, strongly modelled on Person entity, including related mark-up and namespaces.
  • <CitationLnk>/<ResourceLnk> from Person, Place, Group, and Event entities, changed to <SourceLnk>.
  • Reviewed the goal of sticking to XHTML tags for presentation, replacement of the <Hi> element with HTML-like ones, and the addition of support for <sup>/<sub> elements, columnar text, simple tables, and indentation.
  • Removal of ‘Unreadable’ mode from the <Anom> element.
  • Support for distinguishing manuscript and typescript transcriptions in the <Text> element. Support for numbering lines and pages in transcriptions. Positional control over annotations such as marginalia.
  • <FromText> element added to <Narrative> in order to share re-usable sections of text. This has meant that the NoteKey attribute, in the semantic mark-up, was no longer required and so was deleted.
  • Categorisation of the layers in a Citation chain.
  • The optional <DisplayFormat> element of the Citation entity has been re-interpreted as a set of pre-formatted language-specific strings. This may exist in addition to the mandatory set of named parameter values, and the two together can also be used as a simple citation-template.
  • The Intrinsic Functions, mentioned at the end of Semantic Mark-up, have been changed to Intrinsic Methods in preparation for defining a run-time object model. The set is also supplemented by ones for accessing subject-entity names.
  • Small changes to subject-entity *-name-mode vocabularies to factor-out a generic name-mode (missing from previous specification).
  • Place coordinates (including bounding shapes) are now time-dependent, the same as any parent-Place link.
  • Added Canton and Colony to place-type vocabulary. The place-type of House is now replaced by Number and Apartment for flexibility.
  • <Quality>, <Reliability>, and <Credibility> elements moved from the Citation entity to the new Source entity.


Although refinements will continue, I anticipate this to be the last major change to the STEMMA specification. I will, therefore, concentrate subsequent efforts on describing its advantages and philosophy, and in providing more worked examples.



V4.1 [2017-04-19]


Refinements to STEMMA specification, especially in the areas of transcription (multiple contributors, audio, and linking to images or recordings) and narrative mark-up (tabulated data, and citations).


Employment of the revised narrative support may be viewed in the fully-worked examples at: and


  • ‘WhereIn’ attribute added to Citation Parameter definitions. This finally provides the missing criteria necessary for the automatic generation of shortened subsequent reference-note citations. ‘Subst’ attribute added to Citation parameter values in order to override formatting, or provide a substitution for cases on of a value being unavailable.
  • <ParentCitationLnk> now allowed in both <CitationLnk> and <CitationRef> elements in order to create transient chained citations.
  • Quality element, within Source entity, moved inside the Frame element.
  • Review of entries in citation-layer-type namespace.
  • DataControl element of Resource entity supports attribution text.
  • Control of table widths, and individual column widths and alignments.
  • Ability to align images when embedded within narrative.
  • Ability to hyperlink images embedded in narrative.
  • Requirement for enclosing Narrative element dropped for Text elements, except for top-level Narrative entities. Text elements can now be nested.
  • <cb> replaced with <col>, and relationship between paragraphs and columns now reversed (paragraphs now within columns).
  • ResourceRef Mode=SynchImage allows synchronisation between images and transcriptions.
  • Corresponding SVG-x/y coordinates added to elements <page>, <col>, <p>, and <line>. Additional <posn> element defined to associate coordinates with arbitrary text locations.
  • <Page>/<Line> renamed to <page>/><line> and moved alongside <p>/<col> as related to structure and content rather than semantics.
  • Mode=Tablenote attribute supplementing Foonote and Endnote in various places.
  • Text-element Header=boolean attribute replaced with Class=Header | H1 | H2 | H3 | Caption | Footnote | Endnote | Legend | Tablenote.
  • Text-element Class=Caption attribute used in Resource/ResourceRef and tables for generating captions.
  • Text-element Class=Footnote | Endnote | Tablenote attribute used in CitationRef to allow pre-formed (preferred) citations.
  • Deprecated the <Text> attributes Abstract=boolean, Extract=boolean, Manuscript=boolean, and Transcript=boolean..<voice> mark-up added to supplement existing <ts>/<ms> mark-up. <ts>/<ms>/<voice> all enhanced to cope with different hands, voices, fonts, colours, etc.
  • In transcripts of audio recordings, support for multiple voices, overlapping dialogue, intonation, gestures, noises, pauses, timestamps, etc.
  • ResourceRef Mode=SynchAudio allows synchronisation between audio recordings and transcriptions, analogous to SynchImage for textual transcription (above).
  • Complete revision of Mode values for CitationRef element.
  • Relaxation of Date Parameters in order to cover the full range of calendars. One requirement was to represent the date-of-issue for newspaper sources that predated the Julian-to-Gregorian changeover.