Development of the STEMMA® ("Source Text for Event and
Ménage MApping”) data model and source format began around 2011. This page
charts its chronological history.
First draft specification uploaded to new STEMMA Web site.
The STEMMA research notes were collected together and made (almost) readable.
The 70+ pages were uploaded to the STEMMA site as a resource for any similar
work on family history data to utilise.
STEMMA passed from being a draft specification to the first fully working
A number of its features were streamlined or revised as a result of it being
applied to my own data, and following further research. The copious associated
Research Notes were updated in keeping with the new specification, and
supplemented by a Data Model section that shows the model being applied to a
number of case studies.
New or improved features include:
- Rationalised way of
extending partially controlled vocabularies in order to support custom
types, subtypes, roles, styles, and other tag values.
- Unified approach to
defining core and custom properties for Persons and Places.
- Streamlined handling of
multi-valued properties (and of Citation/Resource parameters) such as
- Support for local-events
(i.e. that only affect one person).
- Support for Events with
multiple sources of information, i.e. multiple sets of properties for each
- Support for Dublin Core
semantic tags for both Person/Place properties and Resource/Citation
parameters. Support for their machine-readable OpenURLs.
- Streamlined approach to
Person and Place names that retains their unified handling but
accommodates name types, name styles and sorting for different cultures.
- Support for Dual Dates
(aka Double Years).
- Support for URL hyperlinks
- Support for general
reference notes in narrative.
- Copyright and other
- Identification of physical
artefacts as Resources.
- Extended inheritance
mechanism to Resources (e.g. attachments) and Citations (e.g. sources) so
that the same details may be shared between multiple entities.
STEMMA underwent a considerable number of refinements to
both strengthen and streamline its specification. Features include:
- Better support for
recording transcriptions, including uncertain characters, marginalia,
original emphasis, alternative spellings/meanings.
- Better separation of
evidence from conclusion for marked-up references and for Property values.
- Generic Group concept that
can be used to model time-dependent Sets of Person, e.g. family units.
- Support for attribution of
individuals, whether represented within the family history or external to
it. Contact details, including address, phone, email addresses, Web sites,
and messaging systems.
- Revised date-string
representation for world calendars.
- Downloads section added to
- Changes to NoteRef. Added
new Anom element for transcription anomalies.
- Allow <br> in
- Added Hamlet place-type.
- Changed Group to also represent
real-life entity rather than just Person Sets. Include hierarchy, events,
alternative names, subtype, resources (docs & photos).
- Added GroupRef mark-up.
- Added GroupRef data-type.
- Move Person GroupLnk
BirthEvent/DeathEvent to allow for Eventlet.
- Have equivalent to
Birth/Death (Creation & Demise) for Group and Place. Replaces Void in
- Handle "related
entities" with JoinFrom (in Creation), SplitTo (in Demise), and
RelatedTo elements in Place/Group.
- Simplify Eventlet by
removing hierarchy support.
- Persisted Counters in
Dataset header for assisted key generation.
- Added GroupProperties to
- Event Place optional in
- Resource entity
distinguishes physical artefacts and images thereof.
- Improved digital data-types
- Sensitivity levels
accepted on Resources (e.g. photographs and documents).
- External IDs accepted on
Person, Place, Group, and Event entities.
Major change to trim excess flexibility, and to address
certain known failings:
- Could not represent the
Properties for unidentified or incidental people in a given source.
- Overloading of Role
Property with relationships.
- Problems representing a
“directed Property”, to another entity reference, as opposed to, say,
- Cannot inherit from an
Event when it has Detail elements in it.
- Representation of
top-level research reports.
- Reversal of
Person-to-Event (etc) links to place Properties in the Event, alongside
the respective source details.
- Added References element
to Event for representing subject references in the sources, and their
respective Properties. This element supersedes the previous Detail
- Introduction of “abstract”
entities for the sole purposes of inheritance.
- Make Event hierarchies
bottom-up rather than top-down for consistency and ease of validation.
- Deprecation of parameter
substitution into Citation URIs; both named parameter markers and the ‘=?’
- Inclusion of NARRATIVE as
a top-level Dataset entity for research reports and authored works.
- Changed semantic types on Properties
and Parameters to use “DC:” namespace prefix rather than simply “DC.”.
DCType attribute changed to SemType.
- Reinstatement of
Event-specific Property values to represent named items of information for
- Explicit control over
entity-Key imports for multi-Dataset Documents and multi-Document
- PersonEL, PlaceEL, and
GroupEL data-types added for Properties that describe a relationship
between two evidential subjects, such as person-to-person.
- Addition of ‘Header’
TEXT_TYPE for details of authorship, title, etc., in narrative works.
- Adjustments to
NAME_VARIANTS to move the Type attribute, add an Initial=’boolean’ option
for using initials, an indication of cultural style, and an optional
override for character sorting.
- Added optional PersonalName
(within Person entity) to complement PlaceName (in Place) and GroupName
- Revise syntax of
<Constraints> element to associate narrative with a specific
constraint, e.g. to express causal relationships.
- Added optional coordinates
to a Place in order to represent a point, an enclosed area (i.e. polygon),
or an open line (e.g. for a street).
- Separation of Relationship
- Several new event-types
Major change to finally accommodate sources, information,
evidence, and conclusions in a single model that supports the major approaches
to research and representation.
- Introduction of a new
Source entity that embraces both Citations and Resources for a particular
information source. Citations and Resource entities are now connected to
Source entity rather than to each other.
- Support for source
assimilation & analysis, source
mining, and the ability to drill-down on conclusions, all provided via
the Source entity.
- The <References>
element, within Events, is now superseded by <SourceLnk> which links
to the new Source entity. Enclosed *Ref elements (e.g. <PersonRef>)
changed to *Lnk elements for consistency. Removal of the ID attribute introduced
- Support for cross-source
analysis and correlation via a new Matrix entity.
- Support for a generalised
approach to multi-tier personae.
- Additional of Animal
entity, strongly modelled on Person entity, including related mark-up and
from Person, Place, Group, and Event entities, changed to <SourceLnk>.
- Reviewed the goal of
sticking to XHTML tags for presentation,
replacement of the <Hi> element with HTML-like ones, and the
addition of support for <sup>/<sub> elements, columnar text, simple
tables, and indentation.
- Removal of ‘Unreadable’
mode from the <Anom> element.
- Support for distinguishing
manuscript and typescript transcriptions in the <Text> element.
Support for numbering lines and pages in transcriptions. Positional
control over annotations such as marginalia.
- <FromText> element
added to <Narrative> in order to share re-usable sections of text.
This has meant that the NoteKey attribute, in the semantic mark-up, was no
longer required and so was deleted.
- Categorisation of the
layers in a Citation chain.
- The optional
<DisplayFormat> element of the Citation entity has been
re-interpreted as a set of pre-formatted language-specific strings. This may
exist in addition to the mandatory set of named parameter values, and the
two together can also be used as a simple citation-template.
- The Intrinsic Functions,
mentioned at the end of Semantic
Mark-up, have been changed to Intrinsic Methods in preparation for defining
a run-time object model. The set is also supplemented by ones for
accessing subject-entity names.
- Small changes to
subject-entity *-name-mode vocabularies to factor-out a generic name-mode
(missing from previous specification).
- Place coordinates
(including bounding shapes) are now time-dependent, the same as any
- Added Canton and Colony to
place-type vocabulary. The place-type of House is now replaced by Number
and Apartment for flexibility.
<Reliability>, and <Credibility> elements moved from the
Citation entity to the new Source entity.
Although refinements will continue, I anticipate this to be
the last major change to the STEMMA specification. I will, therefore,
concentrate subsequent efforts on describing its advantages and philosophy, and
in providing more worked examples.
Refinements to STEMMA specification, especially in the areas
of transcription (multiple contributors, audio, and linking to images or
recordings) and narrative mark-up (tabulated data, and citations).
Employment of the revised narrative support may be viewed in
the fully-worked examples at: www.parallaxview.co/familyhistorydata/downloads/JessonLesson.xml
- ‘WhereIn’ attribute added
to Citation Parameter definitions. This finally provides the missing
criteria necessary for the automatic generation of shortened subsequent
reference-note citations. ‘Subst’ attribute added to Citation parameter
values in order to override formatting, or provide a substitution for
cases on of a value being unavailable.
now allowed in both <CitationLnk> and <CitationRef> elements
in order to create transient chained citations.
- Quality element, within
Source entity, moved inside the Frame element.
- Review of entries in citation-layer-type
- DataControl element of
Resource entity supports attribution text.
- Control of table widths,
and individual column widths and alignments.
- Ability to align images when
embedded within narrative.
- Ability to hyperlink
images embedded in narrative.
- Requirement for enclosing
Narrative element dropped for Text elements, except for top-level
Narrative entities. Text elements can now be nested.
- <cb> replaced with
<col>, and relationship between paragraphs and columns now reversed
(paragraphs now within columns).
Mode=SynchImage allows synchronisation between images and transcriptions.
- Corresponding SVG-x/y
coordinates added to elements <page>, <col>, <p>, and
<line>. Additional <posn> element defined to associate
coordinates with arbitrary text locations.
renamed to <page>/><line> and moved alongside
<p>/<col> as related to structure and content rather than
- Mode=Tablenote attribute supplementing
Foonote and Endnote in various places.
Header=boolean attribute replaced with Class=Header | H1 | H2 | H3 | Caption
| Footnote | Endnote | Legend | Tablenote.
- Text-element Class=Caption
attribute used in Resource/ResourceRef and tables for generating captions.
- Text-element Class=Footnote
| Endnote | Tablenote attribute used in CitationRef to allow pre-formed
- Deprecated the <Text>
attributes Abstract=boolean, Extract=boolean, Manuscript=boolean, and Transcript=boolean..<voice>
mark-up added to supplement existing <ts>/<ms> mark-up. <ts>/<ms>/<voice>
all enhanced to cope with different hands, voices, fonts, colours, etc.
- In transcripts of audio recordings,
support for multiple voices, overlapping dialogue, intonation, gestures,
noises, pauses, timestamps, etc.
Mode=SynchAudio allows synchronisation between audio recordings and
transcriptions, analogous to SynchImage for textual transcription (above).
- Complete revision of Mode
values for CitationRef element.
- Relaxation of Date
Parameters in order to cover the full range of calendars. One requirement
was to represent the date-of-issue for newspaper sources that predated the