Development of the STEMMA® ("Source Text for Event and
Ménage MApping”) data model and source format began around 2011. This page
charts its chronological history.
First draft specification uploaded to new STEMMA Web site.
The STEMMA research notes were collected together and made (almost) readable.
The 70+ pages were uploaded to the STEMMA site as a resource for any similar
work on family history data to utilise.
STEMMA passed from being a draft specification to the first fully working
A number of its features were streamlined or revised as a result of it being
applied to my own data, and following further research. The copious associated
Research Notes were updated in keeping with the new specification, and
supplemented by a Data Model section that shows the model being applied to a
number of case studies.
New or improved features include:
- Rationalised way of
extending partially controlled vocabularies in order to support custom
types, subtypes, roles, styles, and other tag values.
- Unified approach to
defining core and custom properties for Persons and Places.
- Streamlined handling of
multi-valued properties (and of Citation/Resource parameters) such as
- Support for local-events
(i.e. that only affect one person).
- Support for Events with
multiple sources of information, i.e. multiple sets of properties for each
- Support for Dublin Core
semantic tags for both Person/Place properties and Resource/Citation
parameters. Support for their machine-readable OpenURLs.
- Streamlined approach to
Person and Place names that retains their unified handling but
accommodates name types, name styles and sorting for different cultures.
- Support for Dual Dates
(aka Double Years).
- Support for URL hyperlinks
- Support for general
reference notes in narrative.
- Copyright and other
- Identification of physical
artefacts as Resources.
- Extended inheritance
mechanism to Resources (e.g. attachments) and Citations (e.g. sources) so
that the same details may be shared between multiple entities.
STEMMA underwent a considerable number of refinements to
both strengthen and streamline its specification. Features include:
- Better support for
recording transcriptions, including uncertain characters, marginalia,
original emphasis, alternative spellings/meanings.
- Better separation of
evidence from conclusion for marked-up references and for Property values.
- Generic Group concept that
can be used to model time-dependent Sets of Person, e.g. family units.
- Support for attribution of
individuals, whether represented within the family history or external to
it. Contact details, including address, phone, email addresses, Web sites,
and messaging systems.
- Revised date-string
representation for world calendars.
- Downloads section added to
- Changes to NoteRef. Added
new Anom element for transcription anomalies.
- Allow <br> in
- Added Hamlet place-type.
- Changed Group to also represent
real-life entity rather than just Person Sets. Include hierarchy, events,
alternative names, subtype, resources (docs & photos).
- Added GroupRef mark-up.
- Added GroupRef data-type.
- Move Person GroupLnk
BirthEvent/DeathEvent to allow for Eventlet.
- Have equivalent to
Birth/Death (Creation & Demise) for Group and Place. Replaces Void in
- Handle "related
entities" with JoinFrom (in Creation), SplitTo (in Demise), and
RelatedTo elements in Place/Group.
- Simplify Eventlet by
removing hierarchy support.
- Persisted Counters in
Dataset header for assisted key generation.
- Added GroupProperties to
- Event Place optional in
- Resource entity
distinguishes physical artefacts and images thereof.
- Improved digital data-types
- Sensitivity levels
accepted on Resources (e.g. photographs and documents).
- External IDs accepted on
Person, Place, Group, and Event entities.
Major change to trim excess flexibility, and to address
certain known failings:
- Could not represent the
Properties for unidentified or incidental people in a given source.
- Overloading of Role
Property with relationships.
- Problems representing a
“directed Property”, to another entity reference, as opposed to, say,
- Cannot inherit from an
Event when it has Detail elements in it.
- Representation of
top-level research reports.
- Reversal of
Person-to-Event (etc) links to place Properties in the Event, alongside
the respective source details.
- Added References element
to Event for representing subject references in the sources, and their
respective Properties. This element supersedes the previous Detail
- Introduction of “abstract”
entities for the sole purposes of inheritance.
- Make Event hierarchies
bottom-up rather than top-down for consistency and ease of validation.
- Deprecation of parameter
substitution into Citation URIs; both named parameter markers and the ‘=?’
- Inclusion of NARRATIVE as
a top-level Dataset entity for research reports and authored works.
- Changed semantic types on Properties
and Parameters to use “DC:” namespace prefix rather than simply “DC.”.
DCType attribute changed to SemType.
- Reinstatement of
Event-specific Property values to represent named items of information for
- Explicit control over
entity-Key imports for multi-Dataset Documents and multi-Document
- PersonEL, PlaceEL, and
GroupEL data-types added for Properties that describe a relationship
between two evidential subjects, such as person-to-person.
- Addition of ‘Header’
TEXT_TYPE for details of authorship, title, etc., in narrative works.
- Adjustments to
NAME_VARIANTS to move the Type attribute, add an Initial=’boolean’ option
for using initials, an indication of cultural style, and an optional
override for character sorting.
- Added optional PersonalName
(within Person entity) to complement PlaceName (in Place) and GroupName
- Revise syntax of
<Constraints> element to associate narrative with a specific
constraint, e.g. to express causal relationships.
- Added optional coordinates
to a Place in order to represent a point, an enclosed area (i.e. polygon),
or an open line (e.g. for a street).
- Separation of Relationship
- Several new event-types
Major change to finally accommodate sources, information,
evidence, and conclusions in a single model that supports the major approaches
to research and representation.
- Introduction of a new
Source entity that embraces both Citations and Resources for a particular
information source. Citations and Resource entities are now connected to
Source entity rather than to each other.
- Support for source
assimilation & analysis, source
mining, and the ability to drill-down on conclusions, all provided via
the Source entity.
- The <References>
element, within Events, is now superseded by <SourceLnk> which links
to the new Source entity. Enclosed *Ref elements (e.g. <PersonRef>)
changed to *Lnk elements for consistency. Removal of the ID attribute introduced
- Support for cross-source
analysis and correlation via a new Matrix entity.
- Support for a generalised
approach to multi-tier personae.
- Additional of Animal
entity, strongly modelled on Person entity, including related mark-up and
from Person, Place, Group, and Event entities, changed to <SourceLnk>.
- Reviewed the goal of
sticking to XHTML tags for presentation,
replacement of the <Hi> element with HTML-like ones, and the
addition of support for <sup>/<sub> elements, columnar text, simple
tables, and indentation.
- Removal of ‘Unreadable’
mode from the <Anom> element.
- Support for distinguishing
manuscript and typescript transcriptions in the <Text> element.
Support for numbering lines and pages in transcriptions. Positional
control over annotations such as marginalia.
- <FromText> element
added to <Narrative> in order to share re-usable sections of text.
This has meant that the NoteKey attribute, in the semantic mark-up, was no
longer required and so was deleted.
- Categorisation of the
layers in a Citation chain.
- The optional
<DisplayFormat> element of the Citation entity has been
re-interpreted as a set of pre-formatted language-specific strings. This may
exist in addition to the mandatory set of named parameter values, and the
two together can also be used as a simple citation-template.
- The Intrinsic Functions,
mentioned at the end of Semantic
Mark-up, have been changed to Intrinsic Methods in preparation for defining
a run-time object model. The set is also supplemented by ones for
accessing subject-entity names.
- Small changes to
subject-entity *-name-mode vocabularies to factor-out a generic name-mode
(missing from previous specification).
- Place coordinates
(including bounding shapes) are now time-dependent, the same as any
- Added Canton and Colony to
place-type vocabulary. The place-type of House is now replaced by Number
and Apartment for flexibility.
<Reliability>, and <Credibility> elements moved from the
Citation entity to the new Source entity.
Although refinements will continue, I anticipate this to be
the last major change to the STEMMA specification. I will, therefore,
concentrate subsequent efforts on describing its advantages and philosophy, and
in providing more worked examples.