Introduction‎ > ‎Document Structure‎ > ‎


A Citation entity identifies the source of some information mentioned in the Dataset. Examples might include books, newspaper articles, BMD certificates, census data, tax records, court records, film, tombstones, military service records, journey manifests, cemetery records, oral history, church records, pension records, land or property transfers, etc.


These are common historical sources, and there are accepted printed citation formats applicable to each of them, but this Citation entity goes further; it can also identify a collection of works, a repository or institution, or even represent attribution to an individual.


In the citations of normal written or printed works, there are two main citation modes that may be employed within text: document labels and source labels, being applicable to documents and images respectively. Citations may involve reference notes linked to inline superscript indicators in the main text. Alternatively, they may involve a source list or bibliography at the end of the work. Parenthetical in-text citations such as “Smith (2004, p. 39) claims that...”, or “…(Smith 2004, p.39)…” if all details are parenthesised are commonly associated with a bibliography are less appropriate for genealogical or historical citations. This is because they do not accommodate the source provenance or analytical notes that are frequently required.


There are citation conventions that apply to different source types and scenarios in order to present some consistency, and these have precise specifications for their layout, quotation marks, punctuation, and use of italics. Several citation styles are in common use. For instance, in the humanities there are: Modern Language Association (MLA), Harvard referencing, Modern Humanities Research Association (MHRA), and the Chicago Manual of Style (CMOS). There are other styles commonly used in law or the sciences too.


The Board for Certification of Genealogists (BCG) recommends CMOS which utilises footnotes, endnotes, and bibliographies. The requirements of genealogy are very demanding in the varieties of sources that need to be cited, and Elizabeth Shown Mills[1] has extended conventional CMOS style guidelines to include many of those additional source types. It should be understood, though, that all these citation styles and modes relate to the final-form written or printed citations. Their application is therefore relevant to a specific end-user rather than computer storage (see Cite Seeing).


Since those final-form citations are designed to be humanly-readable, they also embody elements of a specific locale, culture, and preferred style. This is a problem for electronic documents as they are not computer-readable, and so cannot be adjusted to suit the locale or preferences of an arbitrary end-user. It is therefore necessary to go back to the essence of a citation rather than consider specific physical implementations i.e. to provide sufficient information through a digested citation to uniquely identify a source, its characteristics, and any analytical assessment. These citation-elements — implemented through STEMMA’s Parameter mechanism — should be sufficient to support the formatting appropriate for any given end-user.


The scheme presented here is a generalised computer-readable one that would cope with all possible source types and scenarios. It does not strive to enumerate all possible source types, or specify what elements they require, or mandate a particular presentation style; the main goals of this scheme are to keep it open-ended so that source types can be defined freely, to parameterise the scheme so that it can interface to external citation-templates, and to give it a hierarchical structure for representing different layers of a citation (e.g. for provenance or location).




<Citation Key=’key’ [Abstract=’boolean’]>

[ <Title> citation-title </Title> ]

<URI> source-type-uri </URI>

[ <Params>

{ PARAM_DEF... } | { PARAM_VALUE … }

</Params> ]

[ <DisplayFormat [Mode=’citation-format-mode’]>


</DisplayFormat> ] …

 [ <ParentCitationLnk Key=’key’ [Type=’layer-type’]>


</ParentCitationLnk> ]

[ <BaseCitationLnk Key=’key’>


</BaseCitationLnk> ]







<Param Name=’name’ [Type=’type’]  [SemType=’sem-type’]

[ItemList=’boolean’] [Optional=’boolean’]>







{ <Param Name=’name’  [Key=’key’]>


</Param> }


{ <Param Name=’name’>

{ <Item [Key=’key’]> value </Item> } …

</Param> }



The parameterisation is available in the citation-title, the format-string, narrative elements, and the values of Parameters themselves (i.e. within a Params element).


Note that Parameter names are local to the corresponding source-type. There is no sharing of Parameter names between different source-types, and no implied semantics in any of their names. If two source-types each have a Parameter called ‘Publisher’ then they are each interpreted in the context of their respective source-types. In effect, no semantics are conveyed directly by the Parameter name that is the purpose of the SemType attribute.


The valid Parameter data-types are documented at: Data Types. The same ItemList approach to lists is taken as for Property values. The semantic type is indicated by the SemType attribute which may use the Dublin Core vocabulary, e.g. SemType=’DC:Title’ or SemType=’DC:Publisher.CorporateName.Address’. The default value for the Optional attribute is 0 (i.e. false) which means that a non-blank value must be provided.


The <BaseCitationLnk> element may nominate an Abstract Citation from which data may be inherited by the current Citation, in much the same vein as base classes and derived classes in software programming. An Abstract Citation must define no embedded Keys, can only reference other abstract entities, and must contain Parameter definitions rather than Parameter settings. Any application of Parameter substitution must therefore occur after the inheritance process has completed. If an implementation creates a temporary conglomerate entity in memory by doing a physical merge then it must not be persisted back to the data file, otherwise it constitutes a data corruption. See Inheritance and Parameters for more information.


It is important retain a clear view of the distinction between a Citation and a Resource. As an example, consider UK BMD references. These might be linked to the defining body, say with something like, in order to create a unique source citation. However, if you wanted to be able to pull up the appropriate census page from some Web site then that would be done via a corresponding Resource entity.


Some related articles may be found at: Cite Seeing and Citations for Online Trees.


Semantic Typing


The simple Dublin Core (see Dublin Core Metadata Initiative) terms cannot clearly distinguish, say, the title of an article from the title of a journal containing that article, or provide a clear indication of other data related to the containing journal such as publication date as distinct from the article submission date, or the volume and issue numbers. That same page recommends the use of the OpenURL (ANSI/NISO standard, Z39.88-2004) ContextObject for representing the context of a bibliographic citation, although it does not take this to the level of a hierarchical chain. The OpenURL concept is designed to provide the context of a citation in a machine-readable form that can be resolved by an unspecified library or archive. In other words, the Dublin core recommendation doesn’t cite a source directly but as a library-independent hyperlink to content. At best, it constitutes a reference to an indefinite source.


The SemType attribute associates such semantic information with the individual citation-elements (i.e. Parameters) but leaves the Parameter names to be chosen independently to suit the source-type. Other semantic types could be applied using the same attribute, but with a different namespace.


The STEMMA scheme described here is fully in keeping with those Dublin Core recommendations but is not specifically tied to it. It allows each type of source to be represented by a source-type-uri. Parameters can be applied to build up a citation description for a specific instance of that source-type. The source-type-uri also acts as a global key for retrieving localised text for soliciting Parameter values, data-types for validating the Parameter values, and for interfacing to a citation-template system in order to generate a formatted string for the user.


Citation Chain


Citations may be linked to describe the provenance of a source, the provenance of the information itself, where the originals are held, and any analytical comments. These are known as citation layers and the associated chain forms part of a hierarchy created through the use of the <ParentCitationLnk> element.


The STEMMA Citation hierarchy allows the individual parts of a layered citation to be described, and re-used for related references. For a facsimile copy (digital or otherwise, but not a database) then it generally places a reference to the indefinite source[2] at the lowest level (e.g. an unaccessed original), and then links that to an actual instance (a definite source), such as an online copy. Further layers might identify the source-of-the-source, location of the originals, or analytical comments. With a book, for example, the indefinite source could be identified by the title, author, publisher, and edition, while the definite source could have been an online copy.


Note that STEMMA syntax does not differentiate between citing a specific source of information, citing a collection or work that the information was contained within, or citing a repository or institution hosting that work or collection they are all citing something in the more literal sense. Supporting citation layers avoids duplication and provides a stronger representation overall.


The Dublin Core Metadata Initiative has encountered the issue of a chain but has tried to solve it by adding additional terms and namespaces (see dc-citation-guidelines/).


The links between the layers may be characterised using the Type=’layer-type’ as follows:





A brief summary or a précis of --


Information cited by the source. Source-of-the-Source.


Analytical comments.


Database extract (usually cited in first layer)


Database extract with images


Extracted portion from --


Scan, photocopy, photograph, etc.


Media conversion from --


Other provenance information, differing from ‘Citing’.


Location of source.


Transcribed details from --


Translated details from --



Display Format


The display format is partially provided as a convenience for preserving hand-crafted citations. Citation entities will require formatting to a given style and locale before they can be displayed. A later version may allow styles to be automatically selected from Citation Style Language (CSL) templates CSL is an open XML-based language for defining the parameters and formatting for different citation types. Such styles can be browsed and searched via the Zotero Style Repository, although it currently has no concept of a URI string which is unfortunate because it would be a convenient handle to distinguish the templates and applicable source-types in the repository. A problem with such citation-template schemes is that they try to format plain textual elements into a simple template, whereas STEMMA assumes that objects representing, say, a Person, Place, or Contact can be provided. The advantage of the latter is that the template system can call-back on well-defined methods to obtain a particular style of name, or specific contact details; otherwise the genealogical software product is assumed to have intimate knowledge of the specific template.


In the absence of any external formatting support for citations, the <DisplayFormat> element can also be used as a simple STEMMA-defined citation-template. It allows a number of language-specific text strings to be defined for different formatting modes (e.g. full reference note the default), and these can make use of mark-up and parameterisation to employ them in multiple scenarios. Although some brief examples are presented below, a fuller example may be found at: Citation Template. NB: this template feature is purely declarative and currently contains no decisional control over the generation of the citation text.




Here’s a simple example of a traditional book citation:


<Citation Key=’cOldNottm’>

<Title>Old Nottingham Notes</Title>

<URI> http://stemma </URI>


<Param Name=’Author’>James Granger</Param>

<Param Name=’Title’>OLD NOTTINGHAM : Its Streets, People, etc</Param>

<Param Name=’Publisher’>Nottingham Daily Express Office</Param>

<Param Name=’Date’>1904</Param>

<Param Name=’Pages’/>




Reprinted from the Nottingham Daily Express, October 3rd, 1903 – July 9th, 1904.





A corresponding citation invocation, for a specific page, might appear as:


<CitationRef Key=’cOldNottm’>

<Param Name=’Pages’>46-48</Param>



Whether this generates a source-list reference or a short/long reference note depends on the selected citation mode.


Citations can become very complex since the author will not only want to cite the source, and the information obtained form that source, but the context of how it substantiates or contradicts their assertions and conclusions. This often involves some type of analytical commentary in the citation. For instance:


Death notices, Ulster Gazette and Daily National Intelligencer, both dated 24 January 1815. Corra Bacon-Foster, "The Story of Kalorama," Records of the Columbia Historical Society (1910), 108, states Louisa left four children; three have been identified. In 1810, Charles "Cating" and a female, both over 44, were enumerated with one male and female aged 26-44; one male and female aged 16-25; and one male under 10 - suggesting that George, Louisa, and their first son may have been living in the Catton household. See 1810 U.S. census, Ulster County, New York, New Paltz, p. 116, line 6; NA micropublication M252, roll 37.


This type of text could make extensive use of the STEMMA narrative support, and could be constructed as a single footnote with embedded, inline citations. see Cite Seeing for a deeper discussion.


[1] Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore: Genealogical Publishing Co., 2009)

[2] Note that academic citations, such as those in journals, often refer to an indefinite source. This allows them to be much briefer but it only works because such sources are published and easily accessible; it makes no difference where the article or paper was obtained from.