Introduction‎ > ‎Document Structure‎ > ‎


A Citation is the identification of the source of some information recorded in the Dataset. Examples might include books, newspaper articles, BMD certificates, census data, tax records, court records, film, tombstones, military service records, journey manifests, cemetery records, oral history, church records, pension records, land or property transfers, etc.


These are common historical sources, and specific printed citation formats are applicable to each of them, but this Citation entity goes further. It can also identify a collection of works, a repository or institution, or even represent attribution to an individual.


In the citations of normal written or printed works, there are two main citation modes that may be employed. Citations may involve footnotes/endnotes referenced by inline superscripted indicators which are usually numeric (possibly in brackets) but sometimes symbols. Alternatively, they may involve parenthetical in-text citation such as “Smith (2004, p. 39) claims that...”, or “…(Smith 2004, p.39)…” if all details are parenthesised, in conjunction with a full reference list or bibliography at the end of the work.


In these printed citations, there are conventions that apply to different source types in order to present consistency, and these have precise specifications for punctuation, use of italics, etc. Several citation styles are in common use. For instance, in the humanities there are: Modern Language Association (MLA), Harvard referencing, Modern Humanities Research Association (MHRA), and the Chicago Manual of Style (CMOS). There are other styles commonly used in law or the sciences too.


The Board for Certification of Genealogists (BCG) recommends CMOS which utilises footnotes, endnotes, and bibliographies. The requirements of genealogy are very demanding in the varieties of sources that need to be cited, and Elizabeth Shown Mills[1] has extended conventional CMOS style guidelines to include many of those additional source types. It should be understood, though, that these citation styles and modes relate to the written or printed citations. Their application is therefore relevant to genealogical reports, including on-screen computer displays and charts.


Those citations, though, are designed to be humanly-readable, and so embody elements of a specific locale, culture, and preferred style. This is a problem for electronic documents as they are not computer-readable, and so cannot be adjusted to suit the locale or preferences of an arbitrary end-user. It is therefore necessary to go back to the essence of a citation rather than consider specific physical implementations – i.e. to provide sufficient information through a digested citation to uniquely identify a source, enable it to be re-examined if necessary, and to support the formatting appropriate for the current end-user. The scheme presented here is a generalised computer-readable one that would cope with all possible source types. It does not strive to enumerate all those source types, or specify what parameters they require, or mandate a particular presentation style. The main goals of this scheme are to keep it open-ended so that citation types can be defined freely, to make the scheme parameterised, and to give it a hierarchical structure with parameter inheritance.




<Citation Key=’key’>

[ <Title> citation-title </Title> ]

[ <DisplayFormat> format-string </DisplayFormat> ]

[ <URI> base-uri </URI> ]

[ <Params>

{ <Param Name=’name’ [Type=’type’] [Key=’key’] [DCType=’dc-type’] [ItemList=’boolean’] [Optional=’boolean’]> default-value </Param> } ...

</Params> ]

[ <Quality> source-quality </Quality> ]

[ <Credibility> information-credibility </Credibility> ]

[ <Reliability> information-reliability </Reliability> ]


[ <ParentCitationLnk Key=’key’/>


</ParentCitationLnk> ]

[ <BaseCitationLnk Key=’key’>


</BaseCitationLnk/> ]







{ <Param Name=’name’  [Key=’key’]>


</Param> }


{ <Param Name=’name’>

{ <Item [Key=’key’]> value </Item> } …

</Param> }



Note that STEMMA syntax does not differentiate between citing a specific source of information, citing a collection or work that the information was contained within, or citing a repository or institution hosting that work or collection – they are all citing something. The <Citation> entity is therefore hierarchical so that these related data can be arranged in a chain using the <ParentCitationLnk> to indicate each parent. This avoids duplication and provides a stronger representation overall. It may be controversial but STEMMA considers all aspects of a source to be citations of the true source the true source is not present in the data itself.


The Dublin Core Metadata Initiative has encountered the issue of a chain but has tried to solve it by adding additional terms and namespaces (see dc-citation-guidelines/). Basically, the simple Dublin Core terms cannot clearly distinguish, for instance, the title of an article from the title of a journal containing that article, or provide a clear indication of other data related to the containing journal such as publication date (as distinct from the article submission date), or the volume and issue numbers. That same page recommends the use of the OpenURL (ANSI/NISO standard, Z39.88-2004) ContextObject for representing the context of a bibliographic citation, although it does not take this to the level of a hierarchical chain. The OpenURL concept is designed to provide the context of a citation in a machine-readable form that can be resolved by an unspecified library or archive. In other words, the Dublin core recommendation doesn’t cite a source directly but as a library-independent hyperlink to content.


The STEMMA scheme described here is fully in keeping with those Dublin Core recommendations but is not specifically tied to it. It allows each source-type to provide a parameterised base URI. Parameters can be applied to build up an expanded URI representing a specific citation. Those expanded URIs provide a unique handle for spotting repeated citations locally, but may also constitute a global handle for the unambiguous exchange of machine-readable citations. Irrespective of whether the base URI accommodates a representation of a citation chain, the fact that STEMMA can represent them internally, and the fact that parameters may be propagated along the chain, mean that requirements of all URI standards can be supported.


The STEMMA Citation hierarchy generally places a reference to the indefinite source at the lowest level, and then links that to an actual instance, or the definite source, and then to its location, etc. With a book, for example, the indefinite source could be identified by the title, author, publisher, and edition. The definite source could have been a book from a library, my own copy of the book, or an online copy, but that would be further up the hierarchy and not normally cited in the bibliography. For a digital image of a church record then the indefinite source would be the entry in the parish register. A higher level might be the scanned copy at, say, findmypast or FamilySearch. The same applies to a census. If I only want to see the unique census-page reference in a reference note then that is my indefinite source. Whether I saw the image at the National Archives, online at findmypast or Ancestry, or on some published CDs would be at a higher level.


The display format is part of the Citation entity for convenience. However, many citation types will require formatting to a given style and locale. A later version may allow styles to be automatically selected from Citation Style Language (CSL) templates. CSL is an open XML-based language for defining the parameters and formatting for different citation types. These styles can be browsed and searched via the Zotero Style Repository. It currently has no concept of a URI base string which is unfortunate because it would be a convenient handle to distinguish the templates and applicable source-types in the repository. In the absence of any external formatting support for citations, the <DisplayFormat> element is used as a default.


The named substitution style of parameterisation is available in the citation-title, the format-string, the base-uri, and the values of parameters themselves (e.g. within a Params or ResourceLnk element). The base-uri can also be parameterised using the URI mechanism, similar to that in the Resource URL, but it is more useful here since it enables the base-uri to retain its traditional form for use as a source-type handle.


Note that parameter names are local to the corresponding source-type. There is no sharing of parameter names between different source-types, and no implied semantics in any of their names. If two source-types each have a parameter called ‘Publisher’ then they are each interpreted in the context of their respective source-types. In effect, no semantics are conveyed directly by the parameter name – that is the purpose of the DCType attribute.


The parameter data-type values expressed by the Type attribute are currently similar to those allowed in Extended Properties, except that Measure, Enum, & EnumList are not supported, and Date only accepts ISO dates (no non-Gregorian calendars). The same ItemList approach to lists is taken as for Property values. The semantic type is indicated by the DCType attribute which uses the Dublin Core vocabulary, e.g. DCType=’DC.Title’ or DCType=’DC.Publisher.CorporateName.Address’. The default value for the Optional attribute is 0 (i.e. false) which means that a non-blank value must be provided.


The Quality, Credibility, and Reliability elements characterise the confidence in a source, and of information derived from it Note that these do not relate to a specific datum from the source. The Surety data-attribute is provided for that case. See Extended Vocabularies for defining custom values.


  • Unknown – Unknown or unspecified assessment.
  • Credibility – Expert Information from someone with relevant expertise.
  • Credibility – Questionable. Questionable credibility of information, as in interviews and oral genealogies, or with potential for bias as in an autobiography.
  • Credibility – Trusted. Information from a trusted source.
  • Credibility – Unsubstantiated claims or opinions.
  • Quality – Original. Material in its original recorded form.
  • Quality – Copy. Facsimile of original, e.g. image copy, certified copy.
  • Quality – Derivative. Manipulated version of original, e.g. translation, abstract, extract.
  • Quality Authored. Narrative work using other sources but providing independent conclusions.
  • Reliability – Primary. Details provided by someone with first-hand knowledge.
  • Reliability – Secondary. Details provided by someone with second-hand or more-distant knowledge.


The BaseCitationLnk element may nominate a generic base Citation from which data may be inherited by the current Citation, in much the same vein as base classes and derived classes in software programming. Any application of parameter substitution must therefore occur after the inheritance process has completed. If an implementation creates a temporary conglomerate entity in memory by doing a physical merge then it must not be persisted back to the data file, otherwise it constitutes a data corruption.


Here’s a simple example of a traditional book citation:


<Citation Key=’cOldNottm’>

<Title>Old Nottingham Notes</Title>

<URI> http://stemma</URI>


<Param Name=’Author’>James Granger</Param>

<Param Name=’Title’>OLD NOTTINGHAM : Its Streets, People, etc</Param>

<Param Name=’Publisher’>Nottingham Daily Express Office</Param>

<Param Name=’Date’ Type=’Date’>1904</Param>

<Param Name=’Pages’/>




Reprinted from the Nottingham Daily Express, October 3rd, 1903 – July 9th, 1904





A corresponding citation might appear as:


<CitationLnk Key=’cOldNottm’>

<Param Name=’Pages’>46-48</Param>



Whether this generates an in-text reference or a short/long reference note depends on the selected citation mode.


The Board for Certification of Genealogists (BCG) has an interesting ‘work sample’ on their Web site ( that presents a multi-part citation:


Evidence Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore: Genealogical Publishing Co., 2007)—or the earlier abridged edition, Evidence! Citation & Analysis for the Family Historian (1997) together with its companion QuickSheet: Citing Online Historical Resources Evidence! Style (rev. 2007).


This might be represented as a compound Citation that brings together the referenced simple Citations through its Narrative support.


Citations can become even more complex than this since the author will not only want to cite the source, and the information obtained form that source, but the context of how it substantiates or contradicts their assertions and conclusions. This often involves some type of analytical commentary in the citation. For instance:


Death notices, Ulster Gazette and Daily National Intelligencer, both dated 24 January 1815. Corra Bacon-Foster, "The Story of Kalorama," Records of the Columbia Historical Society (1910), 108, states Louisa left four children; three have been identified. In 1810, Charles "Cating" and a female, both over 44, were enumerated with one male and female aged 26-44; one male and female aged 16-25; and one male under 10 - suggesting that George, Louisa, and their first son may have been living in the Catton household. See 1810 U.S. census, Ulster County, New York, New Paltz, p. 116, line 6; NA micropublication M252, roll 37.


This type of text could make extensive use of the STEMMA Narrative support but should it appear in the Citation definition, the relevant CitationLnk, or elsewhere? The Narrative Structure section illustrates how Evidence and Conclusion (E&C) trees can be built up from named Narrative elements.


It is important retain a clear view of the distinction between a Citation and a Resource. As another example, consider UK BMD references. These might be linked to the defining body, say with something like, in order to create a unique abstract source citation. However, if you wanted to be able to pull up the appropriate index page on some Web site then that would be done via a Resource entity.


[1] Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore: Genealogical Publishing Co., 2009)