EAC-CPF revision half-way into stage 2

After successfully releasing a minor revision of the Encoded Archival Context – Corporate Bodies, Persons, and Families (EAC-CPF) in December 2018, the EAC-CPF team of the Technical Subcommittee on Encoded Archival Standards (TS-EAS) is now half-way into the second stage of the standard’s major revision. While all current issues are discussed on GitHub https://github.com/SAA-SDT/eac-cpf-schema/issues and during the monthly team meetings, a few general topics emerged from these conversations that seemed to qualify for more in-depth analysis.

One day dedicated to EAC-CPF

Hence TS-EAS decided to have a one-day meeting on selected topics from the EAC-CPF revision, when meeting in the context of the Annual Meeting of the Society of American Archivists (SAA) in Austin in August 2019. The topics on the table were "Dates", "Names", "Identifiers" and the new addition of "Assertion Description". It should be noted that some of the updates, which are detailed below, are still in flux and will require further conversations during the next few months.

Dates

EAC-CPF uses a variety of elements to encode date information, but it is only possible to some extent to express uncertainty about dates or even to classify part(s) of a given date range as unknown. The EAC-CPF team has been looking at the implementation of such uncertainty in other standards, including Encoded Archival Description (EAD 2002 and EAD3), the Extended Date/Time Format (EDTF) Specification, the Text Encoding Initiative (TEI) and the Metadata Object Description Schema (MODS). Based on this comparison, the suggested changes include:

  • Adding the attribute @certainty (from EAD3) to the elements <date>, <fromDate> and <toDate>;
  • Recommending the use of values inspired by EDTF such as "uncertain", "approximate" and "uncertain and approximate" with the newly added attribute @certainty;
  • Introducing a new attribute @status for the elements <date>, <fromDate> and <toDate> to indicate their status as being "unknown" or "open" (e.g. for persons who are still alive).

Names

While in EAC-CPF it is relatively straightforward to use the element <nameEntry> plus <part> to encode names and their constituent parts, there remain questions around the appropriate use of its other sub-elements: <authorizedForm>, <alternativeForm> and <preferredForm>. Meant to indicate the rule or convention, based on which a specific form of name can be identified as "authorized", "alternative" or "preferred", these elements – indirectly – also provide information about the status of the name given in their parent <nameEntry>. The EAC-CPF team has discussed options to disentangle the current situation, e.g. by:

  • Recommending more strongly that rules and conventions are encoded via the element <conventionDeclaration> in the <control> section of an EAC-CPF instance;
  • Adding the attribute @rules (from EAD3) to <nameEntry> to briefly note the applied rule, plus adding an IDREF type attribute to <nameEntry> to enable pointing to the corresponding <conventionDeclaration> for further details;
  • Introducing a new attribute @status for the element <nameEntry> to indicate the status of the name as being "authorized" or "alternative";
  • Investigating the possibility of turning <preferredForm> into an attribute as well.

Alongside the expected changes for <nameEntry>, the EAC-CPF team also is considering a name change from <nameEntryParallel> to a more general <nameEntrySet>. The use of the attribute @localType would then be recommended to indicate that all names grouped within <nameEntrySet> are "parallel", as per the specific use case in the US American context, or do all represent "former" forms of the name or "translation"-s of the name.

Identifiers

Talking about the various ways to identify an EAC-CPF instance, its versions, its parts, the entity – or identity – it describes, as well as related resources and related entities, the EAC-CPF team has decided to focus especially on providing more specific descriptions and more appropriate examples to clarify which ID element – or attribute – to use for which use case. As a starting point, three types of identifiers have been defined, one of which can furthermore be divided into two sub-groups:

  • Database primary keys, used to uniquely identify each record within a given context; e.g. elements <recordId> and <otherRecordId> holding current and maybe previously used identifiers of the EAC-CPF instance;
  • Identifiers used to distinguish and determine entities;
    • Informational identifiers, e.g. the alphanumeric string representing the name of an entity as given in <nameEntry>, which establishes a meaningful connection with the entity it represents;
    • Non-informational identifiers, e.g. the primarily, but not exclusively numeric string of a globally unique and persistent identifier as given in <entityId>, which does not have a meaningful connection with the entity it represents;
  • Identifiers used to create unique locations within an EAC-CPF instance; i.e. the attribute @xml:id providing identification for a specific element within the EAC-CPF XML.

With regard to identifiers of EAC-CPF instances that have been merged or translated into the current one, the EAC-CPF team has decided to promote the use of <source> rather than <otherRecordId>. Furthermore, <entityId> will be renamed more appropriately to <identityId>.

Assertion Description

In addition to the three topics on existing elements above, the EAC-CPF team also discussed a new feature request, which deals with enabling users to encode the source of specific information as part of an EAC-CPF instance. This becomes relevant especially when looking at potentially contradicting sources e.g. for the name of an entity or the date or place of birth of a person. Discussions are still ongoing with regard to this topic, but the intent is:

  • To introduce a new element called <evidence> or similar as sub-element to most descriptive elements within EAC-CPF;
  • To include a sub-element <foundData> with this new element to encode a brief description of the evidence data found in the (new) source;
  • To work with attributes to point to the exact element that includes the assertion and to refer to potentially contradicting assertions within the same EAC-CPF instance;
  • To enable connections between the new element <evidence> and the elements <source> and <maintenanceEvent> in the <control> section to encode information about the source in general as well as about agent making the assertion and the date of the assertion.

Next steps

The EAC-CPF team will tackle pending questions with regard to these topics as well as others, which still require further consideration, in the context of its monthly meetings between December 2019 and March 2020, culminating in a three-day meeting from 9 to 12 March 2020 in Berlin, Germany. We invite you to follow and participate in our conversations on GitHub https://github.com/SAA-SDT/eac-cpf-schema/issues at any time.