EAC-CPF revision half-way into stage 2

After successfully releasing a minor revision of the Encoded Archival Context – Corporate Bodies, Persons, and Families (EAC-CPF) in December 2018, the EAC-CPF team of the Technical Subcommittee on Encoded Archival Standards (TS-EAS) is now half-way into the second stage of the standard’s major revision. While all current issues are discussed on GitHub https://github.com/SAA-SDT/eac-cpf-schema/issues and during the monthly team meetings, a few general topics emerged from these conversations that seemed to qualify for more in-depth analysis.

One day dedicated to EAC-CPF

Hence TS-EAS decided to have a one-day meeting on selected topics from the EAC-CPF revision, when meeting in the context of the Annual Meeting of the Society of American Archivists (SAA) in Austin in August 2019. The topics on the table were "Dates", "Names", "Identifiers" and the new addition of "Assertion Description". It should be noted that some of the updates, which are detailed below, are still in flux and will require further conversations during the next few months.

Dates

EAC-CPF uses a variety of elements to encode date information, but it is only possible to some extent to express uncertainty about dates or even to classify part(s) of a given date range as unknown. The EAC-CPF team has been looking at the implementation of such uncertainty in other standards, including Encoded Archival Description (EAD 2002 and EAD3), the Extended Date/Time Format (EDTF) Specification, the Text Encoding Initiative (TEI) and the Metadata Object Description Schema (MODS). Based on this comparison, the suggested changes include:

  • Adding the attribute @certainty (from EAD3) to the elements <date>, <fromDate> and <toDate>;
  • Recommending the use of values inspired by EDTF such as "uncertain", "approximate" and "uncertain and approximate" with the newly added attribute @certainty;
  • Introducing a new attribute @status for the elements <date>, <fromDate> and <toDate> to indicate their status as being "unknown" or "open" (e.g. for persons who are still alive).

Names

While in EAC-CPF it is relatively straightforward to use the element <nameEntry> plus <part> to encode names and their constituent parts, there remain questions around the appropriate use of its other sub-elements: <authorizedForm>, <alternativeForm> and <preferredForm>. Meant to indicate the rule or convention, based on which a specific form of name can be identified as "authorized", "alternative" or "preferred", these elements – indirectly – also provide information about the status of the name given in their parent <nameEntry>. The EAC-CPF team has discussed options to disentangle the current situation, e.g. by:

  • Recommending more strongly that rules and conventions are encoded via the element <conventionDeclaration> in the <control> section of an EAC-CPF instance;
  • Adding the attribute @rules (from EAD3) to <nameEntry> to briefly note the applied rule, plus adding an IDREF type attribute to <nameEntry> to enable pointing to the corresponding <conventionDeclaration> for further details;
  • Introducing a new attribute @status for the element <nameEntry> to indicate the status of the name as being "authorized" or "alternative";
  • Investigating the possibility of turning <preferredForm> into an attribute as well.

Alongside the expected changes for <nameEntry>, the EAC-CPF team also is considering a name change from <nameEntryParallel> to a more general <nameEntrySet>. The use of the attribute @localType would then be recommended to indicate that all names grouped within <nameEntrySet> are "parallel", as per the specific use case in the US American context, or do all represent "former" forms of the name or "translation"-s of the name.

Identifiers

Talking about the various ways to identify an EAC-CPF instance, its versions, its parts, the entity – or identity – it describes, as well as related resources and related entities, the EAC-CPF team has decided to focus especially on providing more specific descriptions and more appropriate examples to clarify which ID element – or attribute – to use for which use case. As a starting point, three types of identifiers have been defined, one of which can furthermore be divided into two sub-groups:

  • Database primary keys, used to uniquely identify each record within a given context; e.g. elements <recordId> and <otherRecordId> holding current and maybe previously used identifiers of the EAC-CPF instance;
  • Identifiers used to distinguish and determine entities;
    • Informational identifiers, e.g. the alphanumeric string representing the name of an entity as given in <nameEntry>, which establishes a meaningful connection with the entity it represents;
    • Non-informational identifiers, e.g. the primarily, but not exclusively numeric string of a globally unique and persistent identifier as given in <entityId>, which does not have a meaningful connection with the entity it represents;
  • Identifiers used to create unique locations within an EAC-CPF instance; i.e. the attribute @xml:id providing identification for a specific element within the EAC-CPF XML.

With regard to identifiers of EAC-CPF instances that have been merged or translated into the current one, the EAC-CPF team has decided to promote the use of <source> rather than <otherRecordId>. Furthermore, <entityId> will be renamed more appropriately to <identityId>.

Assertion Description

In addition to the three topics on existing elements above, the EAC-CPF team also discussed a new feature request, which deals with enabling users to encode the source of specific information as part of an EAC-CPF instance. This becomes relevant especially when looking at potentially contradicting sources e.g. for the name of an entity or the date or place of birth of a person. Discussions are still ongoing with regard to this topic, but the intent is:

  • To introduce a new element called <evidence> or similar as sub-element to most descriptive elements within EAC-CPF;
  • To include a sub-element <foundData> with this new element to encode a brief description of the evidence data found in the (new) source;
  • To work with attributes to point to the exact element that includes the assertion and to refer to potentially contradicting assertions within the same EAC-CPF instance;
  • To enable connections between the new element <evidence> and the elements <source> and <maintenanceEvent> in the <control> section to encode information about the source in general as well as about agent making the assertion and the date of the assertion.

Next steps

The EAC-CPF team will tackle pending questions with regard to these topics as well as others, which still require further consideration, in the context of its monthly meetings between December 2019 and March 2020, culminating in a three-day meeting from 9 to 12 March 2020 in Berlin, Germany. We invite you to follow and participate in our conversations on GitHub https://github.com/SAA-SDT/eac-cpf-schema/issues at any time.

New EAC-CPF release – 2010 schema version revised

The TS-EAS is pleased to announce the completion of the first phase of the revision. Revised schema files and an updated tag library can be found on the EAC-CPF site at: https://eac.staatsbibliothek-berlin.de/.

In 2017 the Technical Subcommittee on Encoded Archival Description (TS-EAS) of the Society of American Archivists agreed to undertake a revision of the standard EAC-CPF. The revision process is following a two-tiered strategy, starting with a technical update that includes minor enhancements and a general clean-up of that standard. The second phase of the revision will be a major overhaul of the standard and a reconciliation with EAD3. The updated schema will be backwards compatible as long as the attribute @accuracy isn’t used and values of @xml:id attributes are unique.

This update solves 15 issues, which can be viewed in full in GitHub. The changes include:

  • relaxed data types for the elements <preferredForm> and <otherAgencyCode>
  • made the elements <languageDeclaration>, <agencyName>, <eventDescription>, <sourceEntry>, <placeEntry> within <relations>-Elements repeatable
  • added the value ‘unknown’ to the attributes @eventType and @agentType
  • added term ‘published’ to the element <publicationStatus>
  • added the term ‘deletedMerged’ to the element <maintenanceStatus>
  • added the new optional element <rightsdeclaration> with child elements to <control>, as in EAD 3
  • added the optional attribute @localType to the elements <fromDate> and <toDate>
  • removed maximum year 2999 from the attributes @standardDate and @standardDateTime
  • corrected the typo in the attribute @accuracy
  • corrected data type for xml:id in eac.rng schema file

Please send questions and comments by e-mail to Silke Jagodzinski (s.jagodzinski@bundesarchiv.de).
Silke Jagodzinski (Bundesarchiv), EAC-CPF Team Lead Kathy Wisser (Simmons College), TS-EAS Co-chair

Revision of Encoded Archival Context – Corporate Bodies, Persons, Families – Call for Comments

In 2017 the Technical Subcommittee on Encoded Archival Description (TS-EAS) of the Society of American Archivists agreed to undertake a revision of the standard EAC-CPF.

The revision follows a two-tier strategy, starting with a technical update that includes minor enhancements and a general clean-up. This will be followed by a major overhaul of the standard and a reconciliation with EAD3. The updated schema will be backwards compatible as long as the attribute @accuracy isn’t used and values of @xml:id attributes are unique.

The TS-EAS is calling for comments on the first set of changes, which are reflected in the updated schema files, cpf.xsd & cpf.rng, available at GitHub. Please send feedback and comments by e-mail to Silke Jagodzinski (s.jagodzinski@bundesarchiv.de). Particularly, feedback on the usage of the attribute @accuracy and the usage of the provided RNG schema file is very welcome. To ensure that the revision process is as open as possible, all comments must be attributable to named individuals and affiliated organisations where appropriate. Anonymous responses will not be considered. All change proposals will be made publicly available, with attribution, through the GitHub Portal as established during the EAD revision process. E-mail addresses are requested so that we may contact respondents for clarification, but will not be shared.

This update solves 16 issues, which can be viewed in full in GitHub. The changes include:

  • relaxed data types for the elements <preferredForm> and <otherAgencyCode>
  • made the elements <languageDeclaration>, <agencyName>, <eventDescription>, <sourceEntry>, <placeEntry> within <relations>-Elements repeatable
  • added the value ‘unknown’ to the attributes @eventType and @agentType
  • added restricted content to the elements <publicationStatus> and <maintenanceStatus>
  • added the new optional element <rightsdeclaration> with child elements to <control>, as in EAD 3
  • added the optional attribute @localType to the elements <fromDate> and <toDate>
  • removed maximum year 2999 from the attributes @standardDate and @standardDateTime
  • corrected the typo in the attribute @accuracy
  • corrected data type for xml:id in eac.rng schema file

This call for comments is open for 2 month, and ends on 31 October 2018.

Silke Jagodzinski (Bundesarchiv), EAC-CPF Team Lead Kathy Wisser (Simmons College), TS-EAS Co-chair

Next steps for EAC-CPF

EAC-CPF is in its sixth year as a standard! Several community members have identified issues or suggested revisions to update, emend or enhance the standard. A call for comments and input from the EAD revision process has solicited a number of questions and suggestions that will form the launching point for the revision work. TS-EAS has determined that the revision process will be conducted through a two-tier strategy.

Update plan

In order to satisfy the requirements of the users that have implemented the standard, we plan to publish a technical update of the EAC-CPF in 2018. This update will comprise minor enhancements and a clean-up, e.g., we are going to
relax selected element contents and attribute values defined by regular expressions, add optional elements and attributes, and align elements and attributes definitions with EAD 3, if feasible. This update stage will ensure that the new EAC-CPF schema is backwards compatible. The EAC-CPF Tag Library will be updated accordingly.

Revision plan

While the EAC-CPF update will answer some of the questions and solve some open issues, more significant questions have emerged about certain aspects of the standard and the approach how to use and define this standard has been questioned. Addressing these general issues and tackling a major overhaul of EAC-CPF starts in 2018. We will publish all announcements about this process as it moves forward on the EAC-CPF website.

Stay updated

The TS-EAS subteam for EAC-CPF is discussing all issues via EAC-CPF GitHub portal. Feel free to check this page to stay updated about the details. Throughout the process we will also be seeking community input at each stage of the process and hope everyone will be engaged. We will announce next steps via the homepage news and the update release will surely communicated via the EAD-Mailinglist.

Revision of Encoded Archival Context – Corporate Bodies, Persons, Families – Call for Comments

At the annual meeting in Portland, Oregon this year, the Technical Subcommittee on Encoded Archival Description (TS-EAS) of the Society of American Archivists agreed to undertake a revision of the standard EAC-CPF.

The subcommittee is calling for proposed changes to the current version of EAC-CPF. To ensure the greatest possible input from users of EAC-CPF and other relevant standards, the deadline for change proposals is 11 December 2017. At that time, all proposals will be made publicly available through the EAC-CPF GitHub portal.

Please send feedback and comments by email to Silke Jagodzinski (s.jagodzinski@bundesarchiv.de). To ensure that the revision process is as open as possible, all comments must be attributable to named individuals and affiliated organisations where appropriate. Anonymous responses will not be considered. All change proposals will be made publicly available, with attribution, through the GitHub Portal as established during the EAD revision process. E-mail addresses are asked for so that we may contact respondents for clarification, but will not be shared.

Final Release of the Tag Library of the EAC-CPF Schema

The Technical Subcommittee on Encoded Archival Context is pleased to announce the publication of the final release of the Tag Library of the EAC-CPF 2010 schema – the EAC-CPF Tag Library 2014.

The previous release was published in 2010 and had the status of draft. Over 2011 – 2013 the Draft Tag Library was translated in French, Spanish, German, Italian and Greek. The process of translation was also an excellent opportunity to review and comment on the content of the Draft Tag Library, by pointing out inconsistencies with the schema and other editorial issues. On the basis of these comments and other queries and suggestions from the international community the TS-EAC has undertaken a thorough revision of the Draft Tag Library text.

Note that this revision concerns only the Tag Library. The EAC-CPF 2010 schema itself has not changed.

In parallel, a new model for the TEI encoding of the Tag Library was designed. The objective of this model is to facilitate the maintenance, update and publication of the Tag Library as living documentation for the EAC-CPF schema. The new model is also intended to facilitate the encoding of the various linguistic versions of the Tag Library. The current release of the EAC-CPF Tag Library 2014 is encoded following the new TEI encoding model.

Scholarship Program for EAC-CPF

Starting in March 2012 SAA will offer the first of seven regional workshops to be scheduled through June 2013 to facilitate the dissemination of the new standard, Encoded Archival Context – Corporate Bodies, Persons, and Families. To alleviate the pressures of decreasing professional development budgets, the Institute of Museum and Library Services (IMLS) is funding twenty scholarships for each of the seven workshops. The workshops will be hosted across the country and the first workshop is scheduled for March 23, 2012, in Austin, Texas. [more …]

The program is part of the project "Building a National Archival Authority Infrastructure".

http://socialarchive.iath.virginia.edu/NAAC_index.html

New Funding for the Social Networks and Archival Context Project

Daniel Pitti, associate director of the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH), in collaboration with Ray Larson at the School of Information at the University of California, Berkeley, and Adrian Turner and Brian Tingle at the California Digital Library, has received a grant from the Mellon Foundation to vastly expand Social Networks and Archival Context (SNAC), a research and demonstration project (http://socialarchive.iath.virginia.edu).

The SNAC project is addressing a longstanding research challenge: discovering, locating, and using distributed historical records. Scholars use these records as primary evidence for the lives and work of historical persons and the events in which they participated. These records are held in archives and manuscript libraries, large and small, around the world, and scholars may need to search scores of different archives, following clues, hunches, and leads to find the records relevant to their topic (and it is likely that at least some records will remain undiscovered). SNAC aims to not only make the records more easily discovered and accessed but also, and at the same time, build an unprecedented resource that provides access to the socio-historical contexts (which includes people, families, and corporate bodies) in which the records were created.

The project uses a recently released Society of American Archivists communication standard for encoding information about persons, corporate bodies, and families, Encoded Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF). EAC-CPF standardizes descriptions of people and groups who are documented in archival records.

The pilot stage of the project was funded by a 2010 grant from the National Endowment for the Humanities, which supported development of a prototype historical research and access system (http://socialarchive.iath.virginia.edu/prototype.html). This next stage encompasses a range of tasks: the project team will vastly expand the source data employed in the project; develop new methods and tools for extracting and assembling archival authority descriptions; enhance methods for matching and combining records describing the same entity; develop methods for accommodating descriptive data in languages other than English; add geographic coordinates to place names; develop timeline-map rendering of chronological biographies or histories (lists of dates, places, and events); enable scholarly users of the prototype to query social-professional networks; develop graphical displays of complex, dense networks; and develop graphical displays of organizational charts, and sequential displays of organizations merging or dividing.

Thirteen consortia and over thirty-five leading research repositories in the U.S., U.K., and France are contributing source data, either finding aids or archival authority records. Among the contributing repositories are the U.S. National Archives and Records Administration (NARA), Smithsonian Institution, Library of Congress, British Library (BL), Archives nationales (France), and the Bibliothèque nationale de France (BnF). OCLC WorldCat is contributing over one million MARC archival descriptions. OCLC VIAF (Virtual International Authority File) and the Getty Vocabulary Program are contributing authority records to be used in match processing. By expanding the quantity and diversity of the data, the project will be able to further develop its processing, indexing, and display methods, public interface design, as well as address the challenge of scale.

For more information, please visit the SNAC web site (http://socialarchive.iath.virginia.edu).