We acknowledge Australia’s Aboriginal and Torres Strait Islander peoples as the Traditional Custodians of the land on which we work and live and give respect to their Elders, past and present.

Read our Statement of Reflection

Your Cart

Your cart is empty right now...

Discover what's on
Your Stuff
Lists
No lists found
Create list
List name
0 Saved items
Updated: a few seconds ago
Getting Started
Get started with Your Stuff

A free Your Stuff account allows you to save, list and share your favourite collection items and articles. This account will give you access to Your Stuff, NFSA Player and Pro. You will need to create an additional account for Canberra event tickets.

Confirm
Skip to main content
National Film and Sound Archive of AustraliaNational Film and Sound Archive
National Film and Sound Archive of Australia
National Film and Sound Archive
National Film and Sound Archive of Australia
National Film and Sound Archive

Fantastic Futures 2024 - Day 2 - Session 12

2024

Fantastic Futures 2024 - Day 2 - Session 12

2024

    Just one more access point: LLM assistance with authority control

    Presenter: Laura McGuiness

    The National Security Research Center is a library within the Los Alamos National Laboratory (LANL) – a US laboratory responsible for solving national security challenges. In a field where reliable author attribution and access to research is critical, Laura McGuiness talks about the challenges of name disambiguation stemming from inconsistent sources, their investigation into utilising machine learning to create valid name authority records and the complexities of implementing a robust system.

    Fantastic Futures 2024

    Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.

    Learn more about this event at the Fantastic Futures 2024 hub

    Just one more access point: LLM assistance with authority control

    Presenter: Laura McGuiness

    The National Security Research Center is a library within the Los Alamos National Laboratory (LANL) – a US laboratory responsible for solving national security challenges. In a field where reliable author attribution and access to research is critical, Laura McGuiness talks about the challenges of name disambiguation stemming from inconsistent sources, their investigation into utilising machine learning to create valid name authority records and the complexities of implementing a robust system.

    Fantastic Futures 2024

    Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.

    Learn more about this event at the Fantastic Futures 2024 hub
    • This transcript was generated by NFSA Bowerbird and may contain errors.

      Hi everyone. So my name is Laura McGinnis. I'm a metadata librarian at Los Alamos National Laboratory. I don't know how popular the Oppenheimer movie was outside of the United States. So for those of you who haven't seen it, I'll introduce Los Alamos just a little bit. Let's see. Okay, so Los Alamos National Laboratory is a United States national laboratory responsible for solving national security challenges. LANL's National Security Research Center is the lab's library housing tens of millions of scientific, technical, and historical materials related to the United States nuclear history. Authority records, for those of you who are not familiar with that, are used to disambiguate author names within databases. Frequently created in Mark 21, authority records are highly structured data that authorize a preferred name format and then cross-reference variant or related names to the correct author. These authority records ensure standardized access points for users, creating a linking framework for related or identical names in a database. As you could imagine at LANL, a researcher's work is often critical to ensuring national security, making it imperative that all possible access points are available for use in a database. Currently, LANL has several large institutional databases lacking name authority records. So this lack of access points is just contributing to information noise, producing overwhelming and ambiguous search results. I believe that we can solve LANL's name authority problem with three steps. The first one being the generation of valid name authority records. The second being the disambiguation of similar names and the implementation of automatic authority control within our institutional repositories being the last. The automatic authority control is my utopia, and I think it is also for anyone who spent a significant amount of time doing authority control, they'd probably agree with that. So far, we've pursued a couple different options for authority record generation, including the use of an LLM. At this time, we found success with using a name entity recognition to identify parts of names and then employing a transformer embedding model to create a vector representation of those names. This output of embedding has allowed us to query the records with the provided author name and then an inferred corresponding name in the database can be obtained. We're using several API calls to grab identifiers such as ORCIDs, Scopus IDs, Library of Congress and VIAF authority records, et cetera. Just to increase the fullness of the record and ensure it's adhering to applicable standards, it can be used in our library. To aid in disambiguation and ensure adherence to authority record standards, we've also chosen to include author affiliations if present. For authority control within the repositories, automatic authority control that is, We're considering an LLM or RAG architecture to compare new author data to existing authority records. At first, manual inspection of the authority records is going to be key. But after that, we're hoping to create a harmonic score where we can compare our authority files to ground truth authority files pulled from BIAF, Library of Congress, created by me, et cetera. Then we'll push that to do iterative changes to the LLM if that's what we choose to use, essentially engaging in reinforcement learning from human feedback. All right, and just a side note that if anyone has the same particular interest as I do in ethical questions in name authority control, there's a really great book. So accurate authority records are especially important for traditionally underrepresented groups who've been disproportionately affected by name ambiguity in databases. This includes individuals who are more likely to change their name over time, names that have been transliterated from a foreign language into English or remain in non-Latin script. The last couple talks were really interesting to me for that reason, as well as double-barreled surnames. Name disambiguation contributes to reliable bibliometrics, and it helps ensure scientific discoveries at LANL are correctly attributed. In a field where gender and racial minorities often experience lower authorship attributions, authority control is giving recognition to scientists whose contributions have been overlooked in scientific databases for several decades. While the benefits to name authority control are many, there's a lot of ethical concerns to consider in the projects surrounding both AI use and authority control. For the former, and we talked a lot about it here, we're going to be mindful of ensuring transparency of the training set, of the outputs, of the decision-making processes. If we do use an LLM at any point, our team is going to be documenting criteria chosen for inclusion or exclusion of certain data sets. to provide a clearer understanding of the LLM's learning processes. This includes ethical concerns with the training data, such as the incorporation of RDA's Rule 9.7 or Mark 21's 375 field, which required assigning gender to authority records. Attributed by scholars is responsible for outing gender identities through authority records. This is one piece of training data that we're going to exclude. While LANL librarians didn't abide by these cataloging rules to begin with, the possibility of using external authority records from BIAFRA Library of Congress, for instance, may introduce this concept, so that's something that we have to keep an eye out for. To mitigate these concerns, our project team is establishing currently internal accountability measures that are going to assess potential impact and biases. That's probably the end of my five minutes, so I'm going to turn it over to the next speaker, but thank you so much.

    Industry professional? Go Pro

    Need to license this item? A/V professionals and researchers can shortlist licensing enquiries via our NFSA Pro catalogue search and membership.

    Get started with PRO

    Collections to explore

    • Start your own collection

      A free Your Stuff account allows you to save, organise and share your favourite videos, audio and stories.

    Personalized your experience

    Save, create and share

    With NFSA Your Stuff