ROBIN FAY: Are they making the appropriate decisions? And do you trust them to make the appropriate decisions? And how can you trust AI? AI has no right or wrong right now. It just is a giant remixer. I kind of called it the Mad Libs of data. It just mashes up all of this data it finds. It fills in boxes. It doesn't understand why it's filling in the boxes or what that actually means. [MUSIC PLAYING] CHARLIE BENNETT: You are listening to WREK Atlanta. And this is Lost in the Stacks-- The Research Library Rock 'n' Roll Radio Show. I'm Charlie Bennett, in the studio with Fred Rascoe and a whole bunch of broken cardboard boxes that are filled with who knows what. Each week on Lost in the Stacks, we pick a theme and then use it to create a mix of music and library talk. Whichever you're here for, we hope you dig it. FRED RASCOE: Our show today is called "There's a New Metadata Intern in the Office." CHARLIE BENNETT: Oh. I totally missed that search committee. Did we-- who's the intern? FRED RASCOE: Well, the intern, Charlie, is AI-- CHARLIE BENNETT: Oh. FRED RASCOE: --a tool for information processing that has, much like interns, great promise, little experience, and probably an overwhelming amount of student loan debt. CHARLIE BENNETT: That joke cuts today. All right, I'll go with it. Our guest is a metadata and cataloging librarian and teacher. And she's going to let us just what this new intern in the office can do for us in libraries, what it can't do, and why we have to watch it like a hawk so it doesn't steal the petty cash. FRED RASCOE: Honestly, maybe one of the least harmful things it could do. Our songs today are about patterns, the lack of human qualities, and things that don't quite know what they're doing. CHARLIE BENNETT: Fred, I am right here in the studio with you. FRED RASCOE: No, it's not you, Charlie. The AI, ChatGPT is trying to work its way into the workplace. And, like a new and inexperienced intern, it may have some neat tricks, but you've really got to closely supervise it and double check its work. After all, it's just a robot. So let's start with "Robot" by The Future Heads, right here on Lost in the Stacks. CHARLIE BENNETT: Ooh, quick stop. That was "Robot" by The Future Heads. This is Lost in the Stacks. And today's show is called "There's a New Metadata Intern in the Office." It's all about AI creating metadata. FRED RASCOE: And on the show today is past and likely future guest Robin Fay, sometimes known as georgiawebgurl. That's all one word. And girl is spelled G-U-R-L. Robin is a cataloging metadata librarian and trainer who has worked in libraries and higher education and consortia throughout her career. She teaches Library Juice Academy classes, writes books, and thinks a lot about metadata. CHARLIE BENNETT: Will you start us off with as succinct a definition as you think is useful of AI? ROBIN FAY: There's two pieces of AI. There is the artificial intelligence, the AI piece of it. And then there's machine learning, which does fall into AI. So machine learning is really the ability of machines and computers and things like that to be able to recognize patterns of data, sort of like relevance ranking in a search engine, to recognize patterns of data and to pull that information together. We sometimes hear the term "big data" associated with things, too, taking all of this data and repackaging it in a way that is useful. That often falls on machine learning, of being able to look at data, being able to look at patterns, and to make not really a decision based upon any specific logic, but more applying logic rules. Does this term appear the most frequently? Where does this term appear in the data, things like that. Artificial intelligence uses machine learning. So that's how it learns. It is using that same process to pull together data and then repackage-- at least in terms of what we're doing with these AI generators, to repackage that data back to us. So it is looking at the landscape of data and then taking what it can find and making decisions. But it's really pattern recognition, things like relevance ranking, data volume, things like that, sort of the same principles we think of with big data, looking at all of that and then repackaging data back to us. CHARLIE BENNETT: Then what follows from that is, what is metadata? If data is all that stuff that it's pulling in and handing back, what is metadata? ROBIN FAY: Well, metadata is really interesting because it is a combination of a human-centered approach to data, but it also does tap into automation, it taps into automated processes, which are a little bit different than AI because they typically have at least a little bit more control or organization surrounding them. So metadata is really that data across the landscape. And I would say there is a very big human component to that because we're either evaluating the data, writing the systems that use the data, or we're creating the data ourselves. Whoever is working with that data is part of that landscape. So even though data-- metadata as a whole is out there everywhere, and cameras are embedding metadata or phones embed metadata, there's still a person behind that. So there's always that programmer or person behind that particular piece of software that makes the decision about what kind of information we record or what kind of data that that particular system is looking at. Now, you could say a little bit the same thing for artificial intelligence. If you go back far enough, there are people behind it, who developed it. But it is meant to run on itself using its own patterns of recognition and looking at this large group of data, really without any human oversight. I think that's, to me, the distinction between AI and traditional metadata, is that there is a much more human involvement either in the development of the data, deciding on what data is recorded, or in the creation of the data, whereas with artificial intelligence, yes, there are people who wrote the programs when they first started. But it is continuing on its own to look at patterns of data and make decisions. And so it might deviate from something that was originally decided by its programmers. I don't really know how many guardrails have been put into place for these things. CHARLIE BENNETT: It feels like maybe AI is the tool to make it so we have to make fewer decisions or automate decisions. Does that sound right to you, making decisions about data, curating it in a way, or deciding what's important or what's going to be revealed? ROBIN FAY: There's a lot of different goals with AI. The AI evangelists or champions would say it's going to make your life easier. It will free you up from ordinary tasks, and you'll be able to focus on things. That's been the argument of computers since they first came around, is we're going to be able to do more with less effort. That's even the automation and the industrialization age, right? We're going to do more with less. We'll have less people working there, but we will build more cars. And I think it just is a continuation of that philosophy of doing more with less. But I do think in terms of data, that piece of it gets really interesting because as more of this data gets out there, then there is that opportunity for the data to be used for either good or bad, depending upon what kind of data it's creating. So it's a very interesting place because, unlike a piece of equipment or even a robot that you may program that does a specific task, AI really doesn't have as many specific tasks built into it. As far as content-creating AIs, it's meant to be able to create kind of any content and to, as more content is put into it, as more people interact with it, to look at what it's told is right and wrong, look at the content that it's given, and to be able to then build more of that kind of content. So it'd be very easy to derail it. FRED RASCOE: So from the librarian perspective, I guess the obvious question is in our work, a lot of a librarian's work is assigning metadata. Can that metadata assignment be done by a machine accurately, either now or in the future? ROBIN FAY: So I think definitely in the future. There are challenges to the resources we create. It's a gradient. So if you have a commercially published book-- the latest John Grisham novel, every library in the United States has at least one copy of it-- there's going to be so much data out there from the publisher, from Goodreads, from all these different data provider source, even blog posts that people have written about it. FRED RASCOE: Just like any web scraper could grab it. ROBIN FAY: Right. So then what AI would do is it's going to look at all of that. And then it's going to-- if I say, "I want to write metadata about that," then it's going to go to the obvious sources. It's going to go to worldcat.org. It might go to your library website. It's going to look at those data sources for that type of data. So if I say I want to create a MARC record, it's going to go out and it's going to look for MARC records. And it's going to find a MARC record and start mashing up that information with other stuff that it finds. For a commercially published thing, where there's lots of them, it works OKish. And it doesn't understand the best practices, standards, and rules that we use. So it just does not understand how to apply those. Can it code, properly code a mark record? It can do that because there's the MARC manual. And it tells you what to put in there. And so it can take that information. Mostly it's looking at other MARC records to pull that information in. What information? How is a publisher field coded? Well, I'll code my publisher field like that. In some cases, it just straight up steals information. I've seen it create records and say the Library of Congress created record. CHARLIE BENNETT: This is Lost in the Stacks. We'll be back with more about this new intern who steals in the office with librarian Robin Fay after a music set. FRED RASCOE: And file this set under TK7882.P3J36. [MUSIC PLAYING] "Electrical Language" by Be Bop Deluxe. Before that, "Pattern People," the Jimmy Webb song, by 5th Dimension. Those were songs about recognizing patterns in people and how they get repeated artificially. [MUSIC PLAYING] CHARLIE BENNETT: This is Lost in the Stacks. Today's show is called "There's a New Metadata Intern in the Office." And that intern is AI, doing the metadata and cataloging work. FRED RASCOE: And our guest is Robin Fay, a metadata and cataloging librarian who has been thinking a lot about AI recently. CHARLIE BENNETT: OK, so let's try and sum up at this moment. So we are pattern-recognizing animals. And we now have a tool that will do the pattern recognizing for us even better. And then it'll make decisions for us, especially when we're trying to categorize and classify information. And a big part of those decisions comes from whatever is most prominent, whatever is oldest and most accepted. So AI is a deeply conservative Luddite. ROBIN FAY: Kind of. So I say it's like a person who's new to work and to the profession. CHARLIE BENNETT: Oh wow. ROBIN FAY: So let's say you have no training. Let's say you have no training in libraries or you have no training in metadata. So what would be one of the first things you would do? You're going to go look at other library records. And you might not even understand what those fields mean, but you think, oh that's a Library of Congress record, or Georgia Tech created that record, or whoever created that. They're a good library. So I'm going to pattern what I do based upon that. I'm not going to understand whether or not that record is actually right or whether or not that record has been updated to be a modern standard or whether or not you've added some local information. FRED RASCOE: Or if it's biased in any way. ROBIN FAY: Or if it's biased in any way. I'm not going to understand that. I'm only going to look at that and make the decision, this is good data. So it doesn't have the ability to really make a decision. It's just basing it upon, I've gone to this data provider that's ranked higher in my list, maybe. CHARLIE BENNETT: Yeah, OK. So I am an utter cliche. What I hear is this is a way to remove a job that a person would have, right, to cut costs and create efficiency based on a mediocre result as opposed to an experienced, qualified, eccentric, or even inspired choice that might be made by a person. I also hear that it's the triumph of the majority, the tyranny of the majority, but in this way, the majority opinion. And so part of me is totally 16 and reading The Illuminatus! Trilogy and getting very angry about the fact that the creeping sameness of culture is coming for us. ROBIN FAY: And I think that's a really valid point. If everybody uses AI to create their data, it will be very standard in that way. There will be no looking at records to see, I'm an engineering library. Should I use a different controlled vocabulary for my users? I mean, that wouldn't exist unless you tell it go out and look at this vocabulary. Now, in terms of how accurate it can be, it does have issues interpreting our standards. What I have found from the projects I've either lurked on or been part of is that if you are looking at a very general level, I want to assign, say, subject headings or a controlled vocabulary term to pick a topic for this thing, if I'm at a very general level-- virtual reality, dogs, libraries-- it can generally look at a resource and make that high-level assignment. So it can be a good starting point. I am new to the libraries, I have no idea what I'm doing, it can certainly be a good starting point. My concern is that it will either be seen as, well, we just don't need these people because this can do the work, without understanding that we're just completely throwing quality control out the window, and not only throwing quality control out the window, we're throwing out the window, what we tout as such an important part of libraries, meeting the needs of our users because AI doesn't really know who our users are or know what our user needs are. So it's just going to create this generic record based upon what is hodgepodged from the internet. And I think that piece of it is one of the things that I'm really concerned about, is I can see it as a great tool. If I need to start a record, we have other ways to start a record. We can actually clone or copy a record and then edit an existing record. So that's existed for a long time, the ability to copy data and to edit it. So that piece of it has existed. I could also use ChatGPT for the same thing. Now, if I am working on an image or some multimedia, ChatGPT-- and I'm not just pointing the finger at that because I found it with other editors, too. If there's no known information about that resource that it can get to, then it just may completely give up. You can stump these things if you have something that there's just no information about it. So for digital archives, special collections, community history projects, digital humanity projects, scholarly communication projects, where the only information coming from that resource is the resource itself, if that resource is not already digitized and artificial intelligence can get into that content, it's not going to be able to do anything with it. And it gives up very quickly. It's just like, sorry. Unlike a person that would do something and say, OK, I don't really know about this, but I'm going to find out. and I will make some data about it. These just give up. So I think that's the other piece that people don't understand, is it really only works for known things, it doesn't understand the standards, and it really assigns topics at a very general level. If you start wanting to try to assign things very granularly, like this topic and in a particular region and getting to all of those various pieces of metadata, where we get very, very granular, it just gives up. Well, it doesn't really give up. It's just wrong a lot of times. [LAUGHS] FRED RASCOE: Yeah, sometimes it would be preferable if it gave up. But a lot of times, ChatGPT just makes stuff up. ROBIN FAY: Yes, it does. FRED RASCOE: I've asked it to summarize things that exist as text on the internet but, obviously, hasn't been fed into its large language model or whatever. And it gives me entirely false information about who created it confidently. And, obviously, I can check and say, well, that's obviously wrong. But I think I would prefer it if it would just say, yeah, I don't know. ROBIN FAY: Yeah, and that-- actually, I did an experiment just with myself and asked it to write a bio for me. And it did pretty well for the first-- it took my Library Juice bio word for word. So it just went out, found it, plopped it in. But then when it tried to build the next paragraph, it mixed me up with a musician. So then I became a librarian famous musician. So I was like, whoa, yeah. It doesn't understand identity that well at all. So there's it does have some potential, but there's a lot to-- right now at least, it does not have the ability to analyze data in a way that we can or make judgments and decisions. FRED RASCOE: You are listening to Lost in the Stacks. And we will continue our performance evaluation of the new metadata intern in the office on the left side of the hour. [MUSIC PLAYING] [OK GO, "WTF?"] DAMIAN KULASH: Hey, this is Damian from OK Go. And you are listening to Lost in the Stacks on WREK Atlanta. OK GO: (SINGING) I've been trying to get my head around what the [INAUDIBLE] is happening. CHARLIE BENNETT: Today's show is called "There's a New Metadata Intern in the Office." We are, of course, not talking about a real intern, but artificial intelligence. So we'd like to read to you from an article about artificial intelligence creating metadata. That is on this very topic. This is from the conclusion of an article titled "The Viability of Using an Open Source Locally Hosted AI for Creating Metadata in Digital Image Collections." And let me say that again, here is an article about whether you can do what we're talking about. It's written by Ingrid Reiche and published in the Code4Lib Journal last month. Sheeko, mentioned here, is an open source AI project overseen by a team from the J. Willard Marriott library at the University of Utah. FRED RASCOE: The goal of this case study was to test two of Sheeko's pretrained machine learning models to ascertain if they would produce captions that could be used as descriptions or titles and, thereby, reduce the amount of time and labor required by staff to describe images in the University of Calgary's digital collections. CHARLIE BENNETT: Sheeko's results show that it does produce captions that could be used as a basis for descriptions with moderate human intervention. The captions would not be suitable for title creation without significant human intervention. FRED RASCOE: Were Sheeko to be used for description generation, it is unclear if the amount of time required to select the most appropriate caption from the three choices generated and then create additional information or modify existing description with information from the Sheeko-generated captions would be less time than it would take for a person to create description metadata from scratch. CHARLIE BENNETT: At the end of this case study, it was determined that Sheeko would not be pursued as a potential AI solution for creating descriptive metadata at this time. Ooh, Fred, I guess we'll have to check in on the University of Calgary in two or three years to see if they're getting back into that Sheeko or not. FRED RASCOE: I'll think so. CHARLIE BENNETT: File this set under BF408.N3548. [MUSIC PLAYING] "Sunset on Humanity" by Dear Nora, and before that, "Expressive Machine" by J. Fernandez. Those were songs about the loss of creativity and other unique human qualities. [MUSIC PLAYING] This is Lost in the Stacks. And today's show is called "There's a New Metadata Intern in the Office." FRED RASCOE: And our guest today is Robin Fay, a metadata and cataloging librarian. We are discussing AI's potential as a metadata creating tool. CHARLIE BENNETT: We've talked about how AI is not dangerous on a Skynet kind of way. It's not going to destroy the world, but it is going to make us all stupid and boring. Obviously, I'm having a little bit of fun, but I'm having fun because the top of my head is going to pop off from how stressed I feel about the kind of corporatization or McDonaldization of metadata that this implies. Only the stuff that's easily consumable and easily replicable and is generic to the point of mass application will survive if everything is AI generated. So what can we do? What is the proper and useful criticism or useful, practical critique that we can apply to AI? ROBIN FAY: Well, I think there's a couple of things. One, I do encourage people to use AI and play with it so that they can see for themselves. But, as I've seen people do, you can't just create a record and go, wow, look, it created a record. We have higher standards than that. And any system can create a record for us just about. So it's about evaluating the data. And the same thing when I asked it to write a bio for me, I looked at that first paragraph and I was like, wow, I know exactly where that information comes from. Now, did it site Library Juice Academy as the source? Oh no, no. It just took that information right from the website and plopped it in as my bio that it was going to give to me. So I think that's the other piece of it, is then looking at experimenting with these things. How would we use this in a library setting? Are we going to use it to answer reference questions? Are we going to use it to be our response to email box request or scheduling request? Are we going to use it to write content for the website or create a blog post or a record? And if so, what does that actually look like. And I don't mean it created some content that looks good. I mean, if you actually read it and review it as a human being with the proofreading hat on, does it meet that standard? FRED RASCOE: So I attended an event about ChatGPT here at Georgia Tech not too long ago. And one of the faculty members presenting said, "Treat ChatGPT like an intern." Is that the level where you are? ROBIN FAY: Yes, yes. I have said, it is like a person who has never worked in a library before, and they've been plopped down in front of a computer to do whatever job you tell them to do. CHARLIE BENNETT: This is really resonating with me in my feelings about Wikipedia. So I love Wikipedia as a resource to, say, read the liner notes of an album that I only have heard through streaming. So it can tell me exactly what was already printed, you know who played bass and who produced. But if we use Wikipedia as a way to structure what we think of a topic, if we rely on that created by committee secondary-source tool, it numbs and deadens the way we think about things. And that's really my worry here. ROBIN FAY: Well, I do think it has the ability to reduce creativity. Now, I've also-- one of the things I went to was a whole forum or discussion on AI for artists and creators and things like that. And so some of the ideas out of that were use it if you encounter writer's block. So let's say, for example, you don't know what to write about this particular topic for a library event that's coming up, or you don't know anything about this particular area of, say, virtual reality, and you have this mixed media thing that you need to create metadata about. Go ahead and use it to see what it gives you. And use that as a starting point. So it would be kind of like having an intern. They write some copy for you. And you can say it's copy for data, copy for a website, copy for whatever. And then you're going to sit down and you're going to edit it, make sure that they haven't said anything that would be an embarrassment or violate policies or is just wrong. You're going to do all of that. But you're going to have to do that every time you use AI because you're never going to truly know, to what extent does it understand our library policies? To what extent does it understand our specific community needs in terms of data? Our local policies, our standards are larger national standards. How does it preference? What does it understand in terms of a national standard? Right now, it's just looking at data recognition patterns and looking at all these different data sources to cobble something together. So it doesn't have the ability yet. Now, will we get there? Will we get to the point where we go beyond machine learning and artificial intelligence is actually applying its own understanding of information? Probably one day. We're not there yet. In spite of the fact that people see artificial intelligence and they think that somehow it does understand, it really doesn't. It's just looking at all of this different data. And the decisions are very logic oriented. Choose this because of this reason, or this data appears more frequently. Use that. Or within the summary of this particular work, it mentions virtual reality 10 times. So let's use that as a subject. That's in the controlled vocabulary. So that's the kind of thing it does right now. But then the human side of it is that we may get away from less data entry. So it may be that as metadata practitioners, we're doing less typing. I mean, that's been my hope my whole career, is that we can just stop keying in stuff so much. Let's link to data. Let's use the data that's out there. But still have that human oversight and that human judgment of, I'm going to link to this source because it's a quality source that provides additional information. Maybe I'm going to link to Wikipedia because it provides some things that is not anywhere else in the data, or maybe I'm not going to link to Wikipedia because that's just a stub article and it doesn't really add any information. But that takes my ability to look at that information. It's not about how short or long a Wikipedia entry is. It's about whether or not it's quality information that would add to that. And right now, these AI creators, they're just creating data. They don't have the ability to really analyze and apply quality standards. I don't mean to get pessimistic about it because I see so much opportunity. And I do think people should experiment with artificial intelligence so that when other people come to you or administrators come to you, whoever comes to you and says, well, we don't need these staff, we can just-- or this person is retiring, we can just use our artificial intelligence instead, then you can have that nuanced conversation about here's where we can use artificial intelligence and what it can do, the intern, and here's where we need an expert. CHARLIE BENNETT: Robin, thank you for being on the show today. ROBIN FAY: You're welcome, as always. CHARLIE BENNETT: Our guest today is Robin Fay, sometimes known as georgiawebgurl. Robin is a cataloging metadata librarian and a trainer who has worked in higher education and with consortia throughout her career. She teaches Library Juice Academy classes, writes books, and thinks a lot about metadata. Fred, did you notice that I just took the bio from the beginning and slapped it right in there like an AI? FRED RASCOE: I'm going to call you CharlieGPT. File this set under Q335.A7852. CHARLIE BENNETT: You wound me, Fred. [MUSIC PLAYING] FRED RASCOE: "Inept" by a/lpaca. And before that, "No Reason" by Sunny War. Those are songs about things that don't know what they're doing. [MUSIC PLAYING] CHARLIE BENNETT: Today's show is called "There's a New Metadata Intern in the Office." That intern is AI. And if it creates metadata, a person has to clean it up. Fred, is there a part of your job that you would, with minimal oversight or even no oversight at all, hand off to AI? FRED RASCOE: Minimal or no oversight? I don't think so. I think there's probably things that I do that eventually-- I mean, the trend seems to be that AI will probably be able to do some of those things. CHARLIE BENNETT: Get that deep machine learning, figure out how to do a lot more stuff. FRED RASCOE: Like right now, as we talked about in the interview today, I have to create metadata for stuff that goes in the repository. And AI is not going to do that now. But I'm not going to say strictly that it will never happen. I wouldn't be that bold. CHARLIE BENNETT: OK, but nothing right now? FRED RASCOE: I do use it autocompletes and things like that, those minimal uses of AI. CHARLIE BENNETT: That's like baby AI, isn't it? FRED RASCOE: Yeah, and the first sign that I see that things like that work, like I use it to make my life easier when I can. How about you? CHARLIE BENNETT: Well, I think you hit it, email. My first thought is, absolutely not. There's no eye replacement for stuff I do, even the automated tasks I do. I don't know how I would tell AI to do it. But maybe email responses. Shout out to John Flintoff, the author who I interviewed a long time ago. "It would be great if AI could figure out when to send that email that says, thank you so much for your email. I will never respond to it." And I think it probably could. Well, while I dream of that, let's go ahead and roll the credits. [MUSIC PLAYING] Lost in the Stacks is a collaboration between WREK Atlanta and the Georgia Tech Library, written and produced by me, Charlie Bennett, Fred Rascoe, which is you-- FRED RASCOE: It is me. CHARLIE BENNETT: --and Marlee Givens. FRED RASCOE: You're getting into the credits guitar music. CHARLIE BENNETT: I'm so into it. FRED RASCOE: Legal counsel and a set of very elaborate descriptive tags for our filed trademarks-- CHARLIE BENNETT: Thank you. FRED RASCOE: --were provided by the Burrus Intellectual Property Law Group in Atlanta, Georgia. Thanks, Philip. CHARLIE BENNETT: Special thanks to Robin for being on the show, to all the librarians trying to understand AI and what it can do. And I'll just riff this. And thanks to Soundgarden, Black Sabbath, and Chief Seattle for this song that's happening right now. And, thanks, as always, to each and every one of you for listening. FRED RASCOE: Our web page is library.gatech.edu/lostinthestacks, where you'll find our most recent episode, a link to our podcast feed, and a web form if you want to get in touch with us. CHARLIE BENNETT: Next week's Lost in the Stacks is a rerun. And we'll be back with some material umph the week after that. FRED RASCOE: OK, it's time for our last song today. OpenAI's ChatGPT can do some very interesting things. But at least, for now, I think it's a glorified party trick. CHARLIE BENNETT: Yeah, all right. FRED RASCOE: So for anything beyond repetitive data entry, humans usually do better for now. Let's close with a song about trying to measure up. This is the classic show tune, "Anything You Can Do I Can Do Better," as recorded by Bing Crosby and Rosemary Clooney right here on Lost in the Stacks. CHARLIE BENNETT: Fred, can you read a room, man? We are listening to Soundgarden cover Black Sabbath's "Into the Void," and you want to follow up with a show tune? FRED RASCOE: Oh, we're not listening to that anymore. We're listening to a show tune now. CHARLIE BENNETT: Yeah, all right. FRED RASCOE: All right. Have a great weekend, everybody. [BING CROSBY AND ROSEMARY CLOONEY, "ANYTHING YOU CAN DO"] ROSEMARY CLOONEY: Anything you can do I can do better. I can do anything better than you. BING CROSBY: No, you can't ROSEMARY CLOONEY: Yes, I can.