AI Assisted Collections Exploration

Coverage of improvements in AI involving Large Language Models (LLM) such as OpenAI’s ChatGPT, Google’s Bard, Meta’s and Microsoft’s LLaMA have been impossible to escape in recent times, both in discussion of what models like these can be used for now in generating text (and images, and videos), and for concerns over what they may be able to do in the future and the impacts this may have on the creative industries (among others). For a quick background on how LLMs operate, this guide from the National Cyber Security Centre (NCSC) may be useful, but an even briefer introduction is to compare them with a smartphones’ predicative text functionality: they are both trying to guess the next appropriate word in a sentence based on the past texts they have been trained on (hence ‘machine learning’), but they are now trained on a grand scale. LLMs are trained on billions of words of text taken from the internet, documents, books, articles and so on.

Exactly what these data sources are is one of the many major ethical problems that comes with LLMs (other varied concerns include privacy, copyright, environmental running costs, the potential over-throw of the human race and so on), as the full details of the training text involved is unknown (although researchers are attempting to discover what more of the training text is). But what is known is that texts taken from the internet may contain “offensive content and language”, which could be replayed back when an LLM is prompted to generate new text. Efforts to prevent this through moderation rules have been implemented in software to manage LLMs, controlling what kind of prompts an LLM will respond to, and which outputs it will generate (or not), but ways to work around these rules are being discovered at the same rate as the rules are added.

As is obligatory with discussions of LLMs, we’ll provide a contrived demonstration of it in action by instructing it to imagine some text for us. We asked ChatGPT (v3.5) to imagine Queen’s Victoria’s diary entry on her visit to lay the foundation stone for the newly renamed Victoria and Albert Museum in 1899 (as depicted here in an early form of cinematography – the Mutoscope) by using information from our collection records. The exact prompt used was, “Search for a recording of Queen Victoria laying the foundation stone of the V&A museum. Using the information in the object record found, write a diary entry in the style of Queen Victoria describing the day”. The result was:

Today, I am proud to share with the world a momentous occasion - the laying of the foundation stone for the Victoria & Albert Museum in Great Britain. I am honored to have been part of this event, and my joy is captured in a Mutoscope photograph which has been preserved in the Prints, Drawings & Paintings Collection of the V&A. It is my hope that this moment will be remembered for generations to come.
– Fake diary entry generated by ChatGPT – see full notebook for complete sequence.

and we can compare it with the real diary entry as made available by the Bodleian and Royal Archives:

[…]At ½ p. 4 started in the open landau at 4, the postillions in Ascot Livery, with Lenchen, Beatrice & Arthur, the rest of the family following, as well as the suite. I had a Sovereign’s Escort. Went up Constitution Hill, down Knightsbridge & Brompton Road to South Kensington, to the site of the new Victoria & Albert Museum.

Immense & enthusiastic crowds everywhere, reminding me of the Jubilee. On arriving I entered a Pavilion, which was very handsome & full of people. Bertie & the rest of the family were on the Dais as well as Ministers, Ambassadors & many Notables. Two verses of “God save the Queen” were sung, as also an Ode, words by the Poet Laureate, & music by McKenzie. Before the latter, the Duke of Devonshire, as Ld President, read an address. I handed my answer & said “It gives me great pleasure to lay the foundation stone of this fine Museum which is to bear the name of my dear Husband & myself”. I then laid the stone, Bertie helping me to spread the Mortar &c, & struck the stone 3 times with mallet, after which Bertie in loud voice said he was commanded by me to declare the stone well & truly laid. A bouquet was presented by one of the students of the Royal College of Music. The ceremony concluded with the Archbishop of Canterbury offering up a prayer & giving the blessing. I drove off amidst a flourish of trumpets, “God save the Queen” & loud cheers. Went straight to Paddington station where there were also immense crowds. Got back to Windsor shortly before 6. — Only the Ladies to dinner.
– Diary entry for Queen Victoria for 17 May 1899

So while not particularly close in this case, the real diary entry mentions lots of unique details which couldn’t be predicated by a language model not trained on relevant knowledge, and evidently the style of writing does not match, as it were, the subject. (Of course, it’s also unfair in that we asked ChatGPT to write about the day in relation to the Mutoscope object, even though that wouldn’t be part of the diary entry.)

The Conditional Probability Machine print by Eduardo Paolozzi. Museum no. CIRC.549-1971

So with all these issues and limitations in mind, is there anything useful we can do with these undeniably powerful LLMs, while avoiding as much as possible the ethical pitfalls and the variable quality of text generation? One area of interest for us is if we can use an LLM to make it easier for users to find items of interest in our collections site. Both Explore the Collections and Search the Archives let you (a human) browse and search for objects or archive items via our web pages, while our Application Programming Interface (API) lets machines (currently programmed by humans) retrieve our collections data via API queries for data exploration. For humans, the user interfaces presented to allow online searching of cultural heritage collections like the V&A have remained more or less the same since their first appearance in the 1990s, a search session usually consists of an initial search for a word or phrase (e.g. “Limerick Lace“) perhaps then refined by a facet such as “Person” to only show matching objects related to a particular person.

For some researchers this is a useful research process to carry out, seeing the results at each step in a query, then allowing for further refinement as needed after reviewing the results. But if you already know your question is “Does the V&A have any objects made of Limerick Lace worn by the actress Ellen Terry?” you might just want to be able to ask that question immediately and get a useful response back, without having to break this query process down into the several steps. Here, LLMs may be able to assist us in searching collections, by responding to a query (“prompt”) from a user, transforming this natural language query into a precise programmatic request to our collections API and then formatting the results into some format the user wants (e.g. a list, a table, etc).

To test this out, we carried out a few experiments with an LLM and the V&A Collections API to see what might be possible. Some immediate caveats:

These searches are artificial and took some time to get any meaningful results. Prompt Engineering is an art form at the moment but will surely improve over time, reducing the strange variations in interpretation that can occur with seemingly trivial changes to a prompt.
Consideration was given of the running cost of each query, this restricted some of the exploration (especially as it is all being carried out on a personal credit card – the overall costs for researching this post were around £5 as of May 2023).
Much of the code is based on the excellent examples set in the langchain documentation.
Much of the same work can be carried out by currently less trendy Natural Language Parsing frameworks anyway.

Bowie & AI Aided Archives Exploration

To make things simple, our first test of an LLM is against our archives site (Search the Archives). Our current API for querying archive records has fewer options than our museum collections API; you can only send search word(s), any matching records are returned, no other API options can be sent. This makes it easier for the LLM to use the API, as it doesn’t have to decide which of many different query parameters to send (and incidentally, it also reduces the costs of using a paid AI service, as usage is charged per token (word) parsed).

The full details are available in this notebook if you would like to see all the steps (with more detailed discussion of some of the programming issues), but in summary:

We tell the LLM where our API specification is, which it then reads in to try to understand how to make use of our API
The LLM read and interprets the user query
The LLM then formulates a query to the API based on a default “prompt” instructing it to carry out a relevant API query on behalf of the user to satisfy their search query
The LLM interpret the response received from our Archives API and summarises the results for the user as requested in the original query or prompt

The code for doing that looks like this (we are making use of the excellent open source langchain framework which lets us put these steps together) :

spec = OpenAPISpec.from_url("https://api.vam.ac.uk/specs/openapi-vam-archives.yaml") operation = APIOperation.from_openapi_spec(spec, '/v2/archives/search', "get") chain = OpenAPIEndpointChain.from_api_operation( operation, llm, requests=Requests(), verbose=True, return_intermediate_steps=True # Return request and response text ) output = chain("What David Bowie items do you have in your archive ?")

which runs the query (output reduced & reformatted, see full output in the notebook):

Entering new OpenAPIEndpointChain chain... Prompt after formatting: You are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions. API_SCHEMA: ```typescript /* Full text search of archive records using the 'q' query parameter as a search query. Returns matching a subset of fields from matching records */ type archive_search_v2_archives_search_get = (_: { q?: string, page?: number, page_size?: number, }) => any; ``` USER_INSTRUCTIONS: "What David Bowie items do you have in your archive ?" > Finished chain. {"q": "David Bowie"}

This first step looks at the API documentation and works out from the user query what it should send to the Archives API. The response is any archive records that contain the text “David Bowie” (see what out API response to this query would be in full). After the API results are received we send them onto the next step in the chain to get a summary.

Entering new APIResponderChain chain…

Prompt after formatting: You are a helpful AI assistant trained to answer user queries from API responses. You attempted to call an API, which resulted in:

API_RESPONSE: { [API response removed for space] } > Entering new APIResponderChain chain... Prompt after formatting: You are a helpful AI assistant trained to answer user queries from API responses. You attempted to call an API, which resulted in: API_RESPONSE: { [API response removed for space] } > Finished chain. We have 11 David Bowie items in our archive, including a 12" record of <i>Baal</i>, a press and media pack, and four files entitled 'David Bowie is missing', 'David Bowie is here', 'David Bowie is ahead of himself', and 'David Bowie is watching you'. We also have a cuttings book for October 1977 to February 1978 for Round House concerts, and a file called 'David Bowie is happening now. A live nationwide cinema event. 19.00 Tuesday 13 August 2013.'

which you can see in a different form if you used the existing search on Search the Archives. (It should be noted these results are our present holdings, the recent acquisition of the David Bowie Archive will greatly increase these numbers as cataloguing work starts).

So a fairly simple query was carried out successfully and a page of results of JSON data was turned into a helpful summary paragraph. So far, so good.

AI Aided Collections Search

But really we didn’t need an LLM for that very simple query. For a more complex example we can use our museum collections API which has many more options to let users search specific fields, browse records based on matching identifiers (used to standardise concepts such as Places, Persons, Materials, Techniques, etc), control which fields are returned, filter on records only with images, and so on. The LLM would need to decide which of those many options it should use in a query (we do give it a bit of a helping hand by telling it which of the multiple API endpoints to use instead of it guessing the right one, which again reduces the costs by not parsing unnecessary tokens)

To make it even more useful for our users, we see if the LLM can aid a search by converting any terms in a question that can be expressed more precisely using controlled vocabulary identifiers, for example instead of searching for the semi-precious stone “jet” you can say you only want objects that match on the material identifier “AAT:45514” (which is the Getty vocabulary identifier we use, in a slightly altered format). This avoids any confusion with jet being used with a different meaning, e.g. jet planes.

Steps we need for this are (see all the steps and code in full in this notebook):

We ask the LLM to identify any words in the query that might be used for an artistic style.
The LLM then looks those words up in the Getty Vocabulary API website to turn them into vocabulary identifiers.
We tell the LLM where our Collections API specification is, which it then reads in to try to understand how to make use of our API.
The LLM reads and interprets the user query, removing words it has recognised in step one as an artistic style.
The LLM then formulates a query to the API based on a default “prompt” instructing it to carry out a relevant API query on behalf of the user to satisfy their search query (but with filtering on artistic styles if detected).
The LLM interprets the response received from our Collections API and summarises the results for the user as requested in the original query or prompt.

Code & Results (abbreviated)

Example (summarised here, full notebook available)

User Query - "Search for an painting with a title of 'interior' painted in early victorian style" Artistic Styles - {'search_q': 'Search for an oil painting showing an interior done in early victorian style', 'styles': '\n\nEarly Victorian,'} Revised API Query - {"q": "interior", "q_object_title": "interior", "id_style": "300021233", "search_profile": "ymal", "data_profile": "search", "images_restrict": "2", "data_restrict": "all", "images_exist": true} Results - The API response contains two records which match your query parameters. The first is an oil painting titled 'An Interior', created by William Mulready (II) in ca. 1835-ca. 1842 in Great Britain. The second is an order book titled 'Cowtan wallpaper order book' created by Cowtan & Son in 1836-1841 in London. Both objects match the 'early Victorian' style, as indicated in the response

(The second result shouldn’t really be returned as it doesn’t match on ‘q_object_title’ so it looks like the LLM has revealed a bug in our API!)

This has quickly found the painting we were looking for. An example search just using the words “interior early victorian” would return a very different set of results. So, with some effort (finding a relevant AAT identifier is a little tricker than expected, as we have to parse the HTML response at the moment from the Getty Vocabulary site) we can improve a user search for queries in this form.

AI Aided Results Interpretation

So, the LLM can (sometimes) generate an improved API query from a question and return the results. We can perhaps improve on this even more by asking it to format the response for us to return just the fields of interest.

Steps we need for this are:

We tell the LLM where our Collections API specification is, which it then reads in to try to understand how to make us of our API
The LLM read and interprets the user query
The LLM then formulates a query to the API based on a default “prompt” instructing it to carry out a relevant API query on behalf of the user to satisfy their search query
The results are received by the LLM and it re-formats them as asked into a table

Output (abbreviated, full code & output in this notebook)

Results Formatting Instruction - "Output a table with columns 'Decade' and 'Object Count' based on the following data, grouping objects by decade using the accessionYear column." User Query - Objects by John Constable from the C18th. Ask for a CSV response with a page size of 45. API Query - {"q_actor": "John Constable", "year_made_from": 1700, "year_made_to": 1800, "search_profile": "ymal", "data_profile": "search", "response_format": "csv", "page_size": 45} API Response - [CSV of matching results omitted for space] Formatted Results: | Accession Number | Primary Title | Primary Maker | Accession Year | |-----------------|--------------|--------------|---------------| | A.202-2002 | The White Horse | John Constable | 1790 | | A.202-2003 | The Hay Wain | John Constable | 1821 | | A.202-2004 | The Cornfield | John Constable | 1826 | | A.202-2005 | Salisbury Cathedral | John Constable | 1826 | | A.202-2006 | Flatford Mill | John Constable | 1812 | Decade | Object Count ---- | ---- 1790s | 1 1800s | 2 1810s | 1 1820s | 2

Quite a few things have gone wrong here, but even so it’s amazing to see how well the overall query has been handled and that the output is in the structure asked for (but with mostly invented data). The API query has searched based on the artist/maker name (“q_actor”) and set the year dates exactly as needed for objects for the eighteenth century (but it then also set a couple of other unnecessary parameters for some reason). The results return the objects as expected (see in the notebook the full output), but then the LLM seems to ignore the data entirely and has listed instead 6 paintings by John Constable that are held by the Tate, Frick, National Gallery but none of the ones at the V&A (this must be based on data from its training). It has then correctly summarised that erroneous data into a table as requested. (no doubt this could be resolved by further tweaking the prompt to make this work more as expected.)

Conclusions

Obviously these were artificial examples, but even with minimal investigation the power (and risks) of the LLM become apparent. What is perhaps most impressive from a developer’s point-of-view is how they make trivial some tasks that would have required custom coding in the past. But the issues with “hallucinations” and biases in training data are un-avoidable in both interpretation of the query and results.

One perhaps useful way for now to incorporate LLMs with full transparency into a collection search interface would be to show to the user, as they type in a a collection question, what the assisted search would be if it was carried out, allowing them to see if it has understood the question correctly and translated it into the different fields that would need setting in an API (i.e. selecting the right identifiers for the identifier facets, set the right date range, select the right location, select the search words/phrase to do a text search on, etc)

An optimistic view for cultural heritage usage is that LLMs can be used as an aid in constructing search queries to improve relevancy of results and, perhaps, in quickly summarising the results before more detailed review. In some ways the limitations of existing collections’ APIs are part of the problem (being designed for humans to programme against, not for machines to read) and will need some revisions to improve how they work with current LLMs. With improvements on both sides and clear indications to the user of what is happening, this could be a step forward (or, for those with long internet memories, a step back) in how exploring museum collections can be made more accessible to all audiences.

But more pessimistic views are also available.

Bowie & AI Aided Archives Exploration

AI Aided Collections Search

Code & Results (abbreviated)

AI Aided Results Interpretation

Conclusions

Add a comment

More on Digital

A chat with computer animation programmer: Paul Allen Newell

Vera Molnar: Machine Imaginaire – the dance of hands and machine thinking

Digital design – Art Fund New Collecting Award

More from around the blog

MEMBERSHIP

SHOP