Flattening Complexity for Search

When I first started on the backend development for Explore the Collections, the complexity of the data held in our museum cataloguing system both impressed and overwhelmed me. We expose close to 60 different fields for each museum object – and some of these fields expand to reveal lists of subfields that in turn have subfields of their own.

To illustrate, let’s take a look at the data available for our famous cast of David in the Cast Courts.

artistMakerPerson field from an extract of the API return — Snapshot of a complex field in the data of David

If you’re interested in the object, this richness is admirable and quite probably fascinating. If your desire is to programmatically traverse a series of objects, the complexity might be seen as a bit of a drag.

We use Elasticsearch to store and query our data. Restricting our search to relevant fields within artistMakerPerson requires the use of a nested search query. For example, this type of search would enable us to construct a query for objects where the ‘sculptor’ was ‘Michelangelo’ and not objects by the many other sculptors we have in our collection.

One of the downsides of nested search queries is that they add complexity and are therefore costly in terms of performance. More significantly, in order to infer intent from the searcher we would require an advanced search interface.

You can read about our designer’s thought processes and – specifically – desire to remove the need for advanced search functionality, in her upcoming post.

Rather than force the searcher to explicitly state their intent when exploring the collections, we have chosen to present a broad interpretation of their needs and then allow them to refine with filters.

For example, if our searcher enters Silver Peacock into the search box we cannot infer if they were perhaps looking for this beautiful ‘peacock’ feather fabric by Arthur ‘Silver’, or possibly this exquisite wall light or sconce, which depicts a ‘peacock’ – and was made by the silversmith Alexander Fisher. However, by presenting generalised results for objects matching both silver and peacock, the former could be revealed quickly by filtering for ‘person’, and the latter with a ‘materials’ filter but in the process we have opened the exploration to serendipitous finds.

The result of this simplification in the search interface is that when you enter the term William Morris you will see objects returned that are made (or otherwise associated with) William Morris the textile designer, but also Jane Morris (an embroider and also William Morris’ wife) and William Blake (poet and painter) among many others. Elasticsearch does its best assessment of relevance though, and most of the first page of results are comprised of items relating to ‘William Morris’.

In simplifying the search interface to a single search box with extensive filter options, the need for me to create a search API that maintains the integrity of complex nested data structures was removed. Obviously we still need to retain the complexity within the results, because when you find the object of interest and view more details you want to be able to tell that Michelangelo was the sculptor and Papi Clemente the cast-maker.

That freed me to create a series of personalised fields where the complexity of the underlying object is flattened. All these developer-created fields are indicated in the API with an underscore, for example _flatPersonsNameId to distinguish them from the true object record –which was populated by subject experts.

In our example of David, the new field looks like a simple list of strings:

“_flatPersonsNameId”: [
    “A3843 - Michelangelo”,
    “AUTH345377 - Papi, Clemente”,
    “N582 - David King of Israel (King)”
]

By creating new fields we have been able to expose additional functionality within the API. So, for example, _flatPersonsNameId holds all the individuals (along with their controlled term id) mentioned in the object record – either as the maker or as an associated person, or even the content. In this case, it also includes King David of Israel who is the subject of the sculpture.

By bringing all these names together in a single field we can easily aggregate or cluster the results using them, so we can present a list of ‘persons’ as a filter option.

Explore the Collections using the Person filter

So now we can reveal all of the other objects that refer to David King of Israel, of which we have 125, including one tiny locket from the Museum of Childhood.

Set of Whist Cards | unknown | V&A Explore The Collections

We’ve had fun with these home grown fields and within our V2 API. It has enabled us to include additional functionality such as aggregating (and then filtering) by things depicted, or even more specifically by people depicted in the artworks.

Here is an example of a search revealing items that depict A8676, which is the controlled term for William Morris, the textile designer:

http://api.vam.ac.uk/v2/objects/search?id_depicts=A8676&data_profile=full

While we have 909 objects that reference William Morris as either the maker or as an otherwise associated person, we have only 16 objects in which William Morris is depicted.

This is functionality that is not in our main Explore the Collections user interface (perhaps it will be in the next release – depending on the outcome of user research), but it is available within our API, so that developers can implement the feature in their own applications.

Quick note on the combination id-name fields

You might have noticed that the newly created fields I mentioned above are combinations of text and id. This is because Elasticsearch will only aggregate on one single keyword term. I can’t produce an aggregation of people and return both their id and their text field (or name).

For our front end we want to return the text field (for example ‘William Morris’), but we want to retain the knowledge of the id so that we can use that controlled term as a filter. We have two people with the name William Morris in our collection, so we don’t want to filter on the name if our searcher has confirmed they are only interested in ‘A1324’ William Morris, the glass sculptor.

By creating this combination field, which is a list of keywords [‘A1234 - William Morris’], Elasticsearch will naturally return this entire value in its aggregation. Its a simple process to then parse this result back into its component id and text fields.

Quick note on the combination id-name fields

Add a comment

More on Digital

The techno-optimism of Web 1.0

Going down the rabbit hole: revisiting the Nabaztag

Open Call: Apply to the Immersive Design Course

More from around the blog

MEMBERSHIP

SHOP