Tate Artwork Data 1950-1955

chart-1

chart-2

Introduction

My dataset comes from the Tate Collection data stored on GitHub. This database mainly includes various artworks owned by Tate from 1950 to 1955. In this project, I aim to analyze the types of artworks and materials used between 1950 and 1955 using data analysis and visualization techniques. I used Python for data processing and analysis, and Flourish as the visualization tool. I will provide more details in the following sections.

Sources

The dataset comes from Tate’s publicly available database on GitHub. It contains information about artworks that was made during 1950-1955 that Tate fully or partially owns, including artwork titles, artists, types, acquisition methods, and entry dates. Additionally, the dataset includes links to each artwork’s page on Tate’s online museum, allowing us to view images and detailed information. It is important to note that this repository has not been updated since 2014, so the data may not be entirely accurate. I used Google Sheets for basic formatting and processing, removing unnecessary information for this project and correcting anomalies in the data.

Processes

In the further data analysis, I used Python to process the data. For data visualization, I chose Flourish to create a pie chart because it allows HTML-based output and provides a clear visual representation of proportions.

In this project, I applied two different data processing methods.

First, I categorized the data based on the different values in the medium column, counted the occurrences of each value, and created a table along with a pie chart (Chart-1). This chart gives a general idea of the mediums artists used at the time. However, some issues arise from this method. For example, “oil painting on canvas” and “oil painting on paper” were treated as two separate mediums, leading to two different proportions in the chart. However, the focus of my project is to highlight the artists’ choices of materials, rather than listing every minor variation in medium names.

To address this, I applied a keyword-based classification method. Based on the insights from Chart 1, I designed a set of keywords (such as “ink,” “oil painting,” and “watercolour”) and used them to count occurrences in the data. For example, if an artwork’s medium is “Ink and Watercolour,” it would be counted twice—once for Ink and once for Watercolour.

Using the results from this method, I created Chart 2, which provides a more comprehensive and project-relevant representation compared to Chart 1.

Presentation

After completing the tables and charts, I embedded both visualizations at the top of the website. I considered whether to include Chart 1 in the final webpage, and ultimately decided to keep it. The comparison between Chart 1 and Chart 2 helps highlight the importance of data processing techniques and provides viewers with a clearer understanding of both the original and refined datasets. Additionally, embedded charts offer much better interactivity compared to static images. For this reason, I chose to embed them using HTML instead of inserting them as images.

Significance

Overall, my project presents a statistical analysis of the artistic materials used in artworks from 1950 to 1955. From the charts, we can see that in mid-20th century Europe, oil painting remained a dominant artistic medium, accounting for nearly half of all recorded artworks. Interestingly, photography was also a significant artistic medium at the time, with its occurrence even surpassing that of watercolour. Meanwhile, mosaic, which was highly popular during the 15th and 16th centuries, is not present in the dataset. Whether this absence is due to the decline of mosaic art during this period or simply because Tate does not have such collections remains an open question that would require further research and data analysis.

This project is a reflection of Digital Arts & Humanities. My approach was not just data cleaning, but rather an integration of knowledge in art history and artistic materials to design relevant keywords and refine the dataset accordingly. This highlights that Digital Humanities is not merely about applying data science to analyze humanities topics; instead, it involves a deep understanding of the subject matter, using modern technological tools to conduct in-depth analysis and interpretation of various topics.