Welsh content coming soon
This is designed as a follow-on activity to our Introducing Python for Data Science.
In this series of challenges, you'll be working with a real historical dataset: Welsh crime and punishment records from 1730 to 1830, kindly provided by The National Library of Wales. These records offer a fascinating glimpse into everyday life, social structures, and the justice system of the time - and you'll be using Python to uncover patterns hidden within them.
The original archive includes a wide range of offences, described in the language of the period. For this activity, the dataset has been carefully cleaned so that you can explore it safely. Violent, graphic, sexual, and other sensitive offences have been removed, leaving behind non-violent property and economic crimes, nuisance cases, and some wonderfully quirky misdemeanours.
You will also notice that some entries include the death sentence. This has not been removed, because it was a standard part of the legal system during this period, known as the "Bloody Code". The term refers to a set of laws that made many non-violent crimes punishable by death. Although the sentence appears in the records, it does not include graphic detail, and it is included here to help you understand the historical context of the data.
Every effort has been made to make the dataset appropriate for you to work with. However, historical records can be inconsistent or unexpectedly phrased, so there's always a small chance that something borderline may have slipped through. If you come across anything that feels unsuitable or uncomfortable, you can simply skip that entry - and you're very welcome to let us know at outreach@aber.ac.uk so we can remove it from future versions.
With that in mind, you're ready to start exploring how Python can help you analyse real historical data and discover the stories it contains.
We've put together a set of starter questions to help you build confidence and get used to working with the dataset. Once you're comfortable, you'll have the freedom to explore your own ideas, interests, and questions using the data.
Click on 'Next' to begin.
First you will need a copy of the dataset:
Create a new folder to save this into. Then, in the same folder create a Python file using your chosen editor. All examples, work-throughs, and answers presented will match the Thonny editor's format. Colour schemes vary between editors.
Import your csv and set up your records list in Python. You can choose your own variable names or click here to reveal our set-up. Treat all the data as strings - in other words do not assign integer types to them like we did in our previous activity.
import csv
crime_records = []
with open("Crime_Punishment_Clean.csv", newline="") as file:
reader = csv.reader(file)
next(reader)
for row in reader:
row_id = row[0]
file_no = row[1]
doc_no = row[2]
f_name = row[3]
surname = row[4]
sex = row[5]
alias = row[6]
parish = row[7]
county = row[8]
status = row[9]
crime_id = row[10]
day = row[11]
month = row[12]
year = row[13]
parish_of_crime = row[14]
county_of_crime = row[15]
accusor = row[16]
plea = row[17]
verdict = row[18]
punishment = row[19]
testimonial_1 = row[20]
testimonial_2 = row[21]
testimonial_3 = row[22]
testimonial_4 = row[23]
testimonial_5 = row[24]
crime_records.append((row_id, file_no, doc_no, f_name, surname, sex, alias, parish, county, status, crime_id, day, month, year, parish_of_crime, county_of_crime, accusor, plea, verdict, punishment, testimonial_1, testimonial_2, testimonial_3, testimonial_4, testimonial_5))
Using for-loops with if-statements to find specific information in our dataset.
In case you skipped straight to the exercises - these are testing knowledge covered in our Introducing Python activity.
We have included the option of viewing hints, walk-throughs, and/or answers.
How many alleged crimes involved the theft of nutmeg?
crime_records for our imported data.
for record in crime_records:
for record in crime_records:
if "nutmeg" in record[20] or "Nutmeg" in record[20]:
nutmeg_counter = 0
for record in crime_records:
if "nutmeg" in record[20] or "Nutmeg" in record[20]:
nutmeg_counter += 1
print("Total references to nutmeg:", nutmeg_counter)
Total references to nutmeg: 7
nutmeg_counter = 0
for record in crime_records:
if "nutmeg" in record[20] or "Nutmeg" in record[20]:
nutmeg_counter += 1
print(nutmeg_counter, record[20])
print("Total references to nutmeg:", nutmeg_counter)
There are 7 references to nutmeg within the testimonials. Four are regarding theft of nutmeg graters. Meaning, the answer is 3.
Can you determine the crime allocated the rhif_trosedd (crime_id) of 2050?
if "2050" in record[10]:
if "2050" in record[10]:
print("Crime listed against 2050:", record[20])
nutmeg_counter = 0
for record in crime_records:
if "nutmeg" in record[20] or "Nutmeg" in record[20]:
nutmeg_counter += 1
#print(nutmeg_counter, record[20])
if "2050" in record[10]:
print("Crime listed against 2050:", record[20])
print("Total references to nutmeg:", nutmeg_counter)
Crime 2050 refers to three cases of illegal sheep shearing - shearing a sheep that does not belong to you and stealing the wool.
What is the full name of the woman alleged to have used a 'pickle' as a weapon?
nutmeg_counter = 0
for record in crime_records:
#Excercise 1 -----
#if "nutmeg" in record[20] or"Nutmeg" in record[20]:
#nutmeg_counter += 1
#print(nutmeg_counter, record[20])
#-----
#Excercise 2 -----
#if "2050" in record[10]:
#print("Crime listed against 2050:", record[20])
#-----
#Excercise 1 -----
#print("Total references to nutmeg:", nutmeg_counter)
#-----
if "pickle" in record[20]:
if "pickle" in record[20]:
print("Full name of pickle wielder:", record[3], record[4])
The name of the woman who allegedly used a pickle to maim a horse is: Jane Richards.
What was the crime for which someone escaped the death penalty with a King's special pardon?
if "King's special pardon" in record[19]:
if "King's special pardon" in record[19]:
print("The King's special pardon was given to the prson found guilty of:", record[20])
The King's special pardon was given to a person found guilty of the crime of 'Coining' - the counterfeiting of coins.
There are a number of cases where someone is accused of the 'Rescue' of someone. How many people were rescued from gaol - the old English spelling of jail?
if "Rescue" in record[20] and "gaol" in record[20]:
if "Rescue" in record[20] and "gaol" in record[20]:
print("Occurence of jail-break:", record[20])
There are 6 alleged crimes of rescuing/escaping jail. However, due to multiple accused for some of these crimes, the testimonial shows only three different people were broken out of jail.
What are the full names of the owners of forty hounds that triggered a nuisance complaint?
if "forty hounds" in record[20]:
if "forty hounds" in record[20]:
print("Full name of one owner of forty hounds:", record[3], record[4])
This exercise illustrates how historical and real-world data can often be incomplete. There are 4 allegations of nuisance involving forty hounds barking through the night. However, only three of the accused names are complete: Henry William, Maurice Stephens and Richard Tudor. The fourth has the Tudor surname but we do not know if this was a second accusation raised against Richard or another person with the same surname.
If you have documented and commented out each of previous exercises as you've worked through this the code should look like this:
nutmeg_counter = 0
for record in crime_records:
#Excercise 1 -----
#if "nutmeg" in record[20] or"Nutmeg" in record[20]:
#nutmeg_counter += 1
#print(nutmeg_counter, record[20])
#-----
#Excercise 2 -----
#if "2050" in record[10]:
#print("Crime listed against 2050:", record[20])
#-----
#Excercise 3 -----
#if "pickle" in record[20]:
#print("Full name of pickle wielder:", record[3], record[4])
#-----
#Excercise 4 -----
#if "King's special pardon" in record[19]:
#print("The King's special pardon was given to the prson found guilty of:", record[20])
#-----
#Excercise 5 -----
#if "Rescue" in record[20] and "gaol" in record[20]:
#print("Occurence of jail-break:", record[20])
#-----
#Excercise 6 -----
if "forty hounds" in record[20]:
print("Full name of one owner of forty hounds:", record[3], record[4])
#-----
#Excercise 1 -----
#print("Total references to nutmeg:", nutmeg_counter)
#-----
We've already looked at creating our own counters in Python - adding one to a variable each time a requirement is met. For example, in exercise 1 we added a counter for the number of times "nutmeg" appeared in one of the data columns.
However, what if we wanted to find out which name/crime/verdict/sentence is most common? At this stage in our Python programming skills this would involve determining all the possible answers and then doing a count for each.
Don't worry - we will not be asking you to do this. Instead, we're going to introduce a new tool to our programs which can do it all for us. This new tool is a set of instructions provided by a counter package which can be retrieved and imported from a library called 'collections.'
Example walk-through
First, we need to import this new tool - best place to add this line is at the start of the program (where we are already importing the csv library):
from collections import Counter
We can now continue our program underneath the last exercise.
Let's say we wanted to know the most common day for crimes to have been allegedly committed for this example. The first thing this new tool will need is a new list variable (called day_list) which stores all the values from a set column. In this case the one labelled dydd (day).
day_list = [record[11] for record in crime_records]
Notice how this new variable involves calling the relevant column from each record using the same for loop structure we've been using for our own queries.
Now we can use our new tool to store a count of each option in a variable called day_count:
day_list = [record[11] for record in crime_records]
day_count = Counter(day_list)
We can now do a new for loop to retrieve the value (content) for each record and its total number of occurrences to print only top result(s).
For the highest occurrence:
for value, count in day_count.most_common(1):
print(count, "of day", value)
for value, count in day_count.most_common(3):
print(count, "of day", value)
for value, count in day_count.most_common(10):
print(count, "of day", value)
What is the most common first name of the accused?
f_name_list = [record[3] for record in crime_records]
f_name_list = [record[3] for record in crime_records]
f_name_count = Counter(f_name_list)
f_name_list = [record[3] for record in crime_records]
f_name_count = Counter(f_name_list)
for value, count in f_name_count.most_common(1):
f_name_list = [record[3] for record in crime_records]
f_name_count = Counter(f_name_list)
for value, count in f_name_count.most_common(1):
print("The most common first name is:", value)
The most common first name of the accused in these records is 'John'.
What are the five most common home counties for the accused?
county_list = [record[3] for record in crime_records]
county_count = Counter(county_list)
county_list = [record[8] for record in crime_records]
county_count = Counter(county_list)
for value, count in county_count.most_common(5):
print("There are", count, "mentions of", value, "as home county for the accused")
The five most common home counties for the accused are:
What are the top three crime ids and what crimes do they represent?
crime_code_list = [record[10] for record in crime_records]
crime_code_count = Counter(crime_code_list)
for value, count in crime_code_count.most_common(3):
print("There are", count, "accusations of crime_id:", value)
for record in crime_records:
#Exercises 1-6 commented out
#Exercise 9 -----
if "1200" in record[10]:
print(record[20])
The most common crime_id is 1200 which refers to sheep theft. The next most common is crime_id 3400 - breaking and entering. The next most common crime_id is 1460 - theft of food.
This is the complete program for this batch of exercises, excluding the CSV import.
#Exercise 7 -----
f_name_list = [record[3] for record in crime_records]
f_name_count = Counter(f_name_list)
for value, count in f_name_count.most_common(1):
print("The most common first name is:", value)
#Exercise 8 -----
county_list = [record[8] for record in crime_records]
county_count = Counter(county_list)
for value, count in county_count.most_common(5):
print("There are", count, "mentions of", value, "as home county for the accused")
#Exercise 9 -----
crime_code_list = [record[10] for record in crime_records]
crime_code_count = Counter(crime_code_list)
for value, count in crime_code_count.most_common(3):
print("There are", count, "accusations of crime_id:", value)
for record in crime_records:
if "1200" in record[10]:
print(record[20])
if "3400" in record[10]:
print(record[20])
if "1460" in record[10]:
print(record[20])
Now that we've practiced using a counter, we can start to introduce some mathematics to our Python program to calculate percentages for us.
Let us look at how to do this to calculate the percentage of our accused with the first name of John.
We have already created the necessary list and count variables in exercise 7 for us to start with:
f_name_list = [record[3] for record in crime_records]
f_name_count = Counter(f_name_list)
We can now add a new variable to store the total number of counts for this column:
total_count = len(crime_records)
This variable is calling for the total number of crime records by measuring the length of the list.
We now need to tell the program to 'get' the total count for John and then use this value and the total count to calculate the percentage. When programming for mathematical symbols we use +, -, /, and *.
total_count = len(crime_records)
percent_John = (f_name_count.get("John", 0) / total_count) * 100
print(percent_John, "% of accused have the first name of John")
The value of 0 inside the get() function is telling the program that if it fails to find what you've requested, it will turn the value of 0.
To reduce the number of decimal places (in this case down to 2) for our print command we need to make the following change:
total_count = len(crime_records)
percent_John = round((f_name_count.get("John", 0) / total_count) * 100, 2)
print(percent_John, "% of accused have the first name of John")
When you run this new program, you should get the answer of 16.54% of accused have the first name of John.
We've put together some exercises to test the new skill of implementing the necessary code to calculate percentages.
What percentage of alleged crimes were thefts of sheep?
Hints
round() function) to find out the answer.Walk-through
percent_1200 = round((crime_code_count.get("1200", 0) / total_count) * 100, 2)
percent_1200 = round((crime_code_count.get("1200", 0) / total_count) * 100, 2)
print(percent_1200, "% of crimes involved the theft of sheep")
Answer
10.62% of allegations involved the theft of sheep.
What percentage of alleged crimes were committed by criminals in Brecon?
Hints
Walk-through
county_of_crime_list = [record[15] for record in crime_records]
county_of_crime_count = Counter(county_of_crime_list)
county_of_crime_list = [record[15] for record in crime_records]
county_of_crime_count = Counter(county_of_crime_list)
percent_Brecon = round((county_of_crime_count.get("Brecon", 0) / total_count) * 100, 2)
print(percent_Brecon, "% of crimes occuring in Brecon")
Answer
13.04% of crimes occurred in the county of Brecon.
Approximately, what percentage of accused were found guilty?
Hints
Walk-through
verdict_list = [record[18] for record in crime_records]
verdict_count = Counter(verdict_list)
for value, count in verdict_count.most_common():
print("There are", count, "verdicts of", value)
len(list_name)), lets us see just how many possible answers there are to the verdict.
print("There are", len(verdict_count), "different answers")
Unlike earlier questions, this one uses a column that isn't tidy. The verdicts were written by many different clerks over 100 years, so the wording varies massively. Because of that, we can't rely on automated counting tools - we must decide what counts as 'Guilty' and search for it manually. This is exactly what real data scientists do: define rules, accept limitations, and work with imperfect information.
guilty_counter = 0
for record in crime_records:
#Previous exercises commented out
#Exercise 12 -----
if "Guilty" in record[18]:
guilty_counter += 1
print("There are", guilty_counter, "verdicts starting with 'Guilty'")
percent_Guilty = round((guilty_counter / total_count) * 100, 2)
print(percent_Guilty, "% of verdicts starting with the term 'Guilty'")
Answer
A good approximation would be around 25%.
This is the complete program for this batch of exercises, excluding the CSV import.
#Exercise 10 -----
total_count = len(crime_records)
crime_code_list = [record[10] for record in crime_records]
crime_code_count = Counter(crime_code_list)
percent_1200 = round((crime_code_count.get("1200", 0) / total_count) * 100, 2)
#Exercise 11 -----
county_of_crime_list = [record[15] for record in crime_records]
county_of_crime_count = Counter(county_of_crime_list)
percent_Brecon = round((county_of_crime_count.get("Brecon", 0) / total_count) * 100, 2)
print(percent_Brecon, "% of crimes occuring in Brecon")
#Exercise 12 -----
verdict_list = [record[18] for record in crime_records]
verdict_count = Counter(verdict_list)
#for value, count in verdict_count.most_common():
#print("There are", count, "verdicts of", value)
#print("There are", len(verdict_count), "different answers")
guilty_counter = 0
for record in crime_records:
if "Guilty" in record[18]:
guilty_counter += 1
#print("There are", guilty_counter, "verdicts starting with 'Guilty'")
percent_Guilty = round((guilty_counter / total_count) * 100, 2)
print(percent_Guilty, "% of verdicts starting with the term 'Guilty'")
Sometimes, to answer a query we need to create a sub-list from the dataset. This means we're creating a smaller set of records that all match a criterion. We can then use 'for loops' to explore these instead of the full list.
This allows us to study and analyse a group within the data. This example walk-through will look to answer the question "What percentage of female accused were issued the death penalty?". This single question could be answered using an 'if statement' within our existing 'for loop':
if "F" in record[5] and "Death" in record[19]:
However, if you wish to re-use the female dataset for multiple queries, it can save time, processing and coding, and reduce errors.
To create a new sub-list in which to store our female only records, we first must create a list variable:
female_record = []
Then, using the for loop that explores all the records we create add a new if statement:
female_record = []
for record in crime_records:
if "F" in record[5]:
Inside this we tell our program to add the record to our female_records list:
female_record = []
for record in crime_records:
if "F" in record[5]:
female_records.append((record))
Now, we have a new list we can search through in a new for loop to determine death sentence count:
female_record = []
for record in crime_records:
if "F" in record[5]:
female_records.append((record))
female_death_counter = 0
for record in female_records:
if "Death" in record[19]:
female_death_counter += 1
total_female_records = len(female_records)
female_death_percent = round((female_death_counter / total_female_records) * 100, 2)
print(female_death_percent, "% of accused women were sentenced to death")
This provides us with a value of 4.04% of women accused being issued the death sentence.
The below exercises are designed to practice the creation and use of sub-lists.
What are the three most common female first names for the accused?
Hints
Walk-through
female_record = []
for record in crime_records:
if "F" in record[5]:
female_records.append((record))
female_name_list = [record[3] for record in female_records]
female_name_count = Counter(female_name_list)
for value, count in female_name_count.most_common(3):
print("The most common female first names:", value, "with", count, "listings")
Answer
The most common female names of the accused in these records are: Mary, Elizabeth, and Margaret.
Using a sub-list of records for crime_id 1200, approximately what percentage of these crimes were issued a transportation order?
Hints
Walk-through
crime_1200_records = []
for record in crime_records:
if "1200" in record[10]:
crime_1200_records.append((record))
transported_1200_counter = 0
for record in crime_1200_records:
if "Transported" in record[19] or "transported" in record[19]:
transported_1200_counter += 1
total_1200_records = len(crime_1200_records)
transported_1200_percent = round((transported_1200_counter / total_1200_records) * 100, 1)
print("Approximately", transported_1200_percent, "% of 1200 crimes resulted in transportation")
Answer
By searching a sub-list of 1200 crime codes for references to transportation in the sentence column you will get an answer of 6.7%
Compare approximate percentage of guilty verdicts between male and female accused.
Hints
Walk-through
female_record = []
male_record = []
for record in crime_records:
if "F" in record[5]:
female_records.append((record))
elif "M" in record[5]:
male_records.append((record))
female_guilty_counter = 0
for record in female_records:
if "Guilty" in record[18]:
female_guilty_counter += 1
female_guilty_percent = round((female_guilty_counter / total_female_records) * 100, 1)
male_guilty_counter = 0
for record in male_records:
if "Guilty" in record[18]:
male_guilty_counter += 1
total_male_records = len(male_records)
male_guilty_percent = round((male_guilty_counter / total_male_records) * 100, 1)
print(female_guilty_percent, "% of women vs", male_guilty_percent, "% of men were found Guilty")
Answer
Approximately, 35% of women were found guilty vs 23% of men.
Are the 5 most common crimes the same for men and women?
Hints
Walk-through
female_crimes_list = [record[10] for record in female_records]
female_crimes_count = Counter(female_crimes_list)
for value, count in female_crimes_count.most_common(5):
print("The most common female crimes:", value, "with", count, "listings")
male_crimes_list = [record[10] for record in male_records]
male_crimes_count = Counter(male_crimes_list)
for value, count in male_crimes_count.most_common(5):
print("The most common male crimes:", value, "with", count, "listings")
Answer
No, the top five crimes allegedly committed by women are not the same as those for men.
The most common crime codes/ids for women were: 1500, 1490, 1460, 3400, and 1540.
The most common crime codes/ids for men were: 1200, 3400, 1460, 1160, and 1260.
Feel free to explore the data further to discover which crimes these codes refer to.
This is the complete program for this batch of exercises, excluding the CSV import.
female_records = []
male_records = []
crime_1200_records = []
for record in crime_records:
if "F" in record[5]: #for exercise 13
female_records.append((record))
elif "M" in record[5]: #for exercise 15
male_records.append((record))
if "1200" in record[10]: #for exercise 14
crime_1200_records.append((record))
#Exercise 13 -----
female_name_list = [record[3] for record in female_records]
female_name_count = Counter(female_name_list)
for value, count in female_name_count.most_common(3):
print("The most common female first names:", value, "with", count, "listings")
#Exercise 14 -----
transported_1200_counter = 0
for record in crime_1200_records:
if "Transported" in record[19] or "transported" in record[19]:
transported_1200_counter += 1
total_1200_records = len(crime_1200_records)
transported_1200_percent = round((transported_1200_counter / total_1200_records) * 100, 1)
print("Approximately", transported_1200_percent, "% of 1200 crimes resulted in transportation")
#Exercise 15 -----
female_guilty_counter = 0
for record in female_records:
if "Guilty" in record[18]:
female_guilty_counter += 1
total_female_records = len(female_records)
female_guilty_percent = round((female_guilty_counter / total_female_records) * 100, 1)
male_guilty_counter = 0
for record in male_records:
if "Guilty" in record[18]:
male_guilty_counter += 1
total_male_records = len(male_records)
male_guilty_percent = round((male_guilty_counter / total_male_records) * 100, 1)
print(female_guilty_percent, "% of women vs", male_guilty_percent, "% of men were found Guilty")
#Exercise 16 -----
female_crimes_list = [record[10] for record in female_records]
female_crimes_count = Counter(female_crimes_list)
for value, count in female_crimes_count.most_common(5):
print("The most common female crimes:", value, "with", count, "listings")
male_crimes_list = [record[10] for record in male_records]
male_crimes_count = Counter(male_crimes_list)
for value, count in male_crimes_count.most_common(5):
print("The most common male crimes:", value, "with", count, "listings")
This activity has explored several different ways to use Python to search and begin analysing data. Along the way, you've seen how real-world datasets can be inconsistent, incomplete, or unexpectedly complex - and how, in those situations, simpler approaches often work best.
Working with historical court records also highlighted another important aspect of real data: when we adapt a dataset to make it age-appropriate, we inevitably lose part of the full picture. In the filtered version used here, the most common offences are theft of sheep, breaking and entering, and theft of food. In the complete public record, however, the most frequent offences are assault, riot involving assault, and then theft of sheep.
Across these exercises, you've strengthened your understanding of how to transfer data into a CSV, retrieve information using Python, and apply additional tools and methods to answer specific questions, identify patterns, and generate simple statistics.
The dataset used in this activity is part of the public collections held by the National Library of Wales, and many more are available. If a particular topic interests you, try exploring another dataset and see what conclusions you can draw using the skills you've developed here.
We hope you've enjoyed this activity.