This activity is your first step into using Python to explore data.
We'll work through everything together: bringing a spreadsheet into your program, understanding how the data is stored, and writing simple code to pull out the information you're interested in. Each example builds gently on the last, helping you grow your confidence as you go.
By working through this activity, you'll learn how to:
Click on the next button to get started.
A Welsh STEM Outreach Team has been running a new programme where schools can order hands-on classroom kits to support science and technology lessons. Each kit focuses on a different area of STEM learning - Electronics, Robotics, and General Science.
Over the past few weeks, schools have been placing orders either online or in person at community events. The team now needs help organising and analysing this information so they can understand demand, plan stock levels, and prepare future deliveries.
You've been given a small sample of recent orders.
Each row in the below table represents one order from one school, showing:
| OrderID | Channel | ElectronicsKits | RoboticsKits | ScienceKits |
|---|---|---|---|---|
| 201 | Online | 42 | 15 | 63 |
| 202 | InPerson | 58 | 22 | 41 |
| 203 | Online | 37 | 18 | 55 |
| 204 | InPerson | 12 | 47 | 72 |
| 205 | Online | 29 | 33 | 88 |
| 206 | InPerson | 64 | 11 | 49 |
| 207 | Online | 23 | 9 | 70 |
| 208 | Online | 51 | 27 | 46 |
| 209 | InPerson | 34 | 14 | 39 |
| 210 | InPerson | 19 | 31 | 77 |
Please Note: As we are planning to use this dataset inside a Python program, all spaces have been removed. As an alternative you could replace them with underscores, for example 'In_Person'.
1. Copy the above table into a spreadsheet.
2. Which order includes the highest number of Robotics Kits?
3. How many Science Kits were ordered in total?
4. How many orders were placed online vs in-person?
A Python program cannot access data in our current spreadsheet. So, we need to change the format into something called a .csv file - which Python can read.
CSV stands for Comma-Seperated Values.
Instead of storing data with colours, borders, formulas, or formatting (like Excel does), a CSV stores only the raw text of the table:
That's it. No hidden features. No styling. Just plain text.
Because it's so simple, almost every programming language - including Python - can read CSV files easily
It also reduces the file size substantially from that of a spreadsheet.
Let's return to our spreadsheet, make sure we've removed any additions (such as functions, calculations, or totals), and then select the 'save as' option from the file menu.
Create a folder for this activity's files in a place you'll remember.
In the 'save as type' dropdown menu, select: CSV(comma delimited)
Give your file a name (without spaces), such as stem_orders.csv
Click save.
If you get a warning message warning about the loss of features, this is normal (CSV files only store plain text), choose yes to continue.
If you are curious about how the data in a CSV file differs from that in a spreadsheet, you can open your file using notepad.
We are now going to look at setting up a Pyton program to use our CSV file but first, let us run through some options for software in which to write a Python program.
If you already have an editor, feel free to scroll down to the bottom of this section to click on the next button.
There are a large number of Python editors available.
Please make sure you have permission before installing new software. There are also in-browser options that do not require admin permissions.
Here are our recommendations:
This is a clean, simple editor perfect for complete beginners.
It requires installation from thonny.org.
A more modern, flexible coding environment that is often already installed on Microsoft systems.
Has a good selection of tools and add-ins for beginners through to experts (this whole website was written using the VS Code editor).
Can be downloaded and installed from Visual Studio. Please note that you will also need to install the Python extension.
Has an in-browser version available at vscode.dev if unable to install.
Preferred by data scientists with previous experience of coding and Python.
Requires installation from spyder-ide.org.
An alternative to Spyder for data scientists with previous Python experience.
Can be downloaded and installed from jupyter.org or used in-browser (test version) from the same link.
It is also possible to use online as part of a Google Collaboration without installation.
Please choose the one that best suits your needs and installation permissions.
Now that you've chosen a Python editor, let's look at how to import data from a CSV file into your program
First thing to do is to create a new python file and save it in the same folder as your CSV file.
We'll build the program step by step, explaining what each part does and how to test that everything is working.
For the full program without explanation, please scroll down to step 5.
Step 1: Importing tools/libraries
Python programs often use extra tools, called libraries, to help them do specific jobs.
In this case we need to import (use) a library called 'csv' to provide us with the code to handle our CSV file.
Step 2: Adding a variableWe then need to create a variable (a named place to store the data). We shall call this variable stem_orders and tell the program this variable will be a list using empty square brackets.
Step 3: Accessing our CSVNow we're ready to tell our program about the CSV file it will need to use for this.
Before entering this next snippet of code, you will need to make sure that your Python file is saved in the same folder as the CSV you've already created.
You will also need to know the exact name you gave your CSV file. For these examples the file was called 'Stem_Orders.csv'.
The code below adds three new lines.
1 import csv
2
3 stem_orders = []
4
5 with open("Stem_Orders.csv", newline="") as file:
6 reader = csv.reader(file)
7 next(reader)
Indentations: Lines 6 and 7 are indented because they belong to the 'with' block above them. In Python, indentation shows which instructions are grouped together.
The most important rule is to use only spaces or only tabs, never a mix.
Step 4: Creating the listNow that our program has opened the CSV file, it needs to read each row, understand what each value represents, and add those values to our 'stem_orders' list.
We shall also give each value a clear variable name, so our program is easier to read and understand later.
1 import csv
2
3 stem_orders = []
4
5 with open("Stem_Orders.csv", newline="") as file:
6 reader = csv.reader(file)
7 next(reader)
8
9 for row in reader:
10 order_id = int(row[0])
11 channel = row[1]
12 electronics = int(row[2])
13 robotics = int(row[3])
14 science = int(row[4])
15 stem_orders.append((order_id, channel, electronics, robotics, science))
For loops: Line 9 starts a 'for loop'. This tells Python to go through the CSV one row at a time. For each row, it retrieves the values, converts the ones we need as numbers, and adds everything to our list. The loop repeats until there are no rows left.
List Order: Python starts counting at 0. This means:
Integer vs String:When Python reads data from a CSV, it treats everything as a string - text inside quotes. But some of our values are numbers that we want to use for calculations. To turn a string into a whole number, we use int(). We don't convert channel because it contains words ("Online" or "In_Person"), not numbers.
Step 5: TestingAt the moment, our program doesn't show anything when we run it. That's because everything it has done so far happens in the background.
Testing is an important part of programming. Before we move on, we want to check that our program has opened the file correctly and stored the data in our list
To check our work, we can use the 'print()' command. This tells Python to display something in the shell or terminal - in this case, the contents of our 'stem_orders' list.
1 import csv
2
3 stem_orders = []
4
5 with open("Stem_Orders.csv", newline="") as file:
6 reader = csv.reader(file)
7 next(reader)
8
9 for row in reader:
10 order_id = int(row[0])
11 channel = row[1]
12 electronics = int(row[2])
13 robotics = int(row[3])
14 science = int(row[4])
15 stem_orders.append((order_id, channel, electronics, robotics, science))
16
17 print(stem_orders)
If you now click the run command inside your editor you should see the data from our CSV reproduced in the shell/terminal window of your editor. It should look like this:
[(201, 'Online', 42, 15, 63), (202, 'InPerson', 58, 22, 41), (203, 'Online', 37, 18, 55),
(204, 'InPerson', 12, 47, 72), (205, 'Online', 29, 33, 88), (206, 'InPerson', 64, 11, 49),
(207, 'Online', 23, 9, 70), (208, 'Online', 51, 27, 46), (209, 'InPerson', 34, 14, 39),
(210, 'InPerson', 19, 31, 77)]
Depending on your editor, the output may appear on one long line or wrap across several lines. Both are normal.
Step 6: DebuggingIt is very rare for a program to work perfectly on the first try, especially when you're new to it.
We have put together the common errors which may stop your code from working using the error messages produced inside the Thonny editor.
Many Python editors give additional information such as line number, suggested syntax, or the problem variable to help you locate the error.
Debugging is a normal part of programming. If you get an error, read it slowly - it often tells you exactly where to look.
Now that your data is stored in a list, you can loop through it to calculate totals or answer questions. Each row contains the values from one order, and by using the correct position (index) you can choose which column you want to work with. Python will then do the adding, counting, or comparing for you.
Before we start, let us remove line 17 from our code - the testing instruction to print the whole dataset. This is no longer needed - we know the data is there now.
Reminder: Each order in our stem_orders list looks like (order_id, channel, electronics, robotics, science), with order[0] referring to order_id, order[1] being channel, order[2] is electronics, and so on...
Total number of electronic kits ordered
How can we use our program to find out the total number of Electronics Kits ordered?
We will need to create a new variable to store this total in. Initially we shall set its value to 0.
We can then use a for loop to go through each row of data within our list, adding the number from the electronics column to our new variable, producing a total.
Finally, we need to have that variable printed so we can see it and have a string of words in front of it, so we know what is being printed in the shell/terminal.
17 total_electronics = 0
18
19 for order in stem_orders:
20 total_electronics += order[2]
21
22 print("Total electronics kits:", total_electronics)
For this example, we chose to use the word 'order' instead of 'row' in the for loop as each set of data within the stem_orders list represents an order. It also separates our usage of the data from our importing of it.
When creating or changing a variable we use an equals sign. However, if we want to add to a variable's existing value/list/string we use +=, this is easier than writing:
Once ready, run your program and hopefully you should see the following in your shell/terminal window:
>>> %Run stem_orders.py
Total electronics kits: 369
>>>
Total number of robotics kits
We can use the same structure with a renamed variable and string.
Remember: the robotics kits are in a different column of the dataset.
24 total_robotics = 0
25
26 for order in stem_orders:
27 total_robotics += order[3]
28
29 print("Total robotics kits:", total_robotics)
>>> %Run stem_orders.py
Total electronics kits: 369
Total robotics kits: 227
>>>
Number of online vs in-person orders
To determine the number of online vs in-person orders we need to learn how to check for strings.
We know that our data only has two options within this column: "Online" or "InPerson". This means we can use an if statement with an else option.
We will need two variables to store the totals for each and have a counter that adds one to the correct variable.
31 online_count = 0
32 inperson_count = 0
33
34 for order in stem_orders:
35 if order[1] == "Online":
36 online_count += 1
37 else:
38 inperson_count += 1
39
40 print("Online orders:", online_count)
41 print("In-person orders:", inperson_count)
Running this program should now produce the following in your shell/terminal.
>>> %Run stem_orders.py
Total electronics kits: 369
Total robotics kits: 227
Online orders: 5
In-person orders: 5
>>>
The if-statement we've used here only checks our data against the word "Online", with no double checking that if it's not that, it must be "InPerson". For example, if someone had mistyped "Online" as "online" in the original spreadsheet, this does not match with the if statement and would therefore be counted as an In-person order.
To catch these incorrectly entered channel names without skewing the counts, you could use an if-statement with an else if and an else section:
Highest number of science kits in a single order
Instead of totalling the data in a column, we shall write a program that finds the highest value in it.
Using the same for loop structure as all the previous examples along with a single if-statement, we can produce this short bit of code to reach the answer.
43 highest_science = 0
44
45 for order in stem_orders:
46 if order[4] > highest_science:
47 highest_science = order[4]
48
49 print("Highest science order:", highest_science)
This program only changes the 'highest_science' variable if the current orders value is higher than the previous.
When you run your program now, the resulting shell/terminal screen should contain the following:
>>> %Run stem_orders.py
Total electronics kits: 369
Total robotics kits: 227
Online orders: 5
In-person orders: 5
Highest science order: 88
>>>
In this section we will provide you with additional challenges using the same dataset. It is your decision whether to use our hints, walk-throughs and/or answers to assist.
How many science kits have been ordered in total?
Which order id has the most items
If electronics kits cost £45.49, science kits cost £39.26, and robotics kits cost £54.98 - How much is the most expensive order?
Using the same cost values, what is the average spend for each kit type?
Which in-person order id has the highest total cost?
What is the total amount spent across all orders?
This activity might have felt like turning something quick and simple in a spreadsheet into a much more complex and time-consuming process.
However, there are good reasons why learning to work with CSVs and Python is worthwhile:
In short: Although the first attempt may feel slow or complicated, the skills you've developed here will help you work confidently with much bigger and more complex datasets in the future.