This activity aims to introduce learners to the idea of data science and how it is a natural progression from analysis skills they are already developing in science lessons.
Click on the 'next' button below to get started.
A group of students are investigating how much rainfall their school gets each month throughout a year.
To do this, they have set up three gauges in different locations on the school grounds. They've chosen three locations to provide an average rainfall value across the area, and for repetition to increase fairness and accuracy.
The results are then collected at the end of each month and logged in the below results table.
Monthly rainfall (in millimetres) from three gauges
| Month | Gauge A | Gauge B | Gauge C | Average |
|---|---|---|---|---|
| January | 142 | 138 | 145 | |
| February | 108 | 112 | 110 | |
| March | 95 | 90 | 100 | |
| April | 78 | 82 | 80 | |
| May | 74 | 70 | 76 | |
| June | 88 | 84 | 86 | |
| July | 92 | 68 | 95 | |
| August | 115 | 108 | 112 | |
| September | 128 | 135 | 130 | |
| October | 162 | 158 | 165 | |
| November | 170 | 175 | 168 | |
| December | 155 | 150 | 160 |
You will need to copy the results table and complete the last column by calculating the mean average for each month.
Using the average values for each month draw an appropriate graph.
Once you've finished your graph, click on the next button below to start the analysis.
When writing up science experiments, you need to follow your results with an analysis. This is where we write about the results and what they show us.
Write a paragraph explaining the results. See the below list for ideas as to what you could include.
So far, everything should be familiar to you - we have been following the standard format of a science experiment report. The next section will start to show you how to apply data science principles to the same information.
Let's begin our journey towards thinking like a data scientist by questioning the data.
Click on any of the questions below to find out more about why asking this is important.
You've been provided with a basic description of the experiment. However, the method of collection/measuring is not specified. Consider how this may have affected the results.
The reason for repetition within an experiment is to take multiple measurements of the same thing to improve accuracy. It can also highlight anomalous results and highlight issues within the methodology.
The data provided may seem similar - we can examine it more deeply by producing a comparison graph instead of only using the average.
This graph shows that gauge 2's line does not match the others. Interestingly, this is also the gauge to produce both the lowest and highest reading.
This is information we lose when we only analyse the average value - you may have picked up on the anomalous result for July on gauge 2 previously but it is clearer now.
What sort of factors could answer why these gauge readings do not match more closely?
Think about how temperature and other weather conditions may affect the results. Also, think about how positioning of gauges related to buildings or trees may affect such readings too.
This is looking at data collection to confirm or eliminate the additional variables considered above.
What would you measure alongside the rainfall gauges to include other factors?
We do not know what the students who carried out this experiment were looking to prove. This may of introduced a form of bias to their techniques, location selection, and/or data collection.
This is why an hypothesis is important when discussing data. Who and why they were doing the experiment may influence approach, expectations, and how the results are selected/represented.
For example, if we wanted to know if bacon is safe to eat, should we use data provided by a butcher, the NHS, or a charity? How would each of these groups may influence (intentionally or not) those results and why?
Bias in science experiments refers to the incorrect, inaccurate and/or influenced methods used to collect data. With the information available for this experiment, we can consider these possible forms:
Produce a new method for attempting your own version of this experiment after considering all the above questions.
Data science is not just about questioning data, it is just the first step in the progress. The next section looks at creating rules and/or models from data.
One of the aims of data science is to take data to create rules and models to aid with predictions and understanding.
The results gathered in this experiment are very limited - we only have one year of monthly data points with no other factors/variables measured. However, there is still enough to make some general rules, such as:
The data limitations do not allow more detailed rules - and the ones we can design should not be taken as accurate until several more years' worth of data has been collected to check.
Can you write a rule using this data? Perhaps by reversing the wording of one of the above examples or, perhaps, you've spotted a different pattern.
We shall cover more about rules and data modelling in our Data Modelling activity.
By creating rules and/or models, data scientists are able to make predictions.
These predictions are only as accurate as the data available to them.
We've already looked at how limited the provided dataset here is, which makes any rules, models, and predictions un-reliable.
However, as part of this activity, we shall use the data we have to make some predictions for the next year's rainfall at the school.
Can you predict which season will be driest/wettest?
Which month will see the least/most rain?
We hope that you've enjoyed this activity and have started to recognise how data science is an evolution on the analysis you already do in the classroom regularly.
If you would like an additional extension activity before moving on, please click on the next button below. Otherwise, feel free to click on the data science page button at the very bottom of the screen to explore the subject further.
Using everything you've learned through questioning each stage of the experiment and data provided, can you write your own version of the experiment for your school or neighbourhood? We have provided a quick checklist below.
Introduction - A brief description of the experiment.
Hypothesis - What are you trying to prove?
Method - The equipment, the locations, and how the experiment will be carried out.
Results - What measurements will be taken, when, and how? Recording methods and table design should be considered for this section.
Analysis - How will you process this data to prove/disprove your hypothesis? What predictions and/or rules do you hope to be able to make?
The ultimate task for this entire exercise, is to write an evaluation for the original experiment - using your analysis and experiment re-design to assist.