Testbase Standardised Tests – Standard Setting
Information for schools about the standard setting process
To set the standards for our tests, we have used an industry-standard and widely used approach (based on the Angoff Method – more details in the Appendix). We used a combination of reference points to set the ‘working at the expected standard’ thresholds for each one of our tests:
- Expert judgement from a panel of primary curriculum experts and practising teachers as to which skills and knowledge would be expected to be secure at the point of assessment, i.e. a criterion-referenced standard.
- The mark distribution for each test, taking into account the percentage of pupils who have achieved the expected standard in previous tests and trials, i.e. a norm-referenced standard.
These thresholds and the mark data, were then used to calculate standardised scaled scores, in a similar style to the Key Stage 2 national curriculum tests. A scaled score of 100 indicates the threshold for ‘working at the expected standard for this point in the academic year’. This means that a pupil who achieves a scaled score of 100 is just at this threshold.
Interpreting scaled scores
Scaled score
Interpretation
None
If no scaled score is given, this is usually because a pupil scored very few marks and so we can’t reliably give a scaled score
< 80
Scores indicate that pupils might not currently be working within year group expectations
80 – 99
Scores within this band indicate that pupils could still need to secure knowledge and skills before we can be confident that they are meeting the expected standard
100
Scores indicate that pupils could be on track to meet the expected standard for their academic year
101 – 120
Scores within this band indicate that pupils are increasingly secure in their knowledge and understanding and are expected to meet the expected standard for their academic year
> 120
These pupils are performing at the ≥ 80th – 90th percentile (i.e. in the top ~10-20% of pupils; depending on the test). This may indicate that these pupils could be working at the ‘higher standard’, and they will need activities to stretch and challenge if they are to continue to progress
Please note: The above applies to Key Stage 2 only. For Key Stage 1, there might be greater variation in the scaled scores due to the small number of marks on the papers combined with the fact that pupils of this age are still learning underlying skills.
Statistically, we would expect approximately 2 out of 3 pupils to have a scaled score between 85 and 115. However, unlike the SATs, we will report scaled scores beyond the 80 – 120 range to ensure teachers have the fullest information available to them. Our ‘floor’ and ‘ceiling’ are 70 and 130, though you should bear in mind that scaled scores at these extremes are less meaningful.
Teachers can therefore use the scaled scores from tests throughout the academic year to help monitor pupil progress, i.e. they can see whether or not a pupil remains within the expected standard from test to test and if they are making expected progress over the year.
Scaled score conversion tables can be downloaded from the Testbase MERiT page.
Age-based standardised score
Unlike the scaled scores, age-based scores take into account the month of birth of pupils within their academic year and are based on our analyses of recent test data. They are intended to provide teachers with a clearer indicator of whether an individual pupil is achieving as expected for their age and, when compared over time, whether they are making the expected progress for their age.
Please note that because the age-based scores are derived from the performance of different age-groups in our test data, there are two important details to remember when referring to them:
They are not criterion-referenced, unlike the scaled scores. This means that an age-based standardised score of 100 should not be interpreted as having met any age-related threshold or criteria.
They are for comparison only, i.e. should only be used in the context of the test to determine whether a pupil is performing well compared to other pupils of a similar age, and within their school year. An age-based standardised score of 100 therefore reflects that a pupil is performing as expected for the month of their birth within the academic year.
Pupils who are not within the expected age range for a given school year will not receive an age-based standardised score, for example pupils with delayed entry to school, or those in higher school years taking tests intended for a lower school year.
Monitoring Pupil Progress
Because the scaled scores are calculated the same way for each test, a pupil who is just reaching the expected standard at each testing point in the academic year will achieve a scaled score of 100, i.e. their scaled score will not increase across the academic year. This is because the thresholds reflect an expectation of pupils gaining new knowledge, skills, etc. from the teaching and practice they experience from testing point to testing point.
Please also remember that making expected progress is separate from reaching the expected standard, so even if a pupil is not meeting the expected standard, they can still show expected progress from one test to the next if their scaled score remains similar across the academic year, even if they have not attained a scaled score of 100.
Variation of a few points up or down from one test to the next is to be expected. However, a large drop in performance (a lower scaled score than the previous test) suggests a pupil may not have made progress since the last test.
Equally, a significant jump up the scaled scores from the previous tests would indicate greater progress than might otherwise be expected.
The same principles apply to the age-based standardised scores too, i.e. if a pupil’s age-based score remains very similar at each testing point in the year for a particular subject, then they are progressing at the expected rate for their age.
Please note: As these tests have been designed to judge whether pupils are meeting the expected standard, they might not test the full depth and breadth within your class. As such, pupils who are working below the expected standard, or are not achieving age-related expectations (ARE) might benefit from other types of assessment more suited to their needs. Pupils working at the higher standard would benefit from further, more challenging, formative assessments to test the depth of their understanding, and to ensure they continue to be engaged and make progress.
APPENDIX
Our Standard Setting Process
There were two panels for each subject, operating independently from each other. They were composed of experienced teachers, test developers and educational consultants, who had no involvement in setting the questions.
Panel members were individually asked to review each item on the tests (as well as review the test as a whole) and to make a judgement as to how many pupils (out of 100) who are working just at the expected standard at the point of assessment would get each question correct. This meant they had to take into account the point in each academic year that the tests are recommended to be used and what would be expected to have been taught at each point. The outcomes were used to create a criterion-referenced test threshold for each test.
An expert panel meeting was then organised and each test threshold was discussed and compared with the live, incoming pupil performance data from MERiT, as well as historical item performance from previous years (to give a norm-referenced viewpoint). The final test thresholds were then agreed amongst panel members based on both expert judgement alongside analysis of pupil test performance.
This process is similar to the one used by the STA.
Outcomes
Testbase considered the recommendations of the two panels before setting the final standards.