Every year, over 16% of the Australian population enrol in schools across the country. Some of these students choose to pay for their education, whilst others don’t. Inevitably, a discussion will arise regarding whether there is any real academic advantage in a more expensive education; the age-old private vs public debate.
Due to the swift rise of data science techniques and resources over the previous decade, an understanding of this problem through data science is now more attainable than ever. This article seeks to assess the relationship between socio-educational factors and the outcome of a students’ academic achievement and, most importantly, to determine whether paying more for schooling is justified through higher academic outcomes, in particular those measured on standardised tests. We achieve this through a deep statistical analysis on both the 2016 Census, and 2018 NAPLAN datasets.
The results reveal some fascinating facts about what seems to matter for education and what doesn’t.
The Education Problem
The goal of this research is to understand how socio-educational factors affect student academic achievement. Taking the NAPLAN data, and purely considering a school’s average score across each of the six tests, we find the results presented in the following figures.
From the above figure, independent and Catholic schools do perform marginally better when simply compared to government schools. We could take this at face value, finish the problem here and say, yes, paying more for schooling and going to a private school holds more promise of achieving a better result.
Unfortunately, drawing this conclusion is not realistic. Quite simply, comparing schools based totally on their results does not paint the full picture. A multitude of other factors may ultimately affect how a student performs at school, like a student’s upbringing, a population’s demographic, or a school’s characteristics. Accounting for these, we see a very different response to the problem, that is, school sector has a relatively insignificant impact on students’ performance.
For a concrete example on understanding how other factors may affect student performance, take the ACARA defined ICSEA value, best explained here. The ICSEA value tells us a variety of interesting things about a school’s student-base and their socio-educational position. If we illustrate the distribution of ICSEA scores across each school type, we find the results shown in the following figure.
In the above figure, we see a much wider spread on ICSEA scores for government schools, and higher averages achieved by Independent and Catholic schools. Based on this, it is a fair assumption to say we might expect Independent and Catholic schools to, on average, enrol students from more advantaged socio-educational backgrounds.
If students from a more advantaged background on average perform better, then by extension, we can assume that independent and Catholic school results are influenced by the students they enrol. So naturally, we would see independent and Catholic schools performing better than government, based on their student demographic. This simple intuition can be extended to many other factors, and represents the need to account for all possible variables when looking to truly understand how tuition fees affect student success.
Approaching the Problem
Our approach takes the NAPLAN results from 2018, combined with the ABS 2016 Census data, and uses these to understand how different socio-economic factors affect student performance. The main problem we’re trying to understand is enormously complex, with many variables that could ultimately affect academic achievement. In particular, we’re looking to understand the following:
- A student’s background, i.e. family encouragement, parent contributions, extra-curricular activities, etc.,
- Contributing factors to school fees, i.e. how tuition fee depends on school size, location, and reputation in a community.
Naturally, these include a huge number of possible variables, the challenge being that some of which are particularly difficult to quantify. For example, how do we express a student’s natural academic ability as a numeric value? This question is a foundation of the main challenge the problem presents.
Considering this, our final solution needs to account for high levels of complexity, whilst making do with the limited data available. We took advantage of two core approaches:
- Making assumptions to reduce the scope of the problem. Notably, to limit the effect of confounding variables, the scope of the project was narrowed down to exclusively include Melbourne schools, within grade three and five student results. This controls for background education effects which would be more prevalent if we analysed later years (i.e. after completing primary school).
- As we don’t have the luxury of being able to carry out randomised controlled trials, we used proven and well understood statistical techniques to establish the associations between variables. We implemented our solution through Instrumental Variable analysis, a form of linear regression that can account for unobserved variations in data. This analysis ultimately gives us a means of accurately modelling the problem using a restricted dataset. Additionally to ensure the coming results are as robust as possible we have applied a number of techniques to account for confounders, collinearities and other problems which cause such analyses to go wrong.
The beauty of using a regression model is that it provides the ability to identify the strength and direction of correlations between each of our model’s variables, and school achievement. This is characteristic of using regression models, and incredibly useful in our use case.
In simple terms, the model returns a single numeric value for each variable in our model, termed model coefficients. Given a one-unit change in each variable, the predicted student academic achievement will then increase/decrease by the model coefficient. For example, in a model that estimates a variable representing the percentage of female teachers in a school to hold a value of positive four, we can assume that a 1% increase in the percentage of female teachers at the school will equate to an average increase of four marks in the NAPLAN.
Using this method, we can summarise the model results using the illustration shown in the above figure, which shows for each variable how changes in these values in the dataset affect average student academic achievement. Although we used the average NAPLAN score as education output to build our model (instead of each subject area, writing, numeracy, etc), a similar result also applies to separate analysis on the different NAPLAN subjects.
Furthermore, if we take the average effect of the values shown previously, we find the result shown in the below figure. You can see tuition fee highlighted in red; clearly still positive, but providing a relatively insignificant effect compared to its counterparts.
What is a Dollar of Education Worth?
Most importantly, the final model coefficient for tuition fee comes to 0.001044 with this being statistically significant. Intuitively then, we know that a one dollar increase in the amount paid for schooling corresponds to a 0.001044 increase in a school’s expected NAPLAN score. In simpler terms, a $1000 increase correlates to a 1.044 point score increase.
We can investigate how the NAPLAN results truly scale to get a better understanding of what this means. The below table illustrates how to achieve different NAPLAN bands based on scores in the grade three and five Numeracy exams from 2018. Note that this information is publicly available here.
|Band||Grade 3||Grade 5|
|Band 1||92.4 – 256.1||–|
|Band 2||270.8 – 318.2||–|
|Band 3||328.3 – 365.6||195.5 – 368.6|
|Band 4||374.4 – 417.9||379.2 – 423.8|
|Band 5||426.8 – 475.0||431.7 – 476.9|
|Band 6||486.0 – 682.5||484.3 – 522.2|
|Band 7||–||530.2 – 573.7|
|Band 8||–||583.5 – 777.8|
The national minimum standards for this exam are bands two and four for students in grades three and five respectively. To jump from the median band two score to the median band three score for a grade three student is approximately 55 points. A similar result holds for grade five students, and subsequent jumps between bands.
Therefore, we can assume that for a student to achieve a single band increase based entirely off paying more for tuition, they require approximately 55 additional NAPLAN Scale Score points. Based on our model results based on the discovered correlations, this is equivalent to paying an additional $52,000 purely in tuition fees.
In practice, the most students from Victoria will ever pay for schooling is around $30,000, rendering scores up to 55 points impossible. Median private schools pay around $9000 on tuition, and by extension, this would give approximately 10 additional predicted points. Obviously, these bonuses cannot solely drive student achievement into higher bands.
By comparison, the median percentage of people in a population whose marital status classifies as married was 40.44%. With a coefficient of 1.104, this percentage is correlated with a score increase of 44.66. The maximum percentage of married people in a population used in this study was 51.61%, which would correspond to a score increase of 56.99. The interpretation here is that in tact families assist students with their academic achievements, which helps put the tuition fee benefits in perspective.
Similarly, the median proportion of a population who regularly undertake volunteering duties was around 15.25%. With a parameter estimate of 1.72, this means the median score increase related to the number of volunteer workers was 26.28 NALPAN score points. The maximum effect based on data used in this study was a population with 24.19% of people who volunteered, offering an increase of 41.69 points.
In comparison to tuition fees, it’s clear other variables provide significantly more influence in our final model. Ultimately, this demonstrates how in practice, school results are clearly influenced by many different factors, not just the school type or amount paid. This allows parents to make better decisions as to where to invest their money and whether living in the catchment of a school with high volunteer rates, married families who have highly educated parents could be a much better investment in their children’s education than paying significant tuition fees.
Our analysis and the results of it indicate that paying more for schooling does positively relate to academic achievement. Conversely, this effect does not provide substantial enough benefit to solely allow students to achieve outstanding results as was to be expected. Instead, other more important factors seem to encourage academic achievement, including family composition, a population’s level of education, community involvement, and a school’s structure.
The key takeaway from this report reduces to the fact that these influences, namely family contributions and population characteristics, play a far more important role in determining student outcomes than the amount paid for tuition does. Notably, this aligns with the findings of Fryer & Levitt (2004) who undertook a similar regression style approach to understanding the racial test-score gap in early-year students. Notably, they show the following are important for academic outcomes:
- A child has highly educated parents
- A child’s parents have high socio-economic status
- A child’s parents speak English at home
These results go further to enforce our findings that tuition fees specifically aren’t as important in contributing to student success. For further details on the analysis that lead to the results presented here, the reader is encouraged to download the reports and discover the limitations and depth of analysis presented there. The reports can be downloaded here and here.
Understanding this, we need to acknowledge the complexity of this problem, and the limitations of what data made available to this project. Whilst these results go some way to understanding how paying for schooling may affect student achievement, a further study involving a more comprehensive dataset and deeper statistical analysis would be necessary to truly understand the full relationship between tuition fee and academic success. In particular, the results presented here are based on regression techniques which can establish associations but cannot guarantee causation.
We encourage future researchers to build upon the results here to establish the causes of the effects we see here.
Most importantly, the work presented here only investigates academic achievement at the school level. When deciding where to send a child to school, whilst still accounting for our results, utmost importance rests on considering a student as an individual such as:
- What do they like to study?
- What opportunities are available at different schools?
- What will give the student the best compromise between doing well academically and making the most of what a school can offer socially?
Whilst mathematics might suggest one thing to be true, in practice it might not always be the case, a cautionary tale that can be extended to the world of data science and machine learning in the modern-day.
Fryer R, Levitt S. Understanding The Black-White Test Score Gap in the First Two Years of School. The Review of Economics and Statistics. 2004.
For inquiries, reach out to the authors, Nicholas Thompson, Meichen Qian, Jonathon Allport and Evan Shellshear.