2.4.2025

Koulutuspolitiikka ei kaipaa populismia (puheenvuoro Suomen Kuvalehti 2.4.2025)

"Tasokursseille ja perusopetuksen päättökokeille ei ole tutkimuksen näkökulmasta perusteita. Muutokset koulutukseen tulisi tehdä tutkitun tiedon pohjalta", kirjoittavat EDUCA-lippulaivatutkimuksen tutkijat. Tämän tekstin lopussa myös lähdekirjallisuutta, johon kirjoittajat viittaavat.

Koulut ja oppiminen puhuttavat erityisesti vaalien alla. Kunta- ja aluevaalikampanjoiden käynnistyttyä etualalle on noussut esimerkiksi tekeillä oleva ”kännykkälaki”, mutta keskustelun rakosista esiin nousee ehdotuksia myös tasoryhmien ja päättökokeiden käyttöönotosta.

Näitä perustellaan oppilaan yksilöllisen osaamisen ja motivaation huomioimisella. Hallituspuolueista perussuomalaisten puheenjohtaja Riikka Purra nosti esiin tasokurssit esitellessään puolueensa kunta- ja aluevaaliohjelmaa viime vuoden lopulla.

Tutkijoina tiedostamme, että koulutus on aihe, josta kuuluu käydä poliittista ja ideologista keskustelua.

Sinällään yksinkertaisilta näyttävien koulutuspoliittisten linjausten takana on kuitenkin monisyinen tutkimuksellinen kenttä, jonka soisimme näkyvän myös poliittisessa keskustelussa.

Peruskoulun alkuvaiheessa tasokurssit olivat käytössä matematiikan ja kielten opetuksessa. Osa oppilaista valikoitui suppeille tasokursseille, jotka eivät tuottaneet jatko-opinto-oikeutta lukioon.

Tasokursseista luovuttiin vuonna 1985, kun tutkimukset osoittivat, että valikoituminen niihin ei perustunut pelkästään oppilaan osaamiseen. Tasokursseista tuli sosioekonomisen ja sukupuolittuneen valikoinnin välineitä, joilla kärjistäen sanottuna tukittiin etenkin työväenluokan poikien jatko-opintoväyliä.

Nykyään yhtenä suomalaisen koulutusjärjestelmän vahvuutena pidetään sitä, että meillä ei ole koulutuksellisia umpiperiä, vaan jokainen opiskelija voi halutessaan edetä opintopolullaan erilaisia reittejä aina tohtoritutkintoon asti.

Tutkimuskentältä ei myöskään löydy vahvaa näyttöä sille, että perusopetuksen päättökokeiden kaltaisella arvioinnilla voitaisiin edistää syvällistä oppimista.

Sen sijaan maailmalla on runsaasti näyttöä niiden epätoivotuista seurauksista niin yksilölle kuin yhteiskunnallekin.

Tutkimusperustaisesti onkin kehitetty malleja, joilla ei ole tällaisia haitallisia seurauksia. Opetushallitus on vuodesta 2018 alkaen kehittänyt perusopetuksen arvioinnin kriteeriperustaisuutta, jonka vahvuutena pidetään sitä, että oppimisen tavoitteet ja arvosanojen edellyttämä osaaminen on määritelty avoimesti etukäteen.

Tällöin opettajalla on mahdollisuus tunnistaa ja ottaa huomioon oppilaiden monenlaisia taitoja samalla, kun hän ohjaa oppimista.

Päättökokeiden sijaan oppilaan arviointia voisikin kehittää rakentamalla oppiainekohtainen valtakunnallinen tehtäväpankki. Sieltä opettajat voisivat tilata esitestattuja tehtäväsarjoja, jotka auttaisivat oppilaiden osaamistason määrittelyssä ilman, että tuloksia tarvitsee julkaista koulukohtaisina ranking-listoina. Listoista on tutkitusti enemmän haittaa kuin hyötyä.

Populistinen, yksittäisiin asioihin keskittyvä koulukeskustelu tekee kyllä ideologisia rajalinjoja, mutta samalla hukkuu kokonaiskuva. Tutkimusta oppimisesta ja siihen liittyvistä erilaisista yksilöllisistä ja yhteiskunnallisista tekijöistä on paljon.

Tarvitsemme nyt näiden tutkimusten kiteyttämistä, jolloin yksittäisten, ehkä raflaavien näkökulmien sijaan teemme näkyväksi oppimiseen ja koulutukseen liittyvät monisyiset tekijät.

Niiden pohjalta poliittisia arvovalintoja voidaan tehdä myös tutkitun tiedon pohjalta.

Mira Kalalahti on apulaisprofessori Jyväskylän yliopistossa
Taina Saarinen on johtaja ja tutkimusprofessori Jyväskylän yliopistossa Janne Varjo on professori Helsingin yliopistossa

Kirjoittajat ovat EDUCA – Koulutuksen tulevaisuus -lippulaivatutkimuksen tutkijoita.

Tutkimuskirjallisuutta päättökokeista ja tasoryhmistä

High-stakes test -tutkimuksia, joissa fokuksessa päättökokeet (ei välttämättä kansalliset kokeet, jotka voivat myös olla luonteeltaan low-stakes-testejä):

Nichols, S. L. (2007). High-Stakes Testing: Does It Increase Achievement? Journal of Applied School Psychology, 23(2), 47–64. https://doi.org/10.1300/J370v23n02_04

Literature rewiev on the impact on student achievement of high-stakes testing. The review concludes there is no consistent evidence to suggest high-stakes testing leads to increases in student learning. Some evidence suggests it may have a negative effect for some student groups and in some important subject areas (e.g., reading).

Berliner, D. (2011). Rational responses to high stakes testing: the case of curriculum narrowing and the harm that follows. Cambridge Journal of Education, 41(3), 287–302. https://doi.org/10.1080/0305764X.2011.607151

The inevitable responses to high stakes testing, wherein students’ test scores are highly consequential for teachers and administrators, include cheating, excessive test preparation, changes in test scoring and other forms of gaming to ensure that test scores appear high. Over the last decade this has been demonstrated convincingly in the USA, but examples in Great Britain abound. Yet the most pernicious response to high stakes testing is perhaps the most rational, namely, curriculum narrowing. In this way more of what is believed to be on the test is taught.

French, S., Dickerson, A. & Mulder, R.A. A review of the benefits and drawbacks of high-stakes final examinations in higher education. High Educ 88, 893–918 (2024). https://doi.org/10.1007/s10734-023-01148-z

High-stakes examinations enjoy widespread use as summative assessments in higher education. We review the arguments for and against their use, across seven common themes: memory recall and knowledge retention; student motivation and learning; authenticity and real-world relevance; validity and reliability; academic misconduct and contract cheating; stress, anxiety and wellbeing; and fairness and equity. For each theme, we evaluate empirical evidence for the perceived pedagogical benefits and pedagogical drawbacks of high-stakes examinations. We find that relatively few of the perceived academic benefits of high-stakes examinations have a strong evidence base.

Tutkimuskatsaus kolmesta eri tietokannasta löytyineistä vuosina 2010-2024 julkaistuista vertaisarvioiduista kansaivälisistä artikkelista, jotka käsittelevät kansallisia kokeita. Katsauksessa ei ole rajauduttu ainoastaan päättökokeisiin, vaan siinä tarkastellaan kansallisia kokeita yleisesti. Tiedote tutkimuksesta: https://www.jyu.fi/fi/uutinen/tutkimustieto-ei-puolla-perusopetuksen-kansallisia-kokeita

Tracking/ability grouping -tutkimukset:

Eric A. Hanushek, Ludger W ößmann, Does Educational Tracking Affect Performance and Inequality? Differences‐ in‐Differences Evidence Across Countries, The Economic Journal, Volume 116, Issue 510, March 2006, Pages C63–C76, https://doi.org/10.1111/j.1468-0297.2006.01076.x

Even though some countries track students into differing‐ability schools by age 10, others keep their entire secondary‐school system comprehensive. To estimate the effects of such institutional differences in the face of country heterogeneity, we employ an international differences‐in‐differences approach. We identify tracking effects by comparing differences in outcome between primary and secondary school across tracked and non‐tracked systems. Six international student assessments provide eight pairs of achievement contrasts for between 18 and 26 cross‐country comparisons. The results suggest that early tracking increases educational inequality. While less clear, there is also a tendency for early tracking to reduce mean performance.

Betts, J. R., & Shkolnik, J. L. (2000). The effects of ability grouping on student achievement and resource allocation in secondary schools. Economics of Education Review, 19(1), 1-15.

A school policy of grouping students by ability has little effect on average math achievement growth. Unlike earlier research, this paper also finds little or no differential effects of grouping for high-achieving, average, or low-achieving students. One explanation is that the allocation of students and resources into classes is remarkably similar between schools that claim to group and those that claim not to group. The examination of three school inputs: class size, teacher education, and teacher experience, indicates that both types of schools tailor resources to the class ability level in similar ways, for instance by putting low-achieving students into smaller classes.

Terrin, É., & Triventi, M. (2022). The Effect of School Tracking on Student Achievement and Inequality: A Meta-Analysis. Review of Educational Research, 93(2), 236-274. https://doi.org/10.3102/00346543221100850 (Original work published 2023)

This meta-analysis examines the effects of sorting secondary students into different tracks (“between-school” tracking) or classrooms (“within-school” tracking) on the efficiency and inequality levels of an educational system. Efficiency is related to the overall learning achievement of students, whereas inequality can refer to “inequality of achievement” (i.e., the dispersion of outcomes) or “inequality of opportunity” (i.e., the strength of the influence of family background on student achievement). The selected publications are 53 analyses performed in the period from 2000 to 2021, yielding 213 estimates on efficiency and 230 estimates on inequality.

Lavrijsen, J., & Nicaise, I. (2015). New empirical evidence on the effect of educational tracking on social inequalities in reading achievement. European Educational Research Journal, 14(3-4), 206-221. https://doi.org/10.1177/1474904115589039 (Original work published 2015)

One of the major imperatives behind the comprehensivisation of secondary education was the belief that postponing the age at which students are tracked in different educational routes would mitigate the effect of social background on educational outcomes. Comparative investigations of large-scale international student achievement tests in secondary education, such as PISA, have indeed suggested that individual test results depend less on social origin in countries that have postponed tracking age. However, a crucial pitfall in such cross-sectional studies is that many other factors influence the effect of social origin on achievement as well. In order to account for possible unobserved confounder bias, and to acknowledge the fact that part of the social origin effect already exists prior to the introduction of tracking, we apply a difference-in-differences analysis to data from PIRLS (primary education, 2006, N = 33, n = 171.486) and PISA (secondary education, 2012, N = 33, n = 235.378). Our results confirm that the introduction of tracking increases the effect of social origin on reading achievement between primary and secondary education. This lends further support to the argument that postponing the tracking age can foster social equity in educational achievement.

Strello, A., Strietholt, R., Steinmann, I. et al. Early tracking and different types of inequalities in achievement: difference-in-differences evidence from 20 years of large-scale assessments. Educ Asse Eval Acc 33, 139–167 (2021). https://doi.org/10.1007/s11092-020-09346-4

Research to date on the effects of between-school tracking on inequalities in achievement and on performance has been inconclusive. A possible explanation is that different studies used different data, focused on different domains, and employed different measures of inequality. To address this issue, we used all accumulated data collected in the three largest international assessments—PISA (Programme for International Student Assessment), PIRLS (Progress in International Reading Literacy Study), and TIMSS (Trends in International Mathematics and Science Study)—in the past 20 years in 75 countries and regions. Following the seminal paper by Hanushek and Wößmann (2006), we combined data from a total of 21 cycles of primary and secondary school assessments to estimate difference-in-differences models for different outcome measures. We synthesized the effects using a meta-analytical approach and found strong evidence that tracking increased social achievement gaps, that it had smaller but still significant effects on dispersion inequalities, and that it had rather weak effects on educational inadequacies. In contrast, we did not find evidence that tracking increased performance levels. Besides these substantive findings, our study illustrated that the effect estimates varied considerably across the datasets used because the low number of countries as the units of analysis was a natural limitation. This finding casts doubt on the reproducibility of findings based on single international datasets and suggests that researchers should use different data sources to replicate analyses.

Nomi, T. (2009). The Effects of Within-Class Ability Grouping on Academic Achievement in Early Elementary Years. Journal of Research on Educational Effectiveness, 3(1), 56–92. https://doi.org/10.1080/19345740903277601

By incorporating two theoretical frameworks this study examines how school characteristics shape first-grade reading ability-grouping practices, and how this, in turn, affects students’ reading achievement. The author uses the data from the Early Childhood Longitudinal Study and applies the propensity-score method to examine whether first-grade ability grouping improves student achievement, whether ability grouping increases achievement inequalities, and whether its effects vary by student initial abilities and/or school contexts. Findings support an argument that ability grouping is an organizational response to problems of diversity in the student body. Schools that use ability grouping are likely to have heterogeneous ability compositions. They are also public, low-performing, low socioeconomic status, and high-minority schools. In these schools, ability grouping has no effects or negative effects, particularly for low-ability students. In contrast, ability grouping may improve achievement for all students in schools with advantageous characteristics, mostly private schools, and may reduce achievement inequalities, because low-ability students benefit the most from this practice.

Deunk, M. I., Smale-Jacobse, A. E., de Boer, H., Doolaard, S., & Bosker, R. J. (2018). Effective differentiation practices: A systematic review and meta-analysis of studies on the cognitive effects of differentiation practices in primary education. Educational Research Review, 24, 31-54.

This systematic review gives an overview of the effects of differentiation practices on language and math performance in primary education, synthesizing the results of empirical studies (n = 21) on this topic since 1995. We extracted 78 effect sizes from the included studies. We found that using computerized systems as a differentiation tool and using differentiation as part of a broader program or reform had small to moderate positive effects on students’ performance. Between- or within-class homogeneous ability grouping had a small negative effect on low-ability students, but no effect on others. The finding that computer technology can be a useful tool to facilitate differentiated instruction is not covered in earlier reviews. Moreover, our findings emphasize that homogeneous ability grouping alone is not enough to guarantee differentiated instruction. This stresses the importance of embedding differentiation practices in a broader educational context.

Au, W. (2022). Unequal by design: High-stakes testing and the standardization of inequality (2nd ed.). Routledge. https://doi.org/10.4324/9781003005179

Ball, S. J. (2003). The teacher’s soul and the terrors of performativity. Journal of Education Policy, 18(2), 215–228.

Ball, S. J. (2018). Banality of numbers. In B. Hamre, A. Morin, & C. Ydesen (Eds.), Testing and inclusive schooling: International challenges and opportunities (pp. 79–86). Routledge. https://doi.org/10.4324/9781315204048

Brass, J., & Holloway, J. (2021). Re-professionalizing teaching: The new professionalism in the United States. Critical Studies in Education, 62(4), 519–536. https://doi.org/10.1080/17508487.2019.1579743

Darling-Hammond, L. (2004). Standards, accountability, and school reform. Teachers College Record, 106(6), 1047–1085.

Ian Hardy (2015) Data, Numbers and Accountability: The Complexity, Nature and Effects of Data use in Schools, British Journal of Educational Studies, 63:4, 467-486, DOI: 10.1080/00071005.2015.1066489

Hardy, I. (2018). Governing teacher learning: Understanding teachers’ compliance with and critique of standardization. Journal of Education Policy, 33(1), 1–22. https://doi.org/10.1080/02680939.2017.1325517

Holloway, J. (2022). Metrics, standards and alignment in teacher policy critiquing fundamentalism and imagining pluralism. Springer. https://doi.org/10.1007/978-981-33-4814-1

Holloway, J., & Brass, J. (2018). Making accountable teachers: The terrors and pleasures of performativity. Journal of Education Policy, 33(3), 361–382. https://doi.org/10.1080/02680939.2017.1372636

Högberg, B., Lindgren, J., Johansson, K., Strandh, M., & Petersen, S. (2021). Consequences of school grading systems on adolescent health: Evidence from a Swedish school reform. Journal of Education Policy, 36(1), 84–106. https://doi.org/0.1080/02680939.2019.1686540

Kelly, P., Andreasen, K. E., Kousholt, K., McNess, E., & Ydesen, C. (2018). Education governance and standardised tests in Denmark and England. Journal of Education Policy, 33(6), 739–758. https://doi.org/10.1080/02680939.2017.136051

Lewis, S., Savage, G. C., & Holloway, J. (2020). Standards without standardisation? Assembling standards-based reforms in Australian and US schooling. Journal of Education Policy, 35(6), 737–764. https://doi.org/10.1080/02680939.2019.1636140

Polesel, J., Rice, S., & Dulfer, N. (2014). The impact of high-stakes testing on curriculum and pedagogy: A teacher perspective from Australia. Journal of Education Policy, 29(5), 640–657. https://doi.org/10.1080/02680939.2013.865082

Sahlberg, P. (2016). The global educational reform movement and its impact on schooling. In K. Mundy, A. Green, B. Lingard, & A. Verger (Eds.), The handbook of global education policy (pp. 128–144). John Wiley & Sons.

Sturrock, S. (2024). ‘Gaming’ in the English primary school: ‘Do whatever you need to do to make your data look good.’ Journal of Education Policy, 1–23. https://doi.org/10.1080/02680939.2024.2360993

Thrupp, M. (2018). The search for better educational standards. A cautionary tale. With Responses from Bob Lingard, Meg Maguire and David Hursh. Springer.

Torrance, H. (2017). Blaming the victim: Assessment, examinations, and the responsibilisation of students and teachers in neo-liberal governance. Discourse: Studies in the Cultural Politics of Education, 38(1), 83–96. https://doi.org/10.1080/01596306.2015.1104854

Verger, A., Parcerisa, L., & Fontdevila, C. (2019a). The growth and spread of large-scale assessments and test-based accountabilities: A political sociology of global education reforms. Educational Review, 71(1), 5–30. https://doi.org/10.1080/00131911.2019.1522045

Verger, A., Fontdevila, C., & Parcerisa, L. (2019b). Reforming governance through policy instruments: How and to what extent standards, tests and accountability in education spread worldwide. Discourse: Studies in the Cultural Politics of Education, 40(2), 248–270. https://doi.org/10.1080/01596306.2019.1569882

Mitä empiiristen tutkimusten perusteella tiedetään kansallisista kokeista?https://journal.fi/kasvatus/article/view/157366

Laura Ketosen ja Päivi Atjosen tutkimuskatsaus kolmesta eri tietokannasta löytyineistä vuosina 2010-2024 julkaistuista vertaisarvioiduista kansaivälisistä artikkelista, jotka käsittelevät kansallisia kokeita. Tiedote tutkimuksesta: https://www.jyu.fi/fi/uutinen/tutkimustieto-ei-puolla-perusopetuksen-kansallisia-kokeita