A Review of European Research on Content and Language Integrated Learning

Introduction. Subjects’ integration in formal teaching can play an important role in addressing the issue of authentic and meaningful learning as opposed to rote memorisation. Content and language integrated learning has been the subject of educational studies for three decades. The scope of research is broad, and from the primary focus on foreign language performance, it has been slowly extended to the impact of Content and language integrated learning on content and mother tongue. The purpose of the research is to summarize selected research articles on Content and language integrated learning application and to estimate its summary average effect on content development in a group of students aged 10‒16. Materials and Methods. The article presents the systematic review of the studies published in the Web of Science database in the last decade (2010–2020) and surveys the selected empirical studies that focus on the impact of Content and language integrated learning implementation on the content subjects at primary and secondary schools. Sixteen studies met the inclusion criteria and were included in the analysis. Data from six studies were also statistically evaluated using Comprehensive meta analysis and RevMan software, and the synthesis is presented in the Results and Discussion parts. Results. Based on the 16 discussed studies’ results, Content and language integrated learning intervention produces positive added value; however, the statistical meta-analysis showed no statistically significant differences between the Content and language integrated learning and non-CLIL groups in their content knowledge and the results favouring non-CLIL groups. As the groups’ size differed in terms of absolute value, the pooled standard deviation was used to reflect the sample sizes and standard deviation were averaged with more weight given to the larger sample groups. Discussion and Conclusion. The practical significance and prospects of the study lie in pointing out the benefits of Content and language integrated learning and stressing the importance of its inclusion in teacher training study programmes along with the development of pre-service teachers’ creativity, critical thinking and ability to create their materials.


Introduction
Communication in a foreign language(s) belongs to the key competencies for life-long learning. Communication involves not only skill(s) but also intercultural understanding. Foreign languages have been learnt and taught for centuries. The research into the history of English as a foreign language МЕЖДУНАРОДНЫЙ ОПЫТ ИНТЕГРАЦИИ ОБРАЗОВАНИЯ teaching maps describes different methods and approaches. The methods have reflected the current knowledge and the theories as well as learners' needs. Content language integrated learning (CLIL) is a methodology when a foreign language functions as a means of communication rather than the aim of teaching, and language learning becomes meaningful and leads toward better long-term retention. It is based on dual aims (content / subject discipline aims and language teaching aims), and its main principles are often presented by the abbreviation 4Cs 1 -content, communication, cognition and culture, or extended 5C, which includes the development of competence 2 . The competence in this context means that teachers should consider "can-do" statements 3 they want their students to be able to do by the end of the lesson. The framework 4C+1 was suggested by Lynch [1] addressing also the theory presented by Coyle, Hood and Marsh 4 . The term CLIL came into existence in 1994 [2]; however, it has a much longer history and the signs of the CLIL principles have been present in different approaches; e. g. integrated thematic instruction (school model designed by Susan Kovalik 5 ), immersion [3], content-based instruction [4], task-based language teaching [5], English for specific purpose [6][7][8][9][10] or bilingual education (see, e.g. [11]). Marsh refers to CLIL as to a "generic term" and describes it as an "educational approach in which diverse methodologies are used which lead to dual-focussed education where attention is given to both topic and language of instruction that includes a wide range of approaches" [12, p. 233]. This is one of the reasons why the studies on CLIL might be difficult or even inappropriate to compare. The studies where the applied approach is defined as CLIL may differ in the perception of what CLIL is, and it happens that more rigorous researchers would evaluate or define it as immersion or perhaps CBI (content-based instruction), see [13; 14].
CLIL has substantially increased across Europe [15], and it is also reflected in the research. The earliest studies have primarily examined the impact of CLIL on foreign-language performance. Later, the focus was also shifted to the content learning impact and different aspects that may influence the results of CLIL application on language and content learning. The aspects that influence CLIL results, e.g. classroom interaction [16], strategies [17; 18], age [19], the influence of affective factors [20; 21] [25; 26], became the subject of more in-depth focus. The experimental data are rather positive [27][28][29][30][31], especially concerning the positive impact of CLIL on language gains, motivation increase and positive change of attitudes towards foreign language learning; however, it is a more complex issue, and as such, the impact has to be studied and interpreted from a much broader context. Pérez Cañado studied the effects of CLIL on mother tongue (L1) and content learning, and she stresses that the CLIL students who receive instruction in L1 outperform their peers, especially in the long term, and she suggests that increased time and input are crucial for CLIL students "to achieve either the same or superior content results as their monolingual peers" [27].
In their study, Castellano-Risco, Alejo-González & Piquer-Píriz reported that CLIL students nearly doubled regular EFL learners' receptive vocabulary knowledge in their research sample [28]. The study presents the experiment results realised with the convenience sample of 44 third-grade secondary school students where the CLIL group was exposed to cca 810 more hours 1 Coyle D. Developing CLIL: Towards a Theory of Practice. In: Monograph 6. Barcelona, Spain: APAC Barcelona; 2006. (In Eng.) 2 Montalto S.A., Walter L., Theodorou M., Chrysanthou K. The CLIL Guide Book. Lifelong Learning Program. 2014. Available at: https://www.languages.dk/archive/clil4u/book/CLIL%20Book%20En.pdf (accessed 25.11.2020). (In Eng.) 3 Ibid. 4 Coyle D., Hood P., Marsh D. CLIL: Content and Language Integrated Learning. Cambridge: Cambridge University Press; 2010. Available at: https://abdn.pure.elsevier.com/en/publications/content-and-languageintegrated-learning (accessed 25.11.2020). (In Eng.) 5  of the foreign language than the non-CLIL group. The authors highlight the added value of CLIL application -an effect on the way learners perform when learning languages. CLIL students apply language learning strategies more effectively compared to their non-CLIL peers. The learning is influenced by the motivation and attitudes of students towards the subject. The students' attitude towards the target language was also the subject of numerous studies, some of which indicate that it declines in time [29]. Martínez Agudo [30] confirmed the results of research conducted by Somers & Llinares [31] and claimed that motivation observed at the initial stages of CLIL implementation starts to decline when CLIL becomes a regular practice and is no more a novelty for learners.
The impact on content and a possible influence (both positive and negative) on L1 should be studied systematically simultaneously with the effect on foreign language. Impressive results on using CLIL and L1 proficiency are presented in the studies, e.g. [27; 32-34] and indicate that the research in the field should be extended. Naturally, using a target language in content subject evokes the questions concerning L1 knowledge and its possible problems; e.g. attention divided between learning language and learning content; smaller L1 vocabulary size compared to learnersʼ monolingual pairs; problems with subject terminology in L1. etc. L1 formed the central focus of a study by Navarro-Pablo & Gándara [32] in which the authors found that CLIL does not negatively influence the development of learners' L1 skills and knowledge, and they suggest that appropriate CLIL delivery based on the years of implementation and experience may benefit it. Similarly, Pérez Cañado claims, based on her research, that "CLIL is not detrimentally impacting L1 competence and is not watering down content learning, on which the positive impact of CLIL is particularly felt in the long term" [27, p. 18]. Esther Nieto-Moreno-de-Diezmas studied the effect of content and language integrated learning on reading competence development in the mother tongue and observed and concluded a significantly positive impact of bilingual education on vocabulary in the L1 [33]. The author claims that "bilingual education (CLIL) does not harm the acquisition of reading competence in the mother tongue, since there are no significant overall differences between the CLIL experimental group and mainstream students" [33, p. 50]. What more, the results of the nationwide Spanish study [32] show not just a neutral effect but a positive one. The sample consisted of 271 primary (age 11) and secondary school (age 15) students from seven public schools and 38 EFL teachers, content teachers and language assistants. One of the study aims was to focus on the differences in Spanish Language and Literature results between CLIL and non-CLIL students. There were statistically significant differences in Spanish Language and Literature results between CLIL and non-CLIL students, favouring CLIL ones: both primary and secondary CLIL students outperformed non-CLIL students despite expectations derived from reduced L1 input in their end-of-year assessment in Spanish Language and Literature.
The research results, however, have to be carefully interpreted as, as it has been already mentioned, there are many moderating variables and factors that influence the strength or direction of an effect (of the educational process -age, gender, proficiency level, motivation, exposure time, intensity, social status and others) (see also [35]).
The replication studies are very important in research generally and in education especially. The experimenters should follow the same procedures; however, it is very difficult to minimise the factors that influence education results. There are not many studies that have examined the effects of CLIL on the content subjects and provide quantitative evidence of this impact. Not rarely, the experimental studies use relatively small convenience samples and the effect size referring to the value of a statistic calculated from a sample of data, similarly as the significance can be difficult to interpret. Systematic reviews and meta-analysis can be used to aggregate the effect size by integrating the results of different studies (selected based on defined criteria according to the research question) and meta-analyses enable the researchers to synthesise data from research with the same characteristics.
This study aims to establish comprehensive evidence about the effect of CLIL on the content subject. Therefore, we will conduct a review and meta-analysis of selected studies' data and compare the CLIL versus non-CLIL group. Concerning the data presented in this research, we have to admit, that we do not deal with the replication studies but tried to select the studies that work with a similar sample (age, type of school) and focus on the influence of CLIL on a subject discipline (content knowledge) based on the experimental data analysis.

The Effect of CLIL on Content Learning
Bonces defines CLIL as "a coherent way of doubling the amount of exposure to the language, without the necessity of adding more room in the timetable for language (only) lessons" [36, p. 183] what might seem to suggest that the primary focus of CLIL is language learning. This is not entirely the real situation as CLIL is content-driven 6 and the dual aims should guarantee content development. Yet, Cenoz, Genesee & Gorter indicate that the proofs about the balanced pedagogic integration of content and language in CLIL are unconvincing [37]. Even after 10 years since the text was published we still miss the data that would indicate the change. To date, studies investigating CLIL and its impact on the content subject have produced equivocal results. Some studies have shown the beneficial effects of CLIL, but others have shown a deterioration in the results.
A systematic review realised by Graham et al. presents the results of 25 studies that were focused on language and/or content development as a result of CLIL application [38]. Six of them dealt with content (Mathematics (2), Physics (1); Accounting, Finance and History (1); World economy history and world economy (1); Science (1)). In one case there was no statistically significant difference between the groups; in 2 studies (both focused on Mathematics) CLIL groups performed better. Speaking about all 25 studies, there are only three cases with the results positive in favour of non-CLIL, eight studies found no difference between the CLIL and non-CLIL groups and in the rest of the studies (14) CLIL students reached better results than the non-CLIL students. The authors stress the need to be cautious, as not all studies present the pre-test results. It is equally important to mention that various studies mention not only positive results in knowledge gains but they also present based on both quantitative and qualitative data, change of motivation [39], attitude [40; 41], use of strategies [42], way of thinking, critical thinking [43], analogical reasoning [44] etc.
This study synthesises research located from WoS database on CLIL teaching conducted in the last 10 years and examines the impact of CLIL on content knowledge gains in the groups of 10-16 years old learners in CLIL and non-CLIL settings.

Materials and Methods
Review question. This review focuses on studies exploring the effectiveness of a CLIL on content subject knowledge in a group of students aged 10-16. In the majority of European countries CLIL is implemented at the lower secondary education (ISCED 2) what corresponds to the 10/11-15/16 years. The review also aims to answer the questions which methods are common to assess content learning in CLIL and define the possible gaps in the research.
Selection of the studies. To identify the relevant studies, four databases from the Web of Science Core Collection were used as a source of high-quality peer-reviewed studies, namely (1) Science Citation Index Expanded, (2) Social Sciences Citation Index, (3) Arts & Humanities Citation Index and (4) Emerging Sources Citation Index. The timespan was limited to the studies published in the period from 2010 till 2020. The basic search (looking for "CLIL research" studies) resulted in 395 studies.
Inclusion criteria. All texts were screened following the PRISMA protocol steps. PRISMA statement consists of a checklist and a flow diagram, and it is a set of items for reporting systematic reviews. It is available online 7 and has been published in several

INTERNATIONAL EXPERIENCE IN THE INTEGRATION OF EDUCATION
journals. After the first screening, six articles were identified as duplications published in two different journals. To be included in the study the text had to (1) apply quantitative research methods, (2) the sample age corresponds to the research question (10-16 years) and (3) possibly provide statistical data (n, mean, SD) (4) comparing intervention and conventional group (5) with the focus on content/subject discipline knowledge (gain).
Forty-five articles were excluded after the titles screening. Some titles (e.g. Teaching linguistics to low-level English language users in a teacher education programme: an action research study; Languages of schooling in European policymaking: present state and future outcomes; Empowering Teachers, Triggering Change: A Case Study of Teacher Training through Action Research) define the sample age or the focus and thus it was possible to exclude them. Based on the abstract reading, the screening resulted in 112 articles, out of which 94 full-texts were retrieved. The majority of abstracts introduce the sample age and the focus and based on this information a relatively vast number of the articles were excluded. As the school systems and the terminology are not identical across the countries (e.g. term secondary school student -in Slovakia responds to a 15-19 years old student, while in Spain to a 12-16 years old learner), it was not possible to identify the age of the sample described what resulted in the subsequent exclusion of the articles due to the sample's age (n = 25) after the full-text screening. Another 40 articles were excluded due to their focus not corresponding with the focus of the present systematic review, three articles were written in Spanish (even though the language was one of the criteria three Spanish texts were included), and four articles were critical reviews and did not present the research in the studied field of CLIL (figure). The list of studies was reduced to 22 out of which, after full-text reading and critical appraisal, sixteen following studies (table 1) were selected as the subject of the present analysis. Six studies that were excluded during the second reading described, e.g. the research based on one 40-minute lesson; data were collected through questionnaires to teachers to evaluate the impact on teaching and learning through this approach, etc.
Six articles presented data (n, mean and SD for both CLIL and non-CLIL groups) that allowed us to realise statistical meta-analysis. Comprehensive meta analysis (CMA version 3.3.070, trial/evaluation version) and RevMan (Review Manager 5) software were used to conduct a meta-analysis. The measures of the effect of the intervention were generally continuous data based on results obtained in a test and we used mean and standard deviation to compare the effect. Even though we tried to select the studies that met set criteria, the effect size could vary according to the not controlled variables or the broader set limit (e.g. age, different populations) and thus we applied a random-effects model. The level of statistical significance was set at p < 0.05. As the studies in the analysis did not use the same scale, it would be not appropriate to use raw differences in means and thus to assess the outcome the standardised mean difference (δ) and the unbiased estimate of δ (Hedges' g) were used. Hedges's g, also called corrected effect size, measures the effect size. Hedges's formula is pooled is a pooled standard deviation, a weighted average of standard deviations for two or more groups. Pooled standard deviation reflects the sample sizes and standard deviation are averaged with more weight given to the larger sample groups. Glenn introduces three levels or categories of effects (a) small effect (cannot be discerned by the naked eye) = 0.2, (b) medium effect = = 0.5 and (c) large effect = 0.8 8 .

Results and Discussion
The content subjects in the studies differ; there were five studies focused on natural science (2 Natural sciences, 1 Geography, 1 Biology and 1 Physics), four studies on Mathematics, 2 History, 1 Social science, 1 Science, 1 Music, 1 Physical Education and 1 Digital Skills. Almost one-third of the studies were conducted in Spain (6), 4 in Germany and 1 in Belgium, Cyprus, the Czech Republic, Greece, Norway and Switzerland.
Arts and Mathematics communicate the meaning in "universal languageˮ, they need no or little translation as they communicate meaning via symbols and images and thus many teachers opt to introduce CLIL via those subjects. This allows the teachers to apply multiple foci, create a safe and rich learning environment, realise authentic task, support active learning, visualise and scaffold the content 9 . Surmont et al. observed the group of first-year pupils of secondary education in the Dutch-speaking part of Belgium (n = 107; M = 53, F = 54, age 12.3 years) [45]. The pupils voluntarily selected the possibility to follow the CLIL course (n CLIL = 35). The CLIL group followed the course for ten months and three testings were realised during the period (T0 at the beginning, T1 after three months and T2 after ten months at the end of the course) to observe the progress of individual groups and to compare the results. The researchers used three versions of the mathematical tests Mathematical Assessment Test-Help (MATH). After ten months, the mean scores of both groups showed statistically significantly different improvements compared to the mean scores at the beginning. Comparison between the groups showed that CLILʼs pupilʼs progress was significantly better (p < 0.5) over the 10 months period when compared to a control group. The difference was evident even after three months (T1).  A short term study [47] realised in the Czech republic focuses primarily on the students' perceptions of teaching mathematics in CLIL. The authors applied quantitative and qualitative tools with a sample of 5 th and 6 th graders (n = 55) at three different schools. The CLIL method was applied in the Math classes at least four times per month in the experimental groups. Except for the attitude tests students filled before the experiment, selected lessons were video-recorded and analysed; researchers also conducted the interviews with teachers and pupils. The attitude test results indicated significantly different results between CLIL and non-CLIL groups, with more positive results in the CLIL group. It has been mentioned above that numerous studies reported the positive motion shortly after the CLIL integration followed by its decline. Even though the authors mention longitudinal observations, compared to other studies, we deal here with two one-month periods and these can be evaluated as a short-term period. Speaking about observations, the authors mention an interesting finding about the teachers in CLIL classes; namely, they state that teachers used more activating methods and communicated more with their pupils [47, p. 110] and more intensive communication. The integration of mathematics content and Italian (2 groups) or Romansh (1 group) as a second language was a subject of study realised by Serra [48] in three Swiss primary schools. The researchers reported on a longitudinal study in which they observed learners in grades 1 to 6. The study inclusion criterion was the sample age should correspond to10-16 years. The study was included even though the age of pupils in the sample was 6 to 12, considering the fact that pupils at the end of the study (last three years students were aged 10-12 who are considered eligible participants) met the set criterion. The study concerned the oral and written production in L2, the role of interaction for L2 acquisition and progress in mathematics that was evaluated based on the standardised math tests. The narrative analysis was focused on "the relevance of repair sequences to draw focus on form while negotiating meaning" [48, p. 583]. The researchers highlight that grammar instruction and search for accuracy were mostly to be connected to focus on content activities, this made teaching more authentic and meaningful as it was the response to the real need and not only an occasional shift to linguistic code features. Those situations "bring learners to notice the relationship that exists between meaning, forms and function in a highly context-sensitive situation" [48, p. 586]. The math results do not only present the comparison of the experimental and control groups but also a representative sample (cockpit reference sample, n = 450). The tests in the experimental groups were realised yearly, control groups in Grade 1, 2 and 4 and Control cockpit results were used to compare the results for Grades 3-6. In grade 1 all control groups performed better than the experimental ones. The results changed in Grade 2 when only 1 Italian group reached weaker results than the control group. In Grade 3, similarly as in Grade 6 all bilingual classes perform better (2 groups significantly better) than their monolingual counterparts. The Italian groups results in grades 4 and 5 were lower than the Cockpit reference sample but the control classes in Grade 4 perform better than the reference sample. The mean percentage of the correct answers in the L2 mathematics test was 59.8% for Romansh group and 67.2% and 63.5% for Italian groups compared to the average of Cockpit reference sample -50%. The important conclusion, concerning the content teaching and learning the authors state, is that rephrasings in L2 and in L1 involved "both the subject language and the everyday language to convey the intended meaning" [48, p. 600] what effectively supported the processing of subject content.
Piesche et al. studied the impact of CLIL on Physics knowledge in a group of German 6 th graders (n = 722) [49]. It was (compared to the above-mentioned studies) a shortterm study (5 lessons lasting for 90 minutes). The group comparisons showed that monolingual educated learners (n non-CLIL = = 360) outperformed those in CLIL groups. The research state a small effect size (-0.2), and considering the possible reasons they mention the lack of previous CLIL experience and the level of their language proficiency. They also mention the volume of the mother tongue and target language as possible factor that can influence the results.
A similar sample was the subject of Spanish research (6 th graders, n = 709) realised by Fernández-Sanjurjo et al.
[50] who also present the data where the non-CLIL group (n = 357) outperformed the experimental CLIL group in Natural Science (p < 0.000). The test for science content was specially designed according to the curricular content and included closed and open questions. Students in a CLIL group learn two subjects in L2 and have two English lessons per week.
The non-CLIL students outperformed their counterparts also in a study published by Mattheoudakis et al. [51] who present the data of year research where CLIL was applied in Geography (that is currently considered as a subject bridging the natural and social sciences) in the 6 th grade (n CLIL = 26; n non-CLIL = 25) in Greece. CLIL learners had two classes of Geography per week instructed in English and both groups had eight classes of English weekly. Participants were tested three times and "CLIL learners scored higher in two out of the three tests; in content test 2 this difference reached statistical significance (p < 0.001)" [51, p. 9]. In content test 3 the non-CLIL group scored higher than the CLIL group. The content tests topics were the same for both groups, but the languages were different (English for CLIL group, Greek for non-CLIL groups). The foreign language receptive skills were also tested and the researchers summarised that CLIL practice had a positive impact on foreign language learning (even though the result was not statistically significant).
Meyerhöffer & Dreesmann [52] applied CLIL in a group of slightly older students. They studied learning gains and motivation in the 9 th grade CLIL and non-CLIL Biology classes in Germany. In their study, they highlight the importance of the selection that is applied in Germany. They explain that CLIL students are selected based on their previous academic results, their attitude towards foreign language learning, and their motivation in school. This is why they compared the groups of pre-selected students and bilingually inexperienced, non-selected students in their research. The sample consisted of 243 students (on average 14.3 years old, ranging from 12 to 16). The CLIL (experimental) group (n = 168) consisted of 85 bilingually inexperienced learners and 82 pre-selected learners for bilingual or gifted programmes. The authors of the study compared both total scores and gains of both control and experimental groups. They summarise that the increase in the content knowledge was similar between the groups and point out that the results "provide evidence against concerns that teaching non-selected students bilingually might lead to deficits in content knowledge acquisition" [52, p. 1].
Application of CLIL in Cypriot context was the subject of the study [53] with the focus on L2 vocabulary and content knowledge. Two quasi-experiments with different groups are described in the article. Both qualitative and quantitative data were collected. The pre-and post-treatment tests were administered in control and experimental groups to assess the vocabulary breadth and content knowledge. Video and audio recordings from experimental classrooms were analysed to interpret the quantitative data. There were no statistical differences at the outset of the experiment. The subject matter tests were realised four days after the experiments with the L1 and L2 items. Both the CLIL and non-CLIL groups exhibited a significantly positive increase indicating a positive impact of CLIL on content knowledge. Although there was a positive mean difference between the groups, the difference was not statistically significant. This was the truth for both experiments described.
The correlation of affective variables and content learning achievement in CLIL programmes was studied by Martínez Agudo [30]. English level was evaluated by collecting learners' English grades; a battery of tests was used to assess learners' intellectual aptitudes, and the content knowledge was measured by learners' final grades (out of a total score of 10). The author stresses that "summative assessment may certainly generate heightened test anxiety in many cases due to added pressure on CLIL students to show both language and content related competences" [34]. Based on the discriminant analysis author summarises that lack of interest is the variable that had the greatest weight in explaining the differences between the achievement in natural sciences between the CLIL and non-CLIL groups.
The study written by Isidro & Lasagabaster [54] presents interesting data on teaching CLIL in Social Sciences classes in Spain. The experiment lasted for two years (what allowed them to observe students in different periods, after year 1 and after year 2) and the students in the sample were in their 3 rd year of secondary education (14-15-year-olds). The specially designed test was prepared to measure previous knowledge, and at the beginning of the experiment the groups were homogeneous in terms of language and Social science performance. Interestingly, the means of non-CLIL students showed a slight decrease in their results in the different phases while the CLIL cohort did not show significant changes. Comparison between the groups did not show the statistical differences between the means, and the researchers state that CLIL "did not have any detrimental effect on CLIL students' learning of content" [54, p. 14].
History was the curricular area for CLIL implementation in the study conducted in Norway [55]. The author focussed her attention on the learners' willingness to communicate orally and their motivation. It was a small-scale a short-term experiment (6-week intervention) resulting in a conclusion that "the CLIL intervention had reinforced most studentsʼ motivation and WTC [willingness to communicate] orally compared to their regular EFL lessons" [55]. Based on the studies stating the declination of motivation within time, it would be interesting to prolong the study [55] and observe WTC's possible impact. It is equally important to mention that Scandinavian countries have several common features that differentiate them from other European countries concerning foreign languages (learning). The fact that most television programmes are not dubbed can significantly influence EFL teaching. This is also indicated by Sylvén [56] who described the contextual differences of 4 European countries and analysed possible factors that may affect the success of CLIL. She mentions policy, teacher (education), age (and cognitive development) and extramural English (and the amount of exposure) as the key factors that may influence the result. She states that "regarding the amount of exposure to English outside of school there are huge differences" [56, p. 315] and Sweden compared to Finland, Germany and Spain also reached the highest scores in English language skills. This might also be one of the reasons why in some countries is CLIL not so successful.
Dallinger et al. focussed their attention on different aspects; besides language and content they also studied motivation, demography, cognitive abilities [57]. Regarding History, CLIL students reached significantly higher results; however, after the second year the differences indicated a (not significant) advantage for CLIL students.
Another small-scale and short-term experiment [58] was realised in Spain and studied the integration of Music and EFL teaching. The researcher used a questionnaire that involved factual and attitudinal questions. The researchers worked with qualitative data and concluded that teaching Music through English shows beneficial effects. On the other hand, the authors warn that some learners are stressed because of their low English proficiency and thus using English as a medium of teaching in other subjects is still dubious. The quantitative and qualitative data analysis was the subjects of the research in Spain [59] in which researchers collected data based on the interviews with 12 participants, a sociometric questionnaire and a quasi-experiment. The impact of CLIL on physical activity in Physical education lessons was evaluated using accelerometry. This allowed researchers to measure sedentary-light physical activity and moderate to vigorous physical activity. Their findings show that the CLIL group obtained higher levels of MVPA than the non-CLIL group. An interesting remark is mentioned by the authors concerning integrating FL teaching and physical education, namely that teachers are concerned with the vocabulary they use, language structures they use and they "may overuse language learning materials such as flashcards and, consequently, the teacher talking time is increased while students' activity time is diminished" [59, p. 7-8]. The study [59] was focused not on the integration of content subject and foreign language teaching but rather on the difference in learnersʼ digital competence. We included this study as the area of ICT, computer and digital skills are usually the compulsory subjects taught at elementary and/or secondary schools. The researchers tested 2 nd year students in compulsory education (aged [13][14] regarding their (a) communicative digital competence and a year later (b) informational digital competence. In the first year 18 093 CLIL students and 2 152 non-CLIL students formed the sample and in the second year 2 581 CLIL and 17 553 non-CLIL students. It is essential to say that the testees in the two tests were not the same groups but two consecutive generations, students enrolled in the 2 nd year of secondary education. It is apparent from the results that CLIL students significantly outperformed their peers in both tests, communicative competence and information competence. Six standards were the subject of the communicative competence evaluation (respect the rules of participation in virtual networks, handling network communication tools, using the internet as a source of information, sending email, understanding risks of sharing personal data and managing files and folders). Information competence was evaluated based on fourteen standards (e.g. compressing folders, copying files to share, creating back-up copies, editing with a word processor, spreadsheets, selecting information critically, drawing and editing images etc.). The CLIL students reached better results in 07 out of 20 standards compared to their non-CLIL counterparts. The author suggests that one of the reasons can be that CLIL creates a methodological framework that naturally leads to developing cross-curricular competencies.
The table below summarises the data from the included studies that presented data about the influence of CLIL on the content subject. Studies in bold presented the statistical data that were used in the meta-analyses (see the text and tables 2).
As it can be seen, it is difficult to evaluate the results generally. The character of content subjects differ, similarly to teaching methods and approaches applied. The results presented above suggest there is a positive impact of CLIL on content learning. Eleven studies present data where CLIL students significantly (or not significantly) outperformed non-CLIL students. The studies that were selected had to focus their attention on content teaching and the impact of CLIL on the results of content subjects. However, some of the studies reported noteworthy limitations, e.g. the length of the study or the sample size. Another critical factor that has to be mentioned is publication bias, a tendency to write about the positive effects in education rather than about the negative impacts or results.
Out of 16 presented studies, six studies included the data that could be evaluated using RevMan software what allows us to combine, synthesise and analyse the selected studies. The studies that do not bring information on p value, SD or simply present data that are not statistically significant are not included in the following meta-analyses. The following table 3 with the forest plot summarises the means, SDs, effect sizes (described above) and confidence intervals of the studies. The confidence interval is the range of values which is likely to contain a population parameter. The table 3

INTERNATIONAL EXPERIENCE IN THE INTEGRATION OF EDUCATION
shows that research results have been contradictory. Altogether, there were 3 303 pupils in the studied (n CLIL = 1 847 and n non-CLIL = = 1 456) sample.
The summary results show that the effect sizes fall in the range of -0.61 to 0.19; the proportion of observed variance (I 2 ) is very high (92%) what means we deal with substantial heterogeneity. The combined effect size is -0.14 (what can be evaluated as a small effect) with a 95% confidence interval of -0.40 to 0.13. Confidence intervals are broader as we deal with the random model. The p-value for the summary effect is 0.31. The variance of dispersion (τ 2 ) that reflects the variance of the true effect is 0.09 what is a small effect.
"Criticism has recently been leveled at CLIL due to the plethora of models or variants which can be identified within it" [60, p. 14] and Banegas [61] stresses that CLIL shortcomings need to be addressed. Researchers are calling for further concise research covering and studying aspects that may influence the results and interpretation.
Spain is probably the most experienced European country in CLIL application and has authored numerous studies focusing on L2, content, different factors as well as L1. The government substantially supports the application of CLIL in Spain; in other countries it is realised systematically but offered as an option (e.g. in Germany, students apply to secondary schools with CLIL programme), and there are countries where it is implemented rather randomly depending on the capacities and willingness of teachers and approval of the school management and parents. This is an important factor that may influence the results and interpretation of the research conducted in different countries. Even though we suggest there are numerous factors that may influence the result, we believe that replication studies and meta-analysis can shed more light on the positive or negative effects of CLIL application on both content and language learning. Three included studies were realised in Germany, one in Spain, Belgium and Greece. The length of treatment varied from 5 lessons to 4 years. Internal consistency of the majority of the tests used was tested and Cronbach alpha was presented. The majority of studies were based on pre-test -post-test research design showing the groups' homogeneity and no statistical differences between the groups. In most of the studies, the researchers used tests out of which some were standardised, but researchers also used validated and nonvalidated tests. In four studies the non-CLIL students outperformed their CLIL peers; however, the data show that the difference between the groups was not statistically significant. This undoubtedly can be perceived positively; the CLIL does not negatively affect the content subject. This should be, however, evaluated along with language performance.

Limitations of the Study
Meta-analysis as an observational study of selected studies synthesises data from different (even small samples) where the results can even be from various reasons not statistically significant. On the other hand, there are some aspects that can be understood or perceived as threat, risks or drawbacks. Not all the studies are realised in the same conditions and do not control all the effects (see the text). The selection of the studies can also be understood as a limitation as "some studies have not been published, or have been published in a form to which the researcher has no access, or have been published in a language that the researcher cannot read, etcetera" [62]. The authors (ibid) also mention the problem with probability sampling, missing cases, the problem with pre-test and post-design and test differences.
T a b l e 2. T a b l e 3. Forest plot illustrating the results using a random-effect model T a b l e 4.

Statistics for the studies and summary
Study ID

Statistics for each study
Hedges's g and 95% CI Hedges's g In the study presented we could not get enough information on the sample selection, the varying amount of target language and mother tongue use, the language used for testing, similarly the content of tests could not be evaluated. The not unified terminology also has to be mentioned as one of the limitations of the study. The term bilingual education today covers different models and is also introduced with different aims at schools and thus there are cases when the terms CLIL and bilingual education are used interchangeably.
The generalisability of the results is subject to a significant limitation, namely the study presents the research realised in rather a limited range of countries where the effects of CLIL were studied. In the majority of these countries (except for Greece and the Czech Rep), the mother tongue belongs to an analytical, rather than synthetic, language group. It can be a very serious factor as people speaking Germanic and Latin-based languages are believed to be more capable of mastering English compared to other languages.
We also have to mention publication bias that was not estimated in the present study as the number of studies was low, even though the search term was very broad. Selection of the studies indexed in WOS that was done intentionally to ensure the quality of studies can, however, also mean that we missed important data that can significantly influence the summary result.

Conclusion
This research aimed to contribute to the presentation and understanding the need for further study of CLIL through different subject-specific lenses. The number of studies focusing on content impacts is much lower than the studies focused on the impact of CLIL on (L2) development, which also influenced the number of studies included in the present review. Even though the search started with a relatively high number of studies (n = 395) but after applying the inclusion/exclusion criteria 16 studies met the criteria and checking the homogeneity data that were available for pooled analysis only six studies could be used for meta-analysis. Thirty years of CLIL existence and its application in different countries indicates there are positives of its implementation. Difficulties caused by the vagueness of the term content and language integrated learning can be solved by narrowing the definition or possibly defining categories of CLIL. Similarly, this would also enable the replication of studies that can successively be synthesised, compared and evaluated. Whatmore, this would enable defining the principles of CLIL more precisely along with the conditions when its application can be effective. As to the pedagogical implications, even though the teacherʼs role has not been mentioned, their attitude, motivation, and self-confidence [63][64][65][66][67] play a crucial role in the quality of CLIL. The systematic preparation of pre-service teachers for possible subjects content integration (not only CLIL), development of their creativity [68; 69], ability to create materials and critical thinking [70] should be one of the main tasks of universities in case the CLIL is to be introduced to our schools.
As a final comment, I would like to mention that undoubtedly, there is a potential of CLIL methodology. But for further study there is a need for a clear definition of CLIL, CBI, bilingual education as they are similar but not synonymic and their proper use in research reports. The information on setting, controlled variables, the way of teacher collaboration (team teaching, co-teaching) is similarly missing in the studies, as well as the information on the percentage of CLIL teaching in a curriculum, language awareness, teaching time in a target language what makes difficult to compare and analyse them.