Assessment of young language learners: Using rubrics to bridge the gap between praxis and curriculum

This article discusses an action research project that took place during the author's second year of teaching English as a foreign language. During the course, the author/teacher was given the responsibility of introducing English as to a group of beginning young language learners (first graders; ages five to seven). Following a few months of teaching, the author detected the need to develop better strategies for the assessment of the pupils’ oral skills. This led to the subsequent steps of designing and implementing an Action Research (AR) project the following aims: to find an assessment system that worked well for the author/teacher, that provided clear evidence of the learners' improvements in their communicative competence and that matched the objectives as stated in the national curriculum for foreign language teaching and learning. This article will describe and discuss the design and outcomes of the AR.


Introduction
Action research (AR) is a research methodological approach that consists of a cyclical process initiated by the need to solve a problem, in this case, the need for a more appropriate method to assess the communicative competence of students (Dick & Swepson, 1997).AR is mainly a way of understanding and improving one's own teaching practices.Put simply, action research is ''learning by doing': a group of people identify a problem, do something to resolve it, see how successful their efforts were, and if not satisfied, try again" (O'Brien, 2001, p.2).Even though action research is currently widely used in education, it was not originally born in this field.
The term "action research" was introduced by Lewin in 1946: "Lewin is credited with coining the term 'action research' to describe work that did not separate the investigation from the action needed to solve the problem" (McFarland & Stansell, 1993, p.14).He characterized the action research process as "a spiral of steps, each of which them is composed of a circle of planning, action, and fact-finding about the result of the action" (Lewin, 1946, in O'Brien, 2001, p.6). Years later, action research started to be used in the field of education for its potential to help improving teaching 2. It aims to solve real world problems in real world situations (2001, p.3) 3. The researcher makes no effort to remain objective (2001, p.3) as she is an active part of the analyzed issue.
Of course, it is also necessary to consider the drawbacks of AR in order to be aware of the weaknesses our research can have and in which way they can be addressed.One major disadvantage of action research is that it does not normally allow generalization.It gives answers to particular situations (in this case, the oral skills of a particular group of first graders) and, therefore, these cannot be applied in other situations, as they have a different nature.Some other authors also consider the fact that, as the researcher is an active member of the community of practice that is being studied; this could lead to some influence in the validity of the results (Cohen, Manion & Morrison, 2000).
Having acknowledged these limitations, it can be pointed out that this study aimed to reduce these drawbacks as much as possible by a) making the study results public by inclusion in a report and subsequent publications and thereby affording more generalizability, b) through interaction with a study supervisor and other researchers the study framework is ensured greater validity, and c) rigor in data analysis is sustained through the same network of participants mentioned in point b, along with carefully designed criteria for selection and analysis (these will be discussed in further detail in the article).
Bellaterra Journal of Teaching & Learning Language & Literature. 7.1 (Feb-Mar 2014) ISSN  This AR project is rooted in the postulates of the sociocultural perspective, wherein the idea that learning occurs through interaction emerges.This paradigm implies that individuals learn while they participate in social activities with other people; and that learning takes place when people take part in meaningful activities that require interaction with equals (e.g.peers), experts (e.g.teachers) or novices (e.g.pupils).Departing from this premise in order to design the AR, two key questions emerged: what did I ii want to achieve?How could I do it?As the main aim was to modify a problematic issue in my teaching practices (how to best assess young language learners' oral competence in English as a Foreign Language), an Action Research project was considered the best option to address it.I wanted to find an assessment system that provided real and valuable information of my students' progress regarding their oral skills.I wanted to see and keep track of their evolution, their attitude and their willingness to learn.I wanted to give them the chance to speak in English without any assessment pressure, and to use the language in other context apart from our class.And perhaps most importantly, this needed to take place in class of very young learners just beginning to be exposed to a foreign language.

Theoretical Background
For many years, both research and teaching trends in the field of second language acquisition focused on the study of linguistic forms separated from their communicative function (Masats, 2008), rather than as a global process that implied using language actively to communicate.More recently, many teachers and researchers are moving onto new theories, such as the sociocultural perspective, that fit more current views of what language learning entails (Johnson, 2009).The main idea of the sociocultural perspective is that learning occurs through interaction, while learners participate in social activities in particular contexts with a particular goal.
This implies that individuals are an active part of their learning process, not just receptors of what experts (e.g.teachers) put "in their heads".The sociocultural perspective is formed by different theories that interact with each other and that researchers use in a multimodal way (Masats, 2008).A major contributor to these ideas is Vygotsky (1978), who developed a huge amount of the concepts that have influenced in one way or another many aspects of all kinds of social sciences, and these have, at the same time, been developed by many of his followers who have ended up developing their own theories.Vygotsky has been so influential in theories of learning that the thoughts he expressed are still currently being developed by contemporary psychologists and educators in different parts of the world (Koshmanova, 2007).
Following the Vygotksian concept of language, learning a language is, then, interacting with more skilled speakers engaged in a social activity while learning how to build up meaningful statements (Masats, 2008).Interaction (amongst equals, with experts, with novices...) is, then, the key for language (and any kind of) learning.In particular, it is important to highlight the role that social interaction plays in language learning, according to the socioconstructivist paradigm (Richards, 2002).
Nonetheless, the best way to assess language learning within a socioconstructivist paradigm, especially in the case of beginner learners, remains under debate.In general it is agreed that all teachers need to collect data, either in a formal or informal way, to determine whether their students are progressing correctly in their learning process and whether they (the teachers) are doing well in their teaching practices.Rea-Dickens and Rixon (2000, p.89) place this approach with continuous assessment, explaining the process as "the collection of data on language use by pupils in classroom language learning".Often, the term 'assessment' is confused with the term 'evaluation', especially in Spanish-speaking contexts, where only one word is used to refer to both concepts.As Ioannou-Georgiou and Pavlou (2003) highlight, evaluation seeks determining whether a language program meets its goals, considering exam results, parent's and teacher's opinions, while assessment is a more general term to refer to all methods used by teachers to characterize children's performance.
Generally, in language teaching, there are five main reasons why and how students are assessed (adapted from Cajkler & Addelman, 2000, as cited in Brewster, Ellis, & Girard, 2002, p. 245): 1. Formative: assessment is part of the continuous learning process.
2. Summative: to give pupils feedback on their progress on a particular moment (often done through tests).
3. Informative: to give pupils, parents or teachers feedback on the general progress.
4. Diagnostic: to help identify particular needs and strengths (often done at the beginning of a teaching period).Based on these observations, it was decided to employ rubrics as the principal means of assessment of the students' development.A rubric is a tool for recording students' performance in particular tasks or also when we observe them from a global perspective.Rubrics (also known as scoring rubrics) are used when quality in students' performances is considered but do not judge whether these are right or wrong.Instead, when using rubrics, a particular criterion is set for the expectations and classifies children's performance according to these criteria.Rubrics can be used either by the teacher or by the students themselves (adapted depending on their age and their reading/understanding abilities).In this project, rubrics were chosen for data collection about students' performances because they were considered as a tool that provided a fair and reliable overview of what children could do and how they could do it, and also because by using them periodically they would provide a global and realistic view of children's progress.However, there are also a few drawbacks that need to be considered when trying to use rubrics for classroom purposes.Creating rubrics is a very timeconsuming process: first, you need to decide the items you want to assess and then establish the achievement levels and the criteria for each level, then you have to field test the rubric to make sure it accomplishes its goals and then make any modifications that need to be implemented.This is one of the reasons why generic rubrics are more efficient for teachers.Once you have found a rubric that works for you, you need to implement its use in a real classroom task and then (which was the case for this teacher-researcher) transfer rubric criteria results to school-acceptable results, which are often numeric and require a conversion from what the criteria have stated, often making the teacher divide and multiply several times which again, is time-consuming.

Assessment
The steps taken to design and modify the assessment process form part of the AR described herein.

Data collection
The context The collection of this data was carried out during the school year 2011/12 in a public school from the Metropolitan Area of Barcelona, in the north of the Baix Llobregat Area.The town where the school is located is characterized by a huge increase in its population in the last twenty years, doubling the number of inhabitants in the period comprised between 1991 and 2012.The home language of the majority of the children and their families is Spanish; Catalan, the language of the school is not the predominant language of the pupils.
The data collection and the subsequent research were carried out in a researchfriendly context and had the collaboration of other subject teachers and the school's government body.

The participants
The students who participated in this research project were 1 st grade pupils within an use English, easy to engage in activities, and well behaved.Inevitably, at times there were minor behavior issues but they could be easily solved.
These children had just started primary education, and as a group, their only previous contact with the English language had been in some symbolic play iii lessons in their last year of kindergarten.This was, therefore, the first time they were receiving formal instruction in English as a Foreign Language (EFL).For the study, only two out of twenty six students have been selected, according to the following criteria: -General performance in all subjects.
-Attitude and participation in class.
-Absence of major learning issues.
Names of students, teachers and other participants have been changed to respect their privacy and anonymity.

Sara
Very high Very active, engaged, motivated X

Pablo
Average level Participative with minor behavior issues X

The activities: description
As it is mentioned in the introduction, this research came to life after observing the teacher-researcher's own teaching practice in terms of assessment and taking the decision to change it and improve her teaching praxis, along the lines of Action Research (see O'Brien, 2001).From there, many activities emerged in the modified planning, which can be divided into two main sequenced learning events: 1) Mr.
Camera -intended to last for the whole year as a routine in the EFL classes, and 2) sporadic big events, lasting up to a month (a role play of the Enormous Carrot was selected for this study).Although these activities were part of the AR, they were included in the routine operation of the lessons and were not perceived by students as isolated or out of normal activities.For Mr. Camera activity, the camera stood on a tripod on the teacher's desk and kids addressed it every time they arrived or left the class.For the theatre activity, the teacher recorded the play at the same time she took on the role of the narrator.Data collection was also supported with after-lesson note taking by the teacher, though this was only done to jot down interesting episodes that occurred during the activities and has only been used as a reminder for the teacher as a link to specific data sections and not as a research tool.

Data Presentation and Discussion
Considering the established criteria in the analytical approach section, an initial rubric was designed in order to categorize students' performances in the Mr. Camera activity: Figure 5: Sample of initial rubric The rubric consisted of different sections: • Language production: this section was related with the criterion "vocabulary range" and it was aimed at categorizing students' use of the vocabulary and structures introduced in class and their evolution.
• Pronunciation: this section was related with the criterion "fluency" and it was aimed at categorizing students' ability to speak accurately according to what was being worked on in class.• Reception: this section was not related to any criterion but it was aimed at describing students' understanding of oral instructions.
• Non-linguistic competences: not related to any criterion but aimed at assessing students' use of non-linguistic strategies (e.g.gestures, facial expression) to support oral communication.
• Attitude: related to the criterion "willing to know more formulae".This item was aimed at classifying students' attitude regarding the activity and the learning of English as a Foreign Language.
As described in figure 4, above, the teacher-researcher introduced the first activity by presenting "Mr.Camera" as a new visitor to the class that would be there every day to greet them.The whole class did a small brainstorming session to share different ways in which they could greet the camera so that they could start using them right away.Greeting formulae such as "Hello" or "Bye Bye" were the most commonly proposed by the children, although some others, like "Good morning" or "Good afternoon" appeared too.In that very same class, students said "goodbye" to Mr. Camera for the first time.
After watching the videos the first rubric was filled out with the following results:  Upon revision of the first rubric and a careful analysis of the transcription of what actually took place, it became apparent that the students' attitude during the task highly influences the teacher's predisposition to give higher or lower marks when assessing them.When it came to Sara's assessment, it was pretty clear that she would get a high mark because she made an effort to perform well, but with Pablo, as he was acting silly, the teacher was ready to give him a low mark.Initially, a lower grade was given in all sections, highly influenced by Pablo's attitude.It was not until the teacher-researcher had revised the descriptors in the rubric again that she realized that she was giving a low mark only because of Pablo's attitude, although he accomplished what descriptors stated in all other categories (as can be seen in line 5, transcript 1).At that point, the teacher-researcher revised Pablo's marks and changed them to fit what the criteria stated.Therefore, we can state that, in this case, the use of rubrics highly supported fair assessment, as students were assessed according to some criteria previously set to ensure impartiality.In the following class, which was the next morning, the teacher placed the camera on her desk and stood at the door to greet the students as they came in.As children were entering the class, they greeted Mr. Camera.
These video recordings were assessed afterwards, with the following results: "good morning" was better than saying "hello".We can deduce that she guessed the structure "good morning" was more complex than "hello" and, therefore, using the first one instead of the second one would mean she made a bigger effort and performed better in the activity.This is also related with the sociopragmatic competence, as Sara showed proof of knowing when to use types of language because she used 'good morning' instead of 'hello' first thing in the morning.
In this second rubric, Pablo had a very low mark as he acted silly again, but he did not use the formulae expected and greeted Mr. Camera using Spanish and not English.This influenced the categories related to language production, pronunciation, reception and attitude.He still got a good grade in non-linguistic competences as he used gestures to support communication.Again, in this case rubrics worked as a system to ensure fairness, as after a second revision of Pablo's rubric and performance, his grades were revised and changed as he actually used the expected language, not to greet the camera but to greet the teacher instead.
These two rubrics were the first two to be filled in the AR project.
Immediately a major limitation emerged: due to the age and language level of the learners, the actual target language production is quite short, rendering evidence for assessment to a minimum.These rubrics were recorded on the first week, and after they were completed, lots of additional questions appeared regarding diverse issues, such as: -Difficulty to fill out a rubric based on a total score over 15 when the grades needed to be given over 10, which meant an extra effort in calculating the final grades.
-The descriptors in the section "Attitude" appeared to be irrelevant to the purpose of the rubric, because they were focused more on activity preparation (which was not needed for this activity) rather than on the criteria previously set, which referred to the "willingness to know more formulae".
-Absence of any category in the rubric referred to the third criterion "use of greetings in foreign language in different situations".
-Inappropriateness of the section "Reception", as this rubric was aimed at regarding students' speaking competences.After these observations, some modifications were made, resulting in the following rubric: The new features were the following: -Labeling the levels of achievement from "extraordinary" to "not acceptable".
-Addition of an extra level of achievement ("extraordinary") as the maximum grade for a particular category, for two main reasons: 1. Transforming the maximum score into 20/20 to make calculation easier (only needing to divide by 2).
2. Slightly modifying the rest of the descriptors to make them adequate for the reality observed in the first recordings: sometimes grades were given without being 100% sure they fit how students performed.
-Elimination of the category "Reception".
-Incorporation of the category "Social English" to refer to the criterion "use of greetings in foreign language in different situations".
-Modification of the category "Attitude" to fit the criterion "willingness to know more formulae" so that it reflected students' efforts to perform well and learn more.
In the next session, which was the third one, the children and the teacher talked about some other ways to say 'good bye', and the formula 'see you tomorrow' appeared in the conversation.This session was more directed as the teacher called the children one by one to say goodbye and then start picking up as it was the last session of the day and children were about to be dismissed to go home.
Results in session 3 were compiled by the means of the modified rubric: In this case, Sara performed differently from her norm.She went in front of the camera very quickly and said the first formula that came to her mind: "bye bye".She hesitated for some reason, maybe because she was used to saying the most complicated formulae from the options presented in class, and in that case she used a rather simple one.After hesitating she continued using that formula and began to pick up as rapidly as she could.This makes us think that Sara only wanted to finish quickly so that she could be the first to line up, something that she normally enjoyed doing.
Although Pablo showed excitement in his facial expression, in this case he showed great interest in using the new formula just introduced.He hesitated but instead of using what he already knew, he asked the teacher so he could use the new option of "see you tomorrow".This attitude and willingness to learn resulted in a fairly high mark.A final session was implemented before the teacher/reserach analysed the effects of the improved rubric to ensure the validity of the consequent reflections.The next morning; children were recorded and assessed again, using the new rubric: Their rubrics were scored as follows.Sara performed correctly according to the descriptors, using very good pronunciation and the appropriate formula.In this case, she did not get the highest mark in the attitude section as she was expected to use a different formula from "good morning" that had already been introduced in class.Pablo always shows excitement and interest about speaking in front of the camera.This interest is often transformed into a not-sopositive attitude, reflected in his silliness in front of the camera.He performed well in terms of language production or pronunciation but did not show an appropriate attitude.

Evaluation of Rubric Usefulness
After the second use of the improved rubric, it was time to reflect on how it had worked and see if it had had the expected effects: -The transformation of the rubric into a base of 20 clearly helped making the teacher's job easier and reduced calculating time.-The addition of an extra category ("exceptional") helped fit performances that were above expectations into the rubric.
-The modification of the descriptors under the "attitude" category helped classify attitudes for this particular activity better as they now referred to effort in performance instead of effort in preparation.
-The addition of the category "social English" helped focus on the use of the language in social situation, but it also showed a major drawback: as the target language production was really short it was difficult to state, only considering the recordings, whether the students were using English in social situations with peers or other teachers, unless these productions were recorded by chance.
Once the initial rubric was modified and the improved rubric was used and tested in class, it was decided that this last rubric was appropriate for the research purposes and it was then used throughout the rest of the project for this particular activity.Following these four first sessions, this assessment system was established as a part of the teaching and assessment cycle for this class and activity.Rubrics proved to be a reliable system which ensured fair and continuous assessment and that showed proof of students' performances according to the observations made for this particular activity.In order to double-check the reliability of the rubric, a complementary activity was carried out to ensure its validity in activities other than "Mr.Camera".
This activity was called "The enormous Carrot" and it consisted in a small theatre play of a story that had been previously worked in class for several sessions (see figure 2 for more information).A particular aspect of it did not fit into the aims of the activity was the social English category.In this case, the use of the language in contexts other from the interaction with the teacher was not an aim, as it was a rather directed activity that implied using the structures worked in class.On the other hand, a category that would have been very helpful when assessing this activity was the one related to "reception" and that had been previously removed from the original rubric as it did not fit the aims of Mr. Camera activity.In this case, children had to make some gestures as they heard their peers and the teacher narrating the story, so this showed proof of understanding spoken English.These reflections lead to the conclusion that this particular rubric did not work for all purposes, as it had to be modified in order to fit the aims of each implemented activity in order to ensure its validity.What was done afterwards was to modify the rubric by deleting the category "social English" and incorporating the category "reception" again.

Rubrics as a way of keeping track of students' progress
In the previous section it has been shown, step by step and illustrated by some meaningful episodes of children's interaction, the process followed to create the appropriate rubrics to work for this research project.What it is aimed in this section is to analyze how these rubrics helped the teacher to keep track of her students' progresses according to the criteria set AR project.The following criteria served as the basis for the creation of the categories and the descriptors of the rubric.

Criterion Category in the rubric
Vocabulary range (whether it was increasing or not).

Language production
Fluency (whether they were speaking more fluently or not).Pronunciation Use of greetings in foreign language in different situations (whether they were using greetings in other situations apart from the activities --e.g. with friends, other teachers...).

Social English
Willingness to know more greeting formulae (whether they wanted to use more and different formulae to use in the activities and in the classroom).

Attitude
Figure 13: analytical criteria and rubric categories In this case, vocabulary referred to the greeting formulae used by the students (e.g.hello, good morning).The following chart will show which greeting formulae Pablo and Sara used in each session to talk to Mr. Camera.It is color-coded so blue stands for "Hello", red for "Bye bye", yellow for "good morning", green for "see you tomorrow" and purple for "how are you". 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 P Sp --X S This criterion also helped categorizing students' attitude and effort, as Sara does when looking for the approval of the teacher after an effort with pronunciation: "see you tomorrow mister camera" (smiling and looking at the teacher, making an effort with /r/ pronunciation).
It was found that rubrics were an effective resource for data collection and continuous assessment for classroom purposes.After testing and modifying the rubrics, they were helpful for gathering information about the analytical criteria that had been set.It has to be noted, however, that some particular aspects, such as fluency, were rather difficult to assess by the means of a rubric due to the limitations in students' productions.
Moreover, the rubrics not only helped the teacher keep track of her students' learning process, they also influenced the teacher's decisions and the children's evolution in several ways that were not initially planned.A principal unexpected outcome was the fact that the focus on the efficacy of rubrics brought about modifications of her teaching practices.Designing the rubrics made this teacherresearcher aware of the pedagogical implications of her decisions and therefore helped her create activities that focused on the criteria previously established while designing the rubric.For instance, when the teacher-researcher had to create Mr.
Camera activity, she focused on the pre-established criteria in order to plan a communicative activity that fitted within the criteria to be evaluated.

Conclusions and suggestions for improvement
While working on the different phases of the AR project, a careful analysis of the data collected was done to check whether rubrics were an efficient means of assessing young language learners' emergent oral competences in the target language.Before analyzing the data, a set of analytical criteria was established in order to check whether the rubrics were accomplishing their function the use of rubrics should bring reliable and real information about students' performances in terms of 1) vocabulary 2) range, 3) fluency, 4) use of English in social situations and willingness to know more formulae (attitude).
It was found that the rubrics were a reliable assessment tool for teachers for several reasons: 1) they ensured fair assessment by creating the need to refer to a set of descriptors; and 2) they helped the teacher collect data and keep track of students' progress.They also had some unexpected outcomes: 3) the use of rubrics shaped the activities' design as they made the teacher reflect upon the implications of her decisions and 4) plan improved activities for achieving the criteria, thereby 5) supporting student learning by focusing the activities on better means of improving their oral skills.
At the same time, some drawbacks of their use were detected.First of all, the use of rubrics is a very time-consuming assessment system and, therefore, require a high degree of commitment by the teacher.Moreover, in the case of beginner learners, it was nearly impossible to keep track of students' use of English for social purposes while teaching at the same time, and some meaningful data was lost along the way.
There are also some observations concerning AR that can be made.First of all, the "teacher hat" was a lot more present than the "researcher hat" and that influenced aspects such as data collection, tools or the overall planning.For instance, data collection was highly influenced by the school's timetable (sometimes recordings were interfered by school events).Also, data were only compiled by the means of

Figure 1 :
Figure 1: The teaching and assessment cycle Journal of Teaching & Learning Language & Literature.7.1 (Feb-Mar 2014) ISSN 2013-6196 age range from 5 to 7 years old from a class of 26 students, comprised of 11 girls and 15 boys.Only one of them had special educational needs, but she still participated in the lessons.In general they were a very participative group, motivated to learn and Bellaterra Journal of Teaching & Learning Language & Literature.7.1 (Feb-Mar 2014) ISSN 2013-6196

Figure 2 :
Figure 2: criteria for selecting the participants

Figure 8 :
Figure 8: Sample of the modified rubric

Figure 11 .
Figure 11.Students represent The Big Carrot

Figure 12 :
Figure 12: rubric for the theatre activity