ASSESSING ORAL PROFICIENCY: Problems and Suggestions for Elicitation Techniques

I Made Sujana (Pendidikan Bahasa dan  Seni FKIP UNRAM)

Abstrak. Tujuan pembelajaran bahasa Inggris pada era komunikatif adalah untuk meningkatkan kemampuan berkomunikasi. Banyak usaha telah dilakukan oleh guru untuk mencapai tujuan ini. Akan tetapi, permasalahan besar muncul ketika berbicara tentang bagaimana menilai kemampuan berbicara pembelajar. Artikel ini memuat permasalahan yang sering dihadapi dalam menilai kemampuan berbicara, beberapa pertimbangan dalam membuat tes berbicara, dan memberikan beberapa saran teknik dalam tes berbicara.

 Kata-kata kunci: berbicara, profisiensi berbicara, teknik tes berbicara

 Abstract. Since the purpose of teaching English in the communicative era is to improve students’ ability to speak English, many teachers try to devote their teaching to achieving this target. However, problems arise when they assess the students’ oral proficiency/achievement. This article discusses problems in testing oral proficiency, considerations in constructing oral proficiency test and provides some suggestions on elicitation techniques for assessing oral proficiency.

 Keywords: speaking/oral proficiency, elicitation techniques



In communicative language teaching era, we all agree that the main target of learning English is to enable students to use the target language. To achieve this, many language teachers have been devoting themselves to imparting and improving students’ speaking ability by searching and applying various techniques to elicit students’ speaking ability.  However, the teachers’ efforts result in frustration when coming to assessing students’ ability. The frustration may come from the teacher himself when he has to conduct the speaking test or an outside body/institution when it conducts formal examination (such as Ujian Akhir Nasional, EBTANAS, and the like).  The first problem relates to the difficulties in conducting the test in terms of time availability, class size, practicality, and setting make-sense criteria. The later relates to the miss-match between what the teacher does in class (i.e. to improve students’ oral proficiency) and what students face in the formal examination (i.e. paper-pencil test). No wonder that the application of communicative language testing is left far behind the application of language teaching due to the nature of speaking/communication test (see Sujana, 2000). In some way there is a tendency that the communicative language teaching is dragged into the teaching what is likely to be tested on the examination. Ironically, there is general agreement among the language practitioners that testing students’ oral proficiency is one of the most important aspects of an overall evaluation (see for example Morrow, 1982).

            This article will further discuss some possible problems may arise in assessing oral proficiency, considerations need to be taken into account in constructing the tests, and some elicitation techniques in testing oral proficiency.

  2. Problems in Testing Oral Proficiency

Due to the complexity of aspects involved in testing oral proficiency, many teachers tend to avoid assessing the speaking skill. Madsen (1983) points out that of all language exams testing speaking ability is the most challenging in terms of test preparation, administration, and scoring. Some of the reasons why the speaking test seems so challenging are that (1) the difficulty in defining the nature of speaking skill; (2) the difficulty in choosing the criteria in testing speaking ability; (3) the involvement of such other factors as listening ability, interpretation of tone, reasoning ability, etc.; (4) the difficulty in getting students to speak (techniques to elicit students to speak).

Speaking ability involves many aspects which can be analyzed into the elements of the speaking skills and the overall speaking proficiency (speaking for functional purposes). At the element level of speaking (primary level), the speaking might involve pronunciation, intonation, stress and other suprasegmental features. At this stage, the speaking also requires the correct use (structure), and the correct idiomatic use (vocabulary) of the target language (Vallette cited in Mukminatien, 1995). At the functional level, speaking involves the integration of the elements of the language and the function of using language either for transaction or for interaction. On the basis of its function language can be used for social relationship (interactional function) and for giving information (transactional function) (Brown & Yule cited in Mukminatien, 1995). In testing, the interactive speaking can be in the form of interview, role play, discussion and the like, while the transactive speaking may take the form of story telling, oral report, describing object/person/thing, addressing speech, and so on.

The two levels of assessment in speaking test cause problems in choosing criteria in assessing students’ ability. The problems relate to the decision to determine the aspects to be looked for: Do the examiners focus on the elements of speaking skills or the overall speaking proficiency (speaking for functional purposes). The test designers, therefore, should determine the purpose of conducting tests, which can be derived from the objectives of language learning. From the purpose and objectives of the test, they can employ the appropriate types and approaches of testing procedures whether to employ discrete-point, integrative or  pragmatic test.  A discrete-point test refers to a test that attempts to assess a particular element of language at a time such as pronunciation, stress, intonation, structure, and vocabulary. An integrative test attempts to assess learners’ ability to use many bits of their skills at a time. A pragmatic test refers to a procedure or task that requires learners to process sequences of elements in a language that conforms to the normal contextual constraints of that language and to relate sequences of linguistic elements to extra linguistic contexts in a meaningful way (Hughes, 1993).

Other awareness to be kept in mind by the test designers is the fact that the success of communication much depends on such other factors as listening ability, ability to interpret tones and other suprasegmental features of expressions, initiative for asking for clarification, turn-taking, etc. It is not impossible that the failure in the speaking activity may be due to the weaknesses on these factors. In this case, the test designer has to anticipate the possibilities of those problems while assessing oral test.

In speaking test, it is not always easy to get students to speak. Sometimes the tasks we expect to be capable of motivating students to speak do not work as expected. To overcome this situation, in addition to the careful design of the speaking tasks to fulfill students’ level and to meet speaking aspects to be assessed, the examiner can function himself as a partner in stimulating the students to speak.

In line to the opinion above, Morrow (1982) adds that there are some other reasons why it is difficult to assess speaking ability, which makes the test be avoided in practice. Those reasons are (1) oral testing is very time-consuming. It seems that we all agree that the neglect of the implementation of speaking test in Indonesian educational context is due to this reason. The average class size in SMA/SMK/SMP is 40-45 students in a class and a teacher should teach parallel classes of 4 or 5. How long do the teachers have to spend to conduct the test? As a result, a paper-pencil communicative test — an indirect way of testing communication — is used to replace the indirect way of testing oral proficiency/achievement; (2) It is difficult to get students to say anything interesting; although, as Morrow says, it does not mean to expect them to entertain the examiner with brilliant conversation or witty anecdote, but it, at least, fulfills one of such criteria as: (a) the student must have a chance to show that he can use the language for a variety of purposes (describing, narrating, apologizing, etc.); (b) He must have a chance to show that he can take part in spontaneous conversation, responding appropriately to what is said to him and making relevant contribution; and (c) He must have a chance to show that he can perform linguistically in a variety of situations, adopting different roles and talking about different topics. (3) The other reason relates to the issue of marking oral proficiency tests. What sort of criteria can we use to assess students’ performance? Is there any standard guideline to be used in setting up the criteria?

To eliminate those problems, Morrow (1982) further suggests (1) the designing of tasks or activities which the students perform through using language. The tasks designed should be as close as the real world; (2) Setting group work. The group work can at least solve the problems related to the time consuming issue and give a chance to students to use the language spontaneously, involving a variety of functions; (3) Setting clear criteria. There must be clear idea for the examiner of what is being looked for in a particular test.

According to Weir (1990), testing speaking ability should be designed for meeting the criteria of communicative testing such as (1) tasks developed should be purposive, interesting and motivating with a positive washback effect on teaching that precedes the test; (2) interaction should be a key feature; (3) there should be a degree of intersubjectivity among the participants; (4) the output should be to a certain extent unpredictable; (5) realistic context should be provided; and (6) processing should be done in real time.

  1. Constructing Oral Proficiency Tests

Testing, according to Bachman (1990), is defined as a procedure designed to elicit certain behavior from which one can make inferences about characteristics of an individual. Thus, what is tested or observed in a test is samples of behavior. From the performance of those samples, the examiner draws inferences of the testee’s ability and then interprets the performance into score criteria. In speaking test, one’s performance of a ten-fifteen minute speaking task is often used to judge the testee’s overall speaking ability.

            As a sample of behavior, which will then be used to represent the testee’s overall performance, the speaking tasks must be designed carefully in order to get valid and reliable description of the testee’s ability. There are a number of considerations that need to be taken into account in constructing the speaking or oral test. Those considerations are, among others, the objectives of the test, the length of the test, the representativeness of the sample, the testee’s language level, and the application of multiple formats.

            The objectives of the speaking test need to be specified first before constructing tasks. There are two major objectives of a speaking test: to measure specific aspects of speaking skill such as structure, vocabulary, pronunciation, intonation, and stress; and to measure the overall speaking proficiency (i.e. speaking tests for functional purposes). The determination of the objectives of the test will influence the procedure of a speaking test whether to apply discrete-point, integrative or pragmatic testing procedures. The form of speaking test at functional level — interactive or transactive — will determine the tasks of speaking test. Interactive speaking means that speaking interpersonally (i.e. the participants (the speaker and the hearer) interact each other in conversation)). The tasks to elicit the speaking ability might be in the form of interview, role play, discussion, and the like. Transactive speaking, on the other hand, means one-direction speaking (i.e. the speaker gives information to the hearer without asking for a response). The tasks used in this kind of speaking might be in the forms of giving speech, describing things/people, story telling, oral presentation, and the like.

            The second consideration is the length of the speaking test. Hughes (1993) suggests that a speaking test should be made as long as is feasible. The tasks should provide the learners with adequate time to show their speaking ability. The long format of the speaking test will be able to assess learners’ consistencies in using language. In addition, it is possible to include more samples of behavior. In other words, the longer the speaking test is, the more reliable and valid information can be obtained.

            The third consideration is the representativeness of the sample being taken in the test. Testing cannot be separated from sample since it is impossible to include all the materials taught or overall aspects of language proficiency in one test. The speaking test should include as wide a sample of specified content as possible in the time available. The more samples included in the speaking test, the more chance for the testee to show his speaking ability, hence the more valid and reliable the test will be. The degree of test validity and reliability will much depend on the representativeness of samples included in the test. In this sense, the test designer should select what is regarded as a representative sample of the specified content, and then the more important one is how to elicit the necessary behavior. This can be achieved by using more than one format/task.

            The next consideration in constructing oral proficiency test is the use of multiple formats. The application of the multiple formats in speaking test will give the testee as many “fresh starts” as possible (Hughes (1993). The testee will be able to show his/her ability in using various language functions. At the same time, it will help the testee in order not to get stuck on the test due to his/her inability to talk about one particular topic or function. One good example of the application of multiple formats in oral proficiency test is IELTS Speaking Tests. The IELTS Speaking Test is divided into 5 steps in 11 – 15 minutes: Phase 1 Introduction (1-2 minutes). The examiner will introduce himself, check the testee’s identity and may also check through Personal Details form which must be filled out before entering the test room; Phase 2 Extended Discourse (3-4 minutes). The examiner asks the testee to talk on a familiar topic, which may be linked to information on Personal Details form. The topics can cover a wide range including home country, local custom, festival, aspects of life  in the country where the testee is going to study, etc; Phase 3 Elicitation (3-4 minutes). The testee asks the examiner questions based on a simple role-paly situation. There is an “information gap and the testee needs to complete his knowledge of the situation by asking appropriate questions; Phase 4 Speculation and Attitudes (3-4 minutes). The examiner asks the testee about his future plans, which may involve the discussion of study and career options, etc.; Phase 5 Conclusion (1 minute). The examiner will bring the interview to an end, wish him luck, and leave-taking (Deakin, 1994).  This wide range of speaking test gives a lot of chances for the testee to show his ability in using various functions of spoken language in various topics. Besides, the test will be effective because it will not take too much time, especially if the testee gets difficulty on one particular topic or language function.

  1. Suggestions for Elicitation Techniques in Assessing Oral Proficiency

The selection of appropriate elicitation techniques in speaking tests will depend on the specifications of the speaking test (testee’s level, objectives, language aspects to be assessed, time availability, etc.). There is a wide range of techniques which can be used to elicit one’s speaking ability. From a study on speaking tests currently in use, involving 121 respondents, Jones and Madsen (cited in Madsen, 1981) found that more than two dozens elicitation techniques currently used in oral proficiency tests. However, these techniques can be grouped into five broad categories, ranging from question types designed to generate communicative language to techniques to facilitate discrete measurement of specific subskills. Those categories are Communicative Discourse, Pseudo-Communicative Discourse, Connected Discourse, Controlled Response, and Linguistic Skills. The following is the discussion of the elicitation techniques used within the wide categories. The discussion will start from the very mechanical techniques to the most communicative elicitation techniques.

  • Linguistic Skills. These oral tests attempt to measure specific linguistic skills such as grammar, vocabulary, and pronunciation. Although the tests intend to measure the linguistic aspects, they can be designed from communicative to mechanical.

These kinds of tests in today’s oral tests are less common. Testing on individual sounds (or other discrete-point tests) was very popular in audio-lingual period, which emphasized on the learners’ ability to produce native-like pronunciation. In communicative language era, in which the main purpose of learning English is to be able to communicate effectively, the language components are normally evaluated in conjunction with listening and speaking; therefore, the components tend to be incorporated with context and meaning. Madsen (1983) criticized that it is not productive to spend time evaluating small points that even native speakers pay little attention to.

At certain levels, the oral linguistic skill tests are still in use to measure certain points. For this purpose, there are some elicitation techniques that can be applied: (a) Sentence Completion (in which the testee repeats and completes a sentence orally, e.g. “I was born in _____________ on __________”/”I was born in Mataram on 27 June 1975“) ; (b) Grammatical Manipulation (the testee manipulates grammatical points by changing the given sentence into the needed response, e.g. Make a question out of this sentence: “She speaks English”/”Does she speak English?”); (c) Elicited Imitation (mimicry of spoken words, phrases, or sentences); (d) the variation of the elicited imitation  can be in the form of Reading Aloud (the testee reads aloud the printed sentence or passage); (e) Bipolar Response (the testee indicates the minimal pairs of oral utterances by simply saying that the words are the “same” or “different”); (f) Directed translation (the testee translates the native words or a phrases, or a sentence into target language or vice-versa on the examiner’s direction; e.g. “What is the English word of “matahari”?”/”Sun”); (g) Picture-Cued Vocabulary (such items can range from individual sketches of an object or actual realia to complex sketches such as building, streets, etc.); (h) Oral Cloze Production (it requires the testee to provide response to deleted words); (i)  Synonym/Antonym Production (it requires the testee to provide synonym or antonym expression for stimulus words; another way of eliciting oral production is by requiring the testee to respond on listening task. The response can be in the forms of (i) Picture Identification; (ii) Total Physical Response (TPR); (iii) Printed Multiple Choice Response; (iv) Memory; and (v) Native Language response.

The elicitation techniques on linguistic skill are more appropriate to testing language components involved in communication including grammar and pronunciation to measure how well each component has been mastered individually. The application of these elicitation techniques should be adjusted with the purpose of the test, the testee’s age and language ability, and the kind of skill or subskills being focused.

  • Controlled Discourse/Limited Response. These testing techniques can be used for testees with limited speaking skills. There are few elicitation techniques which can be applied to generate testee’s oral production. Those are Visual Description Item, Directed Response, Reading Aloud, and so on.
  • Visual + Description Item. From question point of view, it can consist of an extended description of the items or activities represented in the sketch or it might constitute a one-sentence explanation of a simple line drawing, depending on the levels of the testees. At the advanced levels, the testee might be required to describe an object or technical drawings. It can take a visual + student items and a visual + examiner questions.
  • Elicited Imitation. This involves the control of reading aloud, especially for beginning students. This kind of test can be one sentence read at the time or a group of sentences which are read by the teacher and the students repeat them.
  • Directed Response. The teacher gives the students statement or situation and then asks the students to restate using other expressions. A rather simple form of this test is that the teacher states such simple sentences as “Tell me that you are a student; and the student responds “I am a student”. For more advanced students, the teacher gives situations and the students rephrase them into sentences; for example, the teacher says: “An urgent letter your secretary has typed is full of mistakes: without offending her, persuade her to do it again”, the expected response is “There are one or two small errors in this letter, do you think you could perhaps do it again?”
  • Connected Discourse. This testing technique is commonly used as guided oral communication test, but it is felt to typify real communication. It can take the forms of giving a talk, providing narration from pictures, or retelling reading passages, etc.
  • Oral Presentation. It is approximate communication of real life; the students are asked to prepare a talk and present it in front of class or an examiner.
  • Retell Story. This requires the student to read a passage and retells what he/she has read. Sometimes the test requires the testee to retell a story presented to him/her orally. Another version of this test is that the testee retells the story from ideographs or multiple sketches. This can reduce the memory problems (Madsen, 1981).
  • Explanation and Description. These connected discourse techniques require the testee to explain the situations or events and describe things. The former can be such items as “Explain how Moslem in Lombok celebrate Idul Fitri”, “Explain how teenagers in Lombok celebrate Valentine’s Day.” The latter can involve such descriptions as “Describe a cow”, “Describe a durian.” The tasks in these techniques can vary in the degree of control and difficulty, but both require varying in amount of connected speech.
  • Pseudo-Communicative Discourse. This kind of technique is used to provide somewhat more control over the use of language produced by the testee, but still maintains communicative forms (Madsen, 1981; Madsen, 1987).
  • Role Play. This technique is widely used in testing oral communication. The testee plays the role based on situations given. It can take a variety of situations and the testee chooses one randomly. In a classroom context two or more students can take part at the same time, and the teacher is simply as an observer or a rater. On IELTS Speaking test, the rater at the same time becomes a partner in the role play.
  • Directed Request. This technique requires the testee to reconstruct the situation given into other expressions. For example the testee will be given “Would you please ask the man if we could look at his telephone directory a moment? The expected answer is “Excuse me. Can we use your dictionary for a few minutes.”
  • Interpreter Task. This technique requires the testee to report to the second person who pretends to speak only the language being tested. The testee is required to engage in a two-way translation: Native Language to Foreign Language and Foreign Language to Native Language. This technique is used in FSI Oral Interview.
  • Communicative Discourse. The most frequently used testing procedure in assessing oral proficiency is using direct measure of speaking ability. The techniques can vary simple Questions and Answers to complex Oral Interview.
  • Conversation techniques are very common in testing speaking. These techniques can vary from such simple questions as “What is your name?”, Where were you born?” Where do you live?”, “Why are you learning English?” to ore free conversation on certain topics to promote genuine interactions. The most complex one is interview.
  • Dyad Interaction. The testees exchange information with peer in activities ranging from evaluating one topic to problem solving.
  • Group Evaluation. A group of testees (4-6) are given a topic to be discussed. To provide topic, the tester can start by showing video or tape.

In communicative language teaching era, assessment of one’s ability in using language in real communication becomes the main concern. However, the complexity of the aspects involved in speaking test makes the teachers tend to avoid using direct testing for testing oral proficiency, instead they use indirect or semi-direct testing, that is, testing oral ability using paper-pencil dialogue tests. No wonder that the application of communicative language testing is dragged far behind the application of communicative language teaching. It is due, on one side, to the difficulty of conducting the speaking test and the other is due to the appreciation of speaking test in educational contexts, in which written test is more dominant in determining learners’ achievement than the speaking test.

      In the future with the application of Authentic Assessment in Contextual Teaching and Learning (CTL) in 2004’s National Curriculum, in which the assessment of students’ performance will be done at the same time as the teaching and learning activities, the assessment as expected in communicative language testing will be more accommodated. (But it needs careful preparation and objectivity needs the readiness of the teachers to give objective marks).

      The inclusion of practicum in National Examination for SMP and SMA, one of which is speaking for English, will force the teachers to find appropriate techniques to elicit students’ speaking. Teachers can apply various elicitation techniques for speaking test depending on the objectives, time availability, students’ levels, the ratio of raters and students, and some other considerations.


 Alderson, J. Charles, Caroline Clapham, and Diane Wall, 1995. Language Test Construction and Evaluation. Cambridge: CUP.

Bachman, Lyle F., 1991. Fundamental Considerations in Language Testing. Oxford: OUP

Brindley, Geoff (ed), 1995. Language Assessment in Action. Sydney: NCELTR.

Heaton, JB, 1990. Classroom Testing. London: Longman Group Ltd.

Heaton, JB., 1991. Writing English Language Test. London: Longman Group Ltd.

Hughes, Arthur, 1993. Testing for Language Teachers. Oxford: OUP

Madsen, Horald S, 1987. Techniques in Testing. Oxford: OUP

Madsen, Horald S., 1981. “Selecting Appropriate Elicitation Techniques for Oral Proficiency Tests” in John A.S. Read. Directions in Language Testing. Singapore: RELC, pp. 87-99.

Morrow, Keith, 1982. “Testing Spoken Language” in JB Heaton (ed.). Language Testing. Great Britain: Modern British Publication Ltd, Pp. 56-58.

Mukminatien, 1999. “The Scoring Procedures of Speaking Assessment”, English Language Education, Volume 1 Number 1, July, 1995.

Underhill, Nic, 1982. “The Great Reliability/Validity Trade-off: problems in assessing oral productive skills”, in JB Heaton (ed.). Language Testing. Great Britain: Modern British Publication Ltd, pp. 17-23.

Weir, Cyril J., 1990. Communicative Language Testing. New York: Prentice Hall


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s