THE INTERNATIONAL ENGLISH LANGUAGE TESTING SYSTEM (IELTS): OVERVIEW AND EVALUATION

I MADE SUJANA (EED, the University of Mataram)

 

INTRODUCTION

 1.1 The IELTS test : an Overview

 The recent development in language teaching theories and attitudes from structural to communicative trend willy-nilly also brings an impact to development in language testing. Test developers compete to introduce the best way to assess language proficiency and to revise the existing test to be more up-to-date and to be internationally accepted. The International English Language Testing System — known as IELTS — is one of products of this development. The IELTS test, revised with considerable change from the English Language Test Service (ELTS) previously used as English proficiency assessment in Great Britain, was born as a reaction to the need to provide a suitable test for assessing the English proficiency of an increasing number of overseas candidates wishing to study or be trained in Australia, Britain and other English-speaking countries (Ingram, 1990).

   The IELTS test is a result of a joint project conducted by Australia and Britain with some contribution from Canada and some other countries. Its development was inspired by a number of considerations. Firstly, the movement of testing away from focusing on language form tests to the tests closely related to language use, from psychometric to sociolinguistic era (Anderson, n.d.). The second consideration is to provide reliable and valid assessment of candidate’s English proficiency in all four macroskills (Listening, Speaking, Reading, and Writing). Thirdly, to ensure face validity for the test by providing a variety of test items which would adequately assess the language proficiency expected from academic study and professional training. The fourth consideration is because of the need for the test to be readily available and easily marked in any test centre around the world (Ingram, 1990).

   The IELTS test is designed to assess test-takers’ English proficiency level in four macroskills i.e. listening, speaking, reading, and writing. The score achieved from the assessment will provide a description of test-takers’ English proficiency which will determine whether their English will enable them to perform successfully in tertiary study and to encounter social-life situations when living in English-speaking countries without being disadvantaged by their English. Regarding this, the assessment of the candidates’ English proficiency for candidates who want to undertake study in English-speaking countries is crucial. They are demanded to achieve certain level of proficiency to enable them to cope well with the study overseas, because, as Garbutt and O’Sullivan (1995) assert that : (i) studying in tertiary institutions in English-speaking countries may demand reading books and journals, writing assignments, listening to lectures, and participating in tutorials and seminars;  and (ii) living in English-speaking countries may involve reading newspapers, notices, signs, instruction manuals, writing to institutions and individuals, listening to the radio, instruction and casual conversations, and speaking about oneself : background, country, study plan, etc.

   Since the purpose of the IELTS test is to assess the candidates’ English proficiency level in the four  macroskills needed to perform successfully in tertiary study and to encounter in social-life situations when living and studying in English-speaking countries, it is clear what types of tasks and texts should be included in the test. The IELTS test developers (e.g. Ingram, 1990) feel confident that the test has been designed to closely approximate language tasks that are seemingly real life and commonly used in academic setting.

   The test battery consists of two main modules : a General Training Module and an Academic Module. The General Training Module is intended for the test-takers whose purpose of going abroad is to take vocational course and training program (non-degree); while the Academic Module is for those who will undertake the tertiary study (degree). The format of the GTM is identical to that of the Academic Module, but the reading passages and writing tasks do not reflect tertiary study requirements (Garbutt & O’Sullivan, 1995; Griffin, 1988).

   Since April 1995, there have been some changes in the IELTS test. In the new version of the test, the three academic Reading and Writing modules (modules A, B, and C) are replaced by one single module. This means that all candidates, who intend to undertake tertiary study in English-speaking countries, take the same test. The changes also deal with test-taking times, the number of words required in writing (the length of the essays), and the independence of the Writing subtest to the Reading subtest. In order to have an overview about the test, the following section will provide description of the IELTS subtests such as task and text types included in each subtest, time allocation, as well as the way how the test is marked (scored). (For further explanation of this test, see Garbutt & O’Sullivan, 1995; Ingram, 1990)

 1.1.1 Test Description

1.1.1.1 Listening Subtest

The IELTS listening test comprises two identified situations the test-takers will experience overseas: the first relates to social situations and the second to course-related situation. The test lasting 30 minutes generally contains four sections and involves a variety of item types including information transfer such as form-filling, completing a diagram, following routes on a map; true-false and multiple-choice types; and open-ended questions. Discourse styles are different throughout the test and may contain monologue, conversation, and formal and informal lectures which are spoken by “intelligible native speakers” from participating countries in the project (Australia, Britain, or Canada) in varied accents and situations, and different utterance rates (Garbutt & O’Sullivan, 1995; Ingram, 1990).

   The tape is played only once and all instructions are given on the tape, not announced by the examiner in the testing room. At the beginning of each section the examinees are given 30 seconds to study the tasks, and another 30 seconds at the end of each section to check their answers. At the end of the whole listening test, they are given an extra 10 minutes to write up their answers on a special sheet (in the old version only one minute to check the answers for the whole listening test).

 1.1.1.2 Speaking Subtest

 The Speaking section of the IELTS test is a direct test (in which the tasks directly measure what intends to be measured, i.e. to measure the candidates’ ability to speak by giving them tasks which allow them to speak) of oral proficiency. The test is a structured oral interview, designed to encourage the candidates to demonstrate their ability to speak English (IELTS : an introduction, 1989) The length of the interview is eleven to fifteen minutes and it is divided into five stages :

Stage 1 : Introduction : introduction and greetings; basic questions about the

               candidate;

Stage 2 : Extended Discourse : questions about general topics;

Stage 3 : Elicitation : candidate asks the interviewer questions based on a task;

Stage 4 : Speculation and Attitudes : more detail questions, particularly focusing

               on candidate’s future plans; and

Stage 5 : Conclusion : closing the interview (Garbutt & O’Sullivan, 1995; IELTS

               Specimen Materials, 1990).

    However, sometimes not all of these stages are passed through. The interviewer has the flexibility to adjust the current proficiency of the test-taker. Before the interview the candidate must provide information about himself/herself by filling in a curriculum vitae in order to give the interviewer some basic information to carry out tasks on stages 1 and  4.

   Interviewers are native speakers, trained ESL/EFL teachers who have undergone short formal training in administering the Speaking test as well as in the Writing subtest in order to maintain inter-rater reliability (agreement among different raters on the same trait) and intra-rater reliability (the consistency of the same rater from one occasion to another). The interview used to be audio-recorded, but it is not done anymore (the reason for this change is not clear whether for cost effectiveness,  avoiding the candidates’ nervousness, or it is regarded as time consuming).

1.1.1.3 Reading Subtest

The IELTS Reading tests mostly assess the kinds of reading skills required by the test- takers in tertiary study or academic reading. The task are designed to help the candidates read in effective ways. The candidates are required to read three or four passages/ sections and answer around 30 to 45 questions in 60 minutes (in the old version 55 minutes) which include a wide range of tasks such as identifying structure, content and procedure, following instructions, finding main ideas, reaching conclusion, drawing logical inferences, etc. They are formatted in a variety of item types including multiple-choice, gap-filling/cloze test, summary completion, table completion, heading insertion, and open-ended questions. The materials of the reading passages are taken from magazines, books, academic papers and journals, well-written newspaper articles, etc.(Garbutt & O’Sullivan, 1995; Ingram, 1991).

   As noted above, because of the different purposes and length of being in the English-speaking countries between those taking GTM and academic module tests, the test are constructed differently. Passages in GTM are shorter, less linguistically complex, and less academic in style and content than those in the academic module (Garbutt & O’Sullivan, 1995).

 1.1.1.4 Writing Subtest

 If in the old version, the Writing subtest is always integrated with the Reading subsection, that is, one of the tasks in the writing test (task 2) draws on the material in one or more of the reading passages as input, in the new version both tasks are independent to the reading passages. In all modules there are two tasks : Task 1 requires 150 words minimum in length for the academic module and 80 words for GTM in 20 minutes; and in Task 2 a candidate should spend 40 minutes to produce a 250-word essay for the academic module and a 120-word essay for GTM (in the old version 100 words in 15 minutes for task 1 and 150 words in 30 minutes for task 2).

   According to Ingram (1990), the writing tasks are made as realistic as possible and require the kind of activity that candidate will have to do in entering and pursuing their courses. The tasks, among others, may involve organising and presenting data, describing an object or event, explaining how things work, comparing and contrasting evidence, etc.

1.1.2 Marking and Interpreting Scores in the IELTS test

 

The Reading and Listening subtests are marked objectively according to predetermined answer; while the markings of the Writing and Speaking subtests are based on graded- criteria which are converted into a single band scale score. The scoring systems of the Writing and Speaking tests are more analytical than those of Reading and Listening subtests and hence demands expertise in the form of experienced ESL/EFL teachers who have been trained to mark the papers or interviews.

   Each subtest is marked individually in the form of Band Scores and the final scores are ranged from 0 – 9 with 0.5 gradations between 1 – 9 in the Reading and Listening subtests. The Writing and Speaking subtests have only one/single gradation. The overall test score is the mean of the four subtests. The score can, then, be consulted with Band Score Descriptors, which provide a brief statement of what individual score signifies in terms of a candidate’s performance (Gibson & Rusek, 1992), for examples :

 9  Expert User. Has fully operational command of the language : appropriate, accurate and fluent with complete understanding.

 8  Very Good User. Has fully operational command of the language with only occasional unsystematic inaccuracies and inappropriacies. Misunderstandings may occur in unfamiliar situations. Handles complex detailed argumentation well.

 7  Good User. Has operational command of the language, though with occasional inaccuracies, inappropriacies and misunderstanding in some situations. Generally handles complex language well and understands detailed reasoning.

 6  Competent User. Has generally effective command of the language despite some inaccuracies, inappropriacies and misunderstandings. Can use and understand fairly complex language, particularly in familiar situations.

etc. … (IELTS : an introduction, 1988: 6)

    Those are some information about the task, text types, procedures and scoring system of the IELTS test. As a candidates’ language proficiency assessment, the IELTS test undoubtedly poses a number of issues dealing with its validity and reliability. Regarding those issues questions may arise : Does the inclusion of such task and text types mentioned above as representative samples of behaviour domain guarantee the validity of the test ?; Does the involvement of native speakers and trained ESL/EFL teachers as examiners guarantee consistencies in marking ?; What does the score of each item of the Listening and Reading subtests mean in terms of item difficulty ?, etc. The present study is intended to account for such issues using the test-takers’ reactions on the IELTS test as a main basis of evaluation.

 1.2 The Aim of this Study

The aim of this study is to evaluate the IELTS test from the test-takers point of view concerning the test content, procedures, and results. The main topics under investigation are the reaction toward the relevance of the test content to the test-takers’ (academic and social) performance in the target situations; the level of difficulty of the test tasks; the sufficiency of the test-taking time, the objectivity of scoring; and the adequacy of the test as a predictor of performance in the target situations. It is hoped that the result of this study at least provides information on the test-takers’ comment on the standardised test they have taken. This can be measured against the IELTS test itself.

1.3 Methodology

1.3.1 Test-Takers as Respondents

Recent research and studies in language testing have mentioned the role of test-takers as a part of information that has to be taken into account. The motivation of collecting the test-takers’ reactions, however, varies from one to another. Bradshaw (1990: 14) citing researchers’ and writers’ opinion concludes that the purposes of involving the test-takers’ reactions in language testing are, among others, to reduce the examinees’ dissatisfaction and make the test comfortable, to increase face validity and public accountability (Nevo and Stetz, 1985); to investigate the reliability and validity of test items (Shohamy, 1982, 1983); to examine the interaction between item types, students’ background, and proficiency on various dimensions (Scott and Madsen, 1983).

   Inspired by such opinion, the present study involves the test-takers in order to evaluate the IELTS test. The respondents involved here are post-graduate students from Indonesia who are studying various kinds of disciplines (Physics, Chemistry, Computing, Statistics, Linguistics, etc.) at Macquarie University Sydney Australia. The total number of the respondents is 16 and all of them have taken the IELTS test at least once before coming to Australia as one of the requirements to get scholarship and also as a basis of the determination of  the length of the English course needed before studying in Australia.

   In order to minimise as many extraneous variables as possible, the respondents were limited only to those having been studying in Australia for at least one semester and maximum one and a half years; and they have never studied abroad before. The assumption underlying the consideration of this period is that in one semester the test-takers have adequate experiences to evaluate problems they have, while the consideration of one and a half years is due to the test-takers’ memory about the test they took.

 1.3.2 Data Collection Procedures

 A questionnaire and interview were used to elicit the test-takers’ perspective to the IELTS test. The questionnaire, adapted from that of Brown and McNamara (1992), consists of two type questions : a) rated-response questions using Likert Scale and Osgood’s Semantic Deferential (i.e. the respondents rate from 1 strongly agree/very much to 5 strongly disagree/very very little; and b) open-ended questions (i.e. the respondents make general comments in order to obtain information which cannot be covered by the rated-response questions. The questionnaire is divided into 5 sections, 4 of which are based on the IELTS subtests (listening, speaking, reading, and writing) and one section of general questions.

   Semi-structured/open-ended interview was used as a supplementary data collection. This kind of interview allows both researchers (interviewers) and the respondents (interviewees) some control over the direction to the interview (Downsett, 1986 cited in Gibson & Rusek, 1992).

   In order to assist the test-takers’ memory of the task and text types of the IELTS test, they were shown the sample of the test booklets.

 1.4 Data Analysis

 Data obtained from rated-response questions were analysed using descriptive statistics in order to find out frequency (f) of occurrences of each question and mean (x) of each question; while the open-ended questions and the interview were analysed descriptively.

SOME RELATED STUDIES

 Hamp-Lyons (1988) noted two prior ethical requirements in a language test : the first the requirement of degree of validity (???) and the second the requirement of degree of reliability (???) in which both depend on the purpose to which the test is put.

   As an international assessment of candidates’ English language proficiency for living and pursuing their courses in English-speaking countries, the IELTS test has ‘invited’ a number of issues dealing with its validity and reliability. On the basis of data gathered from the questionnaires completed by trialing candidates, teachers, test administrators, and subject specialists, the IELTS test developers felt confident to make statement about the validity and reliability of the test. They believed that they have fulfilled all the requirements with this test by defining the target language need of candidates, preparing the test criteria for the test centre, variation of text types, and training assessors (Alderson cited in Gibson & Rusek, 1992)

   Up to now only few studies have been conducted in terms of the issues of the IELTS test validity. A study involving 63 undergraduate and postgraduate students enrolled in one of the three South Australia universities was conducted by Gibson and Rusek (1992) in order to see whether the band score of 6.0 of the IELTS test predicted academic success. They found that the result of the study were equivocal. Those who achieve score 6.0 were successful; however, those with score from 4.0 – 5.5 were also successful. Furthermore, they argued that the reason of the success of those whose scores are lower than 6.0 may be the difference in language requirements of different courses. From these findings, they then confirmed that “…while the IELTS test is a useful predictor of language proficiency, there is no direct link between IELTS scores and academic success.” (p. 57).

   Similar results were also previously reported by Criper and Davies (in Davies, 1990) who found non-significant correlation between the English language proficiency measured using ELTS test and academic success with correlation coefficient of .30. They, then, claimed that language proficiency plays only trivial part, that is, only 10 %, in academic performance. This idea is in accordance with that of Graham (1987) which mentioned that the relationship between English test scores and academic success is “murky”.

   Another research focused on the relationship between the English language proficiency measured by IELTS test and academic performance was conducted by Elder (1993). Involving of trainee teachers, she also fails to obtain conclusive statements/evidence about the value of the IELTS as a predictor of academic performance and these findings confirm evidence from two previous studies.

   From the three findings it is clear that the relationship between the English language proficiency measured by the IELTS test and academic performance is equivocal. Since the IELTS test developers claimed that the IELTS test has been designed to closely approximate language task and text types needed in real life situation, these three findings are in accordance with a question launched by Elder (1993) saying  ‘to what extent are the representative samples of language elicited through performance on different types of tests generalizable to ‘real life’ language use in specialized contexts ?’

   Studies involving native speakers of English as subjects have also been conducted in order to find out the validity and reliability of the IELTS test. A study by Evans (1990 cited in Hamilton, Lopes, McNamara, and Sheridan 1993) provided information that the native speakers’ performance on the IELTS test was far from uniform and far from perfect. Her subjects (N = 16) at a tertiary institute in Melbourne were only capable of achieving the middle range on the reading subtest, just below the level required for entrance by foreign students to the institution concerned. Similar results were also reported by Hamilton, et al. (1993) who found that the native speakers’ scores on the IELTS test were neither homogeneous nor high.

   These two findings pose an important question dealing with the term “expert user” (Band 9 in Band Score Descriptors). The “expert user”, as mentioned in the previous section, is defined as follows : “has fully operational command of the language; appropriate, accurate and fluent with complete understanding” (IELTS : an introduction, 1989: 6). It is clear from these findings that the native speakers of English under studies failed to perform in such a way. So, who is the expert user in this context ?

III. RESULTS AND DISCUSSION

3.1 Results

Data on the test-takers’ reactions on the IELTS test are shown in Table 1 below. Because of the small number of the respondents, the response categories, which are measured on a five-point scale ranging from 1 Strongly agree/very easy/very much to 5 Strongly disagree/very difficult/very little were classified according to percentages of agreeing/easy (scales 1 and 2), disagreeing/ difficult (scales 4 and 5) and neutral (3).

Table 1 : Test-takers’ Reactions (N = 16)

 Summary of the Questions Agree/

Easy

% (N)

Neutral

 

% (N)

Dis’gree/Diff.

% (N)

Mean

 

(X)

Section 1 : Listening

 

1. Item Difficulty

2. a. Cope with social life situations

b. Cope with academic situations

3. Time Allotment

 

 

25.0(4)

18.1(3)

37.5(6)

25.0(4)

 

 

12.5(2)

31.2(5)

25.0(4)

6.2(1)

 

 

62.5(10)

50.0(8)

37.5(6)

68.5(11)

 

 

3.50

3.25

3.06

3.87

 

Section 2 : Speaking

 

1. Item Difficulty

2. Opportunity to show speaking

3. Time Allotment

4. Usefulness

5. a. Cope with social situations

b. Cope with academic situations

 

 

 

68.5(11)

56.2(9)

56.2(9)

50.0(8)

62.5(10)

56.2(9)

 

 

25.0(4)

12.5(2)

6.2(1)

43.7(7)

18.1(3)

25.0(4)

 

 

6.2(1)

31.2(5)

37.5(6)

6.2(1)

18.1(3)

18.1(3)

 

 

2.25

2.43

2.50

2.31

2.50

2.56

Section 3 : Reading

 

1. Item Difficulty

2. Time Allotment

3.Topics and field study relevance

4. Reflect ability to read

 

 

18.1(3)

18.1(3)

12.5(2)

25.0(4)

 

 

 

25.0(4)

12.5(2)

37.5(6)

31.2(5)

 

 

 

56.2(9)

68.1(11)

50.0(8)

43.7(7)

 

 

3.50

3.75

3.68

3.37

Section 4 : Writing

 

1. Item Difficulty

2. Time Allotment

3. Topic and field study relevance

4. Cope with academic situations

 

 

43.7(7)

43.7(7)

12.5(2)

25.0(4)

 

 

25.0(4)

12.5(2)

50.0(8)

50.0(8)

 

 

31.2(5)

43.7(7)

37.5(6)

25.0(4)

 

 

2.75

3.00

2.81

3.06

 

Section 5 : General Questions

 

1. Reflect Language Requirements

2. Predictive

3. Role of Background Knowledge

 

 

37.5(6)

31.2(5)

56.2(9)

 

 

31.2(5)

18.1(3)

37.5(6)

 

 

31.2(5)

50.0(8)

6.2(1)

 

 

3.00

3.50

2.18

   The results show that in general the respondents gave a low rating, in the sense that the respondents reacted negatively or found the test difficult, to the IELTS subtests on Listening, Writing, Reading and General Questions. However, positive reaction was recorded to statements on Speaking subtest.

   As indicated in Table 1, the majority of respondent found that the test-taking time was problematic for them to answer questions on Listening (with mean score 3.87) and on Reading (with the mean score 3.68). This was supported by test-takers’ comments on open-ended questions and on general comments, in which 13 out of 16 respondents (81.26%) commented that they could not complete the whole tasks. From the open-ended questions and the interview they also commented that they were rushed by the time which often disturbed their concentration.

   In relation to topics on Reading subtest, it was found that the topics were somehow irrelevant to their field of study or background knowledge (with mean score 3.68). This tendency was also reflected in the open-ended questions and general comments, in which 62.5% of the respondents pointed out that the topics in the Reading passages did not match their field study. On the Writing topics, on the other hand, there is some contradiction between what was obtained from rated-scale statements in which the respondent reacted positively (with mean score 2.81) and from the test-taker’s statements in the open-ended questions and general comments, in which 62.5% (10 respondents) considered lack of relevance between the test topics and their field of study.

   The lack of relevant topics is closely related to no division of modules. The unitary of the three academic modules (module A, B, C) in th old version into one single module allows greater and greater chances for the topics on Reading and Writing subtests being irrelevant to test-takers from various disciplines. The need of having relevant topics to the test-takers’ background closely relates to familiarity of technical terms, vocabulary, background knowledge, etc. In relation to this issue, various comments were recorded such as 62.5% (10 respondents) suggested that there should be more division in order to avoid irrelevant topics; two of them (12.5%) felt that, although the materials and topics are irrelevant to their background which is caused by no more division in the new version, the IELTS topics are better compared to those in the TOEFL; other two were satisfied with the divisions because all candidates face same topics; and the rests gave no comments.

   There were also various comments toward questions whether the test helped the respondents cope with real-life situations (social and academic situations). The majority of them responded negatively toward Listening, Reading, and writing subtests; and positively toward Speaking subtest. And they felt that the scores they obtained could not predict their academic success (mean score 3.50).

3.2 Discussions

 As mentioned in the previous section, there are two prior ethical requirements in determining the quality of a language test : the first the requirement of degree of validity and the second the requirement of degree of reliability in which both depend on the purpose to which the test is put (Hamp-Lyons, 1989). This section, furthermore, will discuss the validity and reliability of the IELTS test using the test takers’ reactions as the main basis of the analysis.

3.2.1 Validity

A test is said to be valid if it measures accurately what is intended to measure (Hughes, 1993: 22). This definition seems very simple; however, if we closely examine, the concept of validity reveals a number of aspects and can be approached from a number of perspective such as from content, face, construct, criterion, washback, etc. (Hughes, 1993; Weir, 1990). Considering various limitations of this study, only content and face validity will be discussed in the following sections.

   Content Validity.  Content validity involves the investigation of the selection of tasks which is representative of a larger set (universe) of tasks of which the test is assumed to be sample (Palmer & Groot, 1980 cited in Hamp-Lyons, 1989). This implies that a language test is said to have content validity if its content covers a representative sample of language skills, structure, etc. with which it meant to be concerned. To have high content validity, a test should contain task and text types which are similar to those which candidates would have in real life (Hughes, 1993).

   Since the purpose of the IELTS test is to assess candidates’ English proficiency level of four language macroskills needed to perform successfully in tertiary study and to encounter social life situations when living and studying in English-speaking countries, that is to say, the candidates are expected to achieve a certain level of proficiency on the test to enable them to cope well with their study overseas, hence it is clear what text and task types should be included in the test. To fulfil the requirements of content validity, the IELTS test tasks have been designed to closely approximate language tasks that are seemingly real life and commonly used in academic settings. Take for example, filling a form, following instructions, etc. in listening tasks; comparing and contrasting evidence, describing an object and event, etc. in writing tasks; role play, telling future plans, etc. in speaking, drawing conclusions, determining general and specific information in reading passages may represent samples of language needed in social and academic settings. On the other hand, the inclusion of text types such as monologues, dialogues, etc. in listening; discuss in detail field of study and future plans in speaking; the use of scientific magazines, academic papers, articles in reading; and the use of reading passages as references in writing tasks may also represent samples of the behavioural domain.

   It is interesting to note that although the IELTS test developers felt confident to make statements about the validity of the test (Gibson & Rusek, 1992), the test-takers under study reacted negatively to the content validity of the IELTS test. Except for Speaking subtest (with mean score 2.50 for coping with social situations and 2.46 for coping with academic situations), the majority of the test-takers commented negatively to on the statements “whether the materials of the test help them to cope with social and academic situations” with a mean score 3.06 for Writing, 3.25 and 3.06 for Listening; and neutral (mean score 3.00) to the statement 1 on General Questions : “the IELTS test reflects the type of language required by candidates to study abroad”.

   A question arises concerning the finding of this study : since the IELTS test has been constructed by the inclusion of task and text types closely related to language need in social and academic settings, “to what extent the representative sample of language elicited through performance on different types of test are generalizable to ‘real-life’ language use in  specialized context ?” (Elder, 1993:84)

   Face Validity. The face validity refers to the extent to which the test looks as if it measures what it is supposed to measure (Hughes, 1993). The ELTS, as the origins of the IELTS test, was designed to put into practice three theoretical positions/constructs of how language proficiency is composed. According to Carroll (1981 cited in Hamp-Lyons, 1988: 10), the three theoretical construct underlying the construction of the ELTS test are :

“The first was that language proficiency can be divided into skills (listening, speaking, reading, writing); the second was that for university or college applicants language proficiency is divisible into ‘general’ and ‘study’ proficiency. The third construct viewed language in this context as divisible into discipline-specific proficiency”.

   It is clear that the IELTS test has fulfilled the requirements of face validity, which is achieved by directly assessing four language macroskills (speaking, listening, reading, and writing) and by inclusion both general and academic modular sections. The test provides tasks and texts closely to what candidates will encounter in real language use. Take for instance, news broadcasts, announcements, weather forecasts, listening to lectures, etc. in listening subtest; telling future plans, asking questions, etc. in speaking; evaluating, comparing, contrasting evidence, etc. in writing; and drawing conclusion, determining general and specific information, etc. in reading will always be found in real life situations either in social or academic contexts.

   From the questionnaire, especially from open-ended questions, and the interview, the test-takers commented positively about ‘surface’ performance of the IELTS test. They admitted that the test directly measured what the test intended to measure, for instance, measure speaking ability by directly asking the candidate to speak, writing ability by asking the candidate to write essays, not through the knowledge underlying the writing ability such as grammar or vocabulary, etc. The division of the modules into GTM and academic module and the inclusion of task and text types closely related to language need in social and academic setting have also guaranteed the face validity of the IELTS test.

   However, if we closely examine the third construct mentioned by Carroll (see quotation above), the single academic module in the new version of the test for all candidates from diverse disciplines fails to fulfil the third construct. If each discipline needs specific-language proficiency, the IELTS test lacks face validity from the viewpoint of discipline-specificness in language proficiency and from the role of background knowledge in testing. By the unitary of the three modules in the old version into the single module, the test may fail to “trace” the potential language proficiency owned by the candidates, the failure of which may be caused by the test-takers’ unfamiliarity of terms, vocabulary and by the mismatch of the test-takers’ background knowledge with the topic of the test, etc.

3.2.2 Reliability. Reliability of the IELTS test relates to several points : Are there inter- and intra-rater consistencies in marking the Speaking and Writing Subtests ?; Does the test maintain its secure status while being frequently administered in more than 160 test centres throughout the world ?; Can the criteria on o the Band Score Descriptors reflect the test-takers’ English proficiency ?, etc.

   The IELTS test developers believe that this test has been thoroughly trialed for its reliability (Griffin, 1990 cited in Gibson & Rusek, 1992). The involvement of native speakers, trained ESL/EFL teachers as interviewers and markers is intended to ensure consistency in scoring (for reliability purposes). Besides, regular monitoring of the taped-speaking subtest interviews and of the writing subtest scripts (by returning 10% samples of the writing and the taped-interview to Australia or Britain) was to be carried out to ensure the quality of the interview was not declining and that the ratings of both subtests remained valid and reliable (Ingram, 1990).

   However, from the questionnaire it was found that majority of the respondents (75%) “felt” that there was some subjectivity involved in scoring (especially in Speaking subtest); and four respondents (25%) gave no comments. They regard that the score in the interview was much determined by who the interviewer was and when the interview was conducted. The interviewer may not be able to maintain consistencies of the scoring over the quality of the interview in the same day, in which one interviewer often undertakes up to 10 interviews. Similar comments were also recorded from the interview. Most of the respondents who were previously trained in a language centre in Indonesia before going abroad admitted that there was subjectivity in marking the Speaking subtest. The admittance was basen on their experience when getting involved in discussion in the language centre.

   In addition, what is said as regular monitoring by Ingram (1990) above may need to be questioned because the interviews were not recorded anymore. Other question needs to be asked is since the IELTS test is administered in more than 160 test centres in the world, has the possibility of the test leakage been considered ? The advancement of technology such as e-mail, facsimile, international telephone call, etc. needs to be taken into account. This technology is not impossible to be used as a means of discussion of the topics, formats, and other information, let alone, tasks on the Writing subtest, the instruction of which can easily be memorized since they usualy consist of one or two sentences.

   In relation to Band Scores, test scores and descriptions, some problems also exist. What does a raw score of, say, 30/50 mean in terms of difficulty, the nature of the test content, types of items, etc. ? (Alderson, 1991). It seems there is less attention to such factors in the IELTS scoring system as a marker on Reading and Listening just counts the number of correct answers (paying no attention on the weigh of the items in terms of complexity, difficulty, types and demands of the test items) then converts them to the determined Band Scores.

   Another problem relates to the nine Band Scores and their descriptive statements. Who is the “expert user” of a language ? Is a native speaker of English an expert user in this context ? If so, the band score of the IELTS test needs to be questioned since the results of studies reported reveal that the native speakers’ scores on the IELTS test were neither homogeneous nor high (see Hamilton, et al., 1993).

3.2.3 Test Difficulty

Most of the respondents found the Listening (mean score 3.50) and Reading (mean score 3.50) subtests difficult. However, they differ in their reasons for assigning levels of difficulty in the subtest, such as time limits, unfamiliar topics, speed and accent of the speakers on the Listening. Reactions, both in rated-scale response and open-ended questions, indicated that the time allowed was indeed a problem for the test-takers in those subtests. They felt that they were “rushed” by the time. As a result, most of them admitted they could not answer all questions on Reading and Listening subtests and they did not have time to recheck what they had written.

   Unfamiliar and irrelevant topics to their background knowledge caused by no division of the academic module mentioned somewhere above were other sources of the test difficulty. The unitary of the academic module into one single module for candidates from various disciplines allows greater chance the topics being irrelevant. Such a problem seems to happen to candidates from other countries. In Reading subtest, for examples, a  German candidate, as quoted by Garbutt and O’Sullivan (1995) admmitted :

“I couldn’t believe my eyes when I opened the test paper and saw that there was a passage about laser physics, with a realy complicated diagram. I don’t know anything about laser physics. I am a system engineer !” (p8).

Regarding this, most of the respondents suggested that there should be more divisions in order to avoid such kinds of problems. More divisions in the academic module will allow candidates to involve their background knowledge, familiarity of terms and vocabularies, etc. in taking the test. The involvement of the knowledge will help the understanding of the test. In other words, the more we are familiar with the content of the test, the easier the test will be. It is in a line with Munby’s (cited in Alderson, 1988) model of needs analysis, that is :

“… different skills are required in different situations. In this view, ability in one area would not easily be predicted by performance in a different area; so the ability to comprehend engineering lectures in English would not necessarily entail the ability to follow conversation in a cocktail party or to understand a political party broadcast on TV” (p. 95).

   Dealing with Listening subtest, almost all of the respondent interviewed found that this section was the most difficult of the four subtests. The sources of the difficulty were, among others, speed of speaking and accent of speakers on the tape, which sometimes cannot be heard clearly. In addition, as the respondents said, the complex activity involved at the same time in Listening test was another source of the difficulty. Candidates should read questions, listen to the tape, think and write the answers almost at the same time.

  1. CONCLUDING REMARKS
  1. The majority of the test-takers reacted negatively on most statements relating to listening, reading, and writing subtests in terms of the very limited test-taking time, the unfamiliar topics of the test, which are sometimes caused by the “rough” divisions of modules, the subjectivity on scoring of Writing and Speaking subtests. However, they commented positively on most statements on Speaking subtest in relation to test difficulty, time allocation, opportunity to show speaking ability, and coping with academic and social situations.
  1. As a test which is designed to assess a candidate’s English proficiency level of four language skills needed to perform successfully in tertiary study and to encounter social life situations when living and studying in English-speaking countries, the IELTS test can be considered to have face validity by directly assessing those four macroskills and by the inclusion of both general and academic modules.
  1. Further study of the same topics need to be carried out in order to get a better picture of results about test-takers reactions to the IELTS test. Better preparation in terns of detailed specifications in questionnaire; the number of sample of respondents; and timing for spreading out the questionnaire (the majority of the target respondents were busy for their examination and assignment preparation).

 References :

 Alderson, J. Charles, 1988. “New Procedures for Validating     Proficiency Test of ESP :        Theory and Practice.” Language Testing, 2/5, pp. 220-232.

Aldesron, J. Charles, n.d. “The Relationship between Grammar and Reading in an EAP         Test Battery”. n.i, pp. 1-18.

Alderson, J. Charles, 1988. “Testing and its Administration”, in ESP, in ESP in the     Classroom : Practice and Evaluation. ELT Documents 128, pp. 87-97.

Alderson, J. Charles, 1991. “Bands and Scores” in Alderson, JC. and Brian North (eds),          Language Testing in the 1990’s : The Communicative Legacy. London : McMillan           Publisher Limited, pp. 71-86.

Bradshaw, Jenny, 1990. “Test-Takers’ Reactions to a Placement Test”, Language Testing,         7/1, pp. 13-30.

Brown, Ann and Tim McNamara, 1992. “The Roles of Test-Tkers’ Feedback in the test        development process : test-takers’ reactions to a taped-mediated test of proficiency             in Spoken Japanese”, Melbourne Papers in Language Testing, 1/1,   pp. 53-101.

Davies, A., 1990. Principles of Language Testing. Oxford :            Blackwell.

Elder, C.,1993. “Language Proficiency as a Predictor of Performance in Teacher        Education”, Melbourne Papers in Language Testing, 2/1, pp. 69-89.

Garbutt, M. and Kerry O’Sullivan, 1995. IELTS : Strategy for Study. Sydney : NCLTR

Gibson, C. and W. Rusek, 1992. The Validity of an Overall Band Score of 6.0 on the          IELTS test as a Predictor of Adequate English Language Level Appropriate for    Successful Academic Study. Unpublished MA Dissertation, Sydney, Australia:           Macquarie        University.

Graham, J.A., 1987. “English Language Proficiency and the     Prediction of Academic           Success”, TESOL Quarterly. 21, 3, pp.        505-521.

Hamilton, Jan, et al., 1993. “Rating Scales and Native Speakers’ Performance on a     Communicatively Oriented EAP Test”,        Melbourne Papers in Language Testing,     2/1, pp. 1-24.

Hamp-Lyons, Liz, 1989. “Language Testing and Ethics”, Prospects. 5/1, pp. 7-15.

Hughes, Arthur, 1993. Testing for Language Teachers. Cambridge : Cambridge      University Press.

Ingram, DE, 1990. “The International English Language Testing System (IELTS) : Its          Nature and Development”, Paper presented in RELC Seminar on Language      Testing and Program Evaluation in Singapore

Shaohamy, E. 1985. A Practical Handbook in Language Testing for Second Language            Teachers. TelAviv : TelAviv University Press.

The international English language Testing System : Specimen Materials for Modules A,             B, C, General Training, Listening and Speaking, 1990. The British Council,          UCLES, IDP of Australian Universities and Colleges.

The international English language Testing System : An Introduction, 1989. The British          Council, UCLES, IDP of    Australian Universities and Colleges.

Weir, C.J., 1990. Communicative Language Testing. London : Prentice Hall.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s