
Unit 15 Assessment in Language Teaching 1.Teaching Aims: To discuss Assessment purposes.methods.criteria.principles and testing in teaching English 2.Teaching Content: 1)Assessment purposes 2)Assessment methods 3)Assessment criteria 4)Assessment principles 5)Testing in assessmen 6)Conclusion 3.Teaching Hours::2 periods 4.Teaching materials: 1)Textbook 2)CAI 5.Teaching Methods: 1)Lecture Computer-aided Instruction) 6.Teaching Procedures: Part One Some Kinds of Assessment 1)ractical Assessment The final result is a negotiated decision,taking into account both extemally and internally assessed teaching
Unit 15 Assessment in Language Teaching 1.Teaching Aims: To discuss Assessment purposes, methods, criteria, principles and testing in teaching English 2.Teaching Content: 1) Assessment purposes 2) Assessment methods 3) Assessment criteria 4) Assessment principles 5) Testing in assessmen 6) Conclusion 3. Teaching Hours:: 2 periods 4. Teaching materials: 1)Textbook 2)CAI 5.Teaching Methods: 1) Lecture ( Computer-aided Instruction) 6.Teaching Procedures: Part One Some Kinds of Assessment 1) ractical Assessment The final result is a negotiated decision, taking into account both externally and internally assessed teaching

2)External Assessment Course participants are required to teach one lesson which is observed by an extemal assessor.This may take place during the last week of the course using your TP class. or after the end of the course using your own class. 3)Internal Axsessmeat Course participants are assessed continuously during the course.More than the minimum amount ofobserved teaching is provided by International House,and consequently the best lessons are considered for assessment. 4)Course Assignmer售ts You have to do seven aignments during the course.Five of these are called Practical Teaching Assignments (PTAs)and two are called Practical Written Assignments (PWAs). 5)The Extended Assignment This involves you carrying out a diagnostic test and writing up a cae study of an individual learner. In both the coursework and the extended assignment,presentation and use of English are important.Care should be taken with punctuation,spelling and layout. Emphasizes principles and procedures of assessment that are of primary importance to educational practitioners Includes comstruction of classroom tests,obeservation techniques,and performance measures integration of assessment and instruction.norm-and criterion-referenced assessment. uses of standardizod tests.current issues and controversies
2) External Assessment Course participants are required to teach one lesson which is observed by an external assessor. This may take place during the last week of the course using your TP class, or after the end of the course using your own class. 3)Internal Assessment Course participants are assessed continuously during the course. More than the minimum amount of observed teaching is provided by International House, and consequently the best lessons are considered for assessment. 4) Course Assignments You have to do seven assignments during the course. Five of these are called Practical Teaching Assignments (PTAs) and two are called Practical Written Assignments (PWAs). 5) The Extended Assignment This involves you carrying out a diagnostic test and writing up a case study of an individual learner. In both the coursework and the extended assignment, presentation and use of English are important. Care should be taken with punctuation, spelling and layout. Emphasizes principles and procedures of assessment that are of primary importance to educational practitioners. Includes construction of classroom tests, observation techniques, and performance measures; integration of assessment and instruction; norm- and criterion-referenced assessment; uses of standardized tests, current issues and controversies

Part Two A Comparison of Three Types of Classroom Assessment Sizing Up Instructional Official Plan instructional Carry out the Provide teacher with a activities and monitor huresucratic aspects of quick perception and Purpose the progress of teaching sch as practical knowledge of instruction grading.grouping.and pupiTs characteristies placing During the first week or Daily throughou the Periodically during the Timing two of'school school year school year Largely informal Formal observation and Formal tests,papers. Evidence-Gathering ohservation pupil papers for reports,quizzes,and planning:informal assignments Method observation for monitoring Type of Evidence Cognitive,affective,and Largely cognitive and Mainly cognitive psychomotor affective Gathered Information kept in Written leson plans Formal records kept in Recordkeeping teacher's mind;few monitoring information teacher's mark book or written records not written down school files Aspects of Testing EVALUATIVE PRACTICAL
Part Two A Comparison of Three Types of Classroom Assessment Sizing Up Instructional Official Purpose Provide teacher with a quick perception and practical knowledge of pupil's characteristics Plan instructional activities and monitor the progress of instruction Carry out the bureaucratic aspects of teaching such as grading, grouping, and placing Timing During the first week or two of school Daily throughout the school year Periodically during the school year Evidence-Gathering Method Largely informal observation Formal observation and pupil papers for planning; informal observation for monitoring Formal tests, papers, reports, quizzes, and assignments Type of Evidence Gathered Cognitive, affective, and psychomotor Largely cognitive and affective Mainly cognitive Recordkeeping Information kept in teacher's mind; few written records Written lesson plans; monitoring information not written down Formal records kept in teacher's mark book or school files Aspects of Testing EVALUATIVE PRACTICAL

的时asurement administrability discrimination comparability environ ment acceptability 15 conteat format procedures activities INSTRUCTIONAL THEORETICAL feedback Theory of test-course interdependence (content. leveL language learning backwush) language teaching language testing Types of Tests 1)Achievement- A measure of what has been leamned from what was taught in a particular course or series of courses,measures the extent of learning of the material presented in a particular course, textbook,or program of instruction .Largely discrete-point in nature. Assesses progress in a particular curriculum using a
• measurement • discrimination • comparability • administrability • economy • environment • acceptability TEST content format procedures activities INSTRUCTIONAL • feedback • test-course interdependence (content, level, backwash) THEORETICAL Theory of • language • language learning • language teaching • language testing Types of Tests 1) Achievement -- A measure of what has been learned from what was taught in a particular course or series of courses; measures the extent of learning of the material presented in a particular course, textbook, or program of instruction. • Largely discrete-point in nature. • Assesses progress in a particular curriculum using a

specific set of course materials and a specific instructional syllabus. Measures primarily grammatical accuracy 2)Proficieney- A test that meaures one's knowledge and/or ability in a foreign language without regard to formal study or text used. Largely integratrve in nature. Assesses acquired languge independent of course, teacher.time,and text. Measures not only accuracy,but the appropriate use of language in context for particular purposes 3)Hybrid- A test in which specific lexical.grammatical,sociolinguistic. and discourse features treated in a curricular sequence are tested as they operate in naturalistic contexts. A blend of open-ended,or divergent,responses with specific,convergent items. .Based on a limited corpus of material. Elicits specified features of the target language using naturalistic discourse in a situational format. Combines grammar and comect,structure and situation 4)Creative- A series of test activities or tasks in which the test designer determines the structures to be used within a naturalistic conlext yet allows the student to demonstrate spontancously and
specific set of course materials and a specific instructional syllabus. • Measures primarily grammatical accuracy. 2) Proficiency -- A test that measures one's knowledge and/or ability in a foreign language without regard to formal study or text used. • Largely integrative in nature. • Assesses acquired language independent of course, teacher, time, and text. • Measures not only accuracy, but the appropriate use of language in context for particular purposes. 3) Hybrid -- A test in which specific lexical, grammatical, sociolinguistic, and discourse features treated in a curricular sequence are tested as they operate in naturalistic contexts. • A blend of open-ended, or divergent, responses with specific, convergent items. • Based on a limited corpus of material. • Elicits specified features of the target language using naturalistic discourse in a situational format. • Combines grammar and context, structure and situation. 4) Creative -- A series of test activities or tasks in which the test designer determines the structures to be used within a naturalistic context, yet allows the student to demonstrate spontaneously and

autonomously the ability to use these and other structures to carry out the tak successfully 5)Pro-achirvement--Such tests elicit actual communicative use of the language (both reception and production)within a comext-and associated limitations-of the particular lexicon and structures covered in the textbook at the time of testing Part Three Testing Glossary ACHIEVEMENT TESTING:a measure of what has been learned from what was taught in a particular course or series of courses;measures the extent of learning of the material presented in a particular course,textbook,or program of instruction. APTITUDE TEST:designed to measure capability or potential,whether it is capability to succeed with an academic program,to learn a foreign language,to acquire a specific vocation,or some other capabelity CLOZE TEST:A test procedure which elicits the completion of hlanks deleted from a text. requires filling in the blanks in a passage from which there have been systematic or random deletions.Usually every fifth or seventh word has been removed from the passage beginning at a randomized starting point.The word cloze"was coined in reference to the notion of psychological"closure." COMPUTER ADAPTIVE TESTING:a procedure using computer hardware and software to present test content to examinees in ways that allow for iterative consideration of ability demonstrated in the ongoing testing process Items are chosen to match individul test taker ability. CRITERION-REFERENCED TEST:assesses achievement or performance againest a cut-off score that is determined as a reflection of mastery or attainment of specified objectives,evaluales individual performance in terms of some predetermined criterion for success at performing some behavior with some result under certain conditions and judged by oertain standards:writing a friendly letter in the target language (the behavior),consisting of 50 words or more (the result)
autonomously the ability to use these and other structures to carry out the task successfully. 5) Pro-achievement --Such tests elicit actual communicative use of the language (both reception and production) within a context -- and associated limitations -- of the particular lexicon and structures covered in the textbook at the time of testing. Part Three Testing Glossary ACHIEVEMENT TESTING: a measure of what has been learned from what was taught in a particular course or series of courses; measures the extent of learning of the material presented in a particular course, textbook , or program of instruction. APTITUDE TEST: designed to measure capability or potential, whether it is capability to succeed with an academic program, to learn a foreign language, to acquire a specific vocation, or some other capability. CLOZE TEST: A test procedure which elicits the completion of blanks deleted from a text; requires filling in the blanks in a passage from which there have been systematic or random deletions. Usually every fifth or seventh word has been removed from the passage beginning at a randomized starting point. The word "cloze" was coined in reference to the notion of psychological "closure." COMPUTER ADAPTIVE TESTING: a procedure using computer hardware and software to present test content to examinees in ways that allow for iterative consideration of ability demonstrated in the ongoing testing process. Items are chosen to match individual test taker ability. CRITERION-REFERENCED TEST: assesses achievement or performance against a cut-off score that is determined as a reflection of mastery or attainment of specified objectives; evaluates individual performance in terms of some predetermined criterion for success at performing some behavior with some result under certain conditions and judged by certain standards: writing a friendly letter in the target language (the behavior), consisting of 50 words or more (the result)

within 10 minutes (the condition).with no more than three morphological errors (the standard). Focus is on ability to perform tasks rather than group ranking DIAGNOSTIC TEST:designed to provide information about the specific strengths and weaknesses of the test taker.It is usually designed to guide remedial instruction. DIRECT TEST:one that measures ability directly in an authentie context and format.as opposed to an indirect test that requires performance of a contrived task from which inference is drawn about the presence of the ahility concemed. DISCRETE-POINT TESTING:testing of one point at a time,i.e..only one element (eg. negative singular past auxiliary'didn)from one component of language (e.g syntax)is assessed in one skill (reading).a receptive skill.a multiple-choice test of article usage FACE VALIDITY:a subjective impression,usually on the part of examinees,of the extent to which the test and its format fulfills the intended purpose of measurement;does the test appear to measare what it claims to measure. FORMATIVE EVALUATION:an evaluation which is ongoing and iterative during an instructional sequence.This kind of evalution permits midstream adaptation and improvement of the program FREE RESPONSE:respondents have the liberty to say or write what they choose.usually within certain parameters. FUNCTIONAL LANGUAGE ABILITY:the ability to use target language knowledge in natural or naturalistic communicative situations. INDIRECT TEST:a test that is contrived and/or different from the situation of interest.e.g. students are given a short-answer grammar test as an indirect measure of their actual grammatical performance in normal classroom routines or out of class.A test that fosters inference about one kid of behavior or performance through measrement of another related kind of performance
within 10 minutes (the condition), with no more than three morphological errors (the standard). Focus is on ability to perform tasks rather than group ranking. DIAGNOSTIC TEST: designed to provide information about the specific strengths and weaknesses of the test taker. It is usually designed to guide remedial instruction. DIRECT TEST: one that measures ability directly in an authentic context and format, as opposed to an indirect test that requires performance of a contrived task from which inference is drawn about the presence of the ability concerned. DISCRETE-POINT TESTING: testing of one point at a time, i.e., only one element (e.g., negative singular past auxiliary "didn't") from one component of language (e.g., syntax) is assessed in one skill (reading), a receptive skill; a multiple-choice test of article usage. FACE VALIDITY: a subjective impression, usually on the part of examinees, of the extent to which the test and its format fulfills the intended purpose of measurement; does the test appear to measure what it claims to measure. FORMATIVE EVALUATION: an evaluation which is ongoing and iterative during an instructional sequence. This kind of evaluation permits midstream adaptation and improvement of the program. FREE RESPONSE: respondents have the liberty to say or write what they choose, usually within certain parameters. FUNCTIONAL LANGUAGE ABILITY: the ability to use target language knowledge in natural or naturalistic communicative situations. INDIRECT TEST: a test that is contrived and/or different from the situation of interest; e.g., students are given a short-answer grammar test as an indirect measure of their actual grammatical performance in normal classroom routines or out of class. A test that fosters inference about one kid of behavior or performance through measurement of another related kind of performance

INTEGRATIVE TEST:one that measures knowledge of a variety of language features,modes or skills simultaneously.Testing two or more points together.usully implying the testing of a number of such points at once.An example would be a dictation,which could be used to measure listening comprehension,spelling,or general language proficiency. INTERRATER RELLABILITY:a mcthod of estimating the reliability of independent ratings,a mexsure of the degree to which two o more raters agree in their ratings on some behavior assessed in one or more subpects ITEM DIFFICULTY:the proportion of correct responses to total responses on a test item.e.g.if 20 out of 30 students get an item right.the item difficulty is 66%(2030). ITEM DISCRIMINATION:how well an item distinguishes hetter students from poorer ones For example.if the upper third of the students get the item correet and the lower two thirds generally get it wrong.the item is a good discriminator between these two groups LANGUAGE APTITUDE:basic ability to leam a new language.including verbal intelligence. auditory and visual memory span,sound-symbol associstive skill,and skill at grammatical analysis LINGUISTIC COMPETENCE:the breadth of knowledge that the learner has regarding the linguistic clements of the language--pronunciation,vocabulary,and structure. MEAN SCORE:the average score for a given group of students,obeained by adding all of the individual scores and then dividing by the total numher of soores MEDLAN:the centermost soore in a distribution of scores arranged in soquence.In even-numbered distributions with no central score,the median is either the midpoint between the two centerbounding scores or a weighted point between them MEDIAL.INTERVAL.:the interval which includes the point below and above which 50%of the scores occur
INTEGRATIVE TEST: one that measures knowledge of a variety of language features, modes, or skills simultaneously. Testing two or more points together, usually implying the testing of a number of such points at once. An example would be a dictation, which could be used to measure listening comprehension, spelling, or general language proficiency. INTERRATER RELIABILITY: a method of estimating the reliability of independent ratings; a measure of the degree to which two or more raters agree in their ratings on some behavior assessed in one or more subjects. ITEM DIFFICULTY: the proportion of correct responses to total responses on a test item, e.g., if 20 out of 30 students get an item right, the item difficulty is 66% (20/30). ITEM DISCRIMINATION: how well an item distinguishes better students from poorer ones. For example, if the upper third of the students get the item correct and the lower two thirds generally get it wrong, the item is a good discriminator between these two groups. LANGUAGE APTITUDE: basic ability to learn a new language, including verbal intelligence, auditory and visual memory span, sound-symbol associative skill, and skill at grammatical analysis. LINGUISTIC COMPETENCE: the breadth of knowledge that the learner has regarding the linguistic elements of the language--pronunciation, vocabulary, and structure. MEAN SCORE: the average score for a given group of students, obtained by adding all of the individual scores and then dividing by the total number of scores. MEDIAN: the centermost score in a distribution of scores arranged in sequence. In even-numbered distributions with no central score, the median is either the midpoint between the two centerbounding scores or a weighted point between them. MEDIAL INTERVAL: the interval which includes the point below and above which 50% of the scores occur

MODE the point at which scores occur moet frequently (i.e.the highest point of the distribution curve).Some distributions are irregular,having more than one mode MINIMAL PAIR:two words sounding alike in all but one feature.e.g "heating/hitting":in this case the feature is the first vowel NATURALISTIC COMMUNICATIVE SITUATION:a staged situation,usually in a classroom,which is intended to simulate natural communicntion removed from the intervention of an instructor NORM-REFERENCED TEST:evaluates abality againest a standard of mean or normative performance ofa group.It usally implies standardization through prior administration to a large sample of examinees. NORMS:an empirically derived distribution of scores on a test,which provides reference data for appropriate groups of examinees.e.g.students'results on the Test of English as a Foreign Language (TOEFL)are reported with reference to norms so that students can see where they stand in comparison with the gencral population of foreign students ORJECTIVE TFSI:a test that can be scored with reference to a scoring key and,therefore.does not require expert judgment in the scoring process.This is unlike a subjective test that depends on impression and opinon at the time of scoring PERCENTAGE SCORE:cqusl to the number of correct items divided by the total number of items an the test,times 100.It is also expressible as 100 times the obtained soore divided by the total score possible PERCENTILE RANK:a number indicating the percent of individuals within the specific norm group that scored lower than the raw score of a given student. PROFICIENCY TEST:a meaure of the linguistic knowledge that students have in a language andor their ability to apply this knowledge functionally.measures general abillityo skill.as opposed to an achievement test that measures the extent of learning of specific material prescned in a particular course,textbook,or program of instruction
MODE: the point at which scores occur most frequently (i.e., the highest point of the distribution curve). Some distributions are irregular, having more than one mode. MINIMAL PAIR: two words sounding alike in all but one feature, e.g., "heating/hitting": in this case the feature is the first vowel. NATURALISTIC COMMUNICATIVE SITUATION: a staged situation, usually in a classroom, which is intended to simulate natural communication removed from the intervention of an instructor. NORM-REFERENCED TEST: evaluates ability against a standard of mean or normative performance of a group. It usually implies standardization through prior administration to a large sample of examinees. NORMS: an empirically derived distribution of scores on a test, which provides reference data for appropriate groups of examinees, e.g., students' results on the Test of English as a Foreign Language (TOEFL) are reported with reference to norms so that students can see where they stand in comparison with the general population of foreign students. OBJECTIVE TEST: a test that can be scored with reference to a scoring key and, therefore, does not require expert judgment in the scoring process. This is unlike a subjective test that depends on impression and opinion at the time of scoring. PERCENTAGE SCORE: equal to the number of correct items divided by the total number of items on the test, times 100. It is also expressible as 100 times the obtained score divided by the total score possible. PERCENTILE RANK: a number indicating the percent of individuals within the specific norm group that scored lower than the raw score of a given student. PROFICIENCY TEST: a measure of the linguistic knowledge that students have in a language and/or their ability to apply this knowledge functionally; measures general ability or skill, as opposed to an achievement test that measures the extent of learning of specific material presented in a particular course, textbook, or program of instruction

QUIZ:a short measure of elass material,possibly informal in nature.e.g.a quiz may just check for ability to use 10 target-language words in a sentence RAW SCORE:the soore obtained on a test before any adjustment,transformation,weighting.or rescaling is done.On an item-based test the raw score is usually equal to the sum of the correct items RELIABILITY:the accuracy with which an item or test is measuring what it is measuring.ic, the likelihood that the obtained result would be replicated if the item or test were given again to the same students the consistency of scores obtainable from a test.It is ully an estimate on a scale of zero to one of the likelihood that the test would rank test takers in the same order from one administration to another proximate one RESPONSE VALIDITY:the extent to which examinee responses to a test or questionnaire can be said to reflect the intended purpooe in measurement.Lack ofadequate instructions,incentives. task familiarity,or courtesy could invalidate responses SKILL-GETTING ACTIVITIES:activities aimed at developing linguistic competence -i.e..a perception of language categories,functions,and the rules relating the two.practice in producing sound s对ments and in formulating communica减iom SKILL-USING ACTIVITIES:activities aimed at developing functional langunge ability i.e., an ability to perform in natural or naturalistie communicative situations STANDARDIZED TEST:a measure that has been piloled (usually on a large sample. representing different types of respondents)and for which interpretive data,such as norms reliability.and validity coefficients have been provided,has been administered to a large group of examinees from a target population,often more than 1000 persons,and has been analyzed and normed for use with other samples from that population SUMMATIVE EVALUATION:evaluation that comes at the conclusion of'an educational program or instructional sequence
QUIZ: a short measure of class material, possibly informal in nature; e.g., a quiz may just check for ability to use 10 target-language words in a sentence. RAW SCORE: the score obtained on a test before any adjustment, transformation, weighting, or rescaling is done. On an item-based test the raw score is usually equal to the sum of the correct items. RELIABILITY: the accuracy with which an item or test is measuring what it is measuring, i.e., the likelihood that the obtained result would be replicated if the item or test were given again to the same students- the consistency of scores obtainable from a test. It is usually an estimate on a scale of zero to one of the likelihood that the test would rank test takers in the same order from one administration to another proximate one. RESPONSE VALIDITY: the extent to which examinee responses to a test or questionnaire can be said to reflect the intended purpose in measurement. Lack of adequate instructions, incentives, task familiarity, or courtesy could invalidate responses. SKILL-GETTING ACTIVITIES: activities aimed at developing linguistic competence -- i.e., a perception of language categories, functions, and the rules relating the two; practice in producing sound segments and in formulating communication. SKILL-USING ACTIVITIES: activities aimed at developing functional language ability -- i.e., an ability to perform in natural or naturalistic communicative situations. STANDARDIZED TEST: a measure that has been piloted (usually on a large sample, representing different types of respondents) and for which interpretive data, such as norms, reliability, and validity coefficients have been provided; has been administered to a large group of examinees from a target population, often more than 1000 persons, and has been analyzed and normed for use with other samples from that population. SUMMATIVE EVALUATION: evaluation that comes at the conclusion of an educational program or instructional sequence