Performance Assessment 4 страница

⇐ ПредыдущаяСтр 4 из 4

change; uneven voice projection, not heard by all in room, some words slurred; loosely organized, repetitive, contains many incomplete thoughts; poor summary.

Poor Pupil body movements distracting, little eye contact or

voice change; words slurred, speaks in monotone, does not project voice beyond first few rows, no consistent or logical pacing; rambling presentation, little organization with no differentiation between major and minor points; no summary.

The teacher would use such a descriptive rating scale by comparing a pupil's numerical ratings to the descriptions, seeking to make a judgment about which description comes closest to the pupil's

rated performance. If we look at Sarah Jackson's ratings and try to describe her overall performance in terms of the above descriptions, it is clear that Sarah is neither excellent nor poor. Her performance is either good or fair. The teacher would have to make a judgment as to which of these categories best describes Sarah's performance. The fact that Sarah had more 4s and 3s than 2s and Is led Sarah's teacher to place her performance in the "good" category.

Below are performance criteria developed to summarize a pupil's oral reading performance. Good examples of scoring criteria for different aspects of essay writing can be found in Stiggins (1987).

Excellent performance: Groups words logically when reading aloud; changes voice tone to emphasize important content; alters voice and pace in accordance with text punctuation; can be heard by all in audience; enunciates each word clearly.

Average performance: Usually groups words in a logical manner; uneven emphasis given to important content; tone and pace follow text punctuation fairly well, loudness of voice varies; mispronounces some words.

Poor performance: Reads word by word with no logical grouping; speaks in a monotone, with little change in pace or voice inflection; speaks too softly to be heard by all in audience; slurs and mumbles words.

In summary, there are four essential characteristics of performance assessment. First, the assessment must have a clear purpose that is understood by the teacher. This is so the evidence-gathering and scoring procedures can be fitted to the purpose. Common purposes for classroom performance assessment are grading, grouping, and diagnosing. Second, the behaviors a pupil should demonstrate when carrying out a performance or the characteristics of a pupil's product must be stated as performance criteria that can be observed, rated, and judged. Without a clear set of performance criteria, it will be difficult to accomplish consistent, structured assessments of pupils' performances. Third, a setting must be arranged which allows the performance to be shown. The setting may be the normal classroom if the activity to be observed occurs frequently in this setting. If not, a special setting must be established. It is very desirable to keep the conditions of performance the same for all pupils. Finally, performance must be scored and rated. Going from most qualitative to most quantitative, three common ways to summarize pupils' performance are anecdotal records, checklists, and rating scales. Checklists and rating scales are based on the performance criteria specified in step 2 of the performance assessment process. Checklists allow the teacher to record only whether a spe

cific behavior was demonstrated during a performance, while ra ing scales allow a judgment of quality. Performance on each c these instruments can be summarized using a numerical or a d< scriptive scale.

VALIDITY AND RELIABILITY OF PERFORMANCE ASSESSMENTS

Since formal performance assessments are used to make decisions about pupils, it is important that they be valid and reliable just like informal and paper-and-pencil assessments. This section describe steps that can be taken to obtain high-quality performance assessments, ones which can be trusted and used confidently in making decisions about pupils. Three general areas are considered: clarity of purpose, instructing and informing pupils of desired performance criteria, and improving validity and reliability.

Clarity of Purpose

The purpose of a formal performance assessment influences howthe assessment is carried out and scored. Depending on whether teacher carries out performance assessment to grade, group, or diagnose pupils, the information observed and the way the observation is summarized will differ. Thus, without knowledge of the decision to be made, useful assessments cannot be planned.

Instructing and Informing Pupils of Desired Performance Criteria

The performances that classroom teachers observe and judge during formal assessment are focused mainly on behaviors pupils have been taught and have practiced prior to the assessment. In most cases, the purpose of performance assessment is the same as that c paper-and-pencil classroom tests: to assess pupil learning from instruction. The difference between performance assessment am paper-and-pencil tests is in the ways they gather information about the pupil's learning.

In Chapter Five we saw that teachers do a number of things to get their pupils ready for assessment. First and foremost, they pre vide good instruction on the content and behaviors the pupils are expected to learn. In helping pupils to achieve, there is no substitute for good instruction. This maxim is true for performance assessments as well as for paper-and-pencil tests. Just as teacher should not test pupils on concepts they were not taught, so too the should not carry out formal performance assessment without pre viding instruction and practice on the desired behaviors. Pupil learn to set up and focus microscopes, build bookcases, give a good oral speech, measure with a ruler, zip a zipper, and speak French the same way they learn to solve simultaneous equations, find countries on a map, write a topic sentence, or balance a chemical equation: they are given instruction and practice. Achievement is dependent on pupils being taught the things on which they are being assessed. One of the advantages of checklists and rating scales is that they can be used to identify pupils' weak points so they can work on improving them.

Also, in preparing pupils for performance assessment, the teacher should inform pupils of the performance criteria on which they will be judged. One way to do this is for the teacher to give pupils a copy of the checklist or rating form that will be used during the assessment. Or the teacher may tell the pupils what particular behaviors or characteristics will be observed and judged during the performance. If what is expected in a formal performance assessment is not made clear to pupils, they may perform poorly not because they are incapable, but because they were not aware of the teacher's expectations and criteria for a good performance. The author recently heard of a high school pupil who got a B + on an oral report he gave in class. The teacher informed him that he had presented an outstanding report—clear, organized, proper details, and well delivered. The pupil was told that he would have gotten an A on his report if he had moved around the classroom more while delivering it. However, the pupils were not told in advance that moving around the classroom was an important performance criterion the teacher would observe and judge. In this case, the performance rating did not reflect the pupil's true ability to perform, so the information could lead to invalid decisions about the pupil.

IMPROVING PERFORMANCE ASSESSMENTS

There are three distinct aspects to rating a pupil's performance: observation, responding, and rating (Almi and Genishi, 1979). When teachers observe, they see how their pupils look, watch what they do, and hear what they say. Most teachers can describe the behaviors they have seen and beard immediately after observation. As they observe pupil behaviors, teachers also respond to what they see. They are pleased by the pupil's appearance and performance; they are annoyed at the pupil's lack of attention to the task; they feel sympathy for the pupil who is trying very hard but can't seem to pull off a successful performance. Teachers can rarely be completely dispassionate observers of what their pupils do, because they know their pupils too well and have a built-in set of affective responses to each. This responding aspect of performance assessment is one which teachers are often encouraged to ignore because it gets in the way of their making an objective assessment of the pupil. In reality, it is very difficult to turn off one's feelings and eliminate affective responses to a pupil's performance. There will almost always be an emotional response. However, teachers should at least recognize these feelings about their pupils and try as much as possible to judge the performance in terms of the criteria established.

The final aspect of performance assessment is rating, which involves making a judgment about the quality of the performance. Clearly, sloppy observation or overemphasis on feelings rather than actual pupil performance can diminish the quality of the rating and, in turn, the appropriateness of decisions based on the rating.

Observing, responding to, and rating pupil performance are similar to the process of scoring essay questions because in each case there are many irrelevant and distracting factors that can influence the teacher's judgments. The key to improving rating or scoring in these situations is to try to eliminate the distracting factors so that the performance rating or essay score will more closely reflect the pupils' actual performance. In essay scoring, the distracting factors include penmanship, sentence structure, vocabulary choice, spelling and grammar, teacher fatigue, and even position of the essay in a pile of pupils' responses. In performance assessment, the main source of error is the observer. It is the observer who judges both what is happening during a performance and the quality of the performance. Any distractions or subjectivity that arise in the observation process can introduce error into the assessment, thereby reducing its validity and reliability.

Validity is concerned with whether the information obtained from an assessment permits the teacher to make a correct decision about a pupil's learning. We have already seen that failure to instruct pupils on desired performances and failure to inform them of the important dimensions of performance can produce assessment information that is invalid. That is, the information will not lead to a correct decision about how well a pupil has learned.

In addition to not teaching pupils the desired performances and not informing them about the criteria that will be used to judge their performance, the major problem that affects the validity of performance assessments is that they are based on observation and judgment. Observation and resulting judgments will be subjective if the conditions of performance assessment are not controlled. Subjectivity introduces error and imprecision into observations and judgments, diminishing the usefulness of the performance assessment for decision making.

One factor which can reduce the validity of formal performance assessment is bias. We saw in Chapter Two how teachers' personal preferences or biases could affect their judgments of pupils during informal assessment used to size up pupils. Bias can also occur in formal observation and assessment. The intent in any assessment is to ensure that the results obtained reflect the pupils' learning, rather than other, irrelevant pupil characteristics. When factors such as native language, prior experience, gender, and race differentiate the scores of one group of pupils from those of another, we say the scores are biased. Bias is error that leads to misinterpretation of pupils' performance because one group of pupils is being judged on different criteria or assessed on different characteristics than another.

Suppose that an oral performance assessment were given to pupils in a second grade classroom. Suppose also that in the classroom was a group of pupils whose first language was Spanish. The oral reading assessment involved reading aloud from a storybook written in English. When the teacher looked at the performance of pupils in the class, she saw that the Spanish-speaking pupils as a group did very poorly. Would the teacher be correct in saying that the Spanish-speaking pupils have poor oral reading skills? Would this be a valid conclusion to draw from the assessment evidence?

A more reasonable interpretation would be that the oral reading assessment was measuring something different for the Spanish-speaking pupils than for the rest of the class. For most of the class, the assessment provided an indication of their oral reading performance. For the Spanish-speaking pupils, however, the assessment was much more an indication of their familiarity with the English language than it was a true indication of their oral reading performance. How might the English-speaking pupils have performed if the assessment required reading in the Spanish language? In essence, the assessment provided different information about the behaviors of the two groups (oral reading proficiency versus facility in the English language). It would be a misinterpretation of the evidence to conclude that the Spanish-speaking pupils had poorer oral reading skills without taking into account the unfairness of having them read and pronounce English words.

A biased test or performance assessment can also lead to misinterpretation of the reasons for some pupils' achievement. A teacher who simply concluded that the Spanish-speaking pupils in the class were poor oral readers would be misinterpreting their poor performance. This misinterpretation, in turn, would lead to decisions that were erroneous. When an assessment instrument provides information that is irrelevant to the decisions it was intended to help make, the instrument is not valid. Thus, in all forms of assessment, but especially performance assessment, a teacher must select and use procedures that do not give an unfair advantage to some pupils because of cultural background, language, or sex (Hills, 1981; Nitko, 1983; Stiggins, 1984).

Other sources of error that commonly affect the validity of performance assessments are teachers' failure to use the entire rating scale in judging pupil performance, reliance on mental recordkeeping of pupils' performance, and being influenced by prior perceptions of a pupil (Gronlund, 1985; Stiggins, 1984). Each of these errors threatens the validity of the interpretations and subsequent decisions (see Chapter Two).

Many teachers have a tendency to rate all pupils at about the same position on a scale. The position may vary from teacher to teacher, some rating only at the high end of the scale, some only at the low end, and some in the middle. There are two undesirable consequences of this practice (Gronlund, 1985). First, any one rating of an individual is of questionable value, since it may reflect the personal rating preference of the teacher rather than the pupil's actual performance. Second, by concentrating ratings in a single region of a rating scale, the^ ratings of pupils cluster together, making it difficult to make reliable distinctions between their performances. It is advisable to know the meaning of different positions on a rating scale and to use these to provide a true judgment of each pupil's performance.

It also is advisable to write down the results of performance assessments at the time they are observed. Failure to write down ratings or judgments of a pupil's performance at the time it is carried out forces a teacher to rely on memory when ratings are finally written. The longer the interval between the observation and the written rating, the more likely the teacher is to forget important features of the pupil's performance, both good and bad. Moreover, when a number of pupils are observed performing one after the other, it is difficult to remember and differentiate between the performances at some later point. Failure to record observations and judgments at the time of the performance introduces memory error into the ratings or scores that are awarded later.

Ideally, ratings should be based solely on the pupil's performance of the desired behaviors. Unfortunately, at least from the viewpoint of objective rating, classroom teachers know a great deal about their pupils, and that knowledge can lessen the objectivity of their ratings. Factors such as personality, effort, work habits, cooperative-ness, and the like are all part of a teacher's perception of each pupil who is being assessed. Often, these prior perceptions can influence the rating a pupil is given: the likable, cooperative pupil with the pleasant personality may receive a higher rating than the standoffish, belligerent pupil, even though they performed very similarly during their performance assessments.

Since reliability is concerned with the stability and consistency of scores or performance ratings, the logical way to obtain information about the reliability of pupil performance is to observe the performance many times. Doing this would provide direct evidence about how stable and consistent the pupil's performance is. However, this is not a reasonable approach for many school performances, which require special settings or equipment to accomplish. Since each pupil must perform individually, few teachers can spend the class time necessary to obtain multiple observations of each pupil. There are always many other new activities that must be taught.

Of course, some performances—e.g., holding a pencil, tracing, cutting with scissors, using a screwdriver, measuring with a ruler, or weighing with a gram scale—can be observed and judged quickly, often in the natural flow of classroom events. Such behaviors are not difficult to observe many times in order to be sure that one has a stable and consistent assessment of pupils' performances.

If performance criteria or rating categories are vague and unclear, the teacher will have to interpret what a criterion or category means, thus introducing error and subjectivity into the ratings. One way to eliminate much of the inconsistency in scores or ratings is to be explicit about the purpose of performance assessment and to state the performance criteria and rating categories in terms of observable pupil behavior. Another way to check on the objectivity of an observation is to have several individuals independently observe and rate a pupil's performance, as in a diving competition. This technique is usually impractical for classroom use and is predicated on training observers so that they understand the performance criteria and rating scale.

All of the above sources of error reduce the validity or reliability of performance assessments because they distort the rating or score attained by a pupil. They produce scores or ratings that are based on characteristics that have nothing to do with the pupil's actual performance. One remedy for these potential problems is to cut down the number of distractions that a teacher must face when trying to carry out performance assessment. The more such distractions as unclear and inappropriate rating criteria, personal bias, and failure to write down assessment ratings can be reduced, the more likely it is that the teacher's observation will provide a valid indication of the quality of a pupil's performance.

To overcome these problems and obtain more valid and reliable performance assessment information, a number of steps can be taken (Stiggins, 1984). As noted, it is important to (1) know the purpose of the assessment before beginning, (2) teach pupils the desired performance, and (3) make pupils aware of the criteria by which their performance will be judged. In addition, the following steps can be implemented. The primary aim of these suggestions is to ensure that the teacher observes the desired pupil behaviors and judges the pupils on their demonstrated performance—not on some other, spurious, criteria.

• State the performance criteria in terms of observable behaviors that the pupil can be seen doing or not doing. Avoid using adverbs in performance criteria because the interpretation of the adverbs may shift from pupil to pupil. Overt, well-described behaviors can be seen by an observer and therefore are less subject to interpretation.

• Select performance criteria that are at an appropriate level of difficulty for the pupils. The criteria used to judge the oral speaking performance of third-year debate pupils would likely be more detailed and specific than those used to judge first-year debate pupils. We would not expect first-year accordion players to perform the same pieces as fifth-year players. Make the performance criteria realistic for the pupils' level.

• Limit the number of performance criteria to a manageable number. A large number of criteria make observation difficult because the teacher has too much to look for and too little time to observe and rate all the criteria. This causes errors in observation and rating that reduce the validity of the assessment information. Ten to fifteen performance criteria are a reasonable target number, depending on the length of the performance. More criteria can be used when a product is being assessed, because the teacher usually has more time to observe and judge a product.

• Maintain a written record of pupil performance. Inevitably when a teacher observes many pupils perform and tries to keep a mental record of each pupil's performance, things are forgotten and mistakes made. This lowers the validity of the assessments because it distorts assessment of the true performance of the pupils. Checklists and rating scales are the easiest method of recording pupil performance on the important criteria. Tape recordings or videotapes may be used to provide a record of performance, so long as their use does not upset or distract the pupils. If a formal instrument cannot be used to record impressions of the performance, then make informal notes of its strong and weak points. Some written record is better than none at all.

In the end, performance assessments are like essay questions in that it is impossible to eliminate all subjectivity from the scoring or rating process. While this is the reality of the situation, it also is true that performance assessment is the only method that can gather appropriate evidence about many important school outcomes. This reality is not meant to condone sloppy and error-laden performance assessment. Quite the opposite. It is to point out the fact that there are problems in performance assessment techniques that threaten the validity and reliability of their results. In this respect, performance assessment is no different from the other assessment techniques that have been discussed. In all cases, problems can be reduced by following suggested practices. It is better to use evidence from imperfect performance assessments than it is to make uninformed decisions about pupil achievement of important school outcomes.

CHAPTER SUMMARY

Performance assessments include those assessments in which a teacher observes a pupil carrying out a process and rates the pupil's performance. They also include instances in which a teacher rates a product that a pupil has produced. Many aspects of school instruction are directed toward teaching pupils to perform complicated behaviors. In the areas of oral communication, psychomotor skills, athletic activities, concept acquisition, and affective characteristics, assessment is best carried out by having pupils perform an activity, rather than responding to a paper-and-pencil test. Direct observation is the best means of determining whether pupils have accomplished the desired behaviors.

Successful performance assessments have four characteristics: a clearly stated purpose for the assessment, a set of behaviors which make up the performance to be observed and rated, a setting in which the performance is to take place, and a procedure for describing and scoring pupil performance. Most teachers carry out performance assessment in their classrooms, because most teachers recognize that not all desired school outcomes can be assessed with paper-and-pencil tests. Unfortunately, the press of time and the multitude of events that occur daily in the classroom often lead to performance assessment which omits one or more of the characteristics necessary for success. The consequence of these omissions is diminished validity and reliability of the assessments, which means that decisions made on the basis of the observations may be incorrect.

Conducting structured performance assessments in classrooms is not a terribly hard thing to do; it is certainly less difficult than constructing one's own paper-and-pencil test. However, unlike paper-and-pencil achievement tests, which are provided as part of the instructional package by most textbook publishers, commercial performance assessment instruments have not yet been provided on a broad scale. Certainly the lack of textbook-provided performance assessment instruments is part of the reason why teachers must rely on imperfect approaches.

But performance assessments of the type described in this chapter are not beyond the ability of most teachers to construct and use. So long as one selects the ten or so most important aspects of a performance and describes these in observable terms, one can construct suitable instruments. Look back at the many examples presented in the chapter, and it will be clear that useful performance assessment instruments can be quick and easy to construct and need not be hard or time-consuming to use. Moreover, the instruments shown have two additional desirable properties which make them efficient for classroom use. They can be used again and again to rate a performance, and they provide specific diagnostic information about a pupil's strong and weak points. These are characteristics that few paper-and-pencil tests have.

Performance assessment is a necessary aspect of a teacher's classroom assessment practices, because much of what teachers want pupils to learn is best assessed by observing the pupils perform. As the level of enthusiasm for paper-and-pencil achievement tests continues to wane, (Educational Leadership, 1989; Phi Delta Kappan, 1989), there will be increased emphasis on performance assessment, which is an important, but underused, strategy for assessing pupils' classroom learning.

REFERENCES

Almi, M., and Genishi, C. (1979). Ways of studying children. New York:

Teachers College Press. Carey, L. M. (1988). Measuring and evaluating school learning. Boston: Allyn

& Bacon.

Cartwright, C. A., and Cartwright, G. P. (1984). Developing observational

skills (2d ed.). New York: McGraw-Hill. Cazden, C. B. (1971). Preschool education: Early language development.

In B. S. Bloom, J. T. Hastings, and G. F. Madaus (eds.), Handbook on

formative and summative evaluation of student learning (pp. 345—398). New

York: McGraw-Hill.

Ebel, R. E., and Frisbie, D. A. (1986). Essentials of educational measurement.

Englewood Cliffs, NJ: Prentice-Hall. Educational Leadership. (1989). 46, (7).

Fitzpatrick, R., and Morrison, E.J. (1971). Performance and product evaluation. In R. L. Thorndike (ed.), Educational measurement (pp. 237-270). Washington, DC: American Council on Education.

Goodwin, W. L., and Driscoll, L. A. (1980). Handbook for measurement and evaluation in early childhood education. San Francisco: Jossey-Bass.

Gronlund, N. E. (1985). Measurement and evaluation in teaching. New York: Macmillan.

Guerin, G. R., and Maier, A. S. (1983). Informal assessment in education. Palo Alto, CA: May field.

Gullickson, A. R. (1986). Teacher education and teacher-perceived needs in educational measurement and evaluation. Journal of Educational Measurement, 25(45), 347-354.

Дата добавления: 2015-11-04; просмотров: 24 | Нарушение авторских прав

⇐ Предыдущая 1 2 34

mybiblioteka.su - 2015-2024 год. (0.023 сек.)

<== предыдущая лекция

следующая лекция ==>