For the Love of Learning


Volume II: Learning - Our Vision for Schools

Chapter 11: Evaluating Achievement

It would seem self-evident that, no matter how carefully designed the curriculum, or how thoroughly prepared the teachers, we cannot know how well students are learning without measuring and describing - assessing and evaluating - their level of achievement and their progress. However, until recently, such information has been scanty and unclear in Ontario.

Assessment, especially when it is used for decision-making purposes, exerts powerful influences on curriculum and instruction ... If assessment exerts these influences it should be carefully shaped to send signals that are consistent with the kinds of learning desired and the approaches to curriculum and instruction that will support such learning.(1)

While recognizing that, as public institutions, schools are obliged to report to the public on how well they have fulfilled their mandate, educators point to many obstacles to doing so about assessing and evaluating effectively, efficiently, and constructively. Professionals who specialize in the complex and technical area of assessment of student achievement acknowledge that it is easier to carry out poorly than well, easier to mislead than to inform with statistics, and easier to spend a great deal of money in assessing what students know than to improve teaching or learning effectively. (We are referring here to professional educators, not to those who have tried - and, in many places, succeeded - in creating profitable businesses built on mass testing that is saleable rather than genuinely useful.)

As the discussion of curriculum emphasized, learning does not proceed in neat steps, each one exactly equal, nor in an unvarying sequence; therefore, tests cannot be applied to students as simply as quality control can be applied to objects coming off a conveyor belt. Tests will not fix students' problems or improve teaching; they will not guarantee that students will find successful jobs or careers. At best, they can tell parents something (but never everything) about what their children know, and give teachers useful information about what material they have taught successfully, and what they need to approach differently.

We know that the schools, the boards, and the province have an obligation to ensure that student learning is assessed fairly and clearly, and that it is reported in a readily understandable way. At the same time, we caution that, no matter how simple it may appear to be to undertake, assessment is complex and costly. It must be done, and done well, but without losing sight of the fact that assessment is a means to an end, not an end in itself. Not only must it enable us to describe what students know and what they have been taught, it must show where improvement is possible and desirable. And, although there is abundant evidence that assessment can cause educators, however unwittingly, to narrow the curriculum and limit students' and teachers' horizons, it must not do so. In Ontario, we need more and better information on what students are learning; we do not need a large-scale testing industry or an educational system that is driven - and limited - by the need to teach only what is easily measured, or to measure only what is easily taught.

This chapter considers issues inherent in monitoring and reporting student achievement, and in ensuring quality and consistency in evaluating students' work. We describe good assessment practices, and identify ways in which those responsible for education in Ontario can be more accountable to the public; as well, we chart directions that will lead to the continuous improvement that is characteristic of a healthy learning organization. System accountability, as differentiated from student assessment, is discussed in Chapter 19.

Student assessment: What people told us

We heard a great deal of concern, mostly from parents and students, but from others as well, about measuring a student's learning.

Parents want information: to be told, fully, honestly, in a language they can understand, and in a timely way, how well their children are progressing in school, and what teachers will do if students are not making satisfactory progress. Parents want standards in order to know how well their children are doing, compared to others of their age, or according to some accepted and consistent criterion of what children their age should know.

The word "standard" is confusing, because it has a general and a specific meaning, and both are used in conversations about learning and assessment. The general meaning is the one implied in a remark such as, "We need high standards." In this general sense, standards is often synonymous with goal or expectation, and refers to an ideal; it connotes a passion for excellence and habitual attention to quality.

"Standards are objective, exemplary ideals that serve as worthy and tangible goals for everyone, even if some cannot (yet) reach them."(2)

In its more specific meaning, often used by the parents we heard from, standards are a reference point against which performance is measured. Educators compare a student's achievements to a number of different reference points. Performance is compared to that of other students in the same class, the school system, or the province (norm-referenced); or it is compared to some pre-determined, expected level of performance (criterion or outcomes-referenced). Standard in this sense is similar to yardstick, and refers to a typical, rather than to an ideal, state. Both norm-referenced and criterion-referenced assessments allow us to describe the individual student as performing below, at, or above the standard, whether the standard is other students' performances, or mastery of content. When people call for "standardized testing," they can mean either a norm-referenced test or a criterion-referenced test, although those outside the system tend to be most familiar with the norm-referenced variety. Examples include the Canadian Test of Basic Skills and the Gates-McGinnity Reading Test. The old Grade 13 departmental exams were examples of criterion-referenced standardized tests.

Students, post-secondary educators, employers, and the general public - like parents - are concerned about standards, each group from a particular vantage point and interest.

Students told us they are concerned about information: they want teachers to tell them clearly and promptly what they need to do in order to improve; they want fairness: they believe (as do many adults) that some teachers and some schools mark "harder" than others, putting students at a disadvantage when making application to college or university. (Or, conversely, marking too easily, and putting students at a disadvantage because they are ill-prepared for the next grade, or for college or university.) Thus they, too, are concerned about common standards for assessment.

Representatives of various sectors of the public - post-secondary institutions, the business community, some professional groups expressed concern about the lack of information about what students know and the existing information that indicates to them that students are not learning well enough. They were often among those calling for an increase in standardized testing, as a way of obtaining more information, and demanding higher expectations (standards) in learning and teaching.

While many parents and community members recommended some kind of standardized testing program as a vehicle for increased consistency and clarity about actual student achievement, some parent groups were concerned about the effects of standardized testing. They noted it might have a particularly harmful impact on minority, low-income, and special-needs students, whose real achievement level might not be reflected because of language differences or difficulties with the test's form, rather than its content; some teacher groups expressed fears that the results of such tests might be misinterpreted.

The recent history of student assessment in Ontario

In recent years, there has been an increasing emphasis on assessment, as well as an increasing concern about the nature of the most widely used forms of student assessment and uses that are made of the results.(3)

The fact that many people are asking, with some impatience and a sense of urgency, for more information about student achievement across Ontario, reflects the lack of such data over the last several decades, compared to earlier times and other jurisdictions, and the current crise de confiance about education, an anxiety which is certainly fed by lack of concrete information.

Ontario has had very little tradition of standardized testing. Throughout the '50s and '60s, standardized exit exams in Grade 13 (departmental exams) were given in all subject areas, and formed the sole basis for entry to university. In the mid-1960s that changed: results from the exams were coupled with teacher's marks. In the late '60s, the exams were discontinued and teachers' marks became the only basis for university entrance. That change was made in part because it was learned that teachers' marks predicted university achievement as well as the exams. This should not be a surprise: one would expect that a teacher who has known a student for a year, and judged his or her performance on a variety of formal and informal criteria, would be a better predictor of potential success than any single test. Traditional tests, of the Grade 13 variety, tended to reflect ability to memorize and regurgitate, and to bear up under stress - useful abilities, certainly, but not the kind of serious thinking and knowledge acquisition our schools should foster, and not the kind of shallow goals that should shape the curriculum.

Teachers have had considerable autonomy in designing their own assessments, and in making judgments about the quality of a student's work. Teachers' marks have been viewed as an acceptable and adequate method of deciding whether students should be promoted, where they should be placed, and what programs they should undertake.(4)

In the 1970s and early '80s, when other provinces and many American states were expanding their assessment programs, Ontario was leaving assessment in the hands of educators. A program called the Ontario Assessment Instrument Pool (OAIP), for example, created banks of test and assessment items from which teachers of various subjects at different grade levels could choose. The OAIP had potential for bringing greater consistency to student assessment, but its implementation was left largely to chance and individual initiative, and its potential was never realized.

This policy of leaving assessment to the discretion of individual teachers was clearly stated in the OSIS policy document (1984) for Grades 7-12/OAC:

For the most part, it is recognized that the most effective form of evaluation is the application of the teacher's professional judgment to a wide range of information gathered through observation and assessment. In order to help teachers evaluate student achievement, curriculum guidelines will describe appropriate evaluation techniques.

Thus, evaluation techniques were described, but standards against which to evaluate were not specified.

The first of Ontario's recent large-scale assessments directed at evaluating the school system's performance were in science and mathematics. During the 1980s, the province participated in several of them. The results were reported by the media as generally indicating that Ontario students scored mid-way, with about half the other jurisdictions (which usually included a few other provinces as well as many other countries) scoring higher, and half lower. While this "middle-of-the-pack" score was an accurate reflection of Ontario's performance for some tests, it was not for others. In fact, this kind of reporting ignored the size and meaning of differences; in some cases, these were so small as to be insignificant and unreliable. What looked like higher or lower scores in a ranking table were often actually ties, because the spread in points was minuscule. For example, in the Second International Mathematics Study, while Ontario was reported as being in the middle of the table in most areas, in fact only Japan scored higher in algebra; Ontario and British Columbia were tied with two other countries; and the rest had lower scores. The same was true in geometry: Japan at the top; Ontario, British Columbia, and five others tied below it, and the rest below them. But in typical "league-table" reporting, the results seemed far worse.

Having said that, however, it is true that the performance of Ontario's students on the math and science tests overall indicated adequate but not outstanding performance; they tended to be stronger on the basic skills components than on higher-level problem-solving.

We think that the more impressive distinction between Ontario and some higher-scoring jurisdictions (these differed from one test to another and, in addition to Japan, included Hungary, Korea, Taiwan, Alberta, British Columbia, and Quebec) is not how well our students learned, but how much they were taught. The results of comparing what is asked on a test to what the curriculum in a particular jurisdiction is supposed to cover are calculated as the "opportunity to learn" (OTL). What is found, when this comparison is made, is that students in Ontario are simply being taught less - fewer concepts and topics - in mathematics and science than students in some other countries and provinces. Thus, the problem is not achievement - our students show similar mastery of what they have been taught. It is a problem of input, not outcome. While it is possible that our students might be taught some things which were not included on the tests, it is clear that they are not being taught many things which students in other countries are given the opportunity to learn.

In many ways, the OTL data are more compelling than the achievement results ... the cause of [different OTL results] is that some countries teach a lot more mathematics or science than others ... it does raise the issue of whether we ought to be teaching more mathematics and science ... a topic agreed upon for inclusion [in an international test] is not necessarily more important than material not included. However, when one country gives high OTL to twice as many items as another country, it certainly must raise the question of whether that second country is teaching enough ... the question of whether we want to teach more material is settled by examination of subject matter content and societal needs, and not the achievement results. The comparative OTL data point to the problem, and curricular analysis answers it.(5)

(In 1995, the Third International Mathematics and Science Study will involve Ontario students in Grades 3, 4, 7, and 8, as well as secondary school students, and will include mathematics, science, and physics.)

Recently, the Council of Ministers of Education of Canada (CMEC) embarked on national assessments in its School Achievement Indicators Program (SAIP), which samples students in each of the participating provinces. The first test, in 1993, was in mathematics and included a sample of 13- and 16-year-old anglophone and francophone students from across Ontario. Results indicated that the two groups were similar to the national average in their knowledge of content (number and operations, algebra, measurement, geometry, statistics, etc.) and problem-solving; like other Canadian students, and as international tests have also shown, their problem-solving skills lagged behind their knowledge of content; and relatively few students were working at the highest levels of achievement. There was considerable inter-provincial variation, with students from Quebec (both francophone and anglophone) tending to score higher than those from other provinces. (Future SAIP testing is scheduled to include reading and writing in 1994 and 1998, science in 1996 and 1999, and mathematics in 1997 and 2000.)

In addition, the Ministry has undertaken provincial reviews of senior geography (1987), senior chemistry and physics (1988), mathematics and reading in Grade 6 (1989), mathematics in Grade 8, 10 (general) and 12 (advanced) (1990), and writing in Grade 12 (1992). These are assessments of curriculum effectiveness based on testing a representative sample of students, plus data based on interviews and observations. (In some cases, school boards extended testing to all students.) Although the provincial reviews were not based on explicit learner outcomes, they have been a good source of information about how well students are learning. The Grade 12 writing review, for example, demonstrated that, while the majority of students were able to write at a "satisfactory" level, very few reached the "superior" category.

All these international, national, and provincial studies have used student samples, which is a much more economical way to assess general student achievement, although it obviously does not permit reporting on the individual student or school. For example, we are advised by the Ministry of Education and Training that the cost of a provincial review is about one quarter the cost of a test given to every student in Ontario. Thus, the Grade 12 reading/ writing review cost about $750,000, while the Grade 9 reading/writing test cost about $3,000,000.

The results of these studies have contributed to public discussion and concern about education in Ontario, and led to increased interest in routine student assessment. In 1993, the government responded by modifying a planned Grade 9 reading/writing review (which would have used a random sample of Grade 9 students across the province) to become a test taken by all 140,000 Grade 9 students in Ontario. (A second Grade 9 reading/writing test is planned for 1994/95, and it, too, will be given to all students.) The 1993 review was based on a two-week curriculum on the theme of food (anglophone) and media literacy (francophone) and included an extensive written portion; test scores counted for 20 percent of a student's final mark. The majority of students performed at or above the level deemed "adequate." Some of the media, however, questioned the validity of the terms "adequate," "competent," and "proficient," based on examples of students' writing graded in those terms. Clearly, there is no pre-determined standard for what constitutes a given level of writing or problem-solving.

Chapters 7, 8, and 9 referred to the development of learner outcomes against which progress can be measured; these have been defined for Grades 1 to 9, and we have recommended that they be expanded to the other grades and levels, and that they be improved. As well, we made reference to the standards being developed in language and mathematics, and we recommended that they be established in other foundation areas. These standards could and should play a key role in future student assessment.

Developing standards depends both on examining actual performance of different groups and trying to develop consensus among educators and the public.

Standards may exist at many levels of sophistication and excellence. They can be set very high (Elvis Stojko's skating, Margaret Atwood's writing, or John Polanyi's work in chemistry), or they can describe realistic expectations and worthy and appropriate goals by which to judge student performance. It is important to note that there is no one way to define a standard: there must be a variety of concrete examples, known to all concerned, that make expectations clear.

One of the most difficult and challenging tasks in education today is establishing these standards, based on informed consensus. Once we have a useful set of outcomes that describe what students should know and be able to do, for example in mathematics by the end of Grade 3, we can assess their performance and compare it to the standards that have been established.

The Ministry of Education and Training has begun to develop standards in language/literacy and mathematics/ numeracy. These are based on the learner outcomes for The Common Curriculum for language and mathematics and suggest different levels of performance such as "limited," "adequate," and "proficient" for students at the end of Grades 3, 6, and 9. A student's performance can fall into one category or another in each subject, and within each subject in several areas. The math standards, for example, are built on areas within math that are specified in The Common Curriculum as "measurement," "problem-solving," "algebra and patterning," etc. These standards are intended to provide descriptions of expected levels of achievement by which students' learning can be assessed, and to provide a clear basis for board-wide and provincial assessments of student achievement. As we said earlier, learner outcomes and standards must be very clear for all foundation subjects: language, mathematics, science, computer literacy, and group learning/interpersonal skills. As these standards are developed and refined, they will become the yardstick against which teachers and the public can measure student performance. In fact, the Ministry of Education and Training has already indicated that it plans to use the standards as a basis for assessment at the end of the three grades, although it has not been specific about how it intends to carry that out.

We are convinced that the Ontario government, and educators' professional associations and bodies, must make a serious, long-term commitment to assessment, both for improvement and for public reporting and accounting. While public discussion of the issue often focuses on large-scale assessment as an indicator of how the system is working, it is also a tool for improvement. As a commission on learning, we are very concerned about the quality of assessment, formal and informal, that occurs daily in the classroom, and that informs, or should inform, students, teachers, and parents about improving performance. Much more than large-scale assessment for public accounting, this level of frequent and cumulative assessment has the potential to increase and enhance learning.

Assessing individual students

This section covers four issues. The first, and most important, is assessment for improvement; second is reporting clearly, accurately, and fairly what has been learned. In our opinion, fairness means that individual student assessment is consistent that a 75 percent at one school is not a 65 or an 85 percent in another; moreover, parents must be accurately informed about what their children have achieved in relation to explicit and universally applied standards.

Third is the role of information technology, which has a significant contribution to make to improving assessment practice. Finally, there are issues of bias in assessment evaluating students fairly across gender, social, and cultural lines.

Assessing for individual improvement: The most important reason

The most important use of assessment is as a way of finding out how well students are doing in order to help them learn better, more, and faster. Assessing what students know - and what they don't - enables teachers to capitalize on students' knowledge, and focus on gaps in it. Furthermore, by examining student performance, teachers have the opportunity to assess the success of their own methods and efforts. Evaluating students regularly enables teachers to monitor learning, and make changes when learning is not occurring, not occurring fast enough, or not occurring in sufficient depth. Regular evaluations, with frequent and detailed feedback from teachers, assure students that they understand what is being taught and can move onto the next task, thus advancing student learning. We call this formative evaluation, because it helps form the learning and teaching needed to achieve success.

Large-scale assessments, used to monitor the school, school board, or province as a whole, and individual assessments (such as final exams) used for marks and accountability, are not very useful to individual students. First, students, who need immediate feedback, typically do not find out how well they did on these tests for some time. Second, the results may be just a letter or a number, rather than an analysis of strengths and weaknesses. Third, large-scale tests usually ask questions that are easy to mark, but do not measure problem-solving, analytic ability, or understanding.(6) While marking of surface features like capitalization and punctuation may be carried out by computer, such assessment methods cannot adequately cover content, style, and other elements; nor can they distinguish between a wrong answer which reflects real misunderstanding or ignorance, and a wrong answer which reflects simply a mechanical error.

Teachers and students alike show disrespect for learningand teaching that emphasize "just the facts," are notapplied to "real" problems, are "low level," or require "regurgitation." In spite of these espoused beliefs,much teaching and learning is shallow, and there islegitimate concern that this is the result of evaluationpractices and perceptions of them.(7)

It is essential that assessment be a regular part of learning. In Ontario, classroom assessment has been the typical vehicle for assessing individual student learning. It is part of the daily experience of educators and students, an integral part of classroom activity, and occurs frequently. It may be formal or informal and is often indistinguishable from instruction; it may take place with an individual or in a group. Classroom assessment includes oral questions, teacher-created tests, quizzes, essays, assignments, examinations, projects, as well as observations of performance, and any other products or samples of work that might provide information about performance. Because it is frequent and varied, classroom assessment can tell far more about what a student knows and has learned than any single test. Teachers have opportunities to observe whether or not students are learning to think critically, to make connections between prior and new learning, and whether they take pleasure in learning. "Using one assessment procedure is like using a hammer to do everything from brain surgery to pile driving."(8)

If a test is to give accurate data on a student's full knowledge and understanding of a single concept, it must comprise a number of questions. Telling, reliably, what a 10-year-old knows about math requires a lengthy test. A test that would give reliable information on what that 10-year-old knows about math, language, science, and computers would have to be administered over several sessions, would probably take on a significance in the minds of teachers and students that exceeded its value, and still could not provide the accurate and meaningful evaluation of continuous classroom assessment.

In the classroom, students can work on projects that result in a useful product, or in a real discovery about how things operate. They can write - on paper or on a computer screen - for a real audience, whether a student in another school, near or far, or for the newspaper of the school or the town.

A lot of intelligences really can't be tested for, in the sense that we usually use the word "test." What we need to do is to create school environments where you can observe a lot about what kids are good at, what interests them, and where they show substantial growth.(9)

While professional preparation and continuing professional education may expose teachers to all kinds of assessments, good assessment for improvement requires much more attention than it has traditionally received, more than can be delivered in a oneor two-year pre-service program. Designing and marking tests and other assignments (papers, presentations, projects) should be a priority in professional development, as should the systematic use and interpretation of information based on observing and meeting with students. Such training cannot stop when a credential is awarded: it must continue in schools.

Although it is common for educators to point out that the danger of large-scale testing is that it tends to measure what is most easily measurable, it is equally true that accurately evaluating more complex thinking skills in the classroom demands careful training, extensive supervised practice, and the development of skills that are seriously neglected in teacher education.

For example, when students are asked to summarize a story, their product - the summary - can be at the simple level of listing all the ideas in the story or text, in which case the writer shows immaturity in carrying out the assignment. (This may be quite appropriate for a young learner, but it is unsatisfactory later on.) A more adequate summary shows some judgment: the reader selects the main ideas, and links them together sequentially. But this kind of summary still attempts to pay equal attention to each section or episode of the text, to summarize the plot, and usually goes on at length. A summary which shows real comprehension and proficiency (beyond listing and linking main ideas) examines underlying themes, pays more attention to some main ideas than others, or even constructs new ideas, by building on the significant themes of the text - the famous "reading between the lines." Reading and assessing students' work for higher levels of literacy, what some call depth of processing in learning,(10) is not something that all teachers know how to do, or how to describe to students and parents. But it is the kind of analysis and assessment that is necessary, if we are to teach and to assess for understanding.

Based on what we learned in the hearings and from the research, teachers must provide more and better feedback to students and parents, which pinpoints strengths and weaknesses, results in teachers and students and parents doing things differently, and is timely enough that it contributes to what the student is learning now, and what the teacher is teaching now, rather that to what was taught but not learned weeks or months ago.

In essence, this is like coaching: for example, a teacher observes a student making an oral presentation on the use of the computers in graphic design and finds that he or she speaks too quickly and does not frame the presentation in a manner that allows the listener to follow easily. Rather than waiting until the term report and noting that the student is weak in presentation skills, the teacher needs to tell the student as soon as possible that speed and organization need improvement, help map out a possible reorganization, discuss techniques for slowing speech, and offer an opportunity to try the presentation again.

Our belief is that the first report card of the year, whether at the end of October or in December/January, should not contain surprises for parents. It should not, for example, indicate that the youngster is reading below grade expectations, when the parent has not previously been made aware of the problem. We know (because we heard about it and because some of us have experienced it) that it does happen, and that it should not. The report card may not always bear good news, but the contact between parents and teachers should be frequent and consistent, whether or not students are performing according to expectations.

Parents need to see the results of routine classroom tests and the evaluations of regular classroom assignments throughout the year, starting in September, as well as portfolios of students' work, with indications of progress made from earlier to later efforts. Teachers need to inform parents about what has been covered in recent weeks and what is coming up; they should tell parents how, at home, they can support their children in gaining specific skills or knowledge.

Our strategy for enhancing individual student assessment for improvement, including helpful feedback, involves giving teachers the information and skills to link better assessment to student learning. Programs that build the capacity to reliably and consistently evaluate writing, problem-solving, understanding, and analysis in all subject areas - in other words, to assess the achievement of the higher-order literacies that we want our graduates to have - are an investment in the ability to measure what matters most. They are a commitment to teach, re-teach, and teach better. Such programs demand considerable time, and thus can be expensive, as is most high-quality, professional training. But, to the extent that we can teach teachers to evaluate complex thinking skills well and consistently, we build the capacity to measure well what matters most.

Consistency is tied to fairness - a subject about which students said a great deal. Right now, the only training teachers get on consistency in assessing critical thinking and communicating skills is in relation to provincial subject reviews and OAC examinations (given in the final year of high school for students preparing to enter university); these do not affect most teachers. But all teachers need to be better educated in assessing, whether that is being done through written tests, essays, presentations, or projects.

Because we are care above all about learning, our first concern with assessment centres on teachers' ability to assess student work accurately and consistently, and to communicate effectively to students (and to parents) how they can improve. We are convinced that assessing for purposes of improvement always depends on the teacher's ability in both assessment itself and on response to the results. That is why the first recommendation we make about assessment is that efforts in this area be the subject of teacher education at every level: in faculties of education, school boards, schools, and continuing professional education at such post-graduate institutions as the Ontario Institute for Studies in Education (OISE).

Assessing understanding, critical thinking, and the ability to generalize, synthesize, and apply knowledge from one situation to another is very complicated and requires considerable experience and practice. Reporting the results of such assessment takes time - to think, to write, and, often, to discuss results with the student and/or parent. The necessary skills are built throughout the teacher's career. We believe that a great deal of the practice and training should take place in the school, with teachers working systematically in teams to mark papers and presentations, and to discuss their ratings, guided by consultants who have expertise in assessment.

Recommendation 46

*We recommend that significantly more time in pre-service and continuing professional development be devoted to training teachers to assess student learning in a way that will help students improve their performance, and we recommend supervised practice and guidance as the principal teaching/learning mechanism for doing so.

We hasten to point out that we are not suggesting that teachers test or assess more or mark more papers, but that they bring a higher level of professional training and expertise to the process of assessing and reporting on what students have achieved.

Accounting for student assessment: Reporting what is learned

Accountability begins, then, with something more humble than large-scale testing: it begins with ... teachers monitoring and adjusting the daily homework and classwork of students rigorously and consistently. It begins with not accepting work that is shoddy ... It begins with a policy that says schools will send reports home more than twice a year. In short, if you want to stop the kind of minimal compliance and perfunctory work that can sink a school, you'll need an effective and timely grading system, reporting mechanisms, and promotion standards.(11)

Thus far, we have discussed the importance of assessing students for improvement, giving all students a fair opportunity to demonstrate what they know, and offering feedback to students and parents to keep them apprised of the students' progress through frequent and consistent communication.

The final report for the term or year/semester is particularly important: it tells student and parent what level of learning has been achieved in the required knowledge and skills for that course or year. The evaluation summary that appears on the year-end report is permanent: it goes into the Ontario Student Record and may be used by other teachers for planning, or as a way of diagnosing student performance. The report may also be a factor in decisions about course or class placement, streaming, and planning for post-secondary education. Hence, the quality of that assessment has long-term significance. Schools and teachers are accountable to parents for its accuracy and reliability.

We heard from parents and others that report cards are not very helpful: they are unclear or lack sufficient information on how much the student has learned and where the focus for improvement should be. While some parents want marks in letters or numbers, others want more detail and a better sense of how their children are doing. Many parents brought report cards to our public hearings, or sent them, pointing out inconsistencies and "edu-babble." These examples did not reflect well on the teachers, principals, schools, or school boards involved.

While parents who are in regular and friendly communication with a child's teacher are likely to be well informed about the child's progress, that level of communication isn't always maintained: a parent may not be able or willing to articulate concerns or misgivings, or may not always understand or agree with the teacher's analysis. More frequent and more candid communication would do more to correct this problem than any increase in assessments or testing.

Teachers have an obligation to be sensitive to parents who don't understand, don't agree, or who have difficulty articulating their concerns. They have to reassure parents who are afraid to voice misgivings, lest their children suffer some form of retaliation. The fact is that no report card, no matter how precise, makes good communication between teacher and parent obsolete or less vital to the student's well-being.

Parents also want to know how their children are progressing in terms of acceptable and universal standards which, until recently, had not been established. Now that they have begun to be established, standardized assessment is possible - as long as teachers are equipped to carry it out.

As already noted, the recent development of learner outcomes and standards is helping to create a clearer and more provincially consistent basis for curriculum and standards on which assessment will be built. That is a crucial step. We have urged the Ministry of Education and Training to develop "curriculum guidelines based on the learner outcomes that will give teachers and parents a clear idea of the basic structure of each curriculum area each year." (See recommendations in Chapter 8.) We have recommended that, at the beginning of each school year or semester, schools give parents and students information on course content, based on clear learner outcomes. We have also suggested that the learner outcomes in the common curriculum courses be made more readily understandable, and that outcomes statements are needed for all grades and subjects, including the specialized curriculum in Grades 10 to 12.

Clearly written learner outcomes, even without descriptions of different levels or standards of achievement, would make it considerably easier for parents to know what their children are expected to learn and what they have learned. The standards (which have been developed for language/ literacy and mathematics/numeracy, and which we have recommended be developed for science, computer literacy, and group learning skills) give parents information they need if they are to better understand and informally assess their children's progress. We believe that reporting to parents should be based on the same learner outcomes and standards as the curriculum. Thus, in a parent-teacher-student conference, parents should be shown examples of work of different standards, so that they can fully understand their own child's level and mark. Report cards should reflect the student's level of attainment of major outcomes, measured by adherence to clear and universal standards.

Goals are made clear if, at the beginning of the school year, parents and students are provided with a written description of expected outcomes, and then get feedback on students' learning throughout the term or session; report cards must be consistent with this information. The importance of evaluating students according to uniform and explicit standards also pertains to issues of fairness and consistency.

An individual student or parent says, "It isn't fair that teacher X (and/or school Y) gives much easier marks than my teacher (school). It gives those students the advantage of a higher average and means they get a place in university that is denied to me, even though my 80 percent is worth as much as their 90 percent." Beyond the individual's complaint, universities and colleges worry about screening applicants to get students who are most likely to be successful. Employers worry about the meaning and value of a transcript or diploma. Society worries about whether its best and brightest have opportunities for higher education so that they can become pillars of a productive and competitive society.

Because teachers have been held responsible for using uniform, consistent standards that did not exist, they have used their own. The supposed objectivity of numbers, percentages, and letter grades obscures the fact that standards differ; a provincial standard should mean that, while differences in teachers' marks will never completely disappear, they will be fewer, smaller, and less significant.

It is of course true that we can never eliminate all subjectivity in assessment, and cannot pretend that there is or ever will be a fool-proof objective test of everything we want students to know. We can, however, take steps to modify and decrease, albeit not eliminate, inconsistency among teachers in marking.

We have spoken earlier of the necessity to improve teachers? ability to assess students? work accurately and consistently, and of our belief that this professional education must begin early and continue through the teaching career. In order for that training and practice to be most efficient and effective, it is highly desirable that its content be determined by the learner outcomes and standards which teachers will be assessing students on. In order to offer this support, it will be essential to create resource materials and manuals keyed to the curriculum, to guide teachers both at the training and application stages. Such materials must give multiple examples of how the achievement of specific outcomes at various levels (or standards) can be consistently measured. "There is no reason why we have to be assessed in the same way ... If I understand a mathematical principle and I can show you it one way, it's not really important that I show it to you in another way." (12)

Recommendation 47

We recommend that the Ministry of Education and Training begin immediately to develop resource materials that help teachers learn to assess student work accurately and consistently, on the specific learner outcomes upon which standardized assessment and reporting will be based.

One valuable resource has already been developed, but needs to be updated and refined: the Ontario Assessment Indicators Program (OAIP), referred to earlier, which contains assessment items and ideas for many grade levels and subjects.

The next step, we suggest, is creation of a provincial report card, an Ontario Student Achievement Report (OSAR) based on the outcomes and standards expected in each grade and each subject. In addition to a global mark for each subject or interdisciplinary area (e.g., math or arts), students should be rated on a set of specific outcomes, derived from the common curriculum and provincial standards documents. In the first and second terms, the report should indicate the extent to which the student is (or is not) making good progress toward the achievement of each of the several outcomes related to the particular subject and, at the end of the school year, has or has not achieved that outcome at a satisfactory level.

In the term (and possibly the final) reports, the teacher should include practical and specific suggestions for students and parents for progress and how it can be achieved. The teacher who works at being a capable assessor of foundation skills will give parents the information they want: a clear indication of where their children stand as measured by provincial standards. In other words, we believe the accountability so many parents are asking for is based on clear standards, and on able teacher-assessors making unambiguous reports, the core of which (the key learner outcomes reflected in the report) will be the same for all teachers of the same grade or course. We also believe that teacher comments are a very important part of any report card, and should refer to significant, authentic demonstrations of knowledge and skills, or to indications of genuine difficulties.

We also suggest that, after Grade 9, when students follow different programs each semester or year, it is desirable to have the same kind of standard reporting format. We have recommended the development of learner outcomes for the courses that follow the common curriculum of Grades 1 to 9; once they exist, the OSAR is equally appropriate after Grade 9. Each subject teacher would indicate the extent to which the student is achieving the expected outcomes, give the student a global mark in the subject, and include helpful comments to the parent. In keeping with current practice, subject teachers' reports would be combined into a single report, possibly with comments from the home-room teacher or advisor-teacher who examined the student's progress across subjects. All of this could be greatly facilitated through the use of standard forms and computer programs developed centrally by the Ministry of Education and Training.

We do not want to remove the flexibility of teachers and schools in reporting to parents in a way that reflects local needs and preferences. We suggest that the Ministry prepare a common report card based on the expected outcomes in each grade within the common curriculum (and each course within the specialized curriculum) and that it provide an electronic copy to every board; boards could seek permission from the Ministry to make additions, but not deletions, and any substantial changes in content or format would require the approval of the Ministry. Of course, boards could add other documents, as long as the Ontario Student Achievement Report was the main vehicle of communication. There should be ample room for teacher comments as well as the check-offs on achievement levels. Translations should be provided by the Ministry for parents who do not read French or English, and a Braille version could also be developed.

The Ontario Student Achievement Report should be designed by a team of educators and assessment experts, with significant input from the community, (through the Ontario Parent Council, for example) and, at least at the secondary level, from the three student federations or the Ontario Student Council (see Chapter 17). The OSAR should be field-tested initially and reviewed regularly to ensure that it meets the needs of teachers, parents, and students.

We are not suggesting that the OSAR for Grade 1 be the same as for Grade 7, even with differences in outcomes. We believe that professional educators, students, and parents are in the best position to decide how reports should be structured, given the differences from one age to another. The key criteria are clarity, a direct link to learner outcomes in the curriculum, and input from the users.

Recommendation 48

*Therefore, we recommend that the Ministry of Education and Training, in conjunction with professional educators, assessment experts, parents, students, and members of the general public, design a common report card appropriate for each grade. To be known as the Ontario Student Achievement Report, it would relate directly to the outcomes and standards of the given year or course and, in all years, would be used as the main vehicle for communicating, to parents and students, information about the students' achievements. While school boards would not be permitted to delete any part of the OSAR, they could seek permission from the Ministry to add to it.

We come now to the matter of setting a standard for communication, one that recognizes the importance of assessment and the right and need of parents to have information on their children's progress, if they are to support learning and the school.

We believe that, in each school year, all teachers should have a minimum of two conversations, in person or by phone, with the parents or guardians of each student for whom they carry prime responsibility.

These conversations (and we see two as a minimum), which are in addition to the formal conference at report-card time, should focus on student achievement, improvement, and concrete suggestions about what parents can do to support their children's learning. From kindergarten to Grades 5/6, this would include all the students in the "main" class, while students in a rotary system would be the responsibility of a home-room teacher or a teacher-advisor, as recommended in Chapters 8 and 9.

We suggest that the first conversation take place prior to the first report if, as often occurs, that is scheduled as late as December; beginning in Grade 7, the discussion would probably make reference to the development of a Cumulative Educational Plan (CEP). (See Chapter 8.)

We are convinced that the key to assessment for accountability to parents is teacher-based standardized assessment which indicates how much progress students make over a year toward the achievement of critical learning outcomes. We think that the government would be wise to invest the considerable monies necessary for good assessment where there is the biggest payoff for students: in extensive, high-quality teacher education for extensive, high-quality, standardized, classroom-based assessment.

The uses of information technology in improving student assessment

In our opinion, information technologies, and in particular microcomputers, can help implement educational practices in accordance with the principles of formative assessment. First, they enable data to be collected and analyzed coherently, and second, they help to improve teaching and student learning.(13)

We agree that the computer has an important place in individual student assessment, particularly in its potential for giving students quick feedback on how much and how well they have learned.

Eric Dempster, head of the Business Department at Sir Wilfrid Laurier Collegiate Institute in Scarborough, e-mailed a submission to the Royal Commission, giving an example of the way technology can be used in testing, in order to improve student learning. Mr. Dempster says he first used computers for assessment six years ago and allowed students, including those who would have failed but had never been given the opportunity to do better, to take tests more than once. Mr. Dempster averaged the test marks, which provided an incentive to do well the first time, but also showed students they could improve. "The overall result [was] that the poor students felt empowered and realized quickly that they could improve."

His present testing software randomly generates questions, prevents students from restarting a test, and includes graphics.

The students in Mr. Dempster's class are learning more than just the subjects he teaches: they are discovering that they can improve, and that self-assessment is an important part of the process. Many employers told us that, if they are to stay competitive, future workers will have to be experienced in self-assessment. And, because it involves the student guiding his or her own learning with the support of technology, self-assessment also has the potential to increase the teacher's role as coach and mentor.

Mr. Dempster's experiences have been replicated in classrooms where Computerized Adaptive Testing (CAT) is being used: the computer chooses a question on the basis of the answer to the previous one.(14) A correct response results in a harder question, while an incorrect one elicits an easier question. This quickly clarifies the level at which a student is working, and uses few questions to do so; it also pinpoints for students the areas in which they need more help and/or more practice, and makes them responsible for their own progress.

Immediate feedback can be used to motivate students who might otherwise have very little interest in school. This was one finding of a pilot project in New York City (15) that involved a group of inner-city students considered most at-risk of dropping out. They visited the computer lab once a week and took computer-generated "adaptive" math tests. The computer provided students and the teacher with immediate feedback, "rewarded" students who reached 100 percent in each topic with a graphic of a hamburger, and generated practice sheets for the rest of the week.

Contrary to common expectations of them, many at-risk students in the experimental group sought to do well in the computer tests. Sometimes they argued with the teacher that a response marked by the computer as incorrect was, in fact, right, thus indicating that the assessment mattered to them. An unexpected result of the pilot project was student-generated competition for the hamburger. Over time, the students did better in math, as the result of the "friendly competition," the immediate feedback, and the work of the classroom teacher; moreover, they were less often found to be "off task," doing something other than the work at hand.

It is also interesting to note that, contrary to other research findings, the female students were more comfortable with the computer than were the males.

For some time, technology has been used in assessment, to collect and sometimes analyze achievement data. Teachers are already keeping track of how well students do in assignments and tests, and there is software that enables teachers to graph or otherwise display and analyze the data.

We are certain that, with more and better data, teachers will be in a better position to decide on the best types of programs and interventions for their students. Better information and new ways of displaying it will mean improved reporting to parents. As well, computer-based assessment and diagnosis will reduce marking time for teachers, eliminate errors in marking, and offer opportunities for different test formats and for tests in other languages.(16)

However, good assessment software (of which there is an inadequate supply) should do more, moving students from simply accumulating facts to organizing, analyzing, and transforming data. It should measure the quality, rather than simply the quantity, of the student's understanding. And it should be capable of making assessments using portfolios and "real-life" performances based on provincially set standards, with fewer multiple-choice (sometimes called "multiple-guess") tests to compare one student with others in the class, school, or province. Software that requires students to solve problems, that includes high-quality three-dimensional graphics, and that requires students to present their answers and solutions in a variety of formats, will challenge students to show they understand rather than just remember.

There is a long way to go before Mr. Dempster's on-line assessment is the norm in Ontario's schools. Change of this nature requires professional development, adequate hardware, and the right kinds of software, screened for bias. (And, as we make clear in the next section, equal access to computers is a necessary element in eliminating assessment bias.)

We believe that the potential of information technology to improve assessment is substantial, and suggest that information technology play a prominent role in teacher development in assessment, and that the Ministry of Education and Training, in making high-quality software available to Ontario schools, place emphasis on the potential that software offers for improving assessment.

Avoiding bias in assessment: Respecting differences, recognizing diversity

The notion that a student, because of colour, race, or handicap might be streamed to an educational program which is not consistent with the attributes and abilities of that individual is unacceptable.(17)

We have discussed the importance of frequent and accurate assessment of student learning and literacies, and recognized the link between timely feedback and effective student learning, as well as the need to report to parents and the larger public. However, the Commission is very aware that assessment, when not carried out well, can have serious negative repercussions on individuals and on groups of students. The challenge to be effective, helpful, and fair means ensuring that assessment is done well, not that it is avoided.

Assessment must be as bias-free as possible, so that gender, social class, race, culture, and disability are not treated as negative factors. The results of assessment, even of routine classroom assessment, are likely to have an important effect on the confidence and motivation of students, which, in turn, affects performance. Assessment may also have an impact on the student's academic career, and has the potential to cause life-long damage to the person who is assessed below his or her real ability and streamed into lower groups (the "lambs" rather than the "lions" reading group), special education classes or non-university high school streams.

A growing number of parents and educators are raising questions about the over-representation of minority students in special education, vocational, and basic-level programs. The essential concern focuses upon the perceived use of inappropriate testing materials, assessment practices, placement strategies, and restrictive learning opportunities in some jurisdictions.(18)

Many groups are concerned about bias.(19) Various forms of assessment have shown that those who are poor, members of some minority groups, or who are female perform less well than their knowledge or skills would warrant. Some communities complain that their students have been negatively streamed because of biased assessments. For example, more than a decade ago, a York University symposium on racial and ethnic relations in city school boards was told by Marcela Duran that

we were able to institute an experimental program, in co-operation with the Jamaican-Canadian Association, in which 100 West Indian children who had been placed in vocational schools were re-assessed, using different testing instruments. According to this process, 90 of these students were found to have been wrongly placed.(20)

We agree that there is ample evidence that students from some groups are more likely to be placed in lower "ability" classes and streams than others,(21) and that assessment methods may figure in those decisions. But we are convinced that improvement depends on more than just modifying assessment procedures: changes are needed in curriculum, teaching methods, and other areas (including, as we make clear elsewhere in this report, a fundamental reduction in streaming).

Given the importance of assessment, it must not only avoid bias on the basis of gender, social class, or cultural background, it must reflect diverse skills and knowledge, valuing what students know and can do, even if they express it unconventionally or do it in different ways.

In Ontario, as in other Canadian jurisdictions, in the United States and in England, a great deal of attention has been paid to the way assessment bias affects minorities and immigrants. This is because some minorities and immigrant groups, as well as students from poor families or communities, are over-represented in special education classes and non-university streams.(22)

Test bias exists in many different contexts: for example, despite our support for computer-based assessment, we recognize that bias can be found and perhaps even made worse by the use of information technology. We know that students from different socio-economic backgrounds have different levels of access to computers and, therefore, that some will be more at ease than others and that comfort levels undoubtedly affect results.

Four potential causes of bias have been identified in assessing students who are members of ethnic or racial minorities or who are immigrants: bias in the test's content and form; in the way the test is given; as a result of factors in the student's environment, in or outside school; and in the ways results are interpreted and reported.(23) Many of these are related to the inadequacy of teacher education in assessment, and lead to inappropriate student placements.

Educators must also be careful, when assessing students of ethnic/racial minority backgrounds for placement in special education programs, to ensure that due consideration has been given to linguistic and/or cultural factors that can preclude fair and accurate assessment.(24)

Assessments of many second-language students do not adequately differentiate between language-related difficulties and the actual level of knowledge or skill the students possess. The person who thoroughly understands all the material at hand will not be able to answer even the simplest question, if he or she does not comprehend the language in which it is being asked. There is the related problem of confusing linguistic deficits with deficits in ability. Students who have emigrated to Ontario may need time to learn the language, but that does not necessarily mean they need remedial or special education.

There is also the issue of measuring students in terms of what they have learned or are capable of learning, in contrast to assessments that have more to do with the learning environment than with any inherent characteristic of the learner.(25) Is the "learning-disabled" student genuinely disabled, or is the problem a lack of instruction in reading, in disguise?

Before decisions are made to place students in special education classes or in non-university streams, there should be evidence that they cannot achieve progress by changing curricular material or being assigned to a different teacher, and that modified regular-classroom teaching strategies that are being used successfully with other youngsters from a variety of ethnic, linguistic, and socio-economic groups are not working.

Stereotypes develop as we attempt to organize people into categories and to make sense of our world. That in itself is not the problem. However we are in real trouble when these categories are so closed that they prevent us from seeing people's full potential.(26)

There is also evidence that, on multiple-choice tests, girls and women do not do as well as boys and men. According to a joint study by the College Board and the Educational Testing Service in the United States,(27) "the gender gap is substantially larger for multiple-choice items than for other types of questions." The study found that the gender gap narrowed or disappeared when students had to write their answers, as in essays or word problems. The study concludes that a mix of assessment instruments is necessary to ensure equity in high-stakes standardized testing.

Another form of gender bias is found in tests that include questions or examples related to activities more frequently of interest to males than to females - certain sports, for example. Obviously, assessment tools must treat male and female students equally, and must meet the needs of our diverse school populations.(28)

In trying to remove bias from tests, efforts have tended to focus more on the material than on training teachers to construct bias-free tests or to use fair testing techniques. This is baffling, given that most forms of assessment - tests, assignments, projects, oral discussions, etc. - are part of the daily interaction between the teacher and students. Clearly, more attention must be paid to teacher education and to on-going professional development.

More frequent and more varied classroom assessment is another way of minimizing bias, but it presupposes that the teacher is familiar with a variety of techniques. When testing or examining students, giving them a choice in the way a question is answered also helps.

A fair assessment also takes the individual student's environment into account. For example, assessing for placement purposes may be inappropriate for a recent refugee or for a student who has just moved from French immersion to an English-language program. Assessment in the student's first language has been shown to isolate problems related to acquiring a second language, rather than to gaps in knowledge or skill, and it should be used where suitable and possible.

Teachers must have a sense of whether or not students and parents believe that an assessment is fair; if they see it as unfair, there is, at the very least, a problem of communication and there may also be one of equity. When it is impossible to test a student in a first language or to delay assessment of a refugee student, it is vital that the student not suffer as the result of our lack of resources or time. That means, for example, not placing the refugee student with younger children when a test might reveal that what is needed is a specially planned program with specific kinds of support.

Bias in assessment will become increasingly important as Ontario participates more regularly in assessments that encompass other provinces and other nations. This is particularly true in a province that is geographically and socially diverse, and that will become even more culturally and linguistically varied. Fair assessment is vital if the system is to more fully reflect the needs of all students.

As a tool for tracking students into different courses, levels, and kinds of instructional programs, testing has been a primary means of limiting or expanding students' life choices and their avenues for demonstrating competence ... [T]he goals ... of assessment are being transformed from deciding who will be permitted to become well-educated to helping ensure that everyone will learn successfully.(29)

In our view, the Ministry must take the lead role in ensuring that its own assessment instruments treat all students equitably and that the materials used in schools are appropriate and fair. It can do this by evaluating the substance and procedures used in assessment and by monitoring the placement of various groups by stream (or track). The Ministry's new anti-racism, equity, and access division can lead the effort to ensure fairness in assessment. It should also be responsible for monitoring implementation of recommendations made by the Consultative Committee on Assessment and Program Placement of Minority Students for Educational Equity.(30)

Recommendation 49

*We recommend that the Ministry monitor its own assessment instruments for possible bias, and work with boards and professional bodies to monitor other assessment instruments; that teachers be offered more knowledge and training in detecting and eradicating bias in all aspects of assessment; and that the Ministry monitor the effects of assessment on various groups.

Large-scale assessment of student achievement and the effectiveness of school programs

Large-scale assessment of student achievement

Having said that assessments should be based on agreed-on standards, and that teachers should be trained to use them skillfully and fairly and to communicate their results clearly, we turn now to the matter of external tests, given simultaneously to all students in a grade or course. Some people believe that these are a more objective and therefore fairer and more accurate measure of what students have learned. We believe that some system-wide testing should be built in, as a check on student learning at a few critical transition points, and as a vehicle for assuring people that, at those points, all students are being assessed according to the same yardstick.

However, it is important to emphasize that large-scale testing has limitations; otherwise, people reach what we are convinced is the mistaken conclusion that these few tests are the most important in the student's school career, or that many such tests would be ideal. In our opinion, large-scale testing is unlikely to be a more fair and accurate representation of student learning than the best judgment of the well-trained teacher-assessor. Moreover, such testing is easily misused. The following are the three basic problems of using large-scale testing as the major form of student assessment.

First, any external testing is, of necessity, much briefer than classroom-based assessment: a single test cannot reflect everything students are expected to learn over a year. For example, to get a true reading of what a Grade 6 student has learned in math, a number of tests would be necessary, each quite lengthy, to overcome such irrelevancies as the student's level of well-being (hours of sleep, nutrition) that day, or the use of an unfamiliar word in a problem (which might lead to the erroneous conclusion that the student didn't understand the question or the mathematical operation), etc. The reason we are urging that the major source of data on student achievement be that which is collected by the classroom teacher over the year is precisely because that is what offers the greatest potential for reflecting, cumulatively and in summary, what has been learned. A simple achievement test, such as the Canadian Test of Basic Skills, or others of that kind, is not designed to reflect what children know in any depth. Its purpose is to arrange students along a continuum, from those who know most to who know least, in order to make placement decisions. Such tests are not measures of how well teaching and learning have occurred.

Let's say, for example ... that you get a certain score on a standardized test. Can I assume then that you understand something? You might say, "Sure, because those tests test for understanding. But ... research indicates that most students in most schools ... do not really understand ... When you ask students who get very high grades ... to explain a physical phenomenon, not only can they not explain it but they actually give the same sort of explanations that four- and five-year-olds give ... We can only really determine whether a student understands something when we give the student something new, and they can draw upon what they have learned to help answer a question, illuminate a problem, or explain a phenomenon to someone else.(31)

Testing is no panacea for an education system under stress. After all, a mechanic can inspect a car without making the necessary repairs. The long-term educational improvement lies with a comprehensive restructuring of the enterprise, not in resorting to the proverbial "quick fix" of a standardized test. The public needs to be informed about the growing array of assessment tools, but also about how they should be interpreted to improve student, school, and system-wide performance in education. For that reason, testing is only one part of a more comprehensive education restructuring package.(32)

Second, because of their necessary brevity and because thousands of tests must be marked quickly, external tests usually tend toward short-answer and multiple-choice questions, with all their severe limitations on measuring understanding and learning skills. They are the classic case of measuring what is easiest to measure, not what is most important. We are not suggesting that such tests can?t measure certain important abilities we expect all students to have, only that they cannot and do not measure all, or any representative sample, of them. They are biased toward certain kinds of learning, and there is ample evidence that such bias distorts the curriculum in ways that are unhealthy in an educational system that is serious about learning.(33)

Third, any single test used for large-scale assessment and reporting assumes a distorted importance, and can - and often does - have long-term, frequently negative consequences for students and for the learning system, because of the inappropriate ways the information is used. Tests meant to measure whether most children have learned the year's material should not be used to make decisions about students' capacity for learning, or their long-term ability to succeed in school or in the regular program. The problem is that, typically, test scores end up being put to such inappropriate uses. Placement decisions should not be made on the basis of any single test given on a single day in a student's year; however, that is precisely how they are frequently used.

As early as the late 1970s, evidence began to accumulate showing that high-stakes standardized testing policies were highly corruptible, creating greater incentives for cheating than for actually improving instruction, and that the use of standardized tests for accountability had actually narrowed curricula and driven instruction increasingly towards pedagogues, based on memorization and basic skills rather than improving educational quality.(34)

The 1993-94 Ontario Grade 9 testing for language and literacy (with a similar test being given in 1994-95) can be used as an illustration of these points. It is, in fact, a very good test: first, it took place over more than six hours, spread over a two-week interval, thus giving students an opportunity to demonstrate their knowledge and understanding in a way that would be impossible in a typical one-hour "test of basic skills" or the like. Second, the test did not just ask short-answer questions, but was a genuine assessment of performance.

Nonetheless, by itself, the test would tell us less about what students learned about reading and writing in nine (or fewer) years of schooling than would teacher reports based on clear and consistent standards. Moreover, it did not differentiate among students schooled in Ontario for one, two, or nine years prior to the testing. But it did give us valuable data on how well Ontario?s Grade 9 students understand what they read and whether they can write clearly, expressively, and to the point. We do not know yet whether the test will lead to improved teaching and learning, but it was a much better accountability mechanism than most tests - and, of course, at about two million dollars to administer each year, much more expensive. (As we have already pointed out, however, good assessment is very expensive.)

We applaud the Ministry's attempt at large-scale testing in order to measure learning authentically. Despite its strengths, however, a test's ability to withstand inappropriate or damaging misuse is much more problematic. The Minister made it clear to educators that the test was to count for 20 percent of the course mark, but was not to be used for making major decisions about student achievement. It was not to affect whether the grade was passed or failed, or whether the students were to attend summer school or be placed in different programs or "streams" in Grade 10. Nonetheless, informally and unofficially, there are indications that, in some instances, it has been used in exactly those ways.

Whether these reports are accurate, and irrespective of the number of cases to which they might apply, we see such uses as the natural outcome of large-scale external testing. It becomes "high stakes" testing, even when it is not intended to be.

While we want to be very clear about our lack of enthusiasm for extensive, expensive, universal testing, as opposed to sample-based assessment, we recognize the public's need for some measure of basic student achievement that is applied in the same way to every student at a few points in time. That is why we are recommending two province-wide assessments to be given to all students relatively early in their schooling, with the understanding that educators (most especially school principals) will make it clear that the results of such assessment are to be used by teachers, individually and collectively, for purposes of diagnosing and remediating the individual student's difficulties or gaps in learning. In addition, the tests are to enhance reporting to parents and for examining the content and delivery of curriculum. Test results are, most emphatically, not to be used to place or sort students for any reason. They will serve as a central check on how effectively the curriculum is serving the learning needs of the students, and can be an aid in revising or refining curriculum content or teaching strategies.

We are also recommending that a test, to be given much later in a student's school career, make the secondary school diploma a literacy guarantee.

Assessment for early acquisition of literacy and numeracy: getting it right from the start

We have built a learning system on a strong, early foundation. (See Chapter 7.) We have urged that all children be helped to become literate and numerate by the end of Grade 3. By that time, we expect that almost all children should be able to read and understand materials appropriate to their age, and to write on an assigned topic, or a topic of their choice, showing reasonable understanding of conventional rules of grammar, spelling, and punctuation, as well as an ability to bring organization and a "voice" to their writing. As well, we expect them to be able to use the four arithmetic operations, and to understand when to apply them. We see the value of a check on the success of the system in delivering a program that brings all or nearly all children to a point, by about age 9, that enables them to build on dependable foundation skills so that they can acquire more sophisticated knowledge and understanding. We think that parents will also welcome conversations with their child's teacher that include the results of this universal assessment, and a discussion of the child's future progress.

Recommendation 50

*Therefore we recommend that all students be given two uniform assessments at the end of Grade 3, one in literacy and one in numeracy, based on specific learner outcomes and standards that are well known to teachers, parents, and to students themselves.

And, in order that these tests have high credibility in the eyes of the public:

Recommendation 51

*We recommend that their construction, administration, scoring, and reporting be the responsibility of a small agency independent of the Ministry of Education and Training, and operating at a very senior level, to be called the Office of Learning Assessment and Accountability.

This agency will consult with provincial leaders in literacy and numeracy education who can provide leadership in creating assessment instruments that are as valid and reliable, as authentic and comprehensive, as possible. We recognize that principals and teachers will need support and assistance in interpreting and reporting the information gained from these instruments, and would expect both the agency (through the written material it prepares) and the Ministry to act as sources of expertise for school boards.

The results of these tests should be reported promptly and in clear language to parents individually, to every teacher whose students have been tested, to the local community at the school level, and to the general public at the board and provincial levels.

Assessment for graduation: the diploma as a literacy guarantee

The value of assessment at an early stage, such as the end of Grade 3, is that it gives a clear indication of a child?s strengths and weaknesses, and shows where school and home efforts must be focused and monitored. There is also value of a different kind in assessment for accountability near the end of the student's secondary schooling: as a fundamental guarantee, the education system must assure the public that a high school diploma signals adult literacy; that no high school graduate is incapable of reading and writing well enough to communicate in a post-secondary classroom, on the job, or in order to meet the demands of everyday life as a citizen and voter.

Recommendation 52

*We recommend that a literacy test be given to students, which they must pass before receiving their secondary school diploma.

The test would be given in Grade 11, the year before graduation. Students who did not pass the first time would be able to retake the test until they did, but graduation would be dependent on passing.

Some students who took the test the first time might find that they needed help in order to pass, and they would have an opportunity to find that help, and prepare again for the exam. The test would be inappropriate for some students in specially modified programs (such as those in schools for the severely developmentally handicapped) that do not now generally lead to a diploma. However, we believe that it is reasonable to award a diploma only to those who pass the literacy test.

We propose that other large-scale assessments be applied, not to individual students, but to representative samples of students. These would be used to judge how well the curriculum was being learned, as now occurs in the case of provincial, national, and international assessments in mathematics, science, and other subjects.

The effectiveness of school programs: program and examination review

As we have seen, individual students are assessed by their teachers, with the addition of occasional large-scale assessments, and students' progress and achievement must be reported very regularly to parents.

Furthermore, those who are responsible for the overall quality of the system - the provincial government and local boards - must not only ensure that individual students are progressing, but that the curriculum is being delivered effectively and that, on the whole, students in each grade and subject are learning what they are expected to learn.

This is system-level monitoring of achievement. It does not involve testing or assessing every student or every classroom but depends on monitoring student achievement and teacher practices by testing representative samples drawn from across the province; in addition, these samples must be of sufficient size to provide reliable data at the individual school board level.

In Ontario, two processes are used to accomplish those goals and both are extremely sound approaches to system monitoring. The first of these is the process known as the provincial reviews of curriculum, and the second is the examination review process at the senior level, known as the OAC/TIP program. Both have applications well beyond their present restricted use and reporting. At present, both suffer because they are applied sporadically, rather than systematically, across the curriculum, and because the results are under-reported.

Provincial reviews of curriculum

From time to time, provincial reviews of a variety of elementary and secondary courses are undertaken. In each case, the review includes testing of a representative sample of students on the content of the course (for example, Grade 6 reading or senior-level geography), as well as an inspection of curriculum materials, interviews with teachers and students, and other information that helps describe what is taught and learned.

As a result of a provincial review, the Ministry and all school boards have concrete information about the parts of the reading or geography curriculum that are being successfully delivered to students and the parts that are not, based on student performance. As well, they can identify the kinds of resource materials that may be lacking, and the areas in which further teacher education should be offered. These reviews are useful, for both large-scale assessment purposes and for teacher and curriculum development. But they are scheduled sporadically and unpredictably and are publicly under-reported. Moreover, because clear and consensual standards are not established in advance, the results of such assessments are sometimes questioned.

In order to build a good program for educators and make it an effective monitoring mechanism as well, the Ministry of Education and Training should commit to a regular review cycle in all subjects that are part of the common curriculum, with more frequent review in the foundation areas. Subjects should be reviewed at points within the common and specialized curriculum; for example, a history or a geography review might occur every five years and include Grades 6, 9 and 10/11.

Some school boards have used the provincial review to include all students, with no individual identification attached to the test. We applaud this concern for accountability at the local level, and consider it very appropriate because it does not confuse individual scores with evaluating the performance of the staff and students of an institution.

There are, of course, serious concerns about invidious comparisons that ignore many factors over which the individual school has no control. However, the provincial review data have been, and should continue to be, used by schools and school boards to improve teaching and learning at the local level. We believe that review results should be shared with the professional staff and school governance committees of schools that participate, as well, of course, as school board administrators responsible for monitoring and supporting schools. That, after all, is the level at which the data are useful for making improvements to a school. (See the following section for a more extended discussion of this issue.)

The provincial curriculum reviews have also involved teachers as markers, a process exactly like that we described earlier as the ideal professional training for classroom assessment. Working in groups, with the support of experienced markers, teachers reach agreement on what makes one paragraph or paper more or less satisfactory than another, and they establish criteria for judging performance consistently. Thus, the teacher development "spin-off" of the monitoring process is, itself, an investment in better assessment in the classroom.

The examination monitoring process

In the 1980s the Ministry of Education began monitoring examinations used in the Ontario Academic Courses (OACs). This process, which is officially called the OAC/TIP (for "teacher in-service program") was designed to ensure consistency in the quality and coverage of the exam and the marking standards set by each teacher in every course which helps to qualify students for university. The process involves collecting and scrutinizing examinations teachers set and the marks they award to the students' examination papers. All publicly supported secondary schools, as well as inspected private schools that offer university-preparatory courses in the final year (OAC), must participate in this examination review process. At this point, the process, which has been virtually invisible and unreported publicly, has not been extended to any other courses.

After surveying practices under the OAC/TIP, the Ministry of Education and Training develops a handbook on designing and marking examinations in a particular subject area. Teachers in-service programs inform them about the contents of the handbooks, and schools submit copies of their final examinations and scoring keys, as well as a range of test papers representing high, average, and low scores.

An analysis of the examinations and their consistency with expected standards enables the Ministry to judge the impact of standards; schools that vary from them are required to take corrective action and report to the Ministry on the steps they are taking.

University teachers are also part of this process, although their participation has tended to be based on individual expertise, rather than encompassing any responsibility to represent and report to the larger university community. We suggest that, in future, universities and colleges see their role in the process as an opportunity to present their needs and requirements as part of the formation of standards, rather than remaining outside of that conversation.

We further suggest that professors and instructors who teach undergraduates in a discipline, rather than those at the professional (faculty of education) level, take part in the process. People who will be teaching English, geography, or other courses to first-year university and college students are better placed to participate in decisions about acceptable levels of performance in Grade 12, and to work with secondary educators to help students make the transition from high school to college or university.

To date, the OAC examination review has been conducted in several subject areas (English language and literature, visual arts, calculus, economics, accounting, physics, chemistry, and Francais) and is currently scheduled to add one subject per year through 1996. While it is expected that schools or teachers will take action when a review indicates that there are areas that require attention, implementation has not been systematically monitored, and results have not been publicly reported.

This process, like the provincial curriculum review, is especially worthwhile because it involves many teachers in the marking exercise, and, thereby, expands their professional capacity for assessment. Teachers must become more skilled at making professional judgments on the quality of responses to questions that are not simple, multiple-choice or otherwise close-ended. Building this kind of skill and expertise educates teachers in consistent assessment of high-level learning.

The OAC/TIP examination process has all the elements of good assessment and teacher development, but needs better quality control, much more public visibility, and very considerable expansion. As a monitoring program, it can help ensure that a teacher's application of assessment standards is accurate and consistent; this will give increased credibility to a system that depends fundamentally (as any school system must, and any honest school system will readily admit) on teacher education and expertise.

The examination review process, in combination with provincial reviews, gives a reasonably complete picture of what is being learned, and how fairly and consistently that is being assessed. It can and should be taken to the next step, implementing changes in programs, teacher training, and marking procedures, based on what is learned. Furthermore, implementation should be monitored.

The examination review procedure should be expanded to include the full range of Grade 12 courses. Because the process has significant potential for helping to achieve consistency, and because we believe the process should be transparent, it should be extended, and all results should be reported to the public.

Without doubt, considerably expanding program and examination reviews will involve educators in Ontario in more program evaluation than they are accustomed to doing, and will necessitate diverting more funds to assessment. We believe that such efforts and investments are essential; we are convinced that they will be supported by the public, as long as they are carefully designed and implemented, and as long as results are clearly, promptly, and publicly communicated. We see curriculum and examination reviews (what have been called program reviews and the OAC/TIP model of examination review) as an important and ongoing responsibility of the Ministry, in the development of curriculum outcomes, standards, and assessment measures or strategies; and the administration, scoring, and reporting of results.

We envision a cyclic large-scale and province-wide assessment program that:

  • identifies the one or two areas (skill, subject, cross-curricular) to be assessed for each of the next three years, with a commitment to extend this schedule by announcing another program each year;
  • is centred on established outcomes and standards for assessment that will form the basis for judgments about students' levels of attainment, to be shared with educators and the public for discussion;
  • is based on a statistically reliable sample at the provincial level;
  • will be planned and conducted by teachers and experts in assessment, working together;
  • requires each board to participate in a board-wide assessment, so that the content and process are consistent throughout the province, and the results comparable from one jurisdiction to another.

Recommendations 53, 54, 55

We recommend that:

*the Ministry continue to be involved in and to support national and international assessments, and work to improve their calibre;

*the Ministry develop detailed, multi-year plans for large-scale assessments (program reviews, examination monitoring), which establish the data to be collected and the way implementation will be monitored, and report the results publicly, and provide for the interpretation and use of results to educators and to the public;

*initially, and for a five- to seven-year period, until the process is well-established in the school system and in the public consciousness, an independent accountability agency be charged with implementing and reporting the Grades 3 and 11 universal student assessments. The reports and recommendations of the Office of Learning Assessment and Accountability would go directly to the Minister, the College of Teachers, and the public.

The other responsibilities of the Office of Learning Assessment and Accountability are detailed in Chapter 19.

Reporting the results of large-scale assessments

While large-scale assessments are complex and expensive, the results they produce, and the wealth of information they contain, must be reported in ways that can be easily understood without being trivialized. The results achieved by Ontario students in international and national assessments have raised public awareness and concern, particularly because they identified some areas that need concerted attention. As we have pointed out, however, the results have sometimes been used - and misused - to rank Ontario in terms of other jurisdictions, but without thoughtful consideration and interpretation of the studies themselves. While not a simple task (it is a major challenge for the future), reporting results understandably and usefully is vital. This is an area in which the media also have serious responsibilities, to inform, not thoughtlessly arouse, the public.

Although the provincial government's main interest is in the overall state of education in Ontario, information about large-scale assessments is more useful to parents and educators when it is available for their particular school and school system; educators are concerned that any potential usefulness is offset by the possible misuse of the information.

Their concerns are not unique: there have been vigorous debates in other jurisdictions, especially where school results are reported as rankings or "league tables," and have been used as simple indicators of the relative quality of schools. Even a cursory look shows that these kinds of comparisons are totally inappropriate and ignore such crucial influences on student achievement as socio-economic family status, parental literacy, facility in the language of use, etc. Merely ranking schools may identify the area in which the most privileged students live, but it does not indicate the degree to which any school has helped its students develop. The fact that a school is apparently successful may be the result of non-school factors, just as the schools in which achievements seem modest may, in fact, be serving students who enter with low performance levels and improve greatly.

The issue of the value added by schools has become very heated, engendering both political and technical problems. Particularly in Britain, where the process has been in place for a while, teachers rightly point out that achievement results are inadequate measures of a school's contribution to student learning, and some have even refused to participate in the national testing program.

The British experience shows clearly that when the purpose of the study is to establish the effectiveness of the school, it must include information about contextual conditions, such as the readiness of students to learn, the nature of instruction, and the resources available. A statistician who has considered this problem in Britain says that:

[It] is not technically possible with any reasonable certainty to give an unequivocal ranking of schools ... it is important to avoid the trap of supposing that the provision of some information about schools is better than no information. The problem is that such information will be biased and misleading.(35)

The overall complexity of adjusting scores and the overly simplistic approach of publishing raw scores, brings into question the usefulness of ranking schools. Britain's National Commission on Education concluded that a single statistic was not an adequate summary of a school's effect on the progress of students.

This is not intended to suggest that information should not be provided about how schools are doing. But it does highlight the problems of making valid school comparisons on the basis of simple scores and the importance of schools and school boards giving results that include comprehensive information about themselves.

The most appropriate and constructive use of school results for comparative purposes is to look at results in the same school over time. Barring very major changes in neighbourhood demographics (which usually occur only over numbers of decades) the population of a given school is more comparable to itself over time than to that of another school:

For example, checking a student assessment in 1997 with the results of the same assessment at the same school in 1995, offers teachers and the principal an important indicator of progress and quality. When such comparisons are anticipated and planned for, staff have a real incentive to develop targeted school improvement plans, and to compare the next set of results to those plans. Making schools accountable for improving, as opposed to making them accountable for factors beyond their control, gives the promise of really adding value and quality to existing school practices.

To assess value added - and to gain valid insights into whether your schools are effective - you have to compare tests or other results over a period of time, with the same group of students.(36)

Another difficulty related to reporting is that of obtaining results of large-scale assessments broken down according to such sub-groups as gender, ethnicity, socio-economic status, and geographic region. Although this kind of analysis is technically possible if the information is available, detailed demographic data on students is not collected by most school boards. As well, as in the case of reporting results for individual schools, it would be almost impossible to explain differences that might be found among the population groups, unless a great deal of contextual information was added. Without these breakdowns of results, however, educators cannot fulfill their responsibility to monitor equity of outcomes.

Policy makers must accept responsibility for actively communicating with the public about large-scale assessment results, and must work with technical specialists who know the study and can help them interpret the results accurately to the public in many forms and forums. The major challenge is to provide as much information as possible, accurately and succinctly, without oversimplifying the message.

Large-scale assessment rarely provides unequivocal answers, but it does create a context within which different interests policy makers, professional educators, and parents, among others - can find a basis for informed dialogue. It can provide the foundation for debates about public policy, and identify the general direction for making changes in emphasis or focus. More than anything, policy makers must create a range of action plans for responding directly to the results of the assessments.

We urge that school boards and schools be provided with direction and training (initially by the independent accountability agency) to ensure they are able to report results of provincially directed assessments accurately and clearly, to their respective communities, and that, when they wish to do their own assessments, they be helped to do so, using high-quality tools.

Recommendation 56

*We recommend that the Ministry of Education and Training, in consultation with community members and researchers, develop a specific procedure for collecting and reporting province-wide data on student achievement (marks, and Grade 3 and Grade 11 literacy test results) for groups identified according to gender, race, ethno-cultural background, and socio-economic status.

Conclusion

Because they represent the visible products of schools, student assessments and program reviews are key elements in the process of education reform. The Commissioners are very conscious of the impact our recommendations will have on curricula, instruction, teachers, administrators, and, most of all, students. As the focus of education moves towards raising the levels of literacies for all our students, we can no longer rely on simply sorting and comparing students. The Commission is saying that, instead, we want clear descriptions of whether students are achieving the complex learning outcomes they will need if they are to succeed in the 21st century.

  

__________
Endnotes (Chapter 11)

  1. L. Darling-Hammond, "Performance-Based Assessment and Educational Equity," Harvard Educational Review 64, no. 1 (1994): 5-30.
  2. G. Wiggins, "Standards, Not Standardization: Evoking Quality Student Work," Educational Leadership 48, no. 5 (1991): 18-25.
  3. R.L. Linn, E.L. Baker, and S.B. Dunbar, "Complex, Performance-Based Assessment: Epectations and Validation Criteria," Educational Researcher 20, no. 8 (1991): 15- 21.
  4. H. Russell, C. Wolfe, and R. Traub, "Interface: Some Cold Facts on a Hot Argument," E+M Newsletter, OISE, no. 27 (1977).
  5. Philip Nagy, "National and International Comparisons of Student Achievement: Implications for Ontario." Report written for the Ontario Royal Commission on Learning, 1994.
  6. For example, a study in the United States of both standardized science and math texts - and the tests included with the textbook series - found that they contain almost entirely (close to 95 percent) items which test memorization and quick recall, and omit, almost entirely, items which test the higher-order functions involved in genuine problem-solving. See C. Holden, "Study Flunks Science and Math Tests," Science 258 (October 1992): 541.
  7. J.R. Kirby and R.A. Woodhouse, "Measuring and Predicting Depth of Processing in Learning," Alberta Journal of Educational Research 40, no. 2 (1994): 148.
  8. W. Haney, "We Must Take Care: Fitting Assessments to Function," in Expanding Student Assessment, ed. V. Perrone (Alexandria, VA: Association for Supervision and Curriculum Development, 1991), p. 142-66.
  9. Howard Gardner, quoted in E.D. Steinberger, "Howard Gardner on Learning for Understanding," School Administrator 51, no. 1 (1994): 28.
  10. Kirby and Woodhouse, "Measuring and Predicting Depth of Processing."
  11. G. Wiggins, "None of the Above," Executive Educator 16, no. 7 (1994): 16-17.
  12. Howard Gardner, quoted in Steinberger, "Howard Gardner," p. 28.
  13. Clement Dassa, Jesus Vazquez-Abad, and Djavid Ajar, "Formative Assessment in a Classroom Setting: From Practice to Computer Innovation," Alberta Journal of Educational Research 39, no. 1 (1993): 118.
  14. Vicki Hancock and Frank Betts, "From the Lagging to the Leading Edge," Educational Leadership 51, no. 7 (1994): 26.
  15. Barbara R. Signer, "CAI and At-Risk Minority Urban High School Students," Journal of Research on Computing in Education 24, no. 2 (1991): 189-203.
  16. Lauren H. Sandals, "An Overview of the Uses of Computer-Based Assessment and Diagnosis," Canadian Journal of Educational Communication 21, no. 1 (1992): 71. This article lists a variety of other benefits.
  17. Raymond T. Chodzinski, "Teacher Strategies for Non-Biased Student Evaluation and Program Delivery: A Multicultural Perspective," Canadian Modern Language Review 45, no. 1 (1988): 65-75.
  18. Ontario, Ministry of Education, Consultative Committee on Assessment and Program Placement of Minority Students for Educational Equity, Equal Educational Opportunity: Student Assessment and Placement (Toronto, 1987), p. 2.
  19. Child, Youth and Family Policy Research Centre, "Visible Minority Youth Project," p. 44. Report prepared by J. Cummins for the Ontario Ministry of Citizenship, 1989; and C. Tator and F. Henry,Multicultural Education: Translating Policy into Practice (Ottawa: Ministry of Multiculturalism and Citizenship, 1991), p. 20.
  20. M. Duran, speech given at Words-into-Action, a symposium on race-ethnic relations in large urban school boards, York University, Toronto, 1983, p. 67.
  21. See, for example, M. Cheng and A. Soudack, Anti-Racist Education: A Literature Review, report 206 (Toronto Board of Education Research Services, 1994), p. 39.
  22. See, for example, Ontario, Ministry of Education, Consultative Committee on Assessment and Program Placement of Minority Students for Educational Equity, Equal Educational Opportunity; Samuel Messick, "Assessment in Context: Appraising Student Performance in Relation to Instructional Quality," Educational Researcher 13, no. 3 (1984): 3-8; and Ronald Samuda, New Approaches to Assessment and Placement of Minority Students: A Review for Educators (Toronto: Ontario Ministry of Education, 1990).
  23. Chodzinski, "Teacher Strategies for Non-Biased Student Evaluation," p. 69. The list is based on work by Ronald Samuda
  24. Ontario, Ministry of Education and Training, Changing Perspectives (Toronto, 1992), p. 20.
  25. Messick, "Assessment in Context," p. 3-8.
  26. Enid Lee, Letters to Marcia (Toronto: Cross Cultural Communication Centre, 1985), p. 60.
  27. Quoted in FairTest Examiner, National Center for Fair and Open Testing, vol. 7, no. 4 (1993-94): 14.
  28. In 1980, a study by R. MacIntyre, A. Keeton, and R. Agard found that certain diagnostic tests, such as the Bender Visual Perception, the Wepman Auditory Discrimination Test, and the Wechsler Intelligence Scale for Children, were not adequate to identify learning disabilities among minority children. Quoted in Samuda, New Approaches to Assessment and Placement, p. 7.
  29. Darling-Hammond, "Performance-Based Assessment and Educational Equity," p. 8-9.
  30. Ontario, Ministry of Education, Consultative Committee on Assessment and Program Placement of Minority Students for Educational Equity, Equal Educational Opportunity.
  31. Howard Gardner, quoted in Steinberger, "Howard Gardner," p. 26-27.
  32. J. Lewington and G. Orpwood, Overdue Assignment: Taking Responsibility for Canada's Schools (Rexdale, ON: John Wiley, 1993).
  33. For an extended discussion of this evidence, see T. Toch, In the Name of Excellence (New York: Oxford University Press, 1991).
  34. S. Reardon, K. Scott, and J. Verre, "Equity in Educational Assessment: Introduction," Harvard Educational Review 64, no. 1 (1994): 1-4.
  35. H. Goldstein, "Assessment and Accountability." Brief to United Kingdom Parliament, 1993.
  36. Wiggins, "Standards, Not Standardization," p. 17.
  

ISBN 0-7778-3577-0
©Copyright 1994, Queens Printer for Ontario