Chapter 11: Evaluating Achievement
It would seem self-evident that, no matter how carefully designed the
curriculum, or how thoroughly prepared the teachers, we cannot know how
well students are learning without measuring and describing - assessing
and evaluating - their level of achievement and their progress. However,
until recently, such information has been scanty and unclear in Ontario.
Assessment, especially when it is used for decision-making
purposes, exerts powerful influences on curriculum and instruction ... If
assessment exerts these influences it should be carefully shaped to send
signals that are consistent with the kinds of learning desired and the
approaches to curriculum and instruction that will support such learning.(1)
While recognizing that, as public institutions, schools are obliged to
report to the public on how well they have fulfilled their mandate,
educators point to many obstacles to doing so about assessing and
evaluating effectively, efficiently, and constructively. Professionals who
specialize in the complex and technical area of assessment of student
achievement acknowledge that it is easier to carry out poorly than well,
easier to mislead than to inform with statistics, and easier to spend a
great deal of money in assessing what students know than to improve
teaching or learning effectively. (We are referring here to professional
educators, not to those who have tried - and, in many places, succeeded -
in creating profitable businesses built on mass testing that is saleable
rather than genuinely useful.)
As the discussion of curriculum emphasized, learning does not proceed in
neat steps, each one exactly equal, nor in an unvarying sequence;
therefore, tests cannot be applied to students as simply as quality
control can be applied to objects coming off a conveyor belt. Tests will
not fix students' problems or improve teaching; they will not guarantee
that students will find successful jobs or careers. At best, they can tell
parents something (but never everything) about what their children know,
and give teachers useful information about what material they have taught
successfully, and what they need to approach differently.
We know that the schools, the boards, and the province have an
obligation to ensure that student learning is assessed fairly and clearly,
and that it is reported in a readily understandable way. At the same time,
we caution that, no matter how simple it may appear to be to undertake,
assessment is complex and costly. It must be done, and done well, but
without losing sight of the fact that assessment is a means to an end, not
an end in itself. Not only must it enable us to describe what students
know and what they have been taught, it must show where improvement is
possible and desirable. And, although there is abundant evidence that
assessment can cause educators, however unwittingly, to narrow the
curriculum and limit students' and teachers' horizons, it must not do so.
In Ontario, we need more and better information on what students are
learning; we do not need a large-scale testing industry or an educational
system that is driven - and limited - by the need to teach only what is
easily measured, or to measure only what is easily taught.
This chapter considers issues inherent in monitoring and reporting
student achievement, and in ensuring quality and consistency in evaluating
students' work. We describe good assessment practices, and identify ways
in which those responsible for education in Ontario can be more
accountable to the public; as well, we chart directions that will lead to
the continuous improvement that is characteristic of a healthy learning
organization. System accountability, as differentiated from student
assessment, is discussed in Chapter 19.
Student assessment: What people told us
We heard a great deal of concern, mostly from parents and students, but
from others as well, about measuring a student's learning.
Parents want information: to be told, fully, honestly, in a language
they can understand, and in a timely way, how well their children are
progressing in school, and what teachers will do if students are not
making satisfactory progress. Parents want standards in order to know how
well their children are doing, compared to others of their age, or
according to some accepted and consistent criterion of what children their
age should know.
The word "standard" is confusing, because it has a general and
a specific meaning, and both are used in conversations about learning and
assessment. The general meaning is the one implied in a remark such as, "We
need high standards." In this general sense, standards is often
synonymous with goal or expectation, and refers to an ideal; it connotes a
passion for excellence and habitual attention to quality.
"Standards are objective, exemplary ideals that serve as worthy and
tangible goals for everyone, even if some cannot (yet) reach them."(2)
In its more specific meaning, often used by the parents we heard from,
standards are a reference point against which performance is measured.
Educators compare a student's achievements to a number of different
reference points. Performance is compared to that of other students in the
same class, the school system, or the province (norm-referenced); or it is
compared to some pre-determined, expected level of performance (criterion
or outcomes-referenced). Standard in this sense is similar to yardstick,
and refers to a typical, rather than to an ideal, state. Both
norm-referenced and criterion-referenced assessments allow us to describe
the individual student as performing below, at, or above the standard,
whether the standard is other students' performances, or mastery of
content. When people call for "standardized testing," they can
mean either a norm-referenced test or a criterion-referenced test,
although those outside the system tend to be most familiar with the
norm-referenced variety. Examples include the Canadian Test of Basic
Skills and the Gates-McGinnity Reading Test. The old Grade 13 departmental
exams were examples of criterion-referenced standardized tests.
Students, post-secondary educators, employers, and the general public -
like parents - are concerned about standards, each group from a particular
vantage point and interest.
Students told us they are concerned about information: they want
teachers to tell them clearly and promptly what they need to do in order
to improve; they want fairness: they believe (as do many adults) that some
teachers and some schools mark "harder" than others, putting
students at a disadvantage when making application to college or
university. (Or, conversely, marking too easily, and putting students at a
disadvantage because they are ill-prepared for the next grade, or for
college or university.) Thus they, too, are concerned about common
standards for assessment.
Representatives of various sectors of the public - post-secondary
institutions, the business community, some professional groups expressed
concern about the lack of information about what students know and the
existing information that indicates to them that students are not learning
well enough. They were often among those calling for an increase in
standardized testing, as a way of obtaining more information, and
demanding higher expectations (standards) in learning and teaching.
While many parents and community members recommended some kind of
standardized testing program as a vehicle for increased consistency and
clarity about actual student achievement, some parent groups were
concerned about the effects of standardized testing. They noted it might
have a particularly harmful impact on minority, low-income, and
special-needs students, whose real achievement level might not be
reflected because of language differences or difficulties with the test's
form, rather than its content; some teacher groups expressed fears that
the results of such tests might be misinterpreted.
The recent history of student assessment in Ontario
In recent years, there has been an increasing emphasis on
assessment, as well as an increasing concern about the nature of the most
widely used forms of student assessment and uses that are made of the
results.(3)
The fact that many people are asking, with some impatience and a sense
of urgency, for more information about student achievement across Ontario,
reflects the lack of such data over the last several decades, compared to
earlier times and other jurisdictions, and the current crise de confiance
about education, an anxiety which is certainly fed by lack of concrete
information.
Ontario has had very little tradition of standardized testing.
Throughout the '50s and '60s, standardized exit exams in Grade 13
(departmental exams) were given in all subject areas, and formed the sole
basis for entry to university. In the mid-1960s that changed: results from
the exams were coupled with teacher's marks. In the late '60s, the exams
were discontinued and teachers' marks became the only basis for university
entrance. That change was made in part because it was learned that
teachers' marks predicted university achievement as well as the exams.
This should not be a surprise: one would expect that a teacher who has
known a student for a year, and judged his or her performance on a variety
of formal and informal criteria, would be a better predictor of potential
success than any single test. Traditional tests, of the Grade 13 variety,
tended to reflect ability to memorize and regurgitate, and to bear up
under stress - useful abilities, certainly, but not the kind of serious
thinking and knowledge acquisition our schools should foster, and not the
kind of shallow goals that should shape the curriculum.
Teachers have had considerable autonomy in designing their own
assessments, and in making judgments about the quality of a student's
work. Teachers' marks have been viewed as an acceptable and adequate
method of deciding whether students should be promoted, where they should
be placed, and what programs they should undertake.(4)
In the 1970s and early '80s, when other provinces and many American
states were expanding their assessment programs, Ontario was leaving
assessment in the hands of educators. A program called the Ontario
Assessment Instrument Pool (OAIP), for example, created banks of test and
assessment items from which teachers of various subjects at different
grade levels could choose. The OAIP had potential for bringing greater
consistency to student assessment, but its implementation was left largely
to chance and individual initiative, and its potential was never realized.
This policy of leaving assessment to the discretion of individual
teachers was clearly stated in the OSIS policy document (1984) for Grades
7-12/OAC:
For the most part, it is recognized that the most effective
form of evaluation is the application of the teacher's professional
judgment to a wide range of information gathered through observation and
assessment. In order to help teachers evaluate student achievement,
curriculum guidelines will describe appropriate evaluation techniques.
Thus, evaluation techniques were described, but standards against which
to evaluate were not specified.
The first of Ontario's recent large-scale assessments directed at
evaluating the school system's performance were in science and
mathematics. During the 1980s, the province participated in several of
them. The results were reported by the media as generally indicating that
Ontario students scored mid-way, with about half the other jurisdictions
(which usually included a few other provinces as well as many other
countries) scoring higher, and half lower. While this "middle-of-the-pack"
score was an accurate reflection of Ontario's performance for some tests,
it was not for others. In fact, this kind of reporting ignored the size
and meaning of differences; in some cases, these were so small as to be
insignificant and unreliable. What looked like higher or lower scores in a
ranking table were often actually ties, because the spread in points was
minuscule. For example, in the Second International Mathematics Study,
while Ontario was reported as being in the middle of the table in most
areas, in fact only Japan scored higher in algebra; Ontario and British
Columbia were tied with two other countries; and the rest had lower
scores. The same was true in geometry: Japan at the top; Ontario, British
Columbia, and five others tied below it, and the rest below them. But in
typical "league-table" reporting, the results seemed far worse.
Having said that, however, it is true that the performance of Ontario's
students on the math and science tests overall indicated adequate but not
outstanding performance; they tended to be stronger on the basic skills
components than on higher-level problem-solving.
We think that the more impressive distinction between Ontario and some
higher-scoring jurisdictions (these differed from one test to another and,
in addition to Japan, included Hungary, Korea, Taiwan, Alberta, British
Columbia, and Quebec) is not how well our students learned, but how much
they were taught. The results of comparing what is asked on a test to what
the curriculum in a particular jurisdiction is supposed to cover are
calculated as the "opportunity to learn" (OTL). What is found,
when this comparison is made, is that students in Ontario are simply being
taught less - fewer concepts and topics - in mathematics and science than
students in some other countries and provinces. Thus, the problem is not
achievement - our students show similar mastery of what they have been
taught. It is a problem of input, not outcome. While it is possible that
our students might be taught some things which were not included on the
tests, it is clear that they are not being taught many things which
students in other countries are given the opportunity to learn.
In many ways, the OTL data are more compelling than the
achievement results ... the cause of [different OTL results] is that some
countries teach a lot more mathematics or science than others ... it does
raise the issue of whether we ought to be teaching more mathematics and
science ... a topic agreed upon for inclusion [in an international test]
is not necessarily more important than material not included. However,
when one country gives high OTL to twice as many items as another country,
it certainly must raise the question of whether that second country is
teaching enough ... the question of whether we want to teach more material
is settled by examination of subject matter content and societal needs,
and not the achievement results. The comparative OTL data point to the
problem, and curricular analysis answers it.(5)
(In 1995, the Third International Mathematics and Science Study will
involve Ontario students in Grades 3, 4, 7, and 8, as well as secondary
school students, and will include mathematics, science, and physics.)
Recently, the Council of Ministers of Education of Canada (CMEC)
embarked on national assessments in its School Achievement Indicators
Program (SAIP), which samples students in each of the participating
provinces. The first test, in 1993, was in mathematics and included a
sample of 13- and 16-year-old anglophone and francophone students from
across Ontario. Results indicated that the two groups were similar to the
national average in their knowledge of content (number and operations,
algebra, measurement, geometry, statistics, etc.) and problem-solving;
like other Canadian students, and as international tests have also shown,
their problem-solving skills lagged behind their knowledge of content; and
relatively few students were working at the highest levels of achievement.
There was considerable inter-provincial variation, with students from
Quebec (both francophone and anglophone) tending to score higher than
those from other provinces. (Future SAIP testing is scheduled to include
reading and writing in 1994 and 1998, science in 1996 and 1999, and
mathematics in 1997 and 2000.)
In addition, the Ministry has undertaken provincial reviews of senior
geography (1987), senior chemistry and physics (1988), mathematics and
reading in Grade 6 (1989), mathematics in Grade 8, 10 (general) and 12
(advanced) (1990), and writing in Grade 12 (1992). These are assessments
of curriculum effectiveness based on testing a representative sample of
students, plus data based on interviews and observations. (In some cases,
school boards extended testing to all students.) Although the provincial
reviews were not based on explicit learner outcomes, they have been a good
source of information about how well students are learning. The Grade 12
writing review, for example, demonstrated that, while the majority of
students were able to write at a "satisfactory" level, very few
reached the "superior" category.
All these international, national, and provincial studies have used
student samples, which is a much more economical way to assess general
student achievement, although it obviously does not permit reporting on
the individual student or school. For example, we are advised by the
Ministry of Education and Training that the cost of a provincial review is
about one quarter the cost of a test given to every student in Ontario.
Thus, the Grade 12 reading/ writing review cost about $750,000, while the
Grade 9 reading/writing test cost about $3,000,000.
The results of these studies have contributed to public discussion and
concern about education in Ontario, and led to increased interest in
routine student assessment. In 1993, the government responded by modifying
a planned Grade 9 reading/writing review (which would have used a random
sample of Grade 9 students across the province) to become a test taken by
all 140,000 Grade 9 students in Ontario. (A second Grade 9 reading/writing
test is planned for 1994/95, and it, too, will be given to all students.)
The 1993 review was based on a two-week curriculum on the theme of food
(anglophone) and media literacy (francophone) and included an extensive
written portion; test scores counted for 20 percent of a student's final
mark. The majority of students performed at or above the level deemed "adequate."
Some of the media, however, questioned the validity of the terms "adequate,"
"competent," and "proficient," based on examples of
students' writing graded in those terms. Clearly, there is no
pre-determined standard for what constitutes a given level of writing or
problem-solving.
Chapters 7, 8, and 9 referred to the development of learner outcomes
against which progress can be measured; these have been defined for Grades
1 to 9, and we have recommended that they be expanded to the other grades
and levels, and that they be improved. As well, we made reference to the
standards being developed in language and mathematics, and we recommended
that they be established in other foundation areas. These standards could
and should play a key role in future student assessment.
Developing standards depends both on examining actual performance of
different groups and trying to develop consensus among educators and the
public.
Standards may exist at many levels of sophistication and excellence.
They can be set very high (Elvis Stojko's skating, Margaret Atwood's
writing, or John Polanyi's work in chemistry), or they can describe
realistic expectations and worthy and appropriate goals by which to judge
student performance. It is important to note that there is no one way to
define a standard: there must be a variety of concrete examples, known to
all concerned, that make expectations clear.
One of the most difficult and challenging tasks in education today is
establishing these standards, based on informed consensus. Once we have a
useful set of outcomes that describe what students should know and be able
to do, for example in mathematics by the end of Grade 3, we can assess
their performance and compare it to the standards that have been
established.
The Ministry of Education and Training has begun to develop standards in
language/literacy and mathematics/ numeracy. These are based on the
learner outcomes for The Common Curriculum for language and mathematics
and suggest different levels of performance such as "limited," "adequate,"
and "proficient" for students at the end of Grades 3, 6, and 9.
A student's performance can fall into one category or another in each
subject, and within each subject in several areas. The math standards, for
example, are built on areas within math that are specified in The Common
Curriculum as "measurement," "problem-solving," "algebra
and patterning," etc. These standards are intended to provide
descriptions of expected levels of achievement by which students' learning
can be assessed, and to provide a clear basis for board-wide and
provincial assessments of student achievement. As we said earlier, learner
outcomes and standards must be very clear for all foundation subjects:
language, mathematics, science, computer literacy, and group
learning/interpersonal skills. As these standards are developed and
refined, they will become the yardstick against which teachers and the
public can measure student performance. In fact, the Ministry of Education
and Training has already indicated that it plans to use the standards as a
basis for assessment at the end of the three grades, although it has not
been specific about how it intends to carry that out.
We are convinced that the Ontario government, and educators'
professional associations and bodies, must make a serious, long-term
commitment to assessment, both for improvement and for public reporting
and accounting. While public discussion of the issue often focuses on
large-scale assessment as an indicator of how the system is working, it is
also a tool for improvement. As a commission on learning, we are very
concerned about the quality of assessment, formal and informal, that
occurs daily in the classroom, and that informs, or should inform,
students, teachers, and parents about improving performance. Much more
than large-scale assessment for public accounting, this level of frequent
and cumulative assessment has the potential to increase and enhance
learning.
Assessing individual students
This section covers four issues. The first, and most important, is
assessment for improvement; second is reporting clearly, accurately, and
fairly what has been learned. In our opinion, fairness means that
individual student assessment is consistent that a 75 percent at one
school is not a 65 or an 85 percent in another; moreover, parents must be
accurately informed about what their children have achieved in relation to
explicit and universally applied standards.
Third is the role of information technology, which has a significant
contribution to make to improving assessment practice. Finally, there are
issues of bias in assessment evaluating students fairly across gender,
social, and cultural lines.
Assessing for individual improvement: The most important reason
The most important use of assessment is as a way of finding out how well
students are doing in order to help them learn better, more, and faster.
Assessing what students know - and what they don't - enables teachers to
capitalize on students' knowledge, and focus on gaps in it. Furthermore,
by examining student performance, teachers have the opportunity to assess
the success of their own methods and efforts. Evaluating students
regularly enables teachers to monitor learning, and make changes when
learning is not occurring, not occurring fast enough, or not occurring in
sufficient depth. Regular evaluations, with frequent and detailed feedback
from teachers, assure students that they understand what is being taught
and can move onto the next task, thus advancing student learning. We call
this formative evaluation, because it helps form the learning and teaching
needed to achieve success.
Large-scale assessments, used to monitor the school, school board, or
province as a whole, and individual assessments (such as final exams) used
for marks and accountability, are not very useful to individual students.
First, students, who need immediate feedback, typically do not find out
how well they did on these tests for some time. Second, the results may be
just a letter or a number, rather than an analysis of strengths and
weaknesses. Third, large-scale tests usually ask questions that are easy
to mark, but do not measure problem-solving, analytic ability, or
understanding.(6) While marking of surface features
like capitalization and punctuation may be carried out by computer, such
assessment methods cannot adequately cover content, style, and other
elements; nor can they distinguish between a wrong answer which reflects
real misunderstanding or ignorance, and a wrong answer which reflects
simply a mechanical error.
Teachers and students alike show disrespect for learningand
teaching that emphasize "just the facts," are notapplied to "real"
problems, are "low level," or require "regurgitation."
In spite of these espoused beliefs,much teaching and learning is shallow,
and there islegitimate concern that this is the result of
evaluationpractices and perceptions of them.(7)
It is essential that assessment be a regular part of learning. In
Ontario, classroom assessment has been the typical vehicle for assessing
individual student learning. It is part of the daily experience of
educators and students, an integral part of classroom activity, and occurs
frequently. It may be formal or informal and is often indistinguishable
from instruction; it may take place with an individual or in a group.
Classroom assessment includes oral questions, teacher-created tests,
quizzes, essays, assignments, examinations, projects, as well as
observations of performance, and any other products or samples of work
that might provide information about performance. Because it is frequent
and varied, classroom assessment can tell far more about what a student
knows and has learned than any single test. Teachers have opportunities to
observe whether or not students are learning to think critically, to make
connections between prior and new learning, and whether they take pleasure
in learning. "Using one assessment procedure is like using a hammer
to do everything from brain surgery to pile driving."(8)
If a test is to give accurate data on a student's full knowledge and
understanding of a single concept, it must comprise a number of questions.
Telling, reliably, what a 10-year-old knows about math requires a lengthy
test. A test that would give reliable information on what that 10-year-old
knows about math, language, science, and computers would have to be
administered over several sessions, would probably take on a significance
in the minds of teachers and students that exceeded its value, and still
could not provide the accurate and meaningful evaluation of continuous
classroom assessment.
In the classroom, students can work on projects that result in a useful
product, or in a real discovery about how things operate. They can write -
on paper or on a computer screen - for a real audience, whether a student
in another school, near or far, or for the newspaper of the school or the
town.
A lot of intelligences really can't be tested for, in the
sense that we usually use the word "test." What we need to do is
to create school environments where you can observe a lot about what kids
are good at, what interests them, and where they show substantial growth.(9)
While professional preparation and continuing professional education may
expose teachers to all kinds of assessments, good assessment for
improvement requires much more attention than it has traditionally
received, more than can be delivered in a oneor two-year pre-service
program. Designing and marking tests and other assignments (papers,
presentations, projects) should be a priority in professional development,
as should the systematic use and interpretation of information based on
observing and meeting with students. Such training cannot stop when a
credential is awarded: it must continue in schools.
Although it is common for educators to point out that the danger of
large-scale testing is that it tends to measure what is most easily
measurable, it is equally true that accurately evaluating more complex
thinking skills in the classroom demands careful training, extensive
supervised practice, and the development of skills that are seriously
neglected in teacher education.
For example, when students are asked to summarize a story, their product
- the summary - can be at the simple level of listing all the ideas in the
story or text, in which case the writer shows immaturity in carrying out
the assignment. (This may be quite appropriate for a young learner, but it
is unsatisfactory later on.) A more adequate summary shows some judgment:
the reader selects the main ideas, and links them together sequentially.
But this kind of summary still attempts to pay equal attention to each
section or episode of the text, to summarize the plot, and usually goes on
at length. A summary which shows real comprehension and proficiency
(beyond listing and linking main ideas) examines underlying themes, pays
more attention to some main ideas than others, or even constructs new
ideas, by building on the significant themes of the text - the famous "reading
between the lines." Reading and assessing students' work for higher
levels of literacy, what some call depth of processing in learning,(10)
is not something that all teachers know how to do, or how to describe to
students and parents. But it is the kind of analysis and assessment that
is necessary, if we are to teach and to assess for understanding.
Based on what we learned in the hearings and from the research, teachers
must provide more and better feedback to students and parents, which
pinpoints strengths and weaknesses, results in teachers and students and
parents doing things differently, and is timely enough that it contributes
to what the student is learning now, and what the teacher is teaching now,
rather that to what was taught but not learned weeks or months ago.
In essence, this is like coaching: for example, a teacher observes a
student making an oral presentation on the use of the computers in graphic
design and finds that he or she speaks too quickly and does not frame the
presentation in a manner that allows the listener to follow easily. Rather
than waiting until the term report and noting that the student is weak in
presentation skills, the teacher needs to tell the student as soon as
possible that speed and organization need improvement, help map out a
possible reorganization, discuss techniques for slowing speech, and offer
an opportunity to try the presentation again.
Our belief is that the first report card of the year, whether at the end
of October or in December/January, should not contain surprises for
parents. It should not, for example, indicate that the youngster is
reading below grade expectations, when the parent has not previously been
made aware of the problem. We know (because we heard about it and because
some of us have experienced it) that it does happen, and that it should
not. The report card may not always bear good news, but the contact
between parents and teachers should be frequent and consistent, whether or
not students are performing according to expectations.
Parents need to see the results of routine classroom tests and the
evaluations of regular classroom assignments throughout the year, starting
in September, as well as portfolios of students' work, with indications of
progress made from earlier to later efforts. Teachers need to inform
parents about what has been covered in recent weeks and what is coming up;
they should tell parents how, at home, they can support their children in
gaining specific skills or knowledge.
Our strategy for enhancing individual student assessment for
improvement, including helpful feedback, involves giving teachers the
information and skills to link better assessment to student learning.
Programs that build the capacity to reliably and consistently evaluate
writing, problem-solving, understanding, and analysis in all subject areas
- in other words, to assess the achievement of the higher-order literacies
that we want our graduates to have - are an investment in the ability to
measure what matters most. They are a commitment to teach, re-teach, and
teach better. Such programs demand considerable time, and thus can be
expensive, as is most high-quality, professional training. But, to the
extent that we can teach teachers to evaluate complex thinking skills well
and consistently, we build the capacity to measure well what matters most.
Consistency is tied to fairness - a subject about which students said a
great deal. Right now, the only training teachers get on consistency in
assessing critical thinking and communicating skills is in relation to
provincial subject reviews and OAC examinations (given in the final year
of high school for students preparing to enter university); these do not
affect most teachers. But all teachers need to be better educated in
assessing, whether that is being done through written tests, essays,
presentations, or projects.
Because we are care above all about learning, our first concern with
assessment centres on teachers' ability to assess student work accurately
and consistently, and to communicate effectively to students (and to
parents) how they can improve. We are convinced that assessing for
purposes of improvement always depends on the teacher's ability in both
assessment itself and on response to the results. That is why the first
recommendation we make about assessment is that efforts in this area be
the subject of teacher education at every level: in faculties of
education, school boards, schools, and continuing professional education
at such post-graduate institutions as the Ontario Institute for Studies in
Education (OISE).
Assessing understanding, critical thinking, and the ability to
generalize, synthesize, and apply knowledge from one situation to another
is very complicated and requires considerable experience and practice.
Reporting the results of such assessment takes time - to think, to write,
and, often, to discuss results with the student and/or parent. The
necessary skills are built throughout the teacher's career. We believe
that a great deal of the practice and training should take place in the
school, with teachers working systematically in teams to mark papers and
presentations, and to discuss their ratings, guided by consultants who
have expertise in assessment.
Recommendation 46
*We recommend
that significantly more time in pre-service and continuing professional
development be devoted to training teachers to assess student learning in
a way that will help students improve their performance, and we recommend
supervised practice and guidance as the principal teaching/learning
mechanism for doing so.
We hasten to point out that we are not suggesting that teachers test or
assess more or mark more papers, but that they bring a higher level of
professional training and expertise to the process of assessing and
reporting on what students have achieved.
Accounting for student assessment: Reporting what is learned
Accountability begins, then, with something more humble
than large-scale testing: it begins with ... teachers monitoring and
adjusting the daily homework and classwork of students rigorously and
consistently. It begins with not accepting work that is shoddy ... It
begins with a policy that says schools will send reports home more than
twice a year. In short, if you want to stop the kind of minimal compliance
and perfunctory work that can sink a school, you'll need an effective and
timely grading system, reporting mechanisms, and promotion standards.(11)
Thus far, we have discussed the importance of assessing students for
improvement, giving all students a fair opportunity to demonstrate what
they know, and offering feedback to students and parents to keep them
apprised of the students' progress through frequent and consistent
communication.
The final report for the term or year/semester is particularly
important: it tells student and parent what level of learning has been
achieved in the required knowledge and skills for that course or year. The
evaluation summary that appears on the year-end report is permanent: it
goes into the Ontario Student Record and may be used by other teachers for
planning, or as a way of diagnosing student performance. The report may
also be a factor in decisions about course or class placement, streaming,
and planning for post-secondary education. Hence, the quality of that
assessment has long-term significance. Schools and teachers are
accountable to parents for its accuracy and reliability.
We heard from parents and others that report cards are not very helpful:
they are unclear or lack sufficient information on how much the student
has learned and where the focus for improvement should be. While some
parents want marks in letters or numbers, others want more detail and a
better sense of how their children are doing. Many parents brought report
cards to our public hearings, or sent them, pointing out inconsistencies
and "edu-babble." These examples did not reflect well on the
teachers, principals, schools, or school boards involved.
While parents who are in regular and friendly communication with a
child's teacher are likely to be well informed about the child's progress,
that level of communication isn't always maintained: a parent may not be
able or willing to articulate concerns or misgivings, or may not always
understand or agree with the teacher's analysis. More frequent and more
candid communication would do more to correct this problem than any
increase in assessments or testing.
Teachers have an obligation to be sensitive to parents who don't
understand, don't agree, or who have difficulty articulating their
concerns. They have to reassure parents who are afraid to voice
misgivings, lest their children suffer some form of retaliation. The fact
is that no report card, no matter how precise, makes good communication
between teacher and parent obsolete or less vital to the student's
well-being.
Parents also want to know how their children are progressing in terms of
acceptable and universal standards which, until recently, had not been
established. Now that they have begun to be established, standardized
assessment is possible - as long as teachers are equipped to carry it out.
As already noted, the recent development of learner outcomes and
standards is helping to create a clearer and more provincially consistent
basis for curriculum and standards on which assessment will be built. That
is a crucial step. We have urged the Ministry of Education and Training to
develop "curriculum guidelines based on the learner outcomes that
will give teachers and parents a clear idea of the basic structure of each
curriculum area each year." (See recommendations in Chapter 8.) We
have recommended that, at the beginning of each school year or semester,
schools give parents and students information on course content, based on
clear learner outcomes. We have also suggested that the learner outcomes
in the common curriculum courses be made more readily understandable, and
that outcomes statements are needed for all grades and subjects, including
the specialized curriculum in Grades 10 to 12.
Clearly written learner outcomes, even without descriptions of different
levels or standards of achievement, would make it considerably easier for
parents to know what their children are expected to learn and what they
have learned. The standards (which have been developed for language/
literacy and mathematics/numeracy, and which we have recommended be
developed for science, computer literacy, and group learning skills) give
parents information they need if they are to better understand and
informally assess their children's progress. We believe that reporting to
parents should be based on the same learner outcomes and standards as the
curriculum. Thus, in a parent-teacher-student conference, parents should
be shown examples of work of different standards, so that they can fully
understand their own child's level and mark. Report cards should reflect
the student's level of attainment of major outcomes, measured by adherence
to clear and universal standards.
Goals are made clear if, at the beginning of the school year, parents
and students are provided with a written description of expected outcomes,
and then get feedback on students' learning throughout the term or
session; report cards must be consistent with this information. The
importance of evaluating students according to uniform and explicit
standards also pertains to issues of fairness and consistency.
An individual student or parent says, "It isn't fair that teacher X
(and/or school Y) gives much easier marks than my teacher (school). It
gives those students the advantage of a higher average and means they get
a place in university that is denied to me, even though my 80 percent is
worth as much as their 90 percent." Beyond the individual's
complaint, universities and colleges worry about screening applicants to
get students who are most likely to be successful. Employers worry about
the meaning and value of a transcript or diploma. Society worries about
whether its best and brightest have opportunities for higher education so
that they can become pillars of a productive and competitive society.
Because teachers have been held responsible for using uniform,
consistent standards that did not exist, they have used their own. The
supposed objectivity of numbers, percentages, and letter grades obscures
the fact that standards differ; a provincial standard should mean that,
while differences in teachers' marks will never completely disappear, they
will be fewer, smaller, and less significant.
It is of course true that we can never eliminate all subjectivity in
assessment, and cannot pretend that there is or ever will be a fool-proof
objective test of everything we want students to know. We can, however,
take steps to modify and decrease, albeit not eliminate, inconsistency
among teachers in marking.
We have spoken earlier of the necessity to improve teachers? ability to
assess students? work accurately and consistently, and of our belief that
this professional education must begin early and continue through the
teaching career. In order for that training and practice to be most
efficient and effective, it is highly desirable that its content be
determined by the learner outcomes and standards which teachers will be
assessing students on. In order to offer this support, it will be
essential to create resource materials and manuals keyed to the
curriculum, to guide teachers both at the training and application stages.
Such materials must give multiple examples of how the achievement of
specific outcomes at various levels (or standards) can be consistently
measured. "There is no reason why we have to be assessed in the same
way ... If I understand a mathematical principle and I can show you it one
way, it's not really important that I show it to you in another way."
(12)
Recommendation 47
We recommend
that the Ministry of Education and Training begin immediately to develop
resource materials that help teachers learn to assess student work
accurately and consistently, on the specific learner outcomes upon which
standardized assessment and reporting will be based.
One valuable resource has already been developed, but needs to be
updated and refined: the Ontario Assessment Indicators Program (OAIP),
referred to earlier, which contains assessment items and ideas for many
grade levels and subjects.
The next step, we suggest, is creation of a provincial report card, an
Ontario Student Achievement Report (OSAR) based on the outcomes and
standards expected in each grade and each subject. In addition to a global
mark for each subject or interdisciplinary area (e.g., math or arts),
students should be rated on a set of specific outcomes, derived from the
common curriculum and provincial standards documents. In the first and
second terms, the report should indicate the extent to which the student
is (or is not) making good progress toward the achievement of each of the
several outcomes related to the particular subject and, at the end of the
school year, has or has not achieved that outcome at a satisfactory level.
In the term (and possibly the final) reports, the teacher should include
practical and specific suggestions for students and parents for progress
and how it can be achieved. The teacher who works at being a capable
assessor of foundation skills will give parents the information they want:
a clear indication of where their children stand as measured by provincial
standards. In other words, we believe the accountability so many parents
are asking for is based on clear standards, and on able teacher-assessors
making unambiguous reports, the core of which (the key learner outcomes
reflected in the report) will be the same for all teachers of the same
grade or course. We also believe that teacher comments are a very
important part of any report card, and should refer to significant,
authentic demonstrations of knowledge and skills, or to indications of
genuine difficulties.
We also suggest that, after Grade 9, when students follow different
programs each semester or year, it is desirable to have the same kind of
standard reporting format. We have recommended the development of learner
outcomes for the courses that follow the common curriculum of Grades 1 to
9; once they exist, the OSAR is equally appropriate after Grade 9. Each
subject teacher would indicate the extent to which the student is
achieving the expected outcomes, give the student a global mark in the
subject, and include helpful comments to the parent. In keeping with
current practice, subject teachers' reports would be combined into a
single report, possibly with comments from the home-room teacher or
advisor-teacher who examined the student's progress across subjects. All
of this could be greatly facilitated through the use of standard forms and
computer programs developed centrally by the Ministry of Education and
Training.
We do not want to remove the flexibility of teachers and schools in
reporting to parents in a way that reflects local needs and preferences.
We suggest that the Ministry prepare a common report card based on the
expected outcomes in each grade within the common curriculum (and each
course within the specialized curriculum) and that it provide an
electronic copy to every board; boards could seek permission from the
Ministry to make additions, but not deletions, and any substantial changes
in content or format would require the approval of the Ministry. Of
course, boards could add other documents, as long as the Ontario Student
Achievement Report was the main vehicle of communication. There should be
ample room for teacher comments as well as the check-offs on achievement
levels. Translations should be provided by the Ministry for parents who do
not read French or English, and a Braille version could also be developed.
The Ontario Student Achievement Report should be designed by a team of
educators and assessment experts, with significant input from the
community, (through the Ontario Parent Council, for example) and, at least
at the secondary level, from the three student federations or the Ontario
Student Council (see Chapter 17). The OSAR should be field-tested
initially and reviewed regularly to ensure that it meets the needs of
teachers, parents, and students.
We are not suggesting that the OSAR for Grade 1 be the same as for Grade
7, even with differences in outcomes. We believe that professional
educators, students, and parents are in the best position to decide how
reports should be structured, given the differences from one age to
another. The key criteria are clarity, a direct link to learner outcomes
in the curriculum, and input from the users.
Recommendation 48
*Therefore, we
recommend that the Ministry of Education and Training, in conjunction with
professional educators, assessment experts, parents, students, and members
of the general public, design a common report card appropriate for each
grade. To be known as the Ontario Student Achievement Report, it would
relate directly to the outcomes and standards of the given year or course
and, in all years, would be used as the main vehicle for communicating, to
parents and students, information about the students' achievements. While
school boards would not be permitted to delete any part of the OSAR, they
could seek permission from the Ministry to add to it.
We come now to the matter of setting a standard for communication, one
that recognizes the importance of assessment and the right and need of
parents to have information on their children's progress, if they are to
support learning and the school.
We believe that, in each school year, all teachers should have a minimum
of two conversations, in person or by phone, with the parents or guardians
of each student for whom they carry prime responsibility.
These conversations (and we see two as a minimum), which are in addition
to the formal conference at report-card time, should focus on student
achievement, improvement, and concrete suggestions about what parents can
do to support their children's learning. From kindergarten to Grades 5/6,
this would include all the students in the "main" class, while
students in a rotary system would be the responsibility of a home-room
teacher or a teacher-advisor, as recommended in Chapters 8 and 9.
We suggest that the first conversation take place prior to the first
report if, as often occurs, that is scheduled as late as December;
beginning in Grade 7, the discussion would probably make reference to the
development of a Cumulative Educational Plan (CEP). (See Chapter 8.)
We are convinced that the key to assessment for accountability to
parents is teacher-based standardized assessment which indicates how much
progress students make over a year toward the achievement of critical
learning outcomes. We think that the government would be wise to invest
the considerable monies necessary for good assessment where there is the
biggest payoff for students: in extensive, high-quality teacher education
for extensive, high-quality, standardized, classroom-based assessment.
The uses of information technology in improving student
assessment
In our opinion, information technologies, and in particular
microcomputers, can help implement educational practices in accordance
with the principles of formative assessment. First, they enable data to be
collected and analyzed coherently, and second, they help to improve
teaching and student learning.(13)
We agree that the computer has an important place in individual student
assessment, particularly in its potential for giving students quick
feedback on how much and how well they have learned.
Eric Dempster, head of the Business Department at Sir Wilfrid Laurier
Collegiate Institute in Scarborough, e-mailed a submission to the Royal
Commission, giving an example of the way technology can be used in
testing, in order to improve student learning. Mr. Dempster says he first
used computers for assessment six years ago and allowed students,
including those who would have failed but had never been given the
opportunity to do better, to take tests more than once. Mr. Dempster
averaged the test marks, which provided an incentive to do well the first
time, but also showed students they could improve. "The overall
result [was] that the poor students felt empowered and realized quickly
that they could improve."
His present testing software randomly generates questions, prevents
students from restarting a test, and includes graphics.
The students in Mr. Dempster's class are learning more than just the
subjects he teaches: they are discovering that they can improve, and that
self-assessment is an important part of the process. Many employers told
us that, if they are to stay competitive, future workers will have to be
experienced in self-assessment. And, because it involves the student
guiding his or her own learning with the support of technology,
self-assessment also has the potential to increase the teacher's role as
coach and mentor.
Mr. Dempster's experiences have been replicated in classrooms where
Computerized Adaptive Testing (CAT) is being used: the computer chooses a
question on the basis of the answer to the previous one.(14)
A correct response results in a harder question, while an incorrect one
elicits an easier question. This quickly clarifies the level at which a
student is working, and uses few questions to do so; it also pinpoints for
students the areas in which they need more help and/or more practice, and
makes them responsible for their own progress.
Immediate feedback can be used to motivate students who might otherwise
have very little interest in school. This was one finding of a pilot
project in New York City (15) that involved a group
of inner-city students considered most at-risk of dropping out. They
visited the computer lab once a week and took computer-generated "adaptive"
math tests. The computer provided students and the teacher with immediate
feedback, "rewarded" students who reached 100 percent in each
topic with a graphic of a hamburger, and generated practice sheets for the
rest of the week.
Contrary to common expectations of them, many at-risk students in the
experimental group sought to do well in the computer tests. Sometimes they
argued with the teacher that a response marked by the computer as
incorrect was, in fact, right, thus indicating that the assessment
mattered to them. An unexpected result of the pilot project was
student-generated competition for the hamburger. Over time, the students
did better in math, as the result of the "friendly competition,"
the immediate feedback, and the work of the classroom teacher; moreover,
they were less often found to be "off task," doing something
other than the work at hand.
It is also interesting to note that, contrary to other research
findings, the female students were more comfortable with the computer than
were the males.
For some time, technology has been used in assessment, to collect and
sometimes analyze achievement data. Teachers are already keeping track of
how well students do in assignments and tests, and there is software that
enables teachers to graph or otherwise display and analyze the data.
We are certain that, with more and better data, teachers will be in a
better position to decide on the best types of programs and interventions
for their students. Better information and new ways of displaying it will
mean improved reporting to parents. As well, computer-based assessment and
diagnosis will reduce marking time for teachers, eliminate errors in
marking, and offer opportunities for different test formats and for tests
in other languages.(16)
However, good assessment software (of which there is an inadequate
supply) should do more, moving students from simply accumulating facts to
organizing, analyzing, and transforming data. It should measure the
quality, rather than simply the quantity, of the student's understanding.
And it should be capable of making assessments using portfolios and "real-life"
performances based on provincially set standards, with fewer
multiple-choice (sometimes called "multiple-guess") tests to
compare one student with others in the class, school, or province.
Software that requires students to solve problems, that includes
high-quality three-dimensional graphics, and that requires students to
present their answers and solutions in a variety of formats, will
challenge students to show they understand rather than just remember.
There is a long way to go before Mr. Dempster's on-line assessment is
the norm in Ontario's schools. Change of this nature requires professional
development, adequate hardware, and the right kinds of software, screened
for bias. (And, as we make clear in the next section, equal access to
computers is a necessary element in eliminating assessment bias.)
We believe that the potential of information technology to improve
assessment is substantial, and suggest that information technology play a
prominent role in teacher development in assessment, and that the Ministry
of Education and Training, in making high-quality software available to
Ontario schools, place emphasis on the potential that software offers for
improving assessment.
Avoiding bias in assessment: Respecting differences, recognizing
diversity
The notion that a student, because of colour, race, or
handicap might be streamed to an educational program which is not
consistent with the attributes and abilities of that individual is
unacceptable.(17)
We have discussed the importance of frequent and accurate assessment of
student learning and literacies, and recognized the link between timely
feedback and effective student learning, as well as the need to report to
parents and the larger public. However, the Commission is very aware that
assessment, when not carried out well, can have serious negative
repercussions on individuals and on groups of students. The challenge to
be effective, helpful, and fair means ensuring that assessment is done
well, not that it is avoided.
Assessment must be as bias-free as possible, so that gender, social
class, race, culture, and disability are not treated as negative factors.
The results of assessment, even of routine classroom assessment, are
likely to have an important effect on the confidence and motivation of
students, which, in turn, affects performance. Assessment may also have an
impact on the student's academic career, and has the potential to cause
life-long damage to the person who is assessed below his or her real
ability and streamed into lower groups (the "lambs" rather than
the "lions" reading group), special education classes or
non-university high school streams.
A growing number of parents and educators are raising
questions about the over-representation of minority students in special
education, vocational, and basic-level programs. The essential concern
focuses upon the perceived use of inappropriate testing materials,
assessment practices, placement strategies, and restrictive learning
opportunities in some jurisdictions.(18)
Many groups are concerned about bias.(19) Various
forms of assessment have shown that those who are poor, members of some
minority groups, or who are female perform less well than their knowledge
or skills would warrant. Some communities complain that their students
have been negatively streamed because of biased assessments. For example,
more than a decade ago, a York University symposium on racial and ethnic
relations in city school boards was told by Marcela Duran that
we were able to institute an experimental program, in
co-operation with the Jamaican-Canadian Association, in which 100 West
Indian children who had been placed in vocational schools were
re-assessed, using different testing instruments. According to this
process, 90 of these students were found to have been wrongly placed.(20)
We agree that there is ample evidence that students from some groups are
more likely to be placed in lower "ability" classes and streams
than others,(21) and that assessment methods may
figure in those decisions. But we are convinced that improvement depends
on more than just modifying assessment procedures: changes are needed in
curriculum, teaching methods, and other areas (including, as we make clear
elsewhere in this report, a fundamental reduction in streaming).
Given the importance of assessment, it must not only avoid bias on the
basis of gender, social class, or cultural background, it must reflect
diverse skills and knowledge, valuing what students know and can do, even
if they express it unconventionally or do it in different ways.
In Ontario, as in other Canadian jurisdictions, in the United States and
in England, a great deal of attention has been paid to the way assessment
bias affects minorities and immigrants. This is because some minorities
and immigrant groups, as well as students from poor families or
communities, are over-represented in special education classes and
non-university streams.(22)
Test bias exists in many different contexts: for example, despite our
support for computer-based assessment, we recognize that bias can be found
and perhaps even made worse by the use of information technology. We know
that students from different socio-economic backgrounds have different
levels of access to computers and, therefore, that some will be more at
ease than others and that comfort levels undoubtedly affect results.
Four potential causes of bias have been identified in assessing students
who are members of ethnic or racial minorities or who are immigrants: bias
in the test's content and form; in the way the test is given; as a result
of factors in the student's environment, in or outside school; and in the
ways results are interpreted and reported.(23) Many
of these are related to the inadequacy of teacher education in assessment,
and lead to inappropriate student placements.
Educators must also be careful, when assessing students of
ethnic/racial minority backgrounds for placement in special education
programs, to ensure that due consideration has been given to linguistic
and/or cultural factors that can preclude fair and accurate assessment.(24)
Assessments of many second-language students do not adequately
differentiate between language-related difficulties and the actual level
of knowledge or skill the students possess. The person who thoroughly
understands all the material at hand will not be able to answer even the
simplest question, if he or she does not comprehend the language in which
it is being asked. There is the related problem of confusing linguistic
deficits with deficits in ability. Students who have emigrated to Ontario
may need time to learn the language, but that does not necessarily mean
they need remedial or special education.
There is also the issue of measuring students in terms of what they have
learned or are capable of learning, in contrast to assessments that have
more to do with the learning environment than with any inherent
characteristic of the learner.(25) Is the "learning-disabled"
student genuinely disabled, or is the problem a lack of instruction in
reading, in disguise?
Before decisions are made to place students in special education classes
or in non-university streams, there should be evidence that they cannot
achieve progress by changing curricular material or being assigned to a
different teacher, and that modified regular-classroom teaching strategies
that are being used successfully with other youngsters from a variety of
ethnic, linguistic, and socio-economic groups are not working.
Stereotypes develop as we attempt to organize people into
categories and to make sense of our world. That in itself is not the
problem. However we are in real trouble when these categories are so
closed that they prevent us from seeing people's full potential.(26)
There is also evidence that, on multiple-choice tests, girls and women
do not do as well as boys and men. According to a joint study by the
College Board and the Educational Testing Service in the United States,(27)
"the gender gap is substantially larger for multiple-choice items
than for other types of questions." The study found that the gender
gap narrowed or disappeared when students had to write their answers, as
in essays or word problems. The study concludes that a mix of assessment
instruments is necessary to ensure equity in high-stakes standardized
testing.
Another form of gender bias is found in tests that include questions or
examples related to activities more frequently of interest to males than
to females - certain sports, for example. Obviously, assessment tools must
treat male and female students equally, and must meet the needs of our
diverse school populations.(28)
In trying to remove bias from tests, efforts have tended to focus more
on the material than on training teachers to construct bias-free tests or
to use fair testing techniques. This is baffling, given that most forms of
assessment - tests, assignments, projects, oral discussions, etc. - are
part of the daily interaction between the teacher and students. Clearly,
more attention must be paid to teacher education and to on-going
professional development.
More frequent and more varied classroom assessment is another way of
minimizing bias, but it presupposes that the teacher is familiar with a
variety of techniques. When testing or examining students, giving them a
choice in the way a question is answered also helps.
A fair assessment also takes the individual student's environment into
account. For example, assessing for placement purposes may be
inappropriate for a recent refugee or for a student who has just moved
from French immersion to an English-language program. Assessment in the
student's first language has been shown to isolate problems related to
acquiring a second language, rather than to gaps in knowledge or skill,
and it should be used where suitable and possible.
Teachers must have a sense of whether or not students and parents
believe that an assessment is fair; if they see it as unfair, there is, at
the very least, a problem of communication and there may also be one of
equity. When it is impossible to test a student in a first language or to
delay assessment of a refugee student, it is vital that the student not
suffer as the result of our lack of resources or time. That means, for
example, not placing the refugee student with younger children when a test
might reveal that what is needed is a specially planned program with
specific kinds of support.
Bias in assessment will become increasingly important as Ontario
participates more regularly in assessments that encompass other provinces
and other nations. This is particularly true in a province that is
geographically and socially diverse, and that will become even more
culturally and linguistically varied. Fair assessment is vital if the
system is to more fully reflect the needs of all students.
As a tool for tracking students into different courses,
levels, and kinds of instructional programs, testing has been a primary
means of limiting or expanding students' life choices and their avenues
for demonstrating competence ... [T]he goals ... of assessment are being
transformed from deciding who will be permitted to become well-educated to
helping ensure that everyone will learn successfully.(29)
In our view, the Ministry must take the lead role in ensuring that its
own assessment instruments treat all students equitably and that the
materials used in schools are appropriate and fair. It can do this by
evaluating the substance and procedures used in assessment and by
monitoring the placement of various groups by stream (or track). The
Ministry's new anti-racism, equity, and access division can lead the
effort to ensure fairness in assessment. It should also be responsible for
monitoring implementation of recommendations made by the Consultative
Committee on Assessment and Program Placement of Minority Students for
Educational Equity.(30)
Recommendation 49
*We recommend
that the Ministry monitor its own assessment instruments for possible
bias, and work with boards and professional bodies to monitor other
assessment instruments; that teachers be offered more knowledge and
training in detecting and eradicating bias in all aspects of assessment;
and that the Ministry monitor the effects of assessment on various groups.
Large-scale assessment of student achievement and the effectiveness of
school programs
Large-scale assessment of student achievement
Having said that assessments should be based on agreed-on standards, and
that teachers should be trained to use them skillfully and fairly and to
communicate their results clearly, we turn now to the matter of external
tests, given simultaneously to all students in a grade or course. Some
people believe that these are a more objective and therefore fairer and
more accurate measure of what students have learned. We believe that some
system-wide testing should be built in, as a check on student learning at
a few critical transition points, and as a vehicle for assuring people
that, at those points, all students are being assessed according to the
same yardstick.
However, it is important to emphasize that large-scale testing has
limitations; otherwise, people reach what we are convinced is the mistaken
conclusion that these few tests are the most important in the student's
school career, or that many such tests would be ideal. In our opinion,
large-scale testing is unlikely to be a more fair and accurate
representation of student learning than the best judgment of the
well-trained teacher-assessor. Moreover, such testing is easily misused.
The following are the three basic problems of using large-scale testing as
the major form of student assessment.
First, any external testing is, of necessity, much briefer than
classroom-based assessment: a single test cannot reflect everything
students are expected to learn over a year. For example, to get a true
reading of what a Grade 6 student has learned in math, a number of tests
would be necessary, each quite lengthy, to overcome such irrelevancies as
the student's level of well-being (hours of sleep, nutrition) that day, or
the use of an unfamiliar word in a problem (which might lead to the
erroneous conclusion that the student didn't understand the question or
the mathematical operation), etc. The reason we are urging that the major
source of data on student achievement be that which is collected by the
classroom teacher over the year is precisely because that is what offers
the greatest potential for reflecting, cumulatively and in summary, what
has been learned. A simple achievement test, such as the Canadian Test of
Basic Skills, or others of that kind, is not designed to reflect what
children know in any depth. Its purpose is to arrange students along a
continuum, from those who know most to who know least, in order to make
placement decisions. Such tests are not measures of how well teaching and
learning have occurred.
Let's say, for example ... that you get a certain score on a
standardized test. Can I assume then that you understand something? You
might say, "Sure, because those tests test for understanding. But
... research indicates that most students in most schools ... do not
really understand ... When you ask students who get very high grades ...
to explain a physical phenomenon, not only can they not explain it but
they actually give the same sort of explanations that four- and
five-year-olds give ... We can only really determine whether a student
understands something when we give the student something new, and they
can draw upon what they have learned to help answer a question,
illuminate a problem, or explain a phenomenon to someone else.(31)
Testing is no panacea for an education system under stress. After
all, a mechanic can inspect a car without making the necessary repairs.
The long-term educational improvement lies with a comprehensive
restructuring of the enterprise, not in resorting to the proverbial "quick
fix" of a standardized test. The public needs to be informed about
the growing array of assessment tools, but also about how they should be
interpreted to improve student, school, and system-wide performance in
education. For that reason, testing is only one part of a more
comprehensive education restructuring package.(32)
Second, because of their necessary brevity and because thousands of
tests must be marked quickly, external tests usually tend toward
short-answer and multiple-choice questions, with all their severe
limitations on measuring understanding and learning skills. They are the
classic case of measuring what is easiest to measure, not what is most
important. We are not suggesting that such tests can?t measure certain
important abilities we expect all students to have, only that they cannot
and do not measure all, or any representative sample, of them. They are
biased toward certain kinds of learning, and there is ample evidence that
such bias distorts the curriculum in ways that are unhealthy in an
educational system that is serious about learning.(33)
Third, any single test used for large-scale assessment and reporting
assumes a distorted importance, and can - and often does - have long-term,
frequently negative consequences for students and for the learning system,
because of the inappropriate ways the information is used. Tests meant to
measure whether most children have learned the year's material should not
be used to make decisions about students' capacity for learning, or their
long-term ability to succeed in school or in the regular program. The
problem is that, typically, test scores end up being put to such
inappropriate uses. Placement decisions should not be made on the basis of
any single test given on a single day in a student's year; however, that
is precisely how they are frequently used.
As early as the late 1970s, evidence began to accumulate
showing that high-stakes standardized testing policies were highly
corruptible, creating greater incentives for cheating than for actually
improving instruction, and that the use of standardized tests for
accountability had actually narrowed curricula and driven instruction
increasingly towards pedagogues, based on memorization and basic skills
rather than improving educational quality.(34)
The 1993-94 Ontario Grade 9 testing for language and literacy (with a
similar test being given in 1994-95) can be used as an illustration of
these points. It is, in fact, a very good test: first, it took place over
more than six hours, spread over a two-week interval, thus giving students
an opportunity to demonstrate their knowledge and understanding in a way
that would be impossible in a typical one-hour "test of basic skills"
or the like. Second, the test did not just ask short-answer questions, but
was a genuine assessment of performance.
Nonetheless, by itself, the test would tell us less about what students
learned about reading and writing in nine (or fewer) years of schooling
than would teacher reports based on clear and consistent standards.
Moreover, it did not differentiate among students schooled in Ontario for
one, two, or nine years prior to the testing. But it did give us valuable
data on how well Ontario?s Grade 9 students understand what they read and
whether they can write clearly, expressively, and to the point. We do not
know yet whether the test will lead to improved teaching and learning, but
it was a much better accountability mechanism than most tests - and, of
course, at about two million dollars to administer each year, much more
expensive. (As we have already pointed out, however, good assessment is
very expensive.)
We applaud the Ministry's attempt at large-scale testing in order to
measure learning authentically. Despite its strengths, however, a test's
ability to withstand inappropriate or damaging misuse is much more
problematic. The Minister made it clear to educators that the test was to
count for 20 percent of the course mark, but was not to be used for making
major decisions about student achievement. It was not to affect whether
the grade was passed or failed, or whether the students were to attend
summer school or be placed in different programs or "streams" in
Grade 10. Nonetheless, informally and unofficially, there are indications
that, in some instances, it has been used in exactly those ways.
Whether these reports are accurate, and irrespective of the number of
cases to which they might apply, we see such uses as the natural outcome
of large-scale external testing. It becomes "high stakes"
testing, even when it is not intended to be.
While we want to be very clear about our lack of enthusiasm for
extensive, expensive, universal testing, as opposed to sample-based
assessment, we recognize the public's need for some measure of basic
student achievement that is applied in the same way to every student at a
few points in time. That is why we are recommending two province-wide
assessments to be given to all students relatively early in their
schooling, with the understanding that educators (most especially school
principals) will make it clear that the results of such assessment are to
be used by teachers, individually and collectively, for purposes of
diagnosing and remediating the individual student's difficulties or gaps
in learning. In addition, the tests are to enhance reporting to parents
and for examining the content and delivery of curriculum. Test results
are, most emphatically, not to be used to place or sort students for any
reason. They will serve as a central check on how effectively the
curriculum is serving the learning needs of the students, and can be an
aid in revising or refining curriculum content or teaching strategies.
We are also recommending that a test, to be given much later in a
student's school career, make the secondary school diploma a literacy
guarantee.
Assessment for early acquisition of literacy and numeracy: getting it
right from the start
We have built a learning system on a strong, early foundation. (See
Chapter 7.) We have urged that all children be helped to become literate
and numerate by the end of Grade 3. By that time, we expect that almost
all children should be able to read and understand materials appropriate
to their age, and to write on an assigned topic, or a topic of their
choice, showing reasonable understanding of conventional rules of grammar,
spelling, and punctuation, as well as an ability to bring organization and
a "voice" to their writing. As well, we expect them to be able
to use the four arithmetic operations, and to understand when to apply
them. We see the value of a check on the success of the system in
delivering a program that brings all or nearly all children to a point, by
about age 9, that enables them to build on dependable foundation skills so
that they can acquire more sophisticated knowledge and understanding. We
think that parents will also welcome conversations with their child's
teacher that include the results of this universal assessment, and a
discussion of the child's future progress.
Recommendation 50
*Therefore we
recommend that all students be given two uniform assessments at the end of
Grade 3, one in literacy and one in numeracy, based on specific learner
outcomes and standards that are well known to teachers, parents, and to
students themselves.
And, in order that these tests have high credibility in the eyes of the
public:
Recommendation 51
*We recommend
that their construction, administration, scoring, and reporting be the
responsibility of a small agency independent of the Ministry of Education
and Training, and operating at a very senior level, to be called the
Office of Learning Assessment and Accountability.
This agency will consult with provincial leaders in literacy and
numeracy education who can provide leadership in creating assessment
instruments that are as valid and reliable, as authentic and
comprehensive, as possible. We recognize that principals and teachers will
need support and assistance in interpreting and reporting the information
gained from these instruments, and would expect both the agency (through
the written material it prepares) and the Ministry to act as sources of
expertise for school boards.
The results of these tests should be reported promptly and in clear
language to parents individually, to every teacher whose students have
been tested, to the local community at the school level, and to the
general public at the board and provincial levels.
Assessment for graduation: the diploma as a literacy guarantee
The value of assessment at an early stage, such as the end of Grade 3,
is that it gives a clear indication of a child?s strengths and weaknesses,
and shows where school and home efforts must be focused and monitored.
There is also value of a different kind in assessment for accountability
near the end of the student's secondary schooling: as a fundamental
guarantee, the education system must assure the public that a high school
diploma signals adult literacy; that no high school graduate is incapable
of reading and writing well enough to communicate in a post-secondary
classroom, on the job, or in order to meet the demands of everyday life as
a citizen and voter.
Recommendation 52
*We recommend
that a literacy test be given to students, which they must pass before
receiving their secondary school diploma.
The test would be given in Grade 11, the year before graduation.
Students who did not pass the first time would be able to retake the test
until they did, but graduation would be dependent on passing.
Some students who took the test the first time might find that they
needed help in order to pass, and they would have an opportunity to find
that help, and prepare again for the exam. The test would be inappropriate
for some students in specially modified programs (such as those in schools
for the severely developmentally handicapped) that do not now generally
lead to a diploma. However, we believe that it is reasonable to award a
diploma only to those who pass the literacy test.
We propose that other large-scale assessments be applied, not to
individual students, but to representative samples of students. These
would be used to judge how well the curriculum was being learned, as now
occurs in the case of provincial, national, and international assessments
in mathematics, science, and other subjects.
The effectiveness of school programs: program and examination
review
As we have seen, individual students are assessed by their teachers,
with the addition of occasional large-scale assessments, and students'
progress and achievement must be reported very regularly to parents.
Furthermore, those who are responsible for the overall quality of the
system - the provincial government and local boards - must not only ensure
that individual students are progressing, but that the curriculum is being
delivered effectively and that, on the whole, students in each grade and
subject are learning what they are expected to learn.
This is system-level monitoring of achievement. It does not involve
testing or assessing every student or every classroom but depends on
monitoring student achievement and teacher practices by testing
representative samples drawn from across the province; in addition, these
samples must be of sufficient size to provide reliable data at the
individual school board level.
In Ontario, two processes are used to accomplish those goals and both
are extremely sound approaches to system monitoring. The first of these is
the process known as the provincial reviews of curriculum, and the second
is the examination review process at the senior level, known as the
OAC/TIP program. Both have applications well beyond their present
restricted use and reporting. At present, both suffer because they are
applied sporadically, rather than systematically, across the curriculum,
and because the results are under-reported.
Provincial reviews of curriculum
From time to time, provincial reviews of a variety of elementary and
secondary courses are undertaken. In each case, the review includes
testing of a representative sample of students on the content of the
course (for example, Grade 6 reading or senior-level geography), as well
as an inspection of curriculum materials, interviews with teachers and
students, and other information that helps describe what is taught and
learned.
As a result of a provincial review, the Ministry and all school boards
have concrete information about the parts of the reading or geography
curriculum that are being successfully delivered to students and the parts
that are not, based on student performance. As well, they can identify the
kinds of resource materials that may be lacking, and the areas in which
further teacher education should be offered. These reviews are useful, for
both large-scale assessment purposes and for teacher and curriculum
development. But they are scheduled sporadically and unpredictably and are
publicly under-reported. Moreover, because clear and consensual standards
are not established in advance, the results of such assessments are
sometimes questioned.
In order to build a good program for educators and make it an effective
monitoring mechanism as well, the Ministry of Education and Training
should commit to a regular review cycle in all subjects that are part of
the common curriculum, with more frequent review in the foundation areas.
Subjects should be reviewed at points within the common and specialized
curriculum; for example, a history or a geography review might occur every
five years and include Grades 6, 9 and 10/11.
Some school boards have used the provincial review to include all
students, with no individual identification attached to the test. We
applaud this concern for accountability at the local level, and consider
it very appropriate because it does not confuse individual scores with
evaluating the performance of the staff and students of an institution.
There are, of course, serious concerns about invidious comparisons that
ignore many factors over which the individual school has no control.
However, the provincial review data have been, and should continue to be,
used by schools and school boards to improve teaching and learning at the
local level. We believe that review results should be shared with the
professional staff and school governance committees of schools that
participate, as well, of course, as school board administrators
responsible for monitoring and supporting schools. That, after all, is the
level at which the data are useful for making improvements to a school.
(See the following section for a more extended discussion of this issue.)
The provincial curriculum reviews have also involved teachers as
markers, a process exactly like that we described earlier as the ideal
professional training for classroom assessment. Working in groups, with
the support of experienced markers, teachers reach agreement on what makes
one paragraph or paper more or less satisfactory than another, and they
establish criteria for judging performance consistently. Thus, the teacher
development "spin-off" of the monitoring process is, itself, an
investment in better assessment in the classroom.
The examination monitoring process
In the 1980s the Ministry of Education began monitoring examinations
used in the Ontario Academic Courses (OACs). This process, which is
officially called the OAC/TIP (for "teacher in-service program")
was designed to ensure consistency in the quality and coverage of the exam
and the marking standards set by each teacher in every course which helps
to qualify students for university. The process involves collecting and
scrutinizing examinations teachers set and the marks they award to the
students' examination papers. All publicly supported secondary schools, as
well as inspected private schools that offer university-preparatory
courses in the final year (OAC), must participate in this examination
review process. At this point, the process, which has been virtually
invisible and unreported publicly, has not been extended to any other
courses.
After surveying practices under the OAC/TIP, the Ministry of Education
and Training develops a handbook on designing and marking examinations in
a particular subject area. Teachers in-service programs inform them about
the contents of the handbooks, and schools submit copies of their final
examinations and scoring keys, as well as a range of test papers
representing high, average, and low scores.
An analysis of the examinations and their consistency with expected
standards enables the Ministry to judge the impact of standards; schools
that vary from them are required to take corrective action and report to
the Ministry on the steps they are taking.
University teachers are also part of this process, although their
participation has tended to be based on individual expertise, rather than
encompassing any responsibility to represent and report to the larger
university community. We suggest that, in future, universities and
colleges see their role in the process as an opportunity to present their
needs and requirements as part of the formation of standards, rather than
remaining outside of that conversation.
We further suggest that professors and instructors who teach
undergraduates in a discipline, rather than those at the professional
(faculty of education) level, take part in the process. People who will be
teaching English, geography, or other courses to first-year university and
college students are better placed to participate in decisions about
acceptable levels of performance in Grade 12, and to work with secondary
educators to help students make the transition from high school to college
or university.
To date, the OAC examination review has been conducted in several
subject areas (English language and literature, visual arts, calculus,
economics, accounting, physics, chemistry, and Francais) and is currently
scheduled to add one subject per year through 1996. While it is expected
that schools or teachers will take action when a review indicates that
there are areas that require attention, implementation has not been
systematically monitored, and results have not been publicly reported.
This process, like the provincial curriculum review, is especially
worthwhile because it involves many teachers in the marking exercise, and,
thereby, expands their professional capacity for assessment. Teachers must
become more skilled at making professional judgments on the quality of
responses to questions that are not simple, multiple-choice or otherwise
close-ended. Building this kind of skill and expertise educates teachers
in consistent assessment of high-level learning.
The OAC/TIP examination process has all the elements of good assessment
and teacher development, but needs better quality control, much more
public visibility, and very considerable expansion. As a monitoring
program, it can help ensure that a teacher's application of assessment
standards is accurate and consistent; this will give increased credibility
to a system that depends fundamentally (as any school system must, and any
honest school system will readily admit) on teacher education and
expertise.
The examination review process, in combination with provincial reviews,
gives a reasonably complete picture of what is being learned, and how
fairly and consistently that is being assessed. It can and should be taken
to the next step, implementing changes in programs, teacher training, and
marking procedures, based on what is learned. Furthermore, implementation
should be monitored.
The examination review procedure should be expanded to include the full
range of Grade 12 courses. Because the process has significant potential
for helping to achieve consistency, and because we believe the process
should be transparent, it should be extended, and all results should be
reported to the public.
Without doubt, considerably expanding program and examination reviews
will involve educators in Ontario in more program evaluation than they are
accustomed to doing, and will necessitate diverting more funds to
assessment. We believe that such efforts and investments are essential; we
are convinced that they will be supported by the public, as long as they
are carefully designed and implemented, and as long as results are
clearly, promptly, and publicly communicated. We see curriculum and
examination reviews (what have been called program reviews and the OAC/TIP
model of examination review) as an important and ongoing responsibility of
the Ministry, in the development of curriculum outcomes, standards, and
assessment measures or strategies; and the administration, scoring, and
reporting of results.
We envision a cyclic large-scale and province-wide assessment program
that:
- identifies the one or two areas (skill, subject, cross-curricular) to
be assessed for each of the next three years, with a commitment to
extend this schedule by announcing another program each year;
- is centred on established outcomes and standards for assessment that
will form the basis for judgments about students' levels of attainment,
to be shared with educators and the public for discussion;
- is based on a statistically reliable sample at the provincial level;
- will be planned and conducted by teachers and experts in assessment,
working together;
- requires each board to participate in a board-wide assessment, so
that the content and process are consistent throughout the province, and
the results comparable from one jurisdiction to another.
Recommendations 53, 54, 55
We recommend
that:
*the Ministry
continue to be involved in and to support national and international
assessments, and work to improve their calibre;
*the Ministry
develop detailed, multi-year plans for large-scale assessments (program
reviews, examination monitoring), which establish the data to be collected
and the way implementation will be monitored, and report the results
publicly, and provide for the interpretation and use of results to
educators and to the public;
*initially, and
for a five- to seven-year period, until the process is well-established in
the school system and in the public consciousness, an independent
accountability agency be charged with implementing and reporting the
Grades 3 and 11 universal student assessments. The reports and
recommendations of the Office of Learning Assessment and Accountability
would go directly to the Minister, the College of Teachers, and the
public.
The other responsibilities of the Office of Learning Assessment and
Accountability are detailed in Chapter 19.
Reporting the results of large-scale assessments
While large-scale assessments are complex and expensive, the results
they produce, and the wealth of information they contain, must be reported
in ways that can be easily understood without being trivialized. The
results achieved by Ontario students in international and national
assessments have raised public awareness and concern, particularly because
they identified some areas that need concerted attention. As we have
pointed out, however, the results have sometimes been used - and misused -
to rank Ontario in terms of other jurisdictions, but without thoughtful
consideration and interpretation of the studies themselves. While not a
simple task (it is a major challenge for the future), reporting results
understandably and usefully is vital. This is an area in which the media
also have serious responsibilities, to inform, not thoughtlessly arouse,
the public.
Although the provincial government's main interest is in the overall
state of education in Ontario, information about large-scale assessments
is more useful to parents and educators when it is available for their
particular school and school system; educators are concerned that any
potential usefulness is offset by the possible misuse of the information.
Their concerns are not unique: there have been vigorous debates in other
jurisdictions, especially where school results are reported as rankings or
"league tables," and have been used as simple indicators of the
relative quality of schools. Even a cursory look shows that these kinds of
comparisons are totally inappropriate and ignore such crucial influences
on student achievement as socio-economic family status, parental literacy,
facility in the language of use, etc. Merely ranking schools may identify
the area in which the most privileged students live, but it does not
indicate the degree to which any school has helped its students develop.
The fact that a school is apparently successful may be the result of
non-school factors, just as the schools in which achievements seem modest
may, in fact, be serving students who enter with low performance levels
and improve greatly.
The issue of the value added by schools has become very heated,
engendering both political and technical problems. Particularly in
Britain, where the process has been in place for a while, teachers rightly
point out that achievement results are inadequate measures of a school's
contribution to student learning, and some have even refused to
participate in the national testing program.
The British experience shows clearly that when the purpose of the study
is to establish the effectiveness of the school, it must include
information about contextual conditions, such as the readiness of students
to learn, the nature of instruction, and the resources available. A
statistician who has considered this problem in Britain says that:
[It] is not technically possible with any reasonable
certainty to give an unequivocal ranking of schools ... it is important to
avoid the trap of supposing that the provision of some information about
schools is better than no information. The problem is that such
information will be biased and misleading.(35)
The overall complexity of adjusting scores and the overly simplistic
approach of publishing raw scores, brings into question the usefulness of
ranking schools. Britain's National Commission on Education concluded that
a single statistic was not an adequate summary of a school's effect on the
progress of students.
This is not intended to suggest that information should not be provided
about how schools are doing. But it does highlight the problems of making
valid school comparisons on the basis of simple scores and the importance
of schools and school boards giving results that include comprehensive
information about themselves.
The most appropriate and constructive use of school results for
comparative purposes is to look at results in the same school over time.
Barring very major changes in neighbourhood demographics (which usually
occur only over numbers of decades) the population of a given school is
more comparable to itself over time than to that of another school:
For example, checking a student assessment in 1997 with the results of
the same assessment at the same school in 1995, offers teachers and the
principal an important indicator of progress and quality. When such
comparisons are anticipated and planned for, staff have a real incentive
to develop targeted school improvement plans, and to compare the next set
of results to those plans. Making schools accountable for improving, as
opposed to making them accountable for factors beyond their control, gives
the promise of really adding value and quality to existing school
practices.
To assess value added - and to gain valid insights into
whether your schools are effective - you have to compare tests or other
results over a period of time, with the same group of students.(36)
Another difficulty related to reporting is that of obtaining results of
large-scale assessments broken down according to such sub-groups as
gender, ethnicity, socio-economic status, and geographic region. Although
this kind of analysis is technically possible if the information is
available, detailed demographic data on students is not collected by most
school boards. As well, as in the case of reporting results for individual
schools, it would be almost impossible to explain differences that might
be found among the population groups, unless a great deal of contextual
information was added. Without these breakdowns of results, however,
educators cannot fulfill their responsibility to monitor equity of
outcomes.
Policy makers must accept responsibility for actively communicating with
the public about large-scale assessment results, and must work with
technical specialists who know the study and can help them interpret the
results accurately to the public in many forms and forums. The major
challenge is to provide as much information as possible, accurately and
succinctly, without oversimplifying the message.
Large-scale assessment rarely provides unequivocal answers, but it does
create a context within which different interests policy makers,
professional educators, and parents, among others - can find a basis for
informed dialogue. It can provide the foundation for debates about public
policy, and identify the general direction for making changes in emphasis
or focus. More than anything, policy makers must create a range of action
plans for responding directly to the results of the assessments.
We urge that school boards and schools be provided with direction and
training (initially by the independent accountability agency) to ensure
they are able to report results of provincially directed assessments
accurately and clearly, to their respective communities, and that, when
they wish to do their own assessments, they be helped to do so, using
high-quality tools.
Recommendation 56
*We recommend
that the Ministry of Education and Training, in consultation with
community members and researchers, develop a specific procedure for
collecting and reporting province-wide data on student achievement (marks,
and Grade 3 and Grade 11 literacy test results) for groups identified
according to gender, race, ethno-cultural background, and socio-economic
status.
Conclusion
Because they represent the visible products of schools, student
assessments and program reviews are key elements in the process of
education reform. The Commissioners are very conscious of the impact our
recommendations will have on curricula, instruction, teachers,
administrators, and, most of all, students. As the focus of education
moves towards raising the levels of literacies for all our students, we
can no longer rely on simply sorting and comparing students. The
Commission is saying that, instead, we want clear descriptions of whether
students are achieving the complex learning outcomes they will need if
they are to succeed in the 21st century.
__________
Endnotes (Chapter 11)
- L. Darling-Hammond, "Performance-Based
Assessment and Educational Equity," Harvard Educational Review
64, no. 1 (1994): 5-30.
- G. Wiggins, "Standards, Not
Standardization: Evoking Quality Student Work," Educational
Leadership 48, no. 5 (1991): 18-25.
- R.L. Linn, E.L. Baker, and S.B. Dunbar, "Complex,
Performance-Based Assessment: Epectations and Validation Criteria,"
Educational Researcher 20, no. 8 (1991): 15- 21.
- H. Russell, C. Wolfe, and R. Traub, "Interface:
Some Cold Facts on a Hot Argument," E+M Newsletter, OISE,
no. 27 (1977).
- Philip Nagy, "National and International
Comparisons of Student Achievement: Implications for Ontario."
Report written for the Ontario Royal Commission on Learning, 1994.
- For example, a study in the United States of
both standardized science and math texts - and the tests included with
the textbook series - found that they contain almost entirely (close to
95 percent) items which test memorization and quick recall, and omit,
almost entirely, items which test the higher-order functions involved in
genuine problem-solving. See C. Holden, "Study Flunks Science and
Math Tests," Science 258 (October 1992): 541.
- J.R. Kirby and R.A. Woodhouse, "Measuring
and Predicting Depth of Processing in Learning," Alberta
Journal of Educational Research 40, no. 2 (1994): 148.
- W. Haney, "We Must Take Care: Fitting
Assessments to Function," in Expanding Student Assessment,
ed. V. Perrone (Alexandria, VA: Association for Supervision and
Curriculum Development, 1991), p. 142-66.
- Howard Gardner, quoted in E.D. Steinberger, "Howard
Gardner on Learning for Understanding," School Administrator
51, no. 1 (1994): 28.
- Kirby and Woodhouse, "Measuring and
Predicting Depth of Processing."
- G. Wiggins, "None of the Above," Executive
Educator 16, no. 7 (1994): 16-17.
- Howard Gardner, quoted in Steinberger, "Howard
Gardner," p. 28.
- Clement Dassa, Jesus Vazquez-Abad, and Djavid
Ajar, "Formative Assessment in a Classroom Setting: From Practice
to Computer Innovation," Alberta Journal of Educational
Research 39, no. 1 (1993): 118.
- Vicki Hancock and Frank Betts, "From the
Lagging to the Leading Edge," Educational Leadership 51,
no. 7 (1994): 26.
- Barbara R. Signer, "CAI and At-Risk
Minority Urban High School Students," Journal of Research on
Computing in Education 24, no. 2 (1991): 189-203.
- Lauren H. Sandals, "An Overview of the
Uses of Computer-Based Assessment and Diagnosis," Canadian
Journal of Educational Communication 21, no. 1 (1992): 71. This
article lists a variety of other benefits.
- Raymond T. Chodzinski, "Teacher Strategies
for Non-Biased Student Evaluation and Program Delivery: A Multicultural
Perspective," Canadian Modern Language Review 45, no. 1
(1988): 65-75.
- Ontario, Ministry of Education, Consultative
Committee on Assessment and Program Placement of Minority Students for
Educational Equity, Equal Educational Opportunity: Student
Assessment and Placement (Toronto, 1987), p. 2.
- Child, Youth and Family Policy Research Centre,
"Visible Minority Youth Project," p. 44. Report prepared by J.
Cummins for the Ontario Ministry of Citizenship, 1989; and C. Tator and
F. Henry,Multicultural Education: Translating Policy into Practice
(Ottawa: Ministry of Multiculturalism and Citizenship, 1991), p. 20.
- M. Duran, speech given at Words-into-Action, a
symposium on race-ethnic relations in large urban school boards, York
University, Toronto, 1983, p. 67.
- See, for example, M. Cheng and A. Soudack, Anti-Racist
Education: A Literature Review, report 206 (Toronto Board of
Education Research Services, 1994), p. 39.
- See, for example, Ontario, Ministry of
Education, Consultative Committee on Assessment and Program Placement of
Minority Students for Educational Equity, Equal Educational
Opportunity; Samuel Messick, "Assessment in Context: Appraising
Student Performance in Relation to Instructional Quality," Educational
Researcher 13, no. 3 (1984): 3-8; and Ronald Samuda, New
Approaches to Assessment and Placement of Minority Students: A Review
for Educators (Toronto: Ontario Ministry of Education, 1990).
- Chodzinski, "Teacher Strategies for
Non-Biased Student Evaluation," p. 69. The list is based on work by
Ronald Samuda
- Ontario, Ministry of Education and Training,
Changing Perspectives (Toronto, 1992), p. 20.
- Messick, "Assessment in Context," p.
3-8.
- Enid Lee, Letters to Marcia (Toronto:
Cross Cultural Communication Centre, 1985), p. 60.
- Quoted in FairTest Examiner, National
Center for Fair and Open Testing, vol. 7, no. 4 (1993-94): 14.
- In 1980, a study by R. MacIntyre, A. Keeton,
and R. Agard found that certain diagnostic tests, such as the Bender
Visual Perception, the Wepman Auditory Discrimination Test, and the
Wechsler Intelligence Scale for Children, were not adequate to identify
learning disabilities among minority children. Quoted in Samuda, New
Approaches to Assessment and Placement, p. 7.
- Darling-Hammond, "Performance-Based
Assessment and Educational Equity," p. 8-9.
- Ontario, Ministry of Education, Consultative
Committee on Assessment and Program Placement of Minority Students for
Educational Equity, Equal Educational Opportunity.
- Howard Gardner, quoted in Steinberger, "Howard
Gardner," p. 26-27.
- J. Lewington and G. Orpwood, Overdue
Assignment: Taking Responsibility for Canada's Schools (Rexdale, ON:
John Wiley, 1993).
- For an extended discussion of this evidence,
see T. Toch, In the Name of Excellence (New York: Oxford
University Press, 1991).
- S. Reardon, K. Scott, and J. Verre, "Equity
in Educational Assessment: Introduction," Harvard Educational
Review 64, no. 1 (1994): 1-4.
- H. Goldstein, "Assessment and
Accountability." Brief to United Kingdom Parliament, 1993.
- Wiggins, "Standards, Not Standardization,"
p. 17.
ISBN 0-7778-3577-0
©Copyright
1994, Queens Printer for Ontario
|