|
Postsecondary Student Assessment
and Placement:
History, Status, Direction
Dr. Gene Kerstiens, Andragogy
Associates
This paper is not an intentionally objective,
dutifully documented appraisal of student assessment as
it obtains in postsecondary institutions. Rather, it is
an observation based upon a review of the professional literature
together with experience derived from forty years while
serving on and visiting a variety of campuses. Many of these
institutions were occasion for the author's being tested,
evaluated, and advised under the same conditions endured
by students who seldom find them edifying. Without apology,
this exposure is given to a mindset that considers assessment
chiefly from the student's point of view. For those interested
in an explication of postsecondary student assessment and
placement from the perspective of consumer advocacy, read
on.
Briefly, assessment and placement as it is
most commonly practiced on college campuses today consists
principally of basic skills testing - reading, writing,
and math skills as measured on a standardized, time-critical,
objective, paper-and-pencil test battery. Scores on these
tests are used to qualify students for "college level"
courses or to direct underprepared students to some intervention
calculated to prepare them for the rigors of the post-secondary
curricular experience. To be sure, at some institutions
other measures and means are included in the assessment
package: writing samples, high school grades or class standing,
learning skills surveys, experiential credit, and learning
styles and self-esteem surveys. But a review of research
(Gabriel, 1989), as well as the latest national survey (Boylan,
Bliss, and Bonham, 1992), reveals that 96% of colleges use
scores earned on basic skills tests as the principal, if
not the sole, determinant for student placement.
How we arrived at this condition and predicament
has interesting historical roots in Nineteenth Century intelligence
testing and some of the conceptual and measurement errors
related to that movement (Gould, 1981). But how we have
persisted in this tradition in spite of the preponderance
of research criticizing it and our own less than exemplary
experience while carrying on the practice (Weber 1985; Kerstiens
1993) is not a pretty story. Perhaps understandably it is
avoided as a treatable topic in certain graduate schools
of education that regard such ironies as sensible and inevitable
and logic as a systematic method of arriving at the wrong
conclusion with confidence.
By the turn of the century, Binet, Terman,
Spearman, and others had established the construct of general
intelligence. They came to believe that ability to learn
(IQ), especially in the academic arena, could be measured
on paper-and-pencil intelligence tests and that IQ scores
predicted success in college at least reasonably well. This
opinion survives in some quarters today, sometimes transmuted
into curious and convenient persuasions (Conner, 1989).
But by 1921, Gates was successfully selling the notion that
a more specific ability, the ability to read and comprehend
texts, best accounted for proficiencies related to success
in college. This rationale inspired a flood of reading comprehension
tests that often became exclusionary to other instruments
of assessment. Finally, by the late '50's, writing and math
skills also were construed as skills necessary to college
learning. When tests measuring these
[56]
skills were incorporated with the well entrenched
reading skills requirement, the practice of basic skills
testing emerged and blossomed. During the '70's, basic skills
batteries became regarded as the common and acceptable method
for measuring college aptitude. With slight modifications,
many academics sustain unshakable confidence in this system,
relinquishing their hold on speeded, objective, basic skills
tests only when we pry them from their cold, dead fingers.
Of course, most sobering is the fact that
these practices have flourished through the process of reification,
that is, our coming to regard a theory or construct as having
real or concrete existence. Which is to say that basic skills
objective testing is commonly equated in the public mind
and in many academics' consciousness with assessment itself,
so that testing and assessment are now blurred or even indistinguishable
constructs. And because standardized basic skills tests
are inexpensive, their administration conveniently conforms
to institutional time frames, and faculty authored textbooks
teaching test-taking strategies have become imbedded in
developmental course curriculum, we have found it expedient
to remain serenely indifferent to critical research that
should discourage their employment.
Accordingly, through the years, standardized,
paper-and-pencil basic skills testing has collected popular
support as a ship collects barnacles. As the tests enjoyed
wider use, they were naturally cited more often in studies
and reports. Deft references to these tests abound in the
professional literature, especially during the last 30 years,
this frequency and duration implying respectability. Consequently,
the political correctness of this assessment practice has
become well established through its popularity rather than
any proven validity, accuracy of measurement, or track record
of efficient placement. Even normally scrutinous and skeptical
professionals have substituted their faith in group preference
for their own independent judgment based upon observation,
research, or gut feeling. It's amazing how colleagues' sentiments
are influenced when they take notice of the adoption trends
generated by their fellows.
As indicated, the movement toward this assessment
system grew, nourished by glittering advertisement claims
and practitioner endorsements, but in spite of a steady
stream of disfavoring professional literature. During the
past seven decades extensive and comprehensive research
reviews have repeatedly pointed out not only the limitations
of the method but also its debilitating effects on our student
constituency. (Gates 1921; Flanagan 1939; Preston &
Botel 1951; Rankin 1962; Tillman 1977; Stetson 1982) These
writers indict most standardized, basic skills instruments
and, in almost unanimous agreement, make nine charges: they
sacrifice accuracy of response for speed of response, encourage
chance-success responses (guessmanship), discourage analytical
reasoning, unnecessarily elevate anxiety, delay feedback
of test results, rely on norm-referenced measurement, facilitate
or demand mass testing format, provide a scarcity of alternative
test forms, and promote inconvenient scheduling of test
administration. Tillman (1977) succinctly identified the
inconsistency between research findings and our assessment
practices: "Ironically, the increasing popularity of
certain tests seems to be inversely related to the negative
comments of critics" (p. 253).
[57]
Meanwhile, back in the political and popular
opinion arenas, vigorous unrest concerning assessment/placement
practices can be evidenced in abundance. During the past
twelve years, no fewer than eight bureaus, commissions,
and councils have been appointed by the President or his
designee to study the problem, to make recommendations,
and to serve as federal assessment regulatory agencies.
The results have been unrewarding. The latest in this succession
of failures was the National Commission on Education Standards
and Testing (Public Law 102-62, 1992) whose recommendations
were evaluated in Congressional Testimony as follows:
We believe that the proposed NESAC would
not be capable of evaluating the new standards and examinations
meaningfully. We see the need for an independent, non-partisan
body with sufficient expertise and credibility to evaluate
the technical qualities of alternative assessments, examine
the evidence about their feasibility and costs, monitor
the consequences of their use, and judge the comparability
of results. (Institute for Education and Training, 1992,
p. 1)
Echoing these concerns are articles in the
Chronicle of Higher Education, like George Madaus's
(1990) "Standardized Testing Needs a Consumer Protection
Agency," and the following charges leveled by Rand
researchers:
Our testing policies have failed to achieve
many of their intended positive effects, while creating
some clearly negative consequences. Initially created to
facilitate tracking and sorting of students, these instruments
were not intended to support or enhance instruction. Because
of the way in which the tests are constructed, they place
test takers in a passive, reactive role, rather than a role
that engages their capacities to structure tasks, produce
ideas, and solve problems. The tests thus exclude many kinds
of knowledge and types of performance that we expect of
students. They are inappropriate tools for many of the purposes
that they are expected to serve. (Darling-Hammond and Lieberman,
1992, B-l)
Of course, we have been invited to believe
that one reason why these tests fail to fairly evaluate
matriculating students is that they are based on norms developed
years ago when, presumably, the norming population possessed
better skills. Reinforcing this belief are countless alarming
reports in both the media and professional press about students'
declining scores. Much as investors pore over stock indexes
with frightful eagerness, we have been preoccupied with
periodic reports of disappointing fluctuations in student
scores, some of us becoming operatic about declining standards
and evangelical about reestablishing them.
But such data contribute to a distorted view
of the students we serve. Indeed, if we compare today's
average student ACT and SAT scores with those of twenty
years ago, the results are surprising.
[58]
|
ACT Mean Composite
Score Comparison*
(Maximum Score = 36)
|
|
SAT Mean Composite
Score Comparison*
(Maximum Score = 800)
|
|
YEAR
|
SCORE
|
YEAR
|
SCORE
|
|
1970
|
19.9
|
1970
|
474
|
|
1990
|
20.6
|
1990
|
450
|
* Source: WORLD ALMANAC, Scripps Howard Company,
NY, 1991, 218-219.
Since ACT and SAT tests are patently heavy
hitters in the postsecondary testing industry, these data
should provide a credible comparison of the entrance scores
of today's students with scores of students 20 years their
junior. We can notice that on the ACT, there is a .7 of
a point (5%) increase in scores. On the SAT, there is a
24 point (3%) decrease in scores. The plus-and-minus variance
between these average scores would appear to cancel each
other to represent overall score levels that are essentially
unchanged. Which is to say that students may be different
from what they were 20 years ago, but, as they are measured
on standardized entrance examinations that enjoy high usage,
they are no worse - or, let us say, they are just as bad.
Therefore, the declining scores scenario does not account
for the tests' mismeasurement of today's student population.
Whatever the historical circumstances that
have occasioned or sustained the sad condition of postsecondary
assessment and placement, there is wide agreement that changes
need to be made. Students have voiced their protests and
have even initiated organizations designed either to facilitate
reform or abolish the entire process. At the federal level,
regulatory agencies with a firm grasp of the obvious have
registered displeasure and frustration. Few academics are
satisfied with their institution's assessment process. Face-to-face
inquiries elicit guardedly discreet responses from professors,
counselors, advisors, and administrators, most of whom confess
that their institution's assessment/placement strategies
are ineffective and probably unfair.
What can be done to improve the typical assessment
process? There are three modest measures, based upon the
best available research and experience, not really too threatening,
and certainly cost-effective, that most institutions can
apply on a given Monday morning to take significant steps
toward a solution.
First of all, those institutions employing
mandatory placement need to reconsider this policy, especially
because research supports discretionary placement. According
to the latest national survey (Boylan, Bliss, and Bonham,
1992), 57% of postsecondary institutions stated
[original document page 59]
that their placement was mandatory as a result
of assessment. However, on six success variables including
persistence, success in critical classes, and cumulative
GPA, students enrolling in colleges with mandatory placement
policies were significantly less successful than students
attending institutions allowing options. Additionally, Utterback's
(1989) exhaustive review of research, together with his
own well designed study, lend credence to the position that
insisting on student participation in interventions based
on questionable assessment practices is not only unwarranted
but untenable.
Next, most schools need to consider augmenting
and enriching their assessment packages. While a majority
of campuses will probably continue to employ paper-and-pencil
objective basic skills testing, they might choose to include
promising alternative means and measures for assessing and
placing students. One example supporting such consolidation
stands out in the research. In their national survey, Boylan,
Bliss, and Bonham, (1992) learned that 26% of institutions
incorporated learning skills inventories as a component
of their assessment system. On seven success variables including
mean first-semester GPA, persistence and success in critical
classes, and graduation rates, students enrolled in schools
utilizing learning skills inventories as part of their assessment
system were significantly more successful than students
in schools that did not. Additionally, Bliss and Mueller
(1987) learned that results on one learning skills inventory
predicted first-semester GPAs at an unprecedented .79, a
correlation high enough to engage our actuarial and statistical
attention and encourage implementation.
Finally, consider adopting a computer-adaptive
test to replace the paper-and-pencil basic skills instrument
most probably presently in place on most campuses. Why?
First of all, computer-adaptive basic skills testing addresses
the nine most common researcher objections mentioned in
the seventh paragraph of this article. It manages to avoid
most if not all of the negative features deliberated by
objective basic skills testing critics. Next, because its
format and presentation are based on item response theory,
the instrument presents a student with items of optimum
challenge rather than displaying an entire spectrum of item
difficulty that either encourage guessmanship or occasion
boredom. Finally, since test items measure research based
proficiencies typically required of students engaging in
the college experience (College Board, 1983), test results
need not be revealed in terms of points or percentiles but
can be reported in criterion terms as levels of proficiency
and performance. Although only one computer-adaptive basic
skills instrument is presently available (College Board),
another is being prepared for marketing in the near future
(American College Testing).
Of course, a problem that has been decades
in the making is not likely to be remedied with dispatch.
Nor are mandated regulations liable to inspire an epiphany
of collective insight in an infrastructure colonized by
petty bureaucrats more interested in turf concerns than
what is right for students, the course of study they face,
and the faculty delivering instruction. Oxymoronically speaking,
it is difficult to provoke revisionist thinking among those
seeking innovation without change.
[60]
References
Bliss, L., and
Mueller, R. (1987). Assessing study behaviors of college
students: Findings of a new instrument. Journal of Developmental
Education, 11(2), 14-18.
Boylan, H., Bliss,
L. and Bonham, B. (1992). National study of developmental
education, National Center for Developmental Education,
Appalachian State University.
College Board.
(1983). Academic preparation for college: What students
need to know and be able to do. New York: The College
Board.
Conner, J. (1989).
Renee's St. George vs Binet's dragon. Journal of Developmental
Education, 13(2), 28-29.
Darling-Hammond,
L. & Lieberman, A. (1992). The shortcomings of standardized
tests. Chronicle of Higher Education, January 29,
38, Bl-B3.
Flanagan, J. (1939).
A study of the effect of comprehension of varying speeds
of reading. In Research in the foundations of american
education (pp.47-50). Washington, DC: American Educational
Research Association.
Gabriel, D. (1989).
Assessing assessment. Review of Research in Developmental
Education, 6(5), 1-6.
Gates, I. A. (1921).
An experimental and statistical study of reading and reading
tests, Journal of Educational Psychology, 12, September,
October, November 1921, 303-314, 378-391, 445-464.
Gould, S. (1981).
The mismeasure of man. New York: W.W. Norton.
Institute for
Education and Training (1992). National educational standards
and testing: A response to the recommendations of the national
council on education standards and testing. Santa Monica,
CA: The Rand Corporation, 90407-2138.
Kerstiens, G.
(1993). A quarter-century of student assessment in CRLA
publications. Journal of College Reading and Learning,
25(2), in press.
Madaus, G. (1990).
Standardized testing needs a consumer- protection agency.
Chronicle of Higher Education, September 5, A-52.
Preston, R. &
Botel, M. (1951). Reading comprehension under timed and
untimed conditions. School & Society, 74, 71.
Rankin, E. (1962).
The Relationship between reading rate and comprehension.
In E. Bliesmer & R. Staiger (Eds.), Eleventh yearbook
of the national reading conference (pp. 1-5). Boone,
NC: The National Reading Conference.
[61]
Stetson, E. (1982).
Reading tests don't cheat, do they? Journal of Reading,
25, 634-639.
Tillman, C. (1977).
Readabillty and other factors in college reading tests.
In D. Pearson & J. Hank (Eds.), Twenty-sixth yearbook
of the national reading conference (pp. 253-258). Rochchester,
NY: The National Reading Conference.
Utteback, J. (1989).
Closing the door: A critical review of forced placement.
Journal of College Reading and Learning, 22(1), 14-22.
Weber, J. (1985).
Assessement and placement: A review of the research. Community
College Review, 13(3), 21-33.
[62]
|