The CEA Forum
Winter/Spring 2006: 35.1
Machine Scoring of Student Essays: Truth and Consequences. Edited by Patricia Freitag Ericsson and Richard Haswell. Logan, UT: Utah State University Press, 2006.
by Elizabeth Fleitz, Bowling Green State University
As Patricia Freitag Ericsson and Richard Haswell note in the opening to their anthology, “Machine scoring no longer has a foot in the door of higher education. It's sitting comfortably in the parlor” (4). This statement summarizes the importance of computerized essay scoring to higher education, and hints at the urgency with which the issue should be considered. The issue is urgent, Ericsson and Haswell explain, because there has been little discussion on the subject in an academic forum. The vast majority of voices that have dealt with the topic have not been instructors, students, or administrators; rather, company spokespeople dominate the field, substituting scholarly debate with sales pitches for their software. Thus, Machine Scoring of Student Essays was conceived to be an anthology of academic voices on the issue, representing a variety of viewpoints on the utilization of automatic computer scoring in higher education.
Discussion on the use of computers or other technologies in writing instruction, however, has been more frequent. In 2004, CCCC published a “Position Statement on Teaching, Learning, and Assessing Writing in Digital Environments,” which outlines the criteria effective educational technologies should exhibit, which include hands-on use of technology, its application to specific realms of a student's career or personal life, engaging students in critical evaluation of information, and encouraging reflective practice. It is interesting to note that while much high-quality academic scholarship has come out in response to this position statement, the vast majority has dealt with the issues of technology and instruction, rather than with technology and assessment. Ericsson and Haswell's anthology attempts to fill in that gap in scholarship.
However, this perceived gap could be attributed to the fact that, simply put, there just isn't much there to discuss. At first look, it seems like less of a debatable issue than the use of technology in teaching and learning. The 4C's statement is very clear in its disapproval of any machine assessment program for any purpose, asserting “because all writing is social, all writing should have human readers, regardless of the purpose of the writing. [. . .] We oppose the use of machine-scored writing in the assessment of writing.” Specific statements like these imply that the issue is black and white, that no further debate is needed.
So one must ask the question: is an entire anthology discussing the use of automatic essay assessment in higher education really necessary? The 4C's statement effectively shut the door on any debate whether or not the practice was useful. This fact alone may make a potential reader wonder if this anthology isn't just overkill. On the contrary, Ericsson and Haswell's anthology does quite a bit more with the issue, going past the boundaries created by the 4C's statement. As the editors explain in their introductory chapter, this book is intended to begin the academic debate on assessment and technology. Not only has there been “silence” from academia, but as mentioned before, those voices who are talking about it are those who are selling it—to more and more financially-strapped colleges and universities each year (2). Most importantly students and teachers, those people who are most affected by it, have been “systematically excluded” from voicing their thoughts and expert advice on the issue (6). This anthology, then, is meant to stand as the first to establish an academic forum for technology's role in writing assessment. The editors insist that not only is this subject important to discuss, it is urgently awaiting academic debate.
In Machine Scoring 's sixteen essays, the editors attempt to bring together voices from a variety of areas of higher education, to address not only composition instructors but also WPAs, administrators, university presidents, admissions decision-makers, and instructors from any and all disciplines; in effect, anyone who needs to have student essays evaluated is a potential audience for this text. The anthology as a whole is quite comprehensive, including several theory-based essays as well as a series of case studies of specific software programs, including the Intelligent Essay Assessor, ACCUPLACER, E-Write, and WritePlacer Plus, among others. Several essays deal generally with the role of technology in assessment, while many others concentrate on a certain aspect of the topic, for instance Ericsson's essay challenging what the software companies call “meaning.” Each of these essays work effectively together, often citing each other, and seem to respond to and build on the work of the previous article.
Not surprisingly, each of these articles side with the 4C's statement, although some do so to a larger degree than others. The way this anthology builds on that statement is by contemplating and challenging the current views of the field. In their essay “Interested Complicities,” Ken S. McAllister and Edward M. White point out that the issue is much more complex than just a good-versus-evil debate, where the humans are always good and the computers are always evil (11). The topic is not so black and white, as may be incorrectly interpreted by the 4C's statement. What is important is looking at the various components—the “interested complicities”—and how they work together in order to seriously analyze the use of technology in assessment (9-10). The editors mention early on that this anthology does not intend to merely counter the industry viewpoint, but to complicate it: this series of essays seeks to question the publicized “truth” of machine essay grading (2).
A potential weakness of the anthology, albeit a minor one, is its repetitive essay format. A series of articles in the volume challenge industry viewpoints by analyzing specific software programs, making a case study from their research and experiments in a certain program. Each of these articles are strikingly similar in methods, arguments, and conclusions. While the treatment of a single software program is undoubtedly an effective method for making a case to challenge the industry viewpoint, since this same point is made several chapters in a row, the reader may wonder if only one or even two articles in this format might have been just as effective.
A particular strength of this anthology is the editors' enthusiasm for beginning the debate, and their subsequent encouragement of the reader to take action in the form of joining that conversation. Each article includes some kind of recommendation which helps the reader to understand what role they can play in the debate, and advice for the reader about how to think critically when making a decision whether or not to use automated grading machines for particular purposes. At the end of the volume, Haswell provides a comprehensive bibliography of materials relating to the field since 1962, as well as a glossary of terms dealing with computers and assessment. In all, the anthology is an excellent introduction to the issue of technology's role in assessment, and successfully initiates a new academic debate on the utility and validity of machine scoring. The reader will certainly feel persuaded to join in this debate, and critically analyze the dominating sales-driven voices on the issue.
Return to Table of Contents