Backgrounder: Methods for the Assessment of Teaching

Introduction

The Complementary Teaching Assessment Project Team (CTAPT) conducted a literature review and environmental scan of U15 current and best practices for the assessment of teaching, other than student evaluation of teaching (SET).
CTAPT proposes that the University of Waterloo adopts Teaching Dossiers (TD) and Peer Review of Teaching (PRT) as complementary methods, for they provide valuable and robust ways for documenting evidence of teaching. CTAPT recognizes that some units at the UWaterloo already employ PRT or TD.
This Backgrounder outlines CTAPT’s key findings and provides an overview of current practices and a framework for implementing and supporting complementary methods for the assessment of teaching. It also includes “Fact Sheets” that describe best practices for Teaching Dossiers and Peer Review, and how to use them to assess teaching effectiveness.
For more details on CTAPT’s research findings, please consult the full report here: CTAPT Committee Report on Assessment Methods.

Core elements of an effective evaluation framework

The assessment of teaching is done for two main purposes: i) providing feedback for faculty growth, teaching development or improvement, and ii) providing evaluative information for personnel decisions (Arreola 1995: 2; Wright et al. 2014: 36).
Much of the literature advocates for an “improvement” or “growth-orientated” evaluation framework (e.g. Arreola 2007, 1995; Chism 1999; Seldin 2010, 2006, 1984; Wright et al. 2014).
A recent and thorough study by Wright et al. (2014) identifies some reoccurring themes in effective evaluation frameworks that align with the mission of CTAPT and the vision at the University of Waterloo:

1. A multi-facted or complementary approach

Utilizing multiple sources of evidence - from students, peers, and instructor - and multiple methods such as surveys, peer observations, and teaching dossiers increases reliability and fairness (see Berk 2014: 88; Arreola 2007; Chism 1999, 2007; Hubball and Clark 2011; Seldin 1999, 2007).

2. A shared understanding of teaching effectiveness

This definition should be “contextual, evolving, and periodically reviewed” (Devlin and Samarawickrema 2010), consider “faculty values” and include evaluation criteria (Arreola 1995: 1; Berk 2006; Wright et al. 2014: 14).
The literature acknowledges the fact that the notion of ‘good’ or ‘effective’ teaching is ambiguous in many teaching evaluation programs.
CTAPT developed a definition of teaching effectiveness based on research conducted and consultation with the UWaterloo community. Results from the consultation survey and finalized definition are available here: CTAPT Teaching Effectiveness Survey Results – Campus Report.

3. Robust feedback cycles and support

A teaching evaluation process with “...robust feedback cycles that are integrated into evaluation and instructional improvement programs” where faculty development programmes support feedback and improvement cycles (Wright et. al. 2014: 5, 17).

4. Multi-level leadership and engagement

Multi-level leadership in fostering a culture that values and rewards teaching, as well as consultation and communication are key to an effective integrated and multifaceted approach (Wright et al. 2014: 17, 18).

Current and recommended practices for the assessment of teaching

Although Berk (2005) reviews 12 potential sources of evidence to assess teaching effectiveness, research confirmed most higher education institutions mainly use one or more of the following three: student evaluations (SET), self-evaluation such as teaching dossiers (TD), and peer review of teaching (PRT).
The teaching dossier is the most commonly used complementary method in Canada for summative evaluation while peer review is more commonly used for formative purposes. A mixed method study that included responses from participants at 16 Universities in Ontario revealed the following (Wright et al. 2014: 40, 42; Gravestock 2001): o Summative: 82% use SETs, 50% use teaching dossiers, 29% use other self-evaluation instruments, 20% use peer observation. o Formative: 67% use peer observation, 43% use self-evaluation instruments, and 31% use teaching dossiers.
Effective and well-supported use of PRT or TD leads to positive outcomes and the following benefits: enhances a scholarly approach to teaching, reflective practice and professional development; leads to innovations or changes to teaching practices; facilitates opportunities for dialogue and collegiality; strengthens the validity and reliability of teaching evaluation through triangulation of evidence.¹
The main concerns with PRT and/or TD identified in the literature include time commitments, quality of feedback, and the lack of clear standards, criteria, and tools, which relate to additional concerns about validity, subjectivity, and bias.²
PRT is a valuable complementary method for providing evidence on dimensions of teaching effectiveness students are unable to assess.
A TD is an ideal complementary method to use, as components or sections of a TD align with the four dimensions of teaching effectiveness developed by CTAPT. It also provides a method for compiling and contextualizing multiple sources of evidence (i.e. from self, peers, students, and the literature) and a framework for reflecting on teaching practices holistically.
Brookfield (1995) argues that critical reflection using the four lenses (self, peers, students, theoretical literature) is what distinguishes excellent teachers. Hubball and Clarke (2011: 1) also argue for the value of using a scholarly approach to teaching in research-intensive universities, which involves consulting the literature and peers to determine and implement best practices and disciplinary approaches, then reflecting on and assessing those practices (differentiated from SoTL).

Best practices

CTAPT has developed Peer Review of Teaching (PRT) and Teaching Dossiers (TD) Fact Sheets, which summarize the key components, benefits and potential concerns, and best practices for maximizing benefits and addressing concerns.
The Fact Sheets are based on findings from the literature review, including evidence from studies reporting on the development, implementation, and evaluation of pilot programs (particularly of PRT), which offer insight on how they addressed concerns and overcame implementation challenges.
They also draw on several U15 evidence-based pilot study reports and/or guidelines (e.g. Kenny et al. 2018; Richard 2018; University of Toronto 2017a, 2017b).
CTAPT has mapped the three methods of assessment (PRT, TD, SET) to the Dimensions of Teaching Effectiveness to show where each method should be used (see Mapping Methods to Dimensions of Teaching Effectiveness Fact Sheet). The tables demonstrate how utilizing PRT and TD along with student surveys enables instructors to highlight evidence of teaching effectiveness more comprehensively and contextualize results, leading to a more valid and reliable representation of teaching.

Next steps: Phase 2 of consultations

In October 2019, CTAPT will be inviting faculty members to participate in a consultation through either an in-person session or online survey. The purpose of this consultation is to gather input from faculty members on what they would need to implement and support PRTeaching and TD in terms of both using these methods and evaluating evidence from these methods.
Check your email for a call to participate, and please review the following Fact Sheets before participating – Thank you!

Appendix

From the Course Evaluation Project website:

The questions included on the pilot test were developed by a subcommittee of CEPT1, after a study of the research in teaching and learning and multiple rounds of consultation with the campus community. The questions included seek to measure elements of teaching and learning recognized in the research literature and at the University to be priority areas for university instruction. The questions are also designed to avoid presuming a particular style of content delivery. Phase 2 of the project modified the instrument based on input from focus groups with students in every faculty.

The pilot questions are listed below, but we do caution that the intention is not to begin another round of consultation about the wording or selection of questions. The questions may be modified, depending on the results of the pilot testing.

The first nine questions will be measured on a 5-point Likert scale (ranging from strongly disagree to strongly agree; there will also be an additional response-category, labelled: “have no basis for rating”).

The instructor identified the intended learning outcomes for this course.
The intended learning outcomes were assessed through my graded work.
The course activities prepared me for the graded work.
Graded work was returned in a reasonable amount of time.
The instructor helped me to understand the course concepts.
The instructor created a supportive environment that helped me learn.
The instructor stimulated my interest in this course.
Overall, I learned a great deal from this instructor.
Overall, the quality of my learning experience in this course was excellent
The course workload demands were…. (scale ranging from very low to very high)

Additional questions, for analysis purposes of the pilot-test data, include the following:

What is your gender identity? (note that this can also include gender expression as it relates to your gender identity).
On average, I attend class…
In terms of an expected grade in this course, I expect to get…
For me, this course is (required or elective).

***For online courses, the following question: “On average, I attend class” will be replaced with: “On average, I engage in the prescribed weekly online work for this course…”

¹See Barnard 2001; Bell and Cooper 2013; Chism 2007: 6; Gormally et al 2014: 188; Iqbal 2014: 113-5; Mager et al 2014 Schonwetter et al. 2002: 91; Seldin 2010: 43; Smith 2014; Teoh et al. 2016: 1; Thomas et al. 2014: 150.

² See Barnard 2001; Bell and Cooper 2013; Chism 2007: 6; Gormally et al 2014: 188; Iqbal 2014: 113-5; Mager et al 2014 Schonwetter et al. 2002: 91; Seldin 2010: 43; Smith 2014; Teoh et al. 2016: 1; Thomas et al. 2014: 150.