Automated Assessment and Feedback of Computer Programming Assignments

Photo of project team members
Grant recipients:

William Bishop
George Freeman

Department of Electrical and Computer Engineering


(Project timeline: May 2013-January 2016)

Questions Investigated

The goal of this research was to develop an automated assessment tool to provide quick and reliable feedback on programming assignments and to investigate the effectiveness of the tool in a first-year course taught using the C# programming language.  Unlike existing assessment tools that assess only the functional behavior of a computer program, the assessment tool was designed to analyze and provide qualitative feedback on programming style (e.g., suitable choice of variable names, proper indentation and consistent spacing, appropriate use of comments, etc.).  Using this tool, students could receive feedback not only on the functionality of their programs but also on the quality of their coding style in a timely manner.  Without this tool, students often had to wait a week or two for a teaching assistant to grade their work and provide comments on their programming style.  The tools allows for the possibility of multiple automated feedback cycles before a final submission deadline so that students might learn from their mistakes without fear of an associated low grade.

Findings/Insights

Three technical challenges are faced by this tool: (1) transferring program information from the student to the tool and transferring feedback from the tool to the student; (2) compiling and running each student’s solution against a series of functional tests; and (3) analyzing each student’s solution against style guidelines.  We determined that the first activity can be easily accomplished through a variety of mechanisms (e.g., e-mail, uploading and downloading via web services, etc.) but it is challenging to successfully automate within the context of the learning management system used at the University of Waterloo.  We had significant experience with the second activity through many years of automated functional testing of a particularly complex programming assignment in the course.  We chose to focus our efforts on the third activity of automated programming style analysis.

During the spring 2013 term, we employed a co-op student who helped design and build a proof-of-concept system for style analysis by extending an existing tool called StyleCop.  It was not possible to reach the desired interpretation of identifier names, in part because StyleCop does only a surface decomposition of the C# program.  This approach was not sufficient to determine if an identifier was used as a property name (which we would want to be a characteristic – i.e., noun) or a method name (which we would want to be an action – i.e., verb).  Style analysis around the use of white space suffered similar limitations.  Analysis of comments (other than their presence or absence) is a much more difficult problem and was not really attempted here.  Good comments and semantically good identifier names rely on some knowledge of the problem being solved beyond what is apparent from just the submitted code.  We partly addressed this issue for identifiers by using manual intervention to create dictionaries of good name choices as each particular assignment was processed. 

It is worth noting that automated testing (for functional assessment, style assessment, or both) of an assignment requires significant customization by someone highly trained in the workings of the automated system.  The time and effort required to customize the tool for a particular assignment is a barrier to the tool’s use.  We discovered that such a tool only makes sense in the context of a course with a very large enrollment.  While teaching assistants might be able to help with this activity, it was more likely that this task would be one faced by the course instructor or a full-time staff member assisting with the course.  Arguably, this issue discourages the regular updating of assignments which could lead to other issues.

During the fall 2013 term, our system was tested on students’ assignment submissions for the course ECE 150  (Fundamentals of Programming) where it showed reasonable alignment with human feedback provided by experienced teaching assistants.  Student assignments were assessed by both the tool and the teaching assistants.  The resulting feedback and grades were compared to determine the effectiveness of the assessment tool.  The automated assessments were found to be much more consistent and timely than human feedback.  These findings were documented in a work report [1] by one of the teaching assistants for the course who managed the testing and comparing.

As this project was coming to a close, a Microsoft project called Roslyn culminated in both an open-source release of their C# version 6 compiler (written in C#) and a set of libraries to access this C# compiler as a service from within other programs.  This effectively removed the limitations we observed when working with StyleCop.  For example, we could now easily automate a check that each method name resembled an action verb.

In this study, the automated feedback was not returned to students, thus there was no survey done regarding student perceptions of the assessment tool or its application.  Such a survey was planned for the fall 2014 offering of ECE 150 but abandoned due to some necessary teaching reorganization in the department, which resulted in the principle investigator moving away from teaching programming altogether and the second investigator needing to design the first offering of a similar course in a different program.  Over the next two years, this led to the biggest insight of the project – although our tool was successful, we were likely attempting to automate something not very well aligned with student learning.

In the fall of 2014, the second investigator taught both ECE 150 and BME 121 (Digital Computation).  The latter was the new course, having roughly the same core content as ECE 150 but with a biomedical focus and a class size of 42 rather than 420 students.  The first bit of serendipity was him being assigned an outstanding teaching assistant for BME 121 with a passion for teaching and learning.

The second investigator had also been accumulating, for several years, ideas from the scholarship on teaching and learning, particularly around student self-efficacy, threshold concepts, participatory learning, MOOCs, and outcomes alignment.  Although the time pressure of both investigators designing new courses didn’t allow us to extend trials of our new style checker, the small class size of BME 121 allowed the second investigator to observe student learning very closely and to experiment with structural changes at low risk, starting to apply better teaching and learning strategies.  The biggest change in 2014 was to conduct the midterm and final exams online.  The second bit of serendipity was that this act caused submission, marking, and feedback of every assignment and test question in the course to follow the same procedure.  This caused us to think more about how students see the assignments and marked feedback and the processes by which the instructor handles files for marking.

The third bit of serendipity happened in August 2015.  With the release of Windows 10 and upgrading to Office 365, the second investigator found himself facing 1 TB of cloud storage on his desktop (Microsoft OneDrive).  While plodding through the process of setting up the learning management system, he received an email for free cloud services (Microsoft Azure), including virtual machines and web servers.  After a few days of experimentation, he was able to set things up so students in BME 121 accessed all course materials via OneDrive and submitted any files for marking via a website and web service hosted in Azure, by which those files also made their way to OneDrive.  This suddenly removed all limitations around file types and around automation of any file-handling activity, massively improving the first basic activity, the one we had initially ignored.

Returning to the question of assignment marking and feedback, we made the following observations.  The grading scheme is already structured to make it impossible to pass the course by cheating on assignments, meaning it doesn’t matter whether assignment grades are generous or stingy.  Students are hugely motivated by the assignment grades so having them is important.  Most students need no feedback at all since it is fairly obvious when the solution to a programming assignment (at this level) works properly.  Students most in need of feedback need it while they are doing the assignment — not after it is completed.  Good style is a secondary concern if the student is struggling with syntax and semantics.  Otherwise, students seem receptive to style encouragement or drift naturally to a good-enough style of their own.

This led to many changes in BME 121 for the fall 2015 term with the changes being focused on new assignments.  First, we made it an explicit course objective that students understood they were being given full trust and responsibility for their own learning.  Small incentives were added for helping classmates learn.  The course was moved partly in a flipped-classroom direction by removing most presentation material from tutorials and labs, leaving all of that time for directed experiential learning, including work on assignments.  Students were offered any amount of assistance they required from the teaching team to complete any assignment, up to and including walking through a full solution or extending deadlines (both very rarely needed).  The assignments were shifted from backward looking (applying previously learned concepts) to forward looking (needing material not taught yet).  With the ease of tracking, individual students having trouble were identified early and invited for extra help.  Learning outcomes, particularly as measured by confidence in answering final exam questions, seemed significantly improved in comparison with the previous year.

In summary, the project wound a peculiar path to its findings.  We had mild success on the hard technical problem of automating style feedback.  Serendipitous events led to significant improvements in the part we first thought to be trivially unimportant and a realization that automated feedback was probably less useful than careful design of assignments around learning objectives and student motivations.

Dissemination and Impact

At the individual level, a work term report [1] was published by Saad Ilyas, a co-op student, on the development of the assessment tool for style analysis.  Additional dissemination of the findings related to BME 121 is still a possibility. 

The largest impact from this research was observed on BME 121, starting in the fall 2015 term.  For the second investigator, this was the first time in about 25 offerings of similar courses over about 30 years that every student submitted a viable solution to every assignment and test question in the course.  Although basic metrics such as average or median grade did not change much, there seemed to be an increased self-confidence in students during the final exam and we observed a very large increase in the number of students producing essentially perfect answers to all questions (about 1/3 of the class compared with the more typical 1/25 we would expect to see based on all previous offerings). 

References

[1]    Saad Ilyas, “Analysis of Style Assessment in ECE 150”, ECE Work Term Report, Waterloo, Ontario, Canada, January 13, 2014.

Return to "browse projects"