Artificial intelligence in Math assessment

With the growing presence of generative artificial intelligence (genAI), educators are wondering how this will influence their assessments and course design going forward.

There are many webpages, memos and resources from the university on genAI. The aim of this one is to present information most relevant to mathematics instructors, keeping in mind the most common delivery and styles of assessments in the faculty.

As a note: some sections of this page refer to ChatGPT specifically instead of generative artificial intelligence more generally. Much of the research currently being done on genAI is focused on studying ChatGPT, but general principles continue to be true across models.

The information on this page was last updated in Winter of 2024.

“Should my students be allowed to use generative AI?”

There is no universally applicable answer to this question. The choice to incorporate or discourage use of genAI on assessments depends on many factors, including but not limited to the type of assessment, the goals of the course, or whether use of genAI may be an important skill for students in their future workplace.

After considering these details, you may wish to adapt assessments to better align with how you wish students to use (or not use) genAI.

Assessment type Ways to incorporate use Ways to discourage use
Tests/exams Questions where students have to critique a genAI output Generally not a concern if in-person/proctored
Online quizzes Questions where students have to identify errors in a genAI output

Remind students that they will later have to do these types of questions without AI assistance, so practice is important

Lower stakes quizzes and flexible grading (drop lowest x) may encourage students to focus on learning/self-assessment rather than the grade

Assignments

Use AI to improve or change the tone of writing e.g. for data analysis

Use AI to do a first draft of documenting code or turning it from one language into another

Include reflective questions e.g. how students have progressed towards goals

Reference course-specific content (lectures, discussion boards, prior assignments)

Projects

Encourage AI use for outlines or organization

Allow students to use AI provided they include the prompts they used in the appendix of the report

Include an oral presentation/poster session where students can answer questions about their work

Use a scaffolded approach to show process and allow for feedback throughout

Include a question/section on reviews of other students’ projects

In-class activities

Show examples of genAI outputs (emphasize capabilities/limitations)

Generally not a concern for instant activities (e.g. clicker questions)

For activities involving group work where students can consult resources, encourage students to focus on the learning aspect rather than just getting an answer

Self-study Encourage students to ask AI to explain course concepts or create sample questions to practice Warn students that the explanations may be convincing but fundamentally incorrect, leading to confusion

In general, assessments that are more specific to course content or activities are less able to be completed by AI. For example, if students are asked to reflect on something that happened in class, or comment on fellow students' presentations, they must go through the process of reflecting themselves (even if they eventually feed their thoughts into an AI to be rewritten).

As well, it is important to consider the purpose of an assessment. Another question you might ask yourself is “If students did use AI to complete some or all of this assessment, would they still be honouring the purpose of the assessment?” We see this often in mathematics with other technologies, where, for example, software such as Wolfram Alpha will be allowed or encouraged on assessments that focus on interpretation of results, but not on computation-focused assessments.

Common questions

Is the information provided by ChatGPT accurate?

ChatGPT presents its information with confidence, no matter its accuracy. It has been known to invent/‘hallucinate’ facts and sources, so asking it to justify its claims is often a futile exercise (Alkaissi & McFarlane, 2023).

Because of the way it generates answers, it may contradict itself, even when given an identical prompt. It is important to emphasize this to students. In particular, if you allow or encourage them to use a chatbot as a tutor/“study buddy”, they may be lead to believe that this is an accurate and sufficient method of studying. It can be especially difficult for a new learner to identify when ChatGPT is leading them astray, as it can encourage a “flow state” of learning and conversational exchange that results in less critical examination of information (Stojanov, 2023).

Can generative AI accurately complete math work?

The current abilities of ChatGPT are varied. Since it is a large language model, it tends to be better at written tasks. Its ability to complete assessments will depend largely on the type and delivery of content. Among math subjects, it has been observed to be better at programming as compared to mathematical exercises (Kortemeyer, 2023; Nikolic et al., 2023).

The abilities of ChatGPT continue to evolve, so it is not recommended to attempt to make “AI proof” assessments designed around its current limitations. It is important to keep a focus on the learning outcomes of assessments as our understanding of new technologies continues to change.

Can I detect when students are using AI?

While it is important to understand when and in what ways students may be using AI, it can be hard to detect. There exist AI detection tools (typically combined with other established plagiarism checking software), but they are of varying accuracy, and have been found to be worse than what is promised by the companies who develop them (Perkins et al., 2024). It has also been shown that some AI detectors report a significantly higher number of false positives when analyzing work of non-native English speakers, with an average false positive rate over 60% across 7 different detectors trialed (Liang et al., 2023).

The University recommends against using these tools. As well, they may not be considered sufficient to support an alleged academic integrity violation.

If you suspect students may have used artificial intelligence in an assessment, you are welcome to discuss with the student and allow them to further explain their work.

How can I minimize the ethical concerns and environmental impact of AI in my courses?

There is a lot that continues to be unknown about new technologies as we study their capabilities and the effects they will have on education and the world more broadly.

This question could fill a webpage of its own, but as a starting point: a relatively well-rounded introduction to this topic of concern is this YouTube video, a talk given by Dr. Serge Stinckwich, Head of Research at the United Nations University Institute in Macau, China. He addresses potential environmental risks (starting here) as well as technological and societal concerns.

What are the potential benefits of allowing the use of AI?

Depending on the learning goals and structure of the course, there may be reasons why you choose to allow students to use genAI. As outlined in the left column of the table above, you can make use of genAI in ways that support and further learning. You may also choose to allow the use of genAI to avoid having to police or detect its usage, though this too comes with its own challenges to be aware of.

As students enter the workforce, they may be expected to be proficient in using generative AI and other new tools. You may wish to incorporate genAI into your courses to increase students' practical AI knowledge.

What are employers looking for in terms of AI skills?

When discussing the potential benefits of using genAI, we must acknowledge the benefits that AI literacy can have for students outside of their time in the classroom. Some employers have begun to use genAI as part of their work process. This is still very new, and so the degree to which it may be used depends largely on individual employers.

As with any usage, we must remember that genAI is a tool, and human judgements remain key in the workplace. For more discussion, consider reading Hire Waterloo's Five ways AI will change the future of work or Embracing new technology.

You may wish to consider using genAI in the classroom similarly to employers to prepare students for their co-op placements or future employment. When considering this, important factors include: What might students be expected to be able to do? How would this fit into your current course design? How does this support existing learning goals?

How can instructors incorporate AI into their own processes?

Much of the discussion surrounding AI usage in education centers around students’ use of AI or academic integrity violations. We must also acknowledge that instructors may wish to incorporate these new technologies into their own practices.

If you are considering this, we encourage you to be transparent with your students about your usage. Depending on how and when you decide to use it, students may feel that they are being held to a different standard than instructors. How you navigate this will depend greatly on the extent to which you expect to use and regulate AI.

Some practical suggestions for use include:

  • Ask for discussion prompts or clicker questions to use for in-class activities
  • Create personalized case studies for student groups, tailored to student interests
  • Generate a starting summary of a topic as a springboard for research and discussion – identify what it has missed

Some ideas above are from the Associate Vice-President, Academic on Artificial Intelligence at UW. More can be found in this table above, though it is primarily focused on student use.

It is important to familiarize yourself with the concerns and regulations around citation, transparency, privacy, and security. Specifically for instructors, you should be aware that there may be concerns around intellectual property if inputting content (course materials, student work, etc.). These tools may treat inputs differently, but generally they are not upfront about how your inputs may be stored or used. Again using ChatGPT as the most common example: according to OpenAI’s terms of use (the company responsible for ChatGPT), they “may use Content [defined as both input to and output from their Services] to provide, maintain, develop, and improve our Services” (OpenAI, n.d.).

Policy

In line with the recommendations from the Office of Academic Integrity, we encourage instructors to be explicit about their AI policies—how and when can it be used, how should it be cited, etc. In all cases, the emphasis should be on responsible, transparent, and ethical usage.

Out of the few math course syllabi in Fall 2023 that specify policies concerning genAI usage, 61% do not allow its use. 18% allow or encourage students to use AI. The remaining 21% make up the middle ground—these policies tend to be very individual depending on the course. Examples include: AI being allowed or encouraged only for self-study/a "24/7 tutor", only allowed to make up at most 10% of a submission, and more.

University regulations and recommendations

The university encourages instructors to be explicit about their AI use policies. To support this, they provide templates for course outlines, which can also be easily added to course outlines created using outline.uwaterloo.ca.

They do not recommend using "surveillance or detection technology", and evidence from AI detection tools is not considered credible enough to support an alleged academic integrity violation. See "Can I detect when students are using AI?" for more discussion.

Further resources

University resources

Useful papers for further reading

Title Author (Date) Summary
ChatGPT versus engineering education assessment: a multidisciplinary and multi-institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity Nikolic et al. (2023) Course- and assessment-specific breakdowns of the abilities of GPT-3: strengths, weaknesses, and ‘opportunities’ for new assessment design (As a starting point, see pg. 569-572 on introductory math and programming courses)
Learning with ChatGPT 3.5 as a more knowledgeable other: an autoethnographic study Stojanov (2023) Interacting with ChatGPT as a tutor. Found it to be supportive and easy to use, but incorrect and contradictory in subtle ways that often went unnoticed. Led to overestimating knowledge and understanding
Could an artificial-intelligence agent pass an introductory physics course? Kortemeyer (2023) Studying ChatGPT abilities regarding an introductory physics course, including comparisons between the behaviour of ChatGPT and beginning learners

References

University-specific information can be found under University regulations and recommendations and University resources, where all relevant webpages are linked.


Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: Implications in scientific writing. Cureus, 15(2): e35179. https://doi.org/10.7759/cureus.35179

Dahlkemper, M. N., Lahme, S. Z., & Klein, P. (2023). How do physics students evaluate artificial intelligence responses on comprehension questions? A study on the perceived scientific accuracy and linguistic quality of ChatGPT. Physical Review Physics Education Research, 19(010142). https://doi.org/10.1103/PhysRevPhysEducRes.19.010142

Kortemeyer, G. (2023). Could an artificial-intelligence agent pass an introductory physics course? Physical Review Physics Education Research, 19(010132). https://doi.org/10.1103/PhysRevPhysEducRes.19.010132

Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, Z. (2023). GPT detectors are biased against non-native English writers. Patterns 4. https://doi.org/10.1016/j.patter.2023.100779

Nikolic, S., Daniel, S., Haque, R., Belkina, M., Hassan, G. M., Grundy, S., Lyden, S., Neal, P., & Sandison, C. (2023). ChatGPT versus engineering education assessment: A multidisciplinary and multi-institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity. European Journal of Engineering Education, 48(4), 559-614. https://doi.org/10.1080/03043797.2023.2213169

Perkins, M., Roe, J., Postma, D., McGaughran, J., & Hickerson, D. (2023). Detection of GPT-4 generated text in higher education: Combining academic judgement and software to identify generative AI tool misuse. Journal of Academic Ethics, 22, 89-113. https://doi.org/10.1007/s10805-023-09492-6

Stojanov, A. (2023). Learning with ChatGPT 3.5 as a more knowledgeable other: an autoethnographic study. International Journal of Educational Technology in Higher Education, 20(35). https://doi.org/10.1186/s41239-023-00404-7

Terms of use. (n.d.). OpenAI. Retrieved March 25, 2024, from https://openai.com/policies/terms-of-use