PhD Seminar • Artificial Intelligence • A Critical Look At Tokenwise Reward-Guided Text GenerationExport this event to calendar

Monday, July 8, 2024 — 10:00 AM to 11:00 AM EDT

Please note: This PhD seminar will take place online.

Ahmad Rashid, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Pascal Poupart

Large language models (LLMs) can significantly be improved by aligning to human preferences — the so-called reinforcement learning from human feedback (RLHF). However, the cost of fine-tuning an LLM is prohibitive for many users. Due to their ability to bypass LLM finetuning, tokenwise reward-guided text generation (RGTG) methods have recently been proposed. They use a reward model trained on full sequences to score partial sequences during a tokenwise decoding, in a bid to steer the generation towards sequences with high rewards. However, these methods have so far been only heuristically motivated and poorly analyzed.

In this work, we show that reward models trained on full sequences are not compatible with scoring partial sequences. To alleviate this issue, we propose to explicitly train a Bradley-Terry reward model on partial sequences, and autoregressively sample from the implied tokenwise policy during decoding time. We study the property of this reward model and the implied policy. In particular, we show that this policy is proportional to the ratio of two distinct RLHF policies. We show that our simple approach outperforms previous RGTG methods and achieves similar performance as strong offline baselines but without large-scale LLM finetuning.


To attend this PhD seminar on Zoom, please go to https://uwaterloo.zoom.us/j/4834508760.

Location 
Online PhD seminar
200 University Ave West

Waterloo, ON N2L 3G1
Canada
Event tags 

S M T W T F S
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
  1. 2024 (179)
    1. September (1)
    2. August (4)
    3. July (16)
    4. June (17)
    5. May (23)
    6. April (41)
    7. March (27)
    8. February (25)
    9. January (25)
  2. 2023 (296)
    1. December (20)
    2. November (28)
    3. October (15)
    4. September (25)
    5. August (30)
    6. July (30)
    7. June (22)
    8. May (23)
    9. April (32)
    10. March (31)
    11. February (18)
    12. January (22)
  3. 2022 (245)
  4. 2021 (210)
  5. 2020 (217)
  6. 2019 (255)
  7. 2018 (217)
  8. 2017 (36)
  9. 2016 (21)
  10. 2015 (36)
  11. 2014 (33)
  12. 2013 (23)
  13. 2012 (4)
  14. 2011 (1)
  15. 2010 (1)
  16. 2009 (1)
  17. 2008 (1)