SURF Mentoring
Potential projects/topics: Large Language Models (LLMs) such as GPT-4 have demonstrated impressive capabilities in generating human-like text and assisting with a wide range of tasks. However, their ability to align with human intentions, values, and ethical standards remains a significant challenge. As LLMs are increasingly integrated into critical applications, ensuring that these models understand and act in ways that reflect human intent—while avoiding harmful, biased, or unintentional outputs—is paramount. This project explores the use of interpretation techniques as a means of aligning LLMs with human goals, improving transparency, and enhancing human control over their outputs. Specifically, students will develop interpretation methods that allow human users to better understand how LLMs make decisions, offering insights into the reasoning behind model predictions and outputs. Additionally, students will develop methods to modify LLM training and decision-making processes so that the models' behavior consistently reflects human values, goals, and ethical standards.
Potential skills gained: Interpretable Machine Learning, Natural Language Processing
Required qualifications or skills: Python, Machine Learning
Direct mentor: Faculty/P.I.