Hanjie Chen

WEBSITE(S)| GitHub

SURF Mentoring

Potential projects/topics: Large Language Models (LLMs) such as GPT-4 have demonstrated impressive capabilities in generating human-like text and assisting with a wide range of tasks. However, their ability to align with human intentions, values, and ethical standards remains a significant challenge. As LLMs are increasingly integrated into critical applications, ensuring that these models understand and act in ways that reflect human intent—while avoiding harmful, biased, or unintentional outputs—is paramount. This project explores the use of interpretation techniques as a means of aligning LLMs with human goals, improving transparency, and enhancing human control over their outputs. Specifically, students will develop interpretation methods that allow human users to better understand how LLMs make decisions, offering insights into the reasoning behind model predictions and outputs. Additionally, students will develop methods to modify LLM training and decision-making processes so that the models' behavior consistently reflects human values, goals, and ethical standards.

Potential skills gained: Interpretable Machine Learning, Natural Language Processing

Required qualifications or skills: Python, Machine Learning

Direct mentor: Faculty/P.I.

Research Areas

Hanjie is broadly interested in Natural Language Processing, Interpretable Machine Learning, and Trustworthy AI. Her research focuses on understanding the properties, mechanisms, and capabilities of neural language models, enabling their alignment, interaction, and collaboration with humans, and enhancing their impact on real-world applications such as medicine, healthcare, sports, and more. By developing explainable AI techniques, she aims to make intelligent systems controllable by system developers, accessible to general users, applicable to various domains, and beneficial to society.