Jeremiah Milbauer

My research is primarily in natural language processing and human-AI interaction, with a particular focus on human-centered evaluation of language technologies, and on building trustworthy knowledge tools. Some ongoing interests are:

Social impacts of AI-mediated information access
Algorithms and interfaces for social and scholarly sensemaking
Novel methods in computational social science & science of science
Bridging knowledge & culture gaps
Democratizing access to AI

Beyond Text: Domain Expert Needs in Document Research

[ACL Findings 2025] led by Sireesh Gururaja, with Nupoor Gandhi and Emma Strubell

Through conversations with practitioners in law, policy, and science, we explore the gap between domain expert needs and current NLP research goals.

Humanity's Last Exam

with over 600 other contributors

A large effort to write challenging questions that fool SOTA LLMs (circa 2024). My contributions focused on law, perception, and model induction via cellular automata. Happy to discuss this project, especially how to think rigorously (and when to be skeptical) about some of the claims made based on it.

Stereotype or Personalization? User Identity Biases Chatbot Recommendations

[ACL Findings 2025] with Anjali Kantharuban, Maarten Sap, Emma Strubell, and Graham Neubig

We show that large language models produce racially biased responses based on implicitly revealed information about users' identity, and that when questioned in further conversational turns, model responses obfuscate the effect.

Inside the Echo Chamber

In progress, presented at TADA 2023

Comparing language models trained on distinct communities can reveal echo chambers, and key points of linguistic divergence.

NewsSense: Reference-free Fact Verification via Cross-document Comparison

[EMNLP demo 2023] with Ziqi Ding, Zhijin Wu, and Tongshuang Wu

A novel interface to explore consensus and conflict in document clusters, powered by an automatic fact verification system for scalable pairwise NLI.

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

[CHI case study 2025] led by Tongshuang Wu, with the rest of CMU's Human-centered NLP class

Chain of thought isn't the only framework – we explored LLMs as components of crowdsourcing pipelines.

LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

[ACL 2023] with Annie Louis, Javad Hosseini, Alex Fabrikant, Donald Metzler, and Tal Schuster

We introduce an attention masking strategy to partially parallelize computation for multi-segment reasoning tasks, showing that simple structural changes in the network can dramatically increase computation efficiency without harming performance.

Paths to Power: How Reddit Users Rise, Fall, and Influence their Communities

[EMNLP 2021] with Adarsh Mathew and James Evans

We performed a computational study of the social trajectory of "power users" on Reddit, studying their trajectory across communities and characterizing the content they write, observing a correlation between a user's rise to power and increasingly divisive content.

Aligning Multidimensional Worldviews and Discovering Ideological Differences

[EMNLP 2021] with Adarsh Mathew and James Evans

We align unsupervised word embeddings to automatically discover polarized word meanings across communities, find unexpected conceptual homomorphisms, and enable future studies of ideological and worldview differences in text.

Networked Influence on Reddit

[INDE 2021] led by Adarsh Mathew, with James Evans

Using methods from link analysis, we identify the most influential and powerful users on Reddit.

Representing Repositories in the Open Source Ecosystem

[IC2S2 2021] with Yutao Chen, Deblina Mukherjee, and James Evans

Using data from over 28M Github repositories, we train and evaluate repository representation vectors based on social, semantic, and structural features in the codebases.

Green Homesteads

For The Economist, shortlisted finalist for the Open Future Essay Contest

A plan to subsidize the next wave of green energy investment while simultaneously associating green energy with the spirit of American innovation and entrepreneurship. [Press] [Podcast]

Student Privacy Initiative

at the Berkman Center at Harvard University

I assisted with research on privacy and EdTech, managed the Center's "This Week in Student Privacy" newsletter (complete with weekly cat pictures), and advised the MIT Media Lab's Lifelong Kindergarten team on communicating the remix philosophy :)

In memoriam

Here lie some of the unfinished (but interesting) or unpublished projects of yore.

Sentence-level Reranking & Filtering for Fusion in Decoder

with Moshe Berchansky, Peter Izsak, and Emma Strubell

We developed a technique for sentence-level reranking and filtering of evidence passages, based on a signal of which sentences provide sufficient information to the FiD reader model. Ultimately RECOMP arrived at the same algorithm, applied to LLM RAG.