Recent Projects

(click on the pictures to explore)

From Nuisance to News Sense

In preparation, with Ziqi Ding and Zhijin Wu

We build a novel interface to help readers make sense of news article clusters; automatically identifying, highlighting, and linking supporting evidence and contradictory statements across multiple documents. Interactive demo.

Currently Anonymized Question Answering Paper

In submission at EMNLP 2023

We seek insight into how modern QA systems work, and to produce efficiency gains along the way.

LAIT 🥛: Layer-adjustable Late Interaction in Transformers

ACL 2023, with Tal Schuster, Annie Louis, Alex Fabrikant, Donald Metzler, and Javad Hosseini

We study the extent to which cross-segment attention is needed for multi-segment reasoning tasks. We found that partially parallel segment processing (ie, late interaction) can enable segment caching and reduce latency without harming performance for many reasoning tasks.

Evidence Graphs and Adaptive Retrievers

In progress, with Emma Strubell.

We aim to distil the knowledge of sota QA systems, as well as sota document representation, to passage-legel "evidence graphs," and to then use an adaptive retriever to efficiently traverse the graph. This should render various forms of question answering more interpretable, fast, and accurate.

Transparently Synthesized Speech

With Jessica Huynh, Jiatong Shi, and Xuankai Chang

We developed a new method for voice conversion that continuosuly shifts between speaker identities, creating an eerie effect. This achieves improvements in speech anonymization and naturalness, and I hope to develop extended uses for emotion and personality-design in human-computer interaction.

Structure-aware Curriculum Learning

In progress, with Emma Strubell

We investigate the use of different curriculum learning strategies for fine-tuning expert-domain document representation system. We achieve state of the art results for scientific document representation, and introduce a new benchmark for legal document representation.

Measuring Consensus with Linguistic Self-similarity

In progress

We demonstrate that language models can be a powerful tool to measure echo chambers and identify cross-community ideological similarity.

Paths to Power

With Adarsh Mathew and James Evans

We performed a computational study of the behavior of "power users" on Reddit. We found that users move through the social network in strategic ways, and that as they rise to power they increasingly use their power to drive division. We suggest new strategies for taming radicalization and addressing misinformation networks online.

Aligning Multidimensional Worldviews and Discovering Ideological Differences

EMNLP 2021, with Adarsh Mathew and James Evans

We use techniques from multilingual embedding alignment to automatically discover polarized word meanings across communities, enabling a large-scale, multidimensional, and unsupervised study of worldview differences online.

Networked Influence on Reddit

INDE 2021, with Adarsh Mathew and James Evans

Using methods from link analysis, we identify the most influential and powerful users on Reddit. Then, we explore how they move across communities over time.

Representing Repositories in the Open Source Ecosystem

IC2S2 2021, with Yutao Chen, Deblina Mukherjee, and James Evans

Collected from over 28,000,000 repositories on Github, we built a training dataset, designed an evaluation suite (based on practical downstream tasks), and trained a set of baseline models for representing Github repositories. Also see work on Java vs. Python, in submission at CSCW 2022.

Creative Coding

Some art, some games, some interfaces

I love to write beautiful code that makes beautiful things! Sometimes when I get hooked on a game (like NYT's "LetterBoxed") I write a program to let me play more than 1/day. Sometimes I write projects to help me visualize data. Sometimes I turn classic algorithms into art :)

Green Homesteads

Finalist (top 0.25%), The Economist's Open Future Essay Competition

I developed an economic plan to combat climate change. My "green homesteading" program would simultaneously: create market incentives to promote energy farms, associate greentech with American entrepreneurship and innovation, and create a platform to address historic land inequity. You can read it here.

Simmer (formerly "Foodie")

YCombinator 2019

Some college friends and I built an app which helped people find the best dish on every menu. I worked on the machine learning pipeline, which involved named entity recognition, sentiment analysis, and working with data labeling contractors. I also built an trend-tracking tool for restaurants. The app is defunct, but I've documented some stuff here!

Older Work

Dungeons and Dragons

At the UChicago Fab Lab

In college, I played Dungeons & Dragons with my friends. More than just play, I loved to design worlds, characters, and even the tabletop objects for the game! At UChicago's Fab(ulous) Lab, Elise and I made artistic player tokens and laser-cut wooden architectural maps.

Programmable Turing Machine

Just for fun.

For an added layer of irony, I programmed it entirely using Elm.

Student Privacy Initiative

At the Berkman-Klein Center

As a research assistant intern at the Berkman Center, I assisted with research on privacy and EdTech. I also managed the Center's "This Week in Student Privacy" newsletter (complete with weekly cat pictures), and advised the MIT Media Lab's Lifelong Kindergarten team on communicating the remix philosophy :)

Programming in Scratch!

A walk down memory lane...

Although I started programming in Microworlds at a very young age, it was not until Scratch came about that I fell in love with the ability to build my own worlds, games, and systems with code. I attended the very first Scratch conference, and later had the privilege to work with the Scratch team.