I am a founding member at World Labs. I curate petabyte-scale datasets and train large diffusion models that power Marble, our first product allowing users to generate, edit, and export 3D worlds. My data work has also shaped the pre-training strategy of RTFM, our research on Real-Time Frame Models.
I finished my PhD in Computer Science at the University of Michigan in May 2024, advised by Justin Johnson. My PhD work centered on visual representation learning, vision-language models, and image segmentation. I am glad to have enjoyed working on these topics in academia back then, before they became mainstream in the current wave of Gen AI products since the virality of ChatGPT release in 2022. My thesis, titled Language Supervision for Computer Vision, is available publicly.
These days, my favorite pockets of time at work and in personal projects are with data. I enjoy the process of iteratively growing web-scale datasets, increasing their quality density by hand-designing data transforms to filter or select samples to better train generative models. I like finding simple ways to source data, be it from the internet or by manually recording videos. I like vibe coding web interfaces to manually hand-annotate samples or to simply eyeball thousands of images. During and before my PhD, I worked on three dataset projects: nocaps, RedCaps, and COCO-ReM. For the last one, I manually inspected and refined nearly 40,000 segmentation masks to ensure high quality. Those few days were very exhausting, yet very satisfying.
In my free time, I love rickrolling all my friends.






I managed to preserve my ‘firsts’ from back in 2015 on Github, and I try to keep them functional for as long as I can. These are my humble beginnings.