CS Tea: Laurel Orr '13
Connect to Virtual Event
Entity-Centric AI: Integrating Structured Data into ML Pipelines
Industrial machine learning pipelines are experiencing a paradigm shift from customized architectures and hand curated features to self-supervised model ecosystems where models are trained without manual labels and adapted to hundreds of downstream tasks. An increasingly important component of these pipelines is their integration of structured entity knowledge, often through entity embeddings, to downstream tasks. In this talk, I will first motivate the use structured entity data, both from a ML performance and systems perspective. I will then introduce my main research project in this space, an entity extraction system, Bootleg, designed for the long-tail of entity data, and how this project, which started a vague drawing on a whiteboard, ended up as the anchor point for my entire research trajectory. The goal of this talk is to provide a few interesting technical nuggets about how to build and maintain entity-centric pipelines as well as give a "brass tacks" talk of my experiences in academic research and how I landed on this research direction.
Biography: I am currently a PostDoc at Stanford working with Christopher Re as part of the Hazy Research lab. In August of 2019, I graduated with a PhD from Paul G Allen School for Computer Science and Engineering at the University of Washington in Seattle. I was part of the Database Group and advised by Dan Suciu and Magdalena Balazinska. For my undergraduate degree, I went to Carleton College in Northfield, MN, where the city's motto is "Cows, Colleges, and Contentment" and graduated in 2013 as a Computer Science and Mathematics double major.