Powering Real-Time AI at Pinterest: Feature Management and Serving at Scale with Galaxy and Scorpion.
Learn about Pinterest’s Galaxy Platform: architecture, real-time feature serving, and lessons at scale.
Learn about Pinterest’s Galaxy Platform: architecture, real-time feature serving, and lessons at scale.
How do you serve fresh, consistent, and highly reliable machine learning features alongside large-scale model inference—across billions of entities and hundreds of millions of users—in real time? At Pinterest, this challenge is met by the joint power of Galaxy, our online feature store, and Scorpion, our feature fetching and model inference platform.
In this talk, we'll share the architecture behind both Galaxy and Scorpion, discuss our journey evolving them to meet ever-growing demands, and highlight key lessons learned from supporting sub-second serving of features and models at scale.
We'll detail our strategies for data freshness and how we serve low-latency features for both traditional ML and LLM-driven systems—all while overcoming operational and engineering challenges unique to Pinterest's scale. We'll also share how Scorpion's online inference architecture leverages CacheLib for efficient, large-scale feature fetching from Galaxy, supporting model inference across billions of requests at tens of milliseconds latency.