the master
Posts
How Did Uber Build Its GenerativeAI Platform? [Case Study and Technical Analysis]

How Did Uber Build Its GenerativeAI Platform? [Case Study and Technical Analysis]

Uber's Journey from Machine Learning to Deep Learning to GenerativeAI. What tech leaders should learn from it?

Himanshu Ramchandani
October 19, 2024 • Estimated Reading Time: 8 minutes

Live GenerativeAI & ML Course

You already know the power of Generative AI, you can now talk to a computer without learning programming.

Generative AI can create new content—like text, images, or even code—based on what it’s learned from huge amounts of data.

Uber, which connects people all over the world through rides and deliveries, is using this technology to make their business more efficient.

But how did they build a system that could work at Uber’s scale?

Because you already know that implementing such a system isn’t easy.

So, what Uber did and what you and I can learn from it? More Here.

Let’s dive!

Introduction
Michelangelo 1.0 [2016 - Uber’s Machine Learning T …
- Problems with Michelangelo 1.0
Michelangelo 2.0 [2019 - Uber’s Deep Learning Tran …
- Michelangelo Studio (MA Studio)
GenerativeAI Gateway [2023 - Uber’s GenAI Transfor …
- What mistakes did Uber make?
- What should a Leader learn from Uber's mistakes?
Conclusion
The Learning Paths

Introduction

Uber operates in over 900 cities across the globe, providing not just ride-hailing services, but also food delivery, freight logistics, and more.

Serving 10 million real-time predictions per second. That’s huge.

With such a massive, complex operation, they needed an AI system that could keep up with the demands of millions of users, drivers, and employees.

Relying on external AI providers like OpenAI’s GPT wasn’t a perfect fit for Uber.

Here’s why-

Cost
Scalability
Customization

From 2016 to 2023 the workflow of scalable systems changed and became better.

From 2016 to 2023 - Uber

Michelangelo 1.0 [2016 - Uber’s Machine Learning Transformation]

Uber's ML journey began in 2015 with scientists using Jupyter Notebooks for model development.

The issue with using Jupyter Notebook for Engineers is creating custom pipelines for deploying models, leading to inconsistent workflows.

There was no system for reliable and reproducible training or prediction workflows at scale.

In early 2016, Uber launched Michelangelo to standardize ML workflows to solve this pain.

Michelangelo provided an end-to-end system for building and deploying scale ML models.

The platform focused on scalable model training and simplified deployment to production serving containers.
This eliminated the need for custom serving containers, streamlining the deployment process.

After this implementation this happened -

efficient model development and deployment
efforts and enhance collaboration across teams at scale

Problems with Michelangelo 1.0

Undefined quality metrics and project importance
Challenges in collaborative model development
Limited support for Deep Learning (DL) models
Fragmented tools and developer experience

Michelangelo 2.0 [2019 - Uber’s Deep Learning Transformation]

The focus shifted from simply enabling ML everywhere to concentrating on high-impact ML projects.

This change aimed to improve the performance and quality of these projects, leading to greater business value for Uber.

To tackle the challenges of Michelangelo 1.0, Uber launched Michelangelo 2.0, which restructured the ML platforms into a unified system.

Michelangelo 2.0 Architecture

Michelangelo 2.0's architecture supports a flexible, plug-and-play platform with both in-house and third-party components for ML engineers and applied scientists.

It focuses on high-impact ML use cases.

The system leverages Kubernetes and uses -

an offline data plane for big data processing
an online data plane for real-time inference

Michelangelo Studio (MA Studio)

MA Studio

It brings all of Uber's ML tasks—like data prep, model training, and deployment into one easy-to-use platform.

It tracks how models and data features are performing.

GenerativeAI Gateway [2023 - Uber’s GenAI Transformation]

Uber is using generative AI to boost automate operations and improve user experiences.

They developed the Gen AI Gateway to give teams access to both external and in-house LLMs while ensuring security, cost management, and privacy.

Categories of GenAI use case at Uber

Michelangelo now supports LLM fine-tuning, deployment, and performance monitoring, with tools like

model catalog
LLM evaluation framework
prompt engineering toolkit

Uber has enhanced its LLM training infrastructure by integrating tools like Hugging Face, Deepspeed, and elastic GPU management to support larger models and efficient training.

These platform upgrades enable teams to build and optimize LLM-powered applications, and Uber plans to share more about their advancements in LLM production soon.

What mistakes did Uber make?

Ignored the importance of user feedback early on.
Overcomplicated the initial platform, leading to confusion among teams.
Failed to prioritize the integration of existing tools and services from the start.
Underestimated the challenges of managing large data sets and ensuring data quality.
Didn't build enough flexibility into their AI systems for future adjustments and scaling.
Focused too much on advanced features without addressing basic usability concerns first.

What should a Leader learn from Uber's mistakes?

encourage regular feedback from users during development.
simplify the platform to make it user-friendly and intuitive for all teams.
integrate existing tools and services seamlessly to enhance functionality.
focus on basic usability before adding advanced features to the platform.
establish robust data management practices to ensure data quality and reliability.
design systems with the flexibility to adapt and scale as needs change over time.

Conclusion

While Uber's venture into generative AI platforms showcases significant innovation.

It also underscores a troubling reality.

Many organizations rush to adopt these technologies without a robust strategy.

The excitement leads to a lack of rigorous testing and compliance, risking the deployment of flawed models.

For leaders,

It's crucial to ground their expectations in reality by prioritizing transparent model evaluation and continuous monitoring.

Don’t risk repeating the same mistakes that have plagued early adopters in the space.

Until Next Edition.

Happy Case Study!

I am starting the AI/ML GenAI Live Course on 5th November.

Don't know if you fit the live AI Course? Quick Chat or Call.

Testimonials

The Learning Paths

Live Interactive Course (• Starting November 5th)→ Interactive sessions, mentorship support, and assignment work, best for those who prefer study groups and discussion-based learning with close community members (Ideal for leaders, product managers, VPs, and data professionals).
Self-Paced Course (• Open) → Learn through recorded sessions(previous batch), at your own pace. Material and content will be the same as live sessions. You can be part of the self-paced community. All new updates in the curriculum, live guest sessions, and code will be accessible to you.
Do It Yourself (• Open Join Discord) → This is FREE for you. You can learn through all the resources available in this roadmap. Ask your doubts in the Discord community. Need any resources? Ping in the discord community channels and it will be shared with you.

How satisfied are you with today's Newsletter?

This will help me serve you better

Reply

or to participate.