How Does ChatGPT Work? [Detailed Analysis & Insights]

Large Language Models, GPT, LLM Parameters, Prompt, Attention Mechanism

How to become the expert authority in AI, Machine Learning & GenerativeAI?

I am starting the AI/ML Live Course in October, pre-book your seat.

Table of Contents

You have heard a lot of noise about large language models (LLMs), one of which is GPT.

GPT stands for Generative Pre-Trained Transformer.

The first transformer was Google's BERT(Bidirectional Encoder Representations from Transformers), but it didn’t have a human-like response.

In 2020, GPT-3 was launched with better responses than any other LLMs.

You and I will dive deep and build our intuition around ChatGPT.

To understand how it works, let’s understand other technical parts of it.

Professionals don’t know the correct answers to common interview questions around ChatGPT, that's what I am covering in this edition.

Let’s go!

What exactly is GPT?

Let’s break everything down -

  • Generative

  • Pre-Trained

  • Transformer

Generative

The word “Generative” comes from statistics.

When I was doing my masters there was a subject called statistical modeling.

In statistical modeling, you will find a branch of generative modeling.

Generative modeling means you generate/predict numbers based on previous numbers and probability.

Whether you generate images or text, the machine is ultimately generating numbers.

Pre-Trained

As humans how do you learn?

You generally don’t jump to complex topics before fundamentals.

You first learn foundational moves in chess, how chess pieces move, you practice it and then go compete with other player.

Similarly, pre-trained models work.

Pre-trained models

You train these models on foundational tasks and use them to fine-tune complex tasks.

These models create their memory called parameters that are optimized based on the learning it got from the data.

Instead of training a model again, you can use the already trained model and fine-tune it to your specific use case.

To train these models you need a huge amount of data.

GPT-3 is trained on a huge corpus of text with 5 datasets - Common Crawl, WebText2, Books1, Books2, and Wikipedia.

These datasets contain around half a trillion words, which is sufficient to train the model for understanding the relationship between words, the grammar, the formation of sentences, which word will come next, etc.

Transformer

Transformer is a neural network architecture.

To communicate with the machine you need to learn binary language.

Later we came up with a programming language, you can learn Python and give instructions to the machine.

But now you can directly give instructions in the English language.

The gap between humans and machines is reduced, you don’t need to learn programming to interact with a machine.

Transformer was introduced in a research paper Attention Is All You Need in 2017 by Google researchers.

Transformer Architecture

How does LLM generate a response and is different from a normal Google search?

Google search is a semantic search, it searches the databases based on the context, intent, and keyword of the user’s query and gives relevant results.

You read the text in a scannable manner, you don’t remember the text word by word, you only remember important information.

Similarly, language models only keep important information in the form of parameters.

GPT-3 is trained on 500 billion words from the text corpus of 5 datasets mentioned earlier.

After the training, it extracted 175 Billion important parameters (consider parameters as the memory of the model).

When you query a large language model, it generates a response from the parameters(its memory) not from the data it trained on.

Just like humans do, If I ask you - “What is a computer?“ you will not answer similarly to the definition that you read in textbooks word by word, you will respond in the way you understood the meaning of “computer”.

What are LLM parameters?

Each parameter affects how the model understands the natural language.

As LLM is nothing but a neural network, it has weights and biases.

That’s what the parameters are, weights will show how strongly the words and phrases are connected.

The biases are the contact values that work as a starting point for the model's understanding of data.

It also contains a vector representation of the words in numerical embeddings.

Does GPT get better over time?

The answer is NO.

You have to train the model again on newer data.

This is a challenge for language models as you will not get a response on the current date because it is trained till March 2022 dataset.

Every time you query ChatGPT it will store your questions and their responses in the database but the GPT model is not continuously learning and getting better from those user interactions.

It is giving you the response based on the learned 175 billion parameter it generated from the huge 500 billion tokens.

The GPT was trained on data and created this huge complex n-dimensional matrix of numbers we call parameters.

Analogy -

When we as humans learn something, we try to get all the information(data) that we can break down into tokens, then we create our understanding and remember only important things about it (parameters).

Note - To make LLMs connect with external data sources for better response to current information you can use RAG strategies. (Will cover RAG in further editions)

How Does ChatGPT Work?

Working flow of LLM predicting the next word | Image - NVIDIA

There are 171,476 words in the English language, you can assign a probability for each word to be the next in the sentence “The sky is ……“.

The word with the highest probability will win the spot, in this case, “blue“.

LLMs do not improve over time they always start fresh.

How did we humans come up with the word “blue“?

We have been reading the English language for years, we don’t remember sentences word by word, but our understanding of phrases, the relationship between words, and our knowledge gave us the solution that the next word will be “blue”.

What is a Prompt? Why it matters?

The input you give to LLMs is a prompt.

To get a better response you should give a better prompt.

You should adapt to the way LLM works.

The more clear you define the prompt the more better you get the response.

Because LLM predicts the next word based on the previous word, the more words you provide in the prompt, the easier it will be for the model to find patterns between words and generate better responses.

Attention Mechanism

LLM works on the attention mechanism. For example -

Himanshu is an AI consultant. He is going to solve your problems.“

In this sentence, “Himanshu“ and “He” both have a relationship, as a human you understand that “He” is used for “Himanshu” in the text, that’s attention.

Transformers can keep this attention information for long text, that is why you have seen in ChatGPT that in a new chat, if you ask something that you mentioned in the chat earlier, ChatGPT will know what are you talking about.

ChatGPT works forward, not backward

Whenever you prompt ChatGPT it will generate the next word.

It is just next word predictor, in real time.

Let’s say I gave the input: “What is the capital of India?“

ChatGPT will respond with: “The capital of India …..“

Every time it predicts the next word, that word will become part of the input sequence.

How does ChatGPT generate human-like responses?

GPT-3 was a base model, which is trained on a huge dataset.

The problem with the base model is, that it understands the patterns in the text and generates responses based on those patterns.

For example -

If you ask 2 questions like this:

What is the capital of India?
What is the capital of China?

It will detect the pattern and generate the response like this:

What is the capital of India?
What is the capital of China?
What is the capital of Sri Lanka?

As a user you don’t want a response like that, if you ask 2 questions, you must get 2 answers, not another question.

That was the problem with the base model.

To solve this problem OpenAI came up with an Instruction Manual.

They fine-tuned the base model on this instruction manual as if you ask 2 questions, you will get 2 answers.

They hired people to label the best responses manually.

So,

  • ChatGPT does not know anything.

  • It does not have self-awareness.

  • It does not have consciousness.

Why do I get different responses every time I ask the same question to ChatGPT?

As it generates the next word the actual information about the topic will be the same, but the sentence formation and the pattern are different.

LLM generates a probability distribution for the next word, each time this distribution will be different.

ChatGPT generates responses using probabilistic methods, using a technique called sampling.

Non-deterministic probability distribution sampling (it means more than 1 possible outcome).

From a probability distribution of possible next words.

Randomness is introduced through temperature (controls the level of randomness in choosing the next word).

  • LLMs will always start fresh

  • LLMs do not improve with use

  • To get a better response, you must adapt to LLM

Conclusion

Large Language Models will not change your life, it is a technology that you will learn and move on.

LLMs are not magic, there is a technical part to it.

It is also not the answer to all the problems in your organization.

In some business scenarios machine learning will work best.

You need to understand when to use LLMs.

The AI world is moving fast; there is no first-mover advantage; it is the fast-mover advantage.

Learn fast, build fast, win fast, and move fast.

Until next time.

Happy AI

Pre-book AI, Machine Learning & GenerativeAI
Live Course

  • Pre-booking guarantees you a spot in this highly sought-after cohort.

  • When registration starts, you will be the first to get an early bird discount.

  • Registration will begin on 15th October 2024 [ midnight IST ]

  • This is what we are going to cover - Roadmap [October Live Cohort]

Getting Started Resources

About me

Himanshu Ramchandani

  • I’m Himanshu Ramchandani, I am from India.

  • I am an AI Consultant with close to a decade of experience.

  • I worked on over 100 Data & AI projects in [Energy, Healthcare & Law Enforcement].

  • I am running a solopreneur content business.

  • I am the Founder of a Data & AI Solutions company [Team of 7].

  • I focus on action-oriented learning in ML, DL, MLOps, Generative AI & System Design with implementation drills.

  • In the last decade, I have never stopped sharing my knowledge and have helped over 10000 leaders, professionals, and students.

Want to work together? Here’s How I can help you

  • Dextar: Your data to AI solutions - Request a Brainstorm

  • AI/ML Live Cohort for Leaders: Add hybrid skills to your portfolio and build a strong AI expert profile - Know more

  • Notes that will put you in the top 1% of Data & AI Experts [New notes dumping soon].

  • Feel Stuck in your AI career? - FREE Career Consultation

  • Get your product or services in front of 50k+ tech professionals. [Sponsorship] - [email protected]

Testimonial from AI Consultation

Socials

Be part of 50,000+ like-minded AI professionals across the platform

The Drag & Drop

If you don’t find the email, it will probably be in the promotions. Drag and Drop the Email in the primary Inbox of Yours

Drag and Drop the Email into Your Primary Inbox

A big THANK YOU for being part of my learning journey.

Keep Inspiring!

So often you find that the students you are trying to inspire are the ones that end up inspiring you.

Realization 101

PS: Reply to this email with what AI content you want me to share with you.

I will forever be grateful

Reply

or to participate.