the master
Posts
How Does LLM Work?

How Does LLM Work?

GPT, Prompt, Token, Parameters, Attention Mechanism, ChatGPT vs Google Search, LLMs training, Working and Security

Himanshu Ramchandani
December 07, 2024 • Estimated Reading Time: 19 minutes

In partnership with

You have heard a lot of noise about large language models (LLMs), one of which is ChatGPT.

The first transformer model was introduced in the paper "Attention Is All You Need" by Google researchers in 2017.

The original Transformer model was a general architecture, and BERT was introduced later in 2018 as one of its applications.

There are environmental effects of training these models as well.

A study found that training GPT-3 consumed 1,287 MWh, making it equivalent to the carbon dioxide emissions from 550 round-trip flights from New York to San Francisco.

You and I will dive deep and build our intuition around how large language models like ChatGPT work.

Let’s go!

I divided this documentary into 4 parts

How LLMs really work?
The Environmental Time Bomb
The Free AI Trap
Fighting Back - What to do?

What Exactly is GPT? (Generative, Pre-Trained, Tra …
What is a Prompt and Why Does it Matter?
What Are Tokens, and Why Are They Important?
- Types of Tokens
Attention Mechanism
How Does ChatGPT Work?
What Are LLM Parameters?
Why Do I Get Different Responses to the Same Quest …
How Does LLM Generate Human-like Responses?
How is ChatGPT Different From Google Search?
Does GPT Get Better Over Time?
How are LLMs trained?
Environmental Impact of Training LLMs
So why are these costly AI tools FREE?
How Secure and Private Are LLMs?
Final Thought

In 2020, GPT-3 was launched with significant advancements.

Other LLMs like Turing-NLG (by Microsoft) also existed, but GPT-3's strength was its scale (175 billion parameters).

Fast forward to 2025, and there is an industry trend toward efficiency rather than just increasing parameters.

New models are also moving toward Mixture of Experts (MoE) and retrieval-based architectures rather than just raw scaling.

What Exactly is GPT? (Generative, Pre-Trained, Transformer)

Let’s break everything down -

Generative
Pre-Trained
Transformer

Generative

The word “Generative” comes from statistics.

When I was doing my master’s, there was a subject called statistical modeling.

In statistical modeling, you will find a branch of generative modeling.

Generative modeling is a class of models that learn the joint probability distribution of data and generate new data samples.

It’s not just about predicting numbers but about capturing data distribution to generate new instances.

But it can also confidently generate false or biased information that was part of the training text data.

Whether you generate images or text, the machine is ultimately generating numbers.

Pre-Trained

As humans, how do you learn?

You generally don’t jump to complex topics before fundamentals.

You first learn foundational moves in chess, how chess pieces move, you practice it, and then you go on to compete with other players.

Similarly, pre-trained models work.

Pre-trained models

You train these models on foundational tasks and use them to fine-tune complex tasks.

These models create their memory, called parameters, that are optimized based on the learning they got from the data.

Instead of training a model again, you can use the already trained model and fine-tune it to your specific use case.

To train these models, you need a huge amount of data, like fine-web on Huggingface.

GPT-3 is trained on a huge corpus of text with 5 datasets - Common Crawl, WebText2, Books1, Books2, and Wikipedia.

These datasets contain around half a trillion words, which is sufficient to train the model for understanding the relationship between words, the grammar, the formation of sentences, which word will come next, etc.

While these datasets are part of GPT-3’s training data, OpenAI has not disclosed the full dataset details.

The model was trained on a mixture of publicly available and licensed data, but the complete dataset remains proprietary.

Transformer

A transformer is a neural network architecture.

To communicate with the machine, you need to learn binary language.

Later, we came up with a programming language, you can learn Python, and give instructions to the machine.

But now you can directly give instructions in the English language.

The gap between humans and machines is reduced; you don’t need to learn programming to interact with a machine.

The transformer was introduced in a research paper, Attention Is All You Need, in 2017 by Google researchers.

Transformer Architecture

What is a Prompt and Why Does it Matter?

“Who is the prime minister of India?“ - is the prompt I passed to ChatGPT.

The input you give to LLMs is a prompt.

To get a better response, you should give a better prompt.

The more clearly you define the prompt, the better you get the response.

Because LLM predicts the next word based on the previous word, the more words you provide in the prompt, the easier it will be for the model to find patterns between words and generate better responses.

So, you should adapt to the way LLM works, not the other way around.

Are you thinking about how the prompt is interpreted by ChatGPT?

Well, the answer is tokens.

What Are Tokens, and Why Are They Important?

Tokens can be individual or partial words, as seen in the image below.

Large Language Models use tokens to measure 3 things:

The size of the data they trained on
The input they can take
The output they can produce

OpenAI tokenizer

The tokens will be converted into numeric embeddings, as all types of models process numbers only.

Each token is associated with a unique integer ID, which is how the model understands the text.

Token IDs for “I love Himanshu’s AI Newsletter“:

[40, 3047, 24218, 616, 6916, 885, 20837, 27055]

Here, each token corresponds to a specific number that represents it in the model’s vocabulary.

You can check any token and its associated IDs here.

Types of Tokens

There are typically three types of tokens used in LLMs:

Word Tokens

These are individual words. For example, "apple," "runs," and "cat" are word tokens.

Word-based tokenization is simpler but may struggle with rare or compound words.

Subword Tokens

These represent parts of words, typically used when a word is too rare or complex for the model.

For example, "unhappiness" might be split into subword tokens like "un" and "happiness".

It is also called Byte Pair Encoding (BPE), used by ChatGPT.

Character Tokens

These are individual characters (letters, numbers, punctuation marks).

This type of tokenization is very fine-grained and is usually used for languages with complex scripts or specific tasks like spelling correction.

One last thing, this system favors English and European languages, making it harder for non-Western languages to be understood properly.

Sponsor -

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

How does ChatGPT remember what I said earlier in the chat window?

Attention Mechanism

LLM works on the attention mechanism. For example -

“Himanshu is an AI Engineer. He is going to solve your problems.“

In this sentence, “Himanshu“ and “He” both have a relationship, as a human, you understand that “He” is used for “Himanshu” in the text, that’s attention.

Transformers can keep this attention information for long text, which is why you have seen in ChatGPT that in a new chat, if you ask something that you mentioned in the chat earlier, ChatGPT will know what you are talking about.

How Does ChatGPT Work?

Working flow of LLM predicting the next word | Image - NVIDIA

There are 171,476 words in the English language, you can assign a probability for each word to be the next in the sentence “The sky is ……“.

The word with the highest probability will win the spot, in this case, “blue“.

LLMs do not improve in real-time; they always start fresh.

While each ChatGPT session starts fresh (i.e., it does not remember previous interactions across different sessions), OpenAI does fine-tune models periodically, incorporating user feedback and reinforcement learning techniques to improve future versions.

How did we humans come up with the word “blue“?

We have been reading the English language for years, we don’t remember sentences word by word, but our understanding of phrases, the relationship between words, and our knowledge gives us the solution that the next word will be “blue”.

What's concerning is the probabilistic nature of "Intelligence"

When you interact with ChatGPT or any modern LLM, you're not communicating with an intelligent entity.

You're engaging with a sophisticated statistical pattern matcher that predicts the next token based on probability distributions learned from massive text.

ChatGPT works forward, not backward

Whenever you prompt ChatGPT, it will generate the next word.

It is just a next word predictor, in real time.

Let’s say I gave the input: “What is the capital of India?“

ChatGPT will respond with: “The capital of India …..“

Every time it predicts the next word, that word will become part of the input sequence.

I told you that GPT-3 is trained on 175 billion parameters.

What Are LLM Parameters?

Each parameter affects how the model understands the natural language.

As LLM is nothing but a neural network, it has weights and biases.

That’s what the parameters are, weights will show how strongly the words and phrases are connected.

The biases are the contact values that work as a starting point for the model's understanding of data.

It also contains a vector representation of the words in numerical embeddings.

Parameters are what the LLMs understand by text.

Why Do I Get Different Responses to the Same Question from ChatGPT?

As it generates the next word, the actual information about the topic will be the same, but the sentence formation and the pattern are different.

LLM generates a probability distribution for the next word; each time, this distribution will be different.

ChatGPT generates responses using probabilistic methods, using a technique called sampling.

Non-deterministic probability distribution sampling (it means more than 1 possible outcome).

From a probability distribution of possible next words.

Randomness is introduced through temperature (which controls the level of randomness in choosing the next word).

LLMs will always start fresh
LLMs do not improve with use in real time
To get a better response, you must adapt to LLM

How Does LLM Generate Human-like Responses?

GPT-3 was a base model, which was trained on a huge dataset.

The base model generally doesn’t give good responses as these models are trained without sufficient instruction fine-tuning.

For example -

If you ask 2 questions like this:

What is the capital of India?
What is the capital of China?

It will detect the pattern and generate the response like this:

What is the capital of India?
What is the capital of China?
What is the capital of Sri Lanka?

As a user, you don’t want a response like that, if you ask 2 questions, you must get 2 answers, not another question.

That was the problem with the base model.

To solve this problem, OpenAI came up with an Instruction Manual.

They fine-tuned the base model on this instruction manual as if you ask 2 questions, you will get 2 answers.

While instruction fine-tuning improves how ChatGPT follows human intent, this process was more structured with Reinforcement Learning from Human Feedback (RLHF), where human labelers ranked model responses to train ChatGPT on preferred behaviors.

They hired people to label the best responses manually.

So,

ChatGPT does not know anything.
It does not have self-awareness.
It does not have consciousness.

How is ChatGPT Different From Google Search?

Google vs ChatGPT

Google search is a semantic search, it searches the databases based on the context, intent, and keyword of the user’s query and gives relevant results.

In fact, Google search is more than just semantic search; it incorporates keyword-based indexing, semantic search, and neural ranking models to retrieve the most relevant results.

It also ranks web pages based on authority, backlinks, and many other factors.

You read the text in a scannable manner, you don’t remember the text word by word, you only remember important information.

Similarly, language models only keep important information in the form of parameters.

GPT-3 is trained on 500 billion words from the text corpus of 5 datasets mentioned earlier.

After the training, it extracted 175 billion important parameters.

When you query a large language model, it generates a response from the parameters(its memory), not from the data it was trained on.

Just like humans do, if I ask you, “What is a computer?“ you will not answer similarly to the definition that you read in textbooks word by word; you will respond in the way you understood the meaning of “computer”.

Does GPT Get Better Over Time?

The answer is NO.

You have to train the model again on newer data.

Every time you query ChatGPT, it will store your questions and their responses in the database, but the GPT model is not continuously learning and getting better from those user interactions.

However, while the base model does not self-update, OpenAI fine-tunes ChatGPT regularly, integrates user feedback, and applies reinforcement learning techniques to improve it over time.

Some versions of ChatGPT also have retrieval-augmented generation (RAG) to pull in more up-to-date information dynamically.

Analogy -

When we as humans learn something, we try to get all the information(data) that we can, break it down into tokens, and then create our understanding and remember only important things about it (parameters).

Note - To make LLMs connect with external data sources for better response to current information, you can use RAG strategies. (Will cover RAG in further editions)

What LLMs Are and Why They Matter

Large Language Models (LLMs) are trained on a very large amount of text and have a very large number of parameters.

It is more capable of understanding a complex and huge corpus of text data.

Data
Parameters
Performance
Computational resources
Storage and Inference Time

There were a lot of language models before transformer-based models like GPT and BERT, Here are some:

Hidden Markov models (HMMs)
Recurrent neural networks (RNNs)
Long short-term memory (LSTM)
Gated recurrent unit (GRU)

These models worked well in specific tasks, but the transformer-based models outperformed all of them.

If you learn about transformer architecture, you will understand that it solves the problems that you were facing in above mentioned models.

LLMs matter a lot to us as they have improved performance, broad generalization, few-shot learning, understanding of complex contexts, multilingual capabilities, and human-like text generation.

How are LLMs trained?

Large Language Models are trained on massive amounts of text data using transformer-based neural networks, which are made up of many layers and connections. Here's a simple breakdown.

The network has "nodes" connected across layers. Each connection has a weight (importance) and bias (adjustment).

Together with embeddings (how words are represented as vectors), these form the parameters of the model. LLMs have billions of these parameters.

The model looks at text, one part at a time, and predicts the next word or token in the sequence.

It adjusts its parameters (weights and biases) to improve predictions during each training iteration, using feedback to learn better patterns.

Once trained, LLMs can handle different tasks by adapting in the following ways:

Zero-shot Learning
The model performs tasks it wasn’t specifically trained for, based only on the instructions (prompts) given to it. Accuracy may vary.
Few-shot Learning
Adding a few examples improves its understanding and performance for specific tasks.
Fine-tuning
The model is further trained with more data tailored to a specific task, making it highly accurate for that application.

Environmental Impact of Training LLMs

Training modern LLMs consumes huge amounts of energy and water.

This study found that training just one AI model can emit more than 626,000 pounds of carbon dioxide, equivalent to nearly five times the lifetime emissions of an average American car.

Google reported that 60% of its ML energy use came from inference, and the remaining 40% from training.

Training an AI model can be very expensive in terms of computing power. But it is more or less a one-time expense.

Once a model is properly trained, it ideally does not need to be trained further.

A single query to ChatGPT will cost

So why are these costly AI tools FREE?

Big tech is not doing charity business, they are doing business of power.

How Secure and Private Are LLMs?

The security and privacy of Large Language Models depend on how they are built, used, and managed.

Data Privacy & Security Risks

LLMs are trained on publicly available data like websites and books.

If sensitive data is included in the training data, it might show up in responses.

User inputs during interactions may be logged or analyzed, creating privacy risks if sensitive information is shared.

LLMs can sometimes generate sensitive or proprietary information by mistake.

They can be tricked using malicious inputs (like prompt injections) to behave unexpectedly.

Improving Security and Privacy

Training datasets are filtered to avoid personal or sensitive data.

Running LLMs locally or in private clouds ensures data stays within an organization.

Techniques like adding noise to training data (differential privacy) make it harder to trace back information.

Strict logging policies and anonymizing user inputs enhance privacy.

Compliance with Regulations & User Responsibility

LLM providers must follow privacy laws like GDPR or HIPAA.

Don’t share confidential or sensitive information with public LLMs.

Use platforms that clearly explain their data handling policies.

Post on LinkedIn

Final Thought

Large Language Models will not change your life, it is a technology that you will learn and move on.

It is an assistant to us.

LLMs are not magic, there is a technical part to it.

It is also not the answer to all the problems in your organization.

In some business scenarios, machine learning will work best.

You need to understand when to use LLMs.

LLMs are secure when implemented with the right safeguards, but users and organizations must follow data handling practices.

Transparency, robust security measures, and privacy-conscious deployment are essential for safe and ethical use.

The AI world is moving fast; there is no first-mover advantage; it is the fast-mover advantage.

Learn fast, build fast, win fast, and move fast.

Happy AI

Socials

Be part of 50,000+ like-minded AI professionals across the platform

→ LinkedIn → YouTube→ Twitter → Instagram → Medium

→ Telegram → Discord Server → GitHub & Code Resource

→ WhatsApp Community Group → GitHub

Reply

or to participate.