LLMs Explained: What Are Large Language Models & How Do They Work

Large language models (LLMs) are AI systems trained on enormous amounts of text to predict the next word in a sequence. By doing this billions of times over, they learn grammar, facts and patterns of reasoning — which lets tools like ChatGPT, Gemini and Claude answer questions, write code and hold a conversation. This guide explains what an LLM is, how it actually works under the hood (tokens, parameters, context windows and training), why these models sometimes make things up, and how open and closed models differ — in plain English, for Indian readers.

In this guide
What is a large language model? ·
How LLMs actually work ·
Tokens, parameters & context windows ·
How an LLM is trained (3 stages) ·
Why LLMs hallucinate ·
Open vs closed models ·
RAG, fine-tuning & AI agents ·
Examples & the India picture ·
Limitations & how to use LLMs well ·
FAQ

What is a large language model?

A large language model is a type of artificial intelligence trained to understand and generate human language. The name describes exactly what it is. It is large because it has been trained on a vast slice of the internet — books, websites, code, articles and forums — and because it contains a huge number of internal settings called parameters. It is a language model because, at its core, its only job is to model language: given some text, predict what text is likely to come next.

That single capability turns out to be remarkably powerful. If a system can reliably predict the next word in any sentence, it must have absorbed grammar, factual knowledge, writing style and even rough patterns of reasoning. When you ask ChatGPT to draft an email, summarise a contract or explain GST to a small-business owner, it is using this next-word prediction skill, repeated again and again, to build a complete answer.

LLMs sit inside the broader field of generative AI and are a branch of natural language processing (NLP). They are built using deep learning, which uses artificial neural networks loosely inspired by the human brain. The specific design that made modern LLMs possible is the Transformer, introduced by Google researchers in a 2017 paper titled “Attention Is All You Need.” Almost every well-known LLM today — GPT, Gemini, Claude, Llama and others — is a Transformer.

Key takeaway: An LLM is not a database of answers and it is not “googling” your question. It is a statistical pattern-completer that has read a huge amount of text and learned to continue any text in a plausible, human-like way.

Where the “AI language model” fits in the AI family

It helps to see how the terms nest inside one another. Artificial intelligence is the broadest idea. Inside it sits machine learning; inside that, deep learning; and LLMs are one application of deep learning focused on language.

Where large language models sit within the wider field of artificial intelligence.

How LLMs actually work

Under the hood, an LLM does one thing over and over: it reads the text so far and predicts the most likely next chunk of text. Everything else — answering questions, writing essays, debugging code — is built on top of that loop. Let us walk through it step by step.

Step 1: Your words become numbers (tokenisation)

Computers do not understand letters; they understand numbers. So the first thing an LLM does is break your input into tokens — small pieces of text that are roughly word-fragments. A token can be a whole short word (“the”), part of a longer word (“un” + “believ” + “able”), a space, or a punctuation mark. Each token is then mapped to a list of numbers called an embedding, which captures its meaning in a way the model can do maths on. Words with similar meanings end up with similar embeddings.

Step 2: The Transformer pays “attention”

The tokens flow through many stacked layers of a neural network — the Transformer. Its key trick is a mechanism called attention, which lets the model weigh how much every word should influence every other word. In the sentence “the trophy did not fit in the suitcase because it was too big,” attention helps the model figure out that “it” refers to the trophy, not the suitcase. This ability to track relationships across long stretches of text is what makes modern LLMs feel coherent.

Step 3: Predict the next token, then repeat

At the end, the model outputs a probability for every possible next token — effectively a ranked list of “what word is most likely to come next.” It picks one (with a bit of controlled randomness so answers are not robotic), adds it to the text, and feeds the whole thing back in to predict the next token. It repeats this loop, one token at a time, until the answer is complete. That is why you often see ChatGPT or Gemini “type” their reply word by word — you are literally watching the prediction loop run.

The core LLM loop — text in, tokens, attention, next-token prediction, repeat.

Tokens, parameters & context windows: the three numbers that matter

If you read anything about LLMs, three pieces of jargon come up constantly. Understanding them is enough to follow most AI news and to choose the right tool for a task.

Tokens

As above, a token is a chunk of text the model processes. A rough rule of thumb is that 1 token is about 4 characters of English, so roughly 100 tokens ≈ 75 words. Tokens matter for two practical reasons: paid APIs usually charge per token, and every model has a limit on how many tokens it can handle at once. Indian languages written in their own scripts (Hindi, Tamil, Bengali) often use more tokens per word than English, which can make them slightly costlier to process.

Parameters

A parameter is an internal setting — essentially a number — that the model adjusts during training. These are the “knobs” that store what the model has learned. Early models had millions of parameters; modern frontier LLMs have tens or hundreds of billions. More parameters generally mean more capacity to learn, but also higher training and running costs. This is why the industry now also builds small language models (SLMs) — compact models that run cheaply on a laptop or phone for narrower tasks.

Context window

The context window is the maximum amount of text (measured in tokens) a model can “see” at one time — your prompt plus its own answer plus any documents you paste in. If a conversation runs longer than the context window, the model starts to “forget” the earliest parts. Modern models have steadily expanded this, from a few thousand tokens to hundreds of thousands or more, which is what allows you to drop an entire PDF or codebase into a single prompt.

Quick analogy: Think of parameters as everything the model has learned and remembers permanently, and the context window as its short-term working memory for the conversation in front of it. Tokens are simply the unit used to measure both.

Concept	What it is	Why it matters to you
Token	A small chunk of text (~4 characters / ~¾ of a word)	Determines cost on paid APIs and counts toward the context limit
Parameter	An internal number the model tunes while training	More parameters = more capacity, but more cost; SLMs trade size for speed
Context window	Max tokens the model can read at once (its working memory)	Decides how much you can paste in before it starts forgetting
Pre-training	Learning language from a huge text corpus	Builds the model’s general knowledge and writing ability
Fine-tuning	Extra training on a narrower dataset	Makes a model specialise (e.g. legal, medical, customer support)

How an LLM is trained: the three stages

Building a useful, well-behaved LLM happens in three broad phases. Each stage shapes the model in a different way, and skipping any of them produces a worse assistant.

Stage 1: Pre-training

This is where the heavy lifting happens. The model is shown an enormous corpus of text and trained, again and again, to predict the next token. It is not told any “right answers” by humans — it simply learns from the patterns in the text itself (this is called self-supervised learning). Pre-training is extraordinarily compute-intensive: it can take weeks or months on thousands of specialised AI chips (GPUs), which is why only well-funded labs train frontier models from scratch. The output is a “base model” that is knowledgeable but raw — good at completing text, but not yet good at following instructions.

Stage 2: Fine-tuning (instruction tuning)

The base model is then trained further on curated examples of instructions and high-quality responses, so it learns to follow directions rather than just continue text. This is what turns a text-completer into something that answers your questions, writes in the format you asked for, and stays on topic. Organisations can also fine-tune a model on their own domain — for example, an Indian bank fine-tuning a model on its policies for customer support.

Stage 3: RLHF (alignment)

Reinforcement Learning from Human Feedback (RLHF) is the polish. Human reviewers rank different model answers from best to worst, and that feedback is used to train the model to prefer responses people find helpful, honest and harmless. RLHF is a big reason modern assistants feel polite and refuse clearly harmful requests. It is also imperfect — it reflects the judgements of the people doing the rating, which is one source of bias.

The three stages that turn raw text data into a usable AI assistant.

Why do LLMs hallucinate (make things up)?

One of the most important things to understand about LLMs is that they can state false information with complete confidence. This is called a hallucination, and it is not a bug that will be patched away easily — it is a direct consequence of how the technology works.

Remember, an LLM is a next-token predictor. It is optimised to produce text that is plausible, not text that is verified true. It has no built-in fact-checker and no internal sense of “I do not know.” When you ask about something it has not seen enough of, it does not stop — it generates the most statistically likely-sounding answer, which can be a fabricated statistic, a fake citation, a made-up court case, or an invented product feature.

Common triggers for hallucination include:

Knowledge gaps: questions about niche, very recent, or local topics the model saw little of in training.
Specific facts: exact dates, numbers, names, prices and quotes are the easiest things to get subtly wrong.
Leading prompts: if you ask it to describe something that does not exist, it will often oblige rather than correct you.
Outdated training data: a model only knows the world up to its training cut-off, so it can confidently give stale information.

Practical rule: Treat an LLM like a fast, well-read but occasionally over-confident intern. Brilliant for drafting, brainstorming and explaining — but always verify any specific fact, figure, legal point or medical claim before you rely on it.

How the industry reduces hallucinations

Hallucinations cannot be eliminated entirely, but they can be reduced. The main technique is RAG (Retrieval-Augmented Generation), where the model is connected to a trusted source of documents and instructed to answer using only that material — covered in detail below. Other approaches include letting the model use tools (like a calculator or a live web search), asking it to show its reasoning, and fine-tuning it to say “I am not sure” when appropriate.

Open vs closed models: two ways LLMs are released

LLMs come in two broad flavours, and the difference matters a lot for businesses, developers and policymakers in India deciding what to build on.

A closed (proprietary) model is owned by a company that keeps the model’s internal weights private. You access it only through an API or app — you cannot download it or see exactly how it was built. Examples include OpenAI’s GPT family, Google’s Gemini and Anthropic’s Claude. You get cutting-edge quality and zero infrastructure headache, but you depend on the provider and send your data to their servers.

An open-weight (often called “open source”) model has its trained weights released publicly, so anyone can download, run, study and modify it — often free of charge. Meta’s Llama family, Mistral’s models and others popularised this approach, and there is a fast-growing ecosystem of open models. You get control, privacy (you can run it on your own servers) and no per-call fee, but you take on the cost and skill of hosting it yourself. A quick note on terminology: many “open” models release the weights but not the full training data, so purists call them “open-weight” rather than truly open source.

Factor	Closed / proprietary models	Open-weight models
Access	API or app only; weights kept private	Download, run and modify the weights yourself
Examples	GPT, Gemini, Claude	Llama, Mistral, and many community models
Cost model	Pay per token / subscription	Free to use; you pay for the hardware to run it
Data privacy	Your prompts go to the provider	Can run fully on your own / on-prem servers
Customisation	Limited to provided settings	Full fine-tuning and modification possible
Best for	Top quality with minimum setup	Control, privacy, cost at scale, research

Beyond chat: RAG, fine-tuning and AI agents

A raw LLM is just the engine. The most useful real-world AI products wrap that engine in extra techniques so it stays accurate, specialised and able to take action.

RAG (Retrieval-Augmented Generation)

RAG is the single most important technique for building trustworthy business AI. Instead of relying only on what the model memorised in training, you connect it to an external knowledge source — your company’s documents, a product manual, a policy database. When a user asks a question, the system first retrieves the most relevant passages, then hands them to the LLM with an instruction like “answer using only this material.” This keeps answers current, grounded in your own data, and far less prone to hallucination. It is how most internal company chatbots and customer-support assistants are built.

Fine-tuning vs RAG — which to use?

People often confuse the two. Fine-tuning changes the model itself by training it further, and is best for teaching a consistent style, tone or format. RAG leaves the model unchanged but feeds it fresh facts at question time, and is best when your information changes often or must be exact. Many production systems use both.

Need	Use fine-tuning	Use RAG
Teach a specific tone or format	Yes — ideal	Less suited
Answer from up-to-date documents	Poor fit (data goes stale)	Yes — ideal
Keep facts exact and auditable	Risk of memorised errors	Yes — cites sources
Lower hallucination on niche data	Helps a little	Helps a lot

AI agents

The latest leap is the AI agent — an LLM given a goal plus a set of tools it can use on its own, such as searching the web, running code, querying a database or calling other software. Rather than just replying once, an agent plans a series of steps, takes actions, observes the results, and adjusts until the task is done. An agent could, for example, research a topic, draft a report, and email it — chaining several actions together. Agents are powerful but still maturing, and they need careful guardrails because mistakes compound across steps.

LLM examples and the India picture

You are almost certainly using LLMs already, often without realising it — in chat assistants, email auto-complete, search summaries, coding tools and customer-support bots.

Well-known large language models

Model family	Built by	Type	Known for
GPT (ChatGPT)	OpenAI	Closed	Popularising the AI chat assistant
Gemini	Google	Closed	Deep integration with Google products
Claude	Anthropic	Closed	Long documents and a focus on safety
Llama	Meta	Open-weight	Powering the open-model ecosystem
Mistral	Mistral AI	Open-weight	Efficient, smaller high-performance models

(This is an illustrative list of well-known families, not a ranking or an exhaustive directory.)

LLMs and Indian languages

A major theme in India is making LLMs work well across the country’s many languages. Most early models were trained overwhelmingly on English text, so they handled Hindi, Tamil, Telugu, Bengali, Marathi and others less fluently — and, as noted, often used more tokens per word for Indian scripts. India’s IndiaAI Mission, backed by the central government, explicitly aims to support home-grown foundation models and Indian-language datasets, and several Indian startups and research groups are building multilingual models tuned for local languages and contexts. For Indian businesses, this matters: an LLM that genuinely understands regional languages can serve a far larger customer base.

For Indian readers: When evaluating an AI tool for your business, test it in the languages your customers actually use, check whether your data leaves the country (closed API) or stays in-house (self-hosted open model), and prefer RAG over raw chat whenever accuracy on your own documents matters.

Limitations and how to use LLMs well

LLMs are transformative, but they are tools with clear limits. Using them effectively means knowing where they shine and where they stumble.

Key limitations

They can be confidently wrong. Hallucinations are inherent, so always verify facts.
They have a knowledge cut-off. Without web access or RAG, they do not know recent events.
They can reflect bias. Learning from human text and feedback, they can absorb and repeat social biases.
They imitate reasoning rather than truly reason. This usually works but can fail on novel logic or maths without tools.
Privacy and IP matter. Be careful pasting confidential or personal data into public AI tools.

How to get good results

Be specific. Give context, the role you want it to play and the format you want back — this is called prompting.
Give it the source material. Pasting the relevant document (a simple form of RAG) beats relying on memory.
Verify the important bits. Treat figures, citations and legal or medical claims as drafts to check.
Match the model to the task. Use a small, cheap model for simple jobs and a frontier model for hard ones.

Frequently asked questions about large language models

What are some examples of large language models?

Well-known examples include OpenAI’s GPT family (which powers ChatGPT), Google’s Gemini, and Anthropic’s Claude, which are closed/proprietary models accessed via app or API. On the open-weight side, Meta’s Llama and Mistral’s models are widely used. India is also developing home-grown and Indian-language models under the IndiaAI Mission.

What is the difference between an LLM and AI?

Artificial intelligence (AI) is the broad field of making machines perform tasks that need human-like intelligence. A large language model is one specific kind of AI, built using deep learning, that specialises in understanding and generating text. In other words, every LLM is AI, but not all AI is an LLM — AI also includes things like image recognition and recommendation systems.

How is a large language model trained?

In three broad stages. First, pre-training: the model learns language by predicting the next token across a massive text corpus. Second, fine-tuning (instruction tuning): it learns to follow instructions using curated examples. Third, RLHF (Reinforcement Learning from Human Feedback): human reviewers rank answers so the model learns to be more helpful, honest and harmless.

Why do large language models hallucinate?

Because they are designed to predict plausible-sounding text, not to verify truth. An LLM has no built-in fact-checker, so when it lacks solid information it generates the most likely-sounding answer — which can be a made-up fact, statistic or citation. Techniques like RAG (connecting the model to trusted documents) reduce, but do not fully eliminate, hallucinations.

What is the difference between open source and closed large language models?

Closed (proprietary) models such as GPT, Gemini and Claude keep their internal weights private and are used through an API or app. Open-weight models such as Llama and Mistral release their weights publicly so anyone can download, run and modify them, usually for free — though you must supply the hardware to host them. Closed models offer top quality with minimal setup; open models offer control, privacy and lower cost at scale.

What is a small language model (SLM)?

A small language model is a compact LLM with far fewer parameters, designed to run cheaply and fast — sometimes on a laptop or phone — for narrower tasks. SLMs trade some general capability for lower cost, faster responses and better privacy, which makes them attractive for on-device features and specialised business use cases.

What is a context window in an LLM?

The context window is the maximum amount of text, measured in tokens, that a model can consider at once — your prompt, any pasted documents, and the model’s own answer all count toward it. If a conversation or document exceeds the window, the model starts to lose track of the earliest content. Larger context windows let you feed in whole reports or codebases at once.