My Virtual Space

Probabilistic Information Retrieval

Foundations and Language Models

Probabilistic Information Retrieval

Foundations

Probabilistic Information Retrieval

Motivation: Why Probability in IR?
Contrast with Boolean & Vector Space Models
Key Assumptions in Probabilistic IR
Probability Basics
Example: Probability of Relevance for a document

Probability Ranking Principle (PRP)

Definition of PRP
Intuition: Ranking by Probability of Relevance
Case 1: 1/0 Loss Scenario
Case 2: Incorporating Retrieval Costs
Worked Example: PRP Applied to a Query
Limitations of PRP

Binary Independence Model (BIM)

Assumptions of BIM (Term Independence)
Deriving Ranking Function –Step 1
Deriving Ranking Function –Step 2
Formula for Ranking Score (Log-Odds Weight)
Probability Estimates in Theory
Probability Estimates in Practice
Example Calculation
Relevance Feedback in Probabilistic Models
Rocchio vs Probabilistic Feedback
Strengths and Weaknesses of BIM

Extensions & Modern Models

Appraisal of Probabilistic Models
Tree-Structured Dependencies
Okapi BM25: Motivation and Formula
BM25 Example with Query/Document Scoring
Bayesian Network (BN) Approaches to IR
Summary of Probabilistic IR

Language Models for IR

Definition of Language Models in IR
Finite Automata and Language Models
Types of Language Models
Multinomial Distributions over Words
Example: Unigram Language Model

Query Likelihood Model

Query Likelihood Principle
Using Query Likelihood in IR
Estimating Query Generation Probability
Maximum Likelihood Estimation (MLE)
Smoothing Techniques
Ponte & Croft’s Experiments –Setup
Ponte & Croft’s Experiments –Results
Lessons Learned from Query Likelihood

Comparisons & Extensions

Language Modelling vs Probabilistic IR
Language Modelling vs VSM
Strengths of LM Approaches
Weaknesses of LM Approaches
Extended LM Approaches: Mixture Models
Topic Models (e.g., LDA) in IR
Neural Language Models in Modern IR (Word Embeddings, Transformers)
Example: BERT in Document Ranking

Wrap-Up

Summary: Probabilistic vs Language Models
Implications for Web Search

Probabilistic Information Retrieval

Motivation: Why Probability in IR?

Contrast with Boolean & Vector Space Models

Key Assumptions in Probabilistic IR

Probability Basics

Example: Probability of Relevance for a document

Probability Ranking Principle (PRP)

Definition of PRP

Core Principle: If a retrieval system’s goal is to maximize the overall utility (or minimize the expected loss) of the retrieved set, then the documents should be ranked in order of decreasing probability of relevance.
Formal Statement: For a document d, rank d1 above d2 if and only if: P(R=1 |d1) > P(R=1 |d2)
History:
The Probability Ranking Principle (PRP) was introduced by Stephen E. Robertson in 1977 as a foundational concept in information retrieval (IR), asserting that documents should be ranked by their probability of relevance to a query.

Intuition: Ranking by Probability of Relevance

Simple Logic
- If you have to choose between Document A (80% chance of being helpful) and Document B (50% chance of being helpful), you should always choose A first.
Optimal Performance
- A ranking based on the Probability Ranking Principle is mathematically proven to be the optimalranking for a given set of relevance judgments and probability estimates.
- ✅Why It’s Optimal
  - It minimizes the chance of showing irrelevant documents first—meaning users will see the most relevant documents first.
  - It maximizes precision at every rank position.
  - It aligns perfectly with how users judge search quality —they want the most useful results first.
Focus
- The complexity lies not in the ranking rule, but in accurately estimating the probability P(R=1|d)

Case 1: 1/0 Loss Scenario

Case 2: Incorporating Retrieval Costs

Worked Example: PRP Applied to a Query

Limitations of PRP

Binary Independence Model (BIM)

Assumptions of BIM (Term Independence)

Deriving Ranking Function –Step 1

Deriving Ranking Function –Step 2

Formula for Ranking Score (Log-Odds Weight)

Probability Estimates in Theory

Probability Estimates in Practice

Example Calculation

Relevance Feedback in Probabilistic Models

Rocchio vs Probabilistic Feedback

Strengths and Weaknesses of BIM

Extensions & Modern Models

Appraisal of Probabilistic Models

Tree-Structured Dependencies

Okapi BM25: Motivation and Formula

BM25 Example with Query/Document Scoring

Bayesian Network (BN) Approaches to IR

Summary of Probabilistic IR

Language Models for IR

Definition of Language Models in IR

Finite Automata and Language Models

Types of Language Models

Multinomial Distributions over Words

Example: Unigram Language Model

Query Likelihood Model

Query Likelihood Principle

Using Query Likelihood in IR

Estimating Query Generation Probability

Maximum Likelihood Estimation (MLE)

Smoothing Techniques

Ponte & Croft’s Experiments –Setup

Ponte & Croft’s Experiments –Results

Lessons Learned from Query Likelihood

Comparisons & Extensions

Language Modelling vs Probabilistic IR

Language Modelling vs VSM

Strengths of LM Approaches

Weaknesses of LM Approaches

Extended LM Approaches: Mixture Models

Topic Models (e.g., LDA) in IR

Neural Language Models in Modern IR (Word Embeddings, Transformers)

Neural Language Models in Modern IR (Word Embeddings, Transformers)

Example: BERT in Document Ranking

Wrap-Up

Summary: Probabilistic vs Language Models

Implications for Web Search

Watch, Read, Listen

Probabilistic Information Retrieval

Dec 8, 2025

—

by

jeesoo

Join 900+ subscribers

Stay in the loop with everything you need to know.