Probabilistic Information Retrieval



 Foundations and Language Models

Probabilistic Information Retrieval

Foundations

Probabilistic Information Retrieval

  • Motivation: Why Probability in IR?
  • Contrast with Boolean & Vector Space Models
  • Key Assumptions in Probabilistic IR
  • Probability Basics
  • Example: Probability of Relevance for a document

Probability Ranking Principle (PRP)

  • Definition of PRP
  • Intuition: Ranking by Probability of Relevance
  • Case 1: 1/0 Loss Scenario
  • Case 2: Incorporating Retrieval Costs
  • Worked Example: PRP Applied to a Query
  • Limitations of PRP

Binary Independence Model (BIM)

  • Assumptions of BIM (Term Independence)
  • Deriving Ranking Function –Step 1
  • Deriving Ranking Function –Step 2
  • Formula for Ranking Score (Log-Odds Weight)
  • Probability Estimates in Theory
  • Probability Estimates in Practice
  • Example Calculation
  • Relevance Feedback in Probabilistic Models
  • Rocchio vs Probabilistic Feedback
  • Strengths and Weaknesses of BIM


Extensions & Modern Models

  • Appraisal of Probabilistic Models
  • Tree-Structured Dependencies
  • Okapi BM25: Motivation and Formula
  • BM25 Example with Query/Document Scoring
  • Bayesian Network (BN) Approaches to IR
  • Summary of Probabilistic IR

Language Models for IR

  • Definition of Language Models in IR
  • Finite Automata and Language Models
  • Types of Language Models
  • Multinomial Distributions over Words
  • Example: Unigram Language Model

Query Likelihood Model

  • Query Likelihood Principle
  • Using Query Likelihood in IR
  • Estimating Query Generation Probability
  • Maximum Likelihood Estimation (MLE)
  • Smoothing Techniques
  • Ponte & Croft’s Experiments –Setup
  • Ponte & Croft’s Experiments –Results
  • Lessons Learned from Query Likelihood

Comparisons & Extensions

  • Language Modelling vs Probabilistic IR
  • Language Modelling vs VSM
  • Strengths of LM Approaches
  • Weaknesses of LM Approaches
  • Extended LM Approaches: Mixture Models
  • Topic Models (e.g., LDA) in IR
  • Neural Language Models in Modern IR (Word Embeddings, Transformers)
  • Example: BERT in Document Ranking

Wrap-Up

  • Summary: Probabilistic vs Language Models
  • Implications for Web Search

Probabilistic Information Retrieval

Motivation: Why Probability in IR?

Contrast with Boolean & Vector Space Models

Key Assumptions in Probabilistic IR

Probability Basics

Example: Probability of Relevance for a document

Probability Ranking Principle (PRP)

Definition of PRP

  • Core Principle: If a retrieval system’s goal is to maximize the overall utility (or minimize the expected loss) of the retrieved set, then the documents should be ranked in order of decreasing probability of relevance.
  • Formal Statement: For a document d, rank d1 above  d2 if and only if: P(R=1 |d1) > P(R=1 |d2)
  • History:
  • The Probability Ranking Principle (PRP) was introduced by Stephen E. Robertson in 1977 as a foundational concept in information retrieval (IR), asserting that documents should be ranked by their probability of relevance to a query.

Intuition: Ranking by Probability of Relevance

  • Simple Logic
    • If you have to choose between Document A (80% chance of being helpful) and Document B (50% chance of being helpful), you should always choose A first.
  • Optimal Performance
    • A ranking based on the Probability Ranking Principle is mathematically proven to be the optimalranking for a given set of relevance judgments and probability estimates.
    • Why It’s Optimal
      • It minimizes the chance of showing irrelevant documents first—meaning users will see the most relevant documents first.
      • It maximizes precision at every rank position.
      • It aligns perfectly with how users judge search quality —they want the most useful results first.
  • Focus 
    • The complexity lies not in the ranking rule, but in accurately estimating the probability P(R=1|d)

Case 1: 1/0 Loss Scenario

Case 2: Incorporating Retrieval Costs

Worked Example: PRP Applied to a Query

Limitations of PRP

Binary Independence Model (BIM)

Assumptions of BIM (Term Independence)

Deriving Ranking Function –Step 1

Deriving Ranking Function –Step 2

    Formula for Ranking Score (Log-Odds Weight)

    Probability Estimates in Theory

    Probability Estimates in Practice

    Example Calculation

    Relevance Feedback in Probabilistic Models

    Rocchio vs Probabilistic Feedback

    Strengths and Weaknesses of BIM

    Extensions & Modern Models

    Appraisal of Probabilistic Models

    Tree-Structured Dependencies

    Okapi BM25: Motivation and Formula

    BM25 Example with Query/Document Scoring

    Bayesian Network (BN) Approaches to IR

    Summary of Probabilistic IR

    Language Models for IR

    Definition of Language Models in IR

    Finite Automata and Language Models

    Types of Language Models

    Multinomial Distributions over Words

    Example: Unigram Language Model

      Query Likelihood Model

      Query Likelihood Principle

      Using Query Likelihood in IR

      Estimating Query Generation Probability

      Maximum Likelihood Estimation (MLE)

      Smoothing Techniques

      Ponte & Croft’s Experiments –Setup

      Ponte & Croft’s Experiments –Results

      Lessons Learned from Query Likelihood

      Comparisons & Extensions

      Language Modelling vs Probabilistic IR

      Language Modelling vs VSM

      Strengths of LM Approaches

      Weaknesses of LM Approaches

      Extended LM Approaches: Mixture Models

      Topic Models (e.g., LDA) in IR

      Neural Language Models in Modern IR (Word Embeddings, Transformers)

      Neural Language Models in Modern IR (Word Embeddings, Transformers)

      Example: BERT in Document Ranking

      Wrap-Up

      Summary: Probabilistic vs Language Models

      Implications for Web Search

      Join 900+ subscribers

      Stay in the loop with everything you need to know.