Joint probability distributions are fundamental to understanding how multiple random variables behave together.
Joint Distribution
A joint probability distribution describes the probability of events involving multiple random variables simultaneously. Think of it as extending our understanding from single-variable probability to multi-variable scenarios where we can capture complex relationships and dependencies.
Joint distributions are high-dimensional PDFs (continuous variables) or PMFs (discrete variables).
Notation Convention
To avoid confusion, I’ll use capital letters () for random variables and lowercase letters () for their specific values. For example:
- represents the probability distribution of
- or represents the PMF of event occurs
Mathematical Formulation
For two discrete random variables and , the joint probability mass function (PMF) is:
For continuous variables, we have the joint probability density function (PDF):
As we add more variables, the dimensionality grows naturally:
- 1D: or
- 2D: or
- 3D: or
- nD: or where
Essential Properties
Every joint distribution must satisfy these fundamental properties of probability distributions:
- Non-negativity: for all
- Normalization: (discrete) or (continuous)
These properties ensure that joint distributions are valid probability measures.
Marginal Distributions
From a joint distribution, we can derive marginal distributions for individual variables by “summing out” or “integrating out” the other variables:
Discrete case:
Continuous case:
The marginal distribution tells us about individual variables when we ignore the others.
Example
Let say we have a bakery that tracks join distribution of bread () and coffee () sales . The marginal probability answers “how often do people buy bread?—regardless of whether they also buy coffee or not”, focusing just on bread sales alone.
Conditional Distributions
Conditional probability answers: “Given that has occurred, what’s the probability that also occurs?” It’s like updating our beliefs based on new information.
You might wonder: “what is the different between this and the joint distribution P(X,Y)?“. You can think of conditional probability as focusing on the “world” in which has occurred, and asking: “Within that restricted world, what’s the likelihood that also occurs?”
Essential Properties
Like join probability distribution, conditional distribution also satisfy the same fundamental properties
- Non-negativity: for all and
- Normalization: or in a world that happended
The Chain Rule
The fundamental relationship connecting joint and conditional distributions is the chain rule:
Basically, the chain rule decomposes a joint probability into a sequence of conditional probabilities. Each factor represents the probability of one variable given all the previous variables in the sequence.
For variables :
Multi-Variable Chain Rule
Scenario: Consider three variables:
- : Weather (sunny, rainy)
- : Traffic (light, heavy)
- : Meeting attendance (attend, skip)
Given probabilities:
Question: What’s the probability of attending a meeting on a sunny day with light traffic?
Solution using chain rule:
Interpretation: There’s a 50.4% chance of attending a meeting on a sunny day with light traffic.
Independence
Two random variables and are independent if and only if:
Independence means that knowing the value of one variable doesn’t change our beliefs about the other. The joint probability factors into the product of individual probabilities.
Coin Toss Independence
Scenario: Consider two fair coin tosses:
- : First coin (H = 1, T = 0)
- : Second coin (H = 1, T = 0)
Joint probability table:
0 0 0.25 0 1 0.25 1 0 0.25 1 1 0.25 Test for independence:
Step 1: Calculate marginal probabilities
Step 2: Check independence condition
- ✓
- ✓
- ✓
- ✓
Conclusion: All conditions hold, so and are independent.
Dependent Coin Tosses
Scenario: Consider a modified experiment where the second coin is biased based on the first:
- If first coin is H, second coin has 0.8 probability of H
- If first coin is T, second coin has 0.3 probability of H
Joint probability table:
0 0 0.35 0 1 0.15 1 0 0.10 1 1 0.40 Test for independence:
Step 1: Calculate marginal probabilities
Step 2: Check independence condition
- ✗
Conclusion: Since , and are not independent.
Bayes’ Theorem
Rearranging the chain rule gives us Bayes’ theorem:
Bayes’ rule is a fundamental principle for updating beliefs based on new evidence. It tells us how to revise our initial beliefs when we observe new data. This is extremely important when we want to experiment and observe new data in an unknown world - it provides a principled framework for learning from experience and adapting our understanding as we gather more information. I will try to cover this aspect in future blog posts on Maximum Likelihood Estimation and Maximum A Postiriories
Components of Bayes’ Theorem
- : Posterior probability
- : Likelihood (how likely is given ?)
- : Prior probability (our initial belief about )
- : Evidence (probability of observing )
Medical Test Interpretation
Scenario: A disease affects 1% of the population. A test is 95% accurate (95% of sick people test positive, 95% of healthy people test negative).
Question: If someone tests positive, what’s the probability they actually have the disease?
Solution using Bayes’ rule:
- (prior: 1% of population)
- (likelihood: test accuracy)
Bayes’ rule:
Surprising result: Even with a positive test, there’s only a 16.1% chance of having the disease! This is because the disease is rare (1%) and false positives are common when testing a large healthy population.
The Monty Hall Problem
The Monty Hall problem is one of the most famous probability puzzles that often challenges our intuition. Named after the host of the game show “Let’s Make a Deal,” this problem demonstrates how Bayesian reasoning can help us understand counterintuitive probability results.
Problem Setup: You are a contestant on a game show with three doors. Behind one door is a valuable prize (like a car), and behind the other two doors are less desirable prizes (like goats). The game proceeds as follows:
- You choose one of the three doors (but don’t open it yet)
- The host, who knows what’s behind all doors, opens one of the remaining doors that contains a goat
- The host then offers you a choice: stick with your original door or switch to the other unopened door
Question: Should you stick with your original choice or switch doors?
I highly recommend that you try to work out a decision for this question on your own. I’ll be sharing my explanation in a future blog post about MLE and MAP.
Key Takeaways
Fundamental Concepts
- Joint distributions: high-dimensional PDFs (continuous variables) or PMFs (discrete variables).
- Marginal distributions: can be derived by “summing out” other variables from joint distributions
- Conditional distributions: describe how one variable behaves given knowledge of another
- Independence means variables don’t influence each other:
- Chain rule:
- Bayes’ theorem: