Joint probability distributions are fundamental to understanding how multiple random variables behave together.

Joint Distribution

A joint probability distribution describes the probability of events involving multiple random variables simultaneously. Think of it as extending our understanding from single-variable probability to multi-variable scenarios where we can capture complex relationships and dependencies.

Joint distributions are high-dimensional PDFs (continuous variables) or PMFs (discrete variables).

Mathematical Formulation

For two discrete random variables and , the joint probability mass function (PMF) is:

For continuous variables, we have the joint probability density function (PDF):

As we add more variables, the dimensionality grows naturally:

  • 1D: or
  • 2D: or
  • 3D: or
  • nD: or where

Essential Properties

Every joint distribution must satisfy these fundamental properties of probability distributions:

  1. Non-negativity: for all
  2. Normalization: (discrete) or (continuous)

These properties ensure that joint distributions are valid probability measures.

Marginal Distributions

From a joint distribution, we can derive marginal distributions for individual variables by “summing out” or “integrating out” the other variables:

Discrete case:

Continuous case:

The marginal distribution tells us about individual variables when we ignore the others.

Conditional Distributions

Conditional probability answers: “Given that has occurred, what’s the probability that also occurs?” It’s like updating our beliefs based on new information.

You might wonder: “what is the different between this and the joint distribution P(X,Y)?“. You can think of conditional probability as focusing on the “world” in which has occurred, and asking: “Within that restricted world, what’s the likelihood that also occurs?”

Essential Properties

Like join probability distribution, conditional distribution also satisfy the same fundamental properties

  1. Non-negativity: for all and
  2. Normalization: or in a world that happended

The Chain Rule

The fundamental relationship connecting joint and conditional distributions is the chain rule:

Basically, the chain rule decomposes a joint probability into a sequence of conditional probabilities. Each factor represents the probability of one variable given all the previous variables in the sequence.

For variables :

Independence

Two random variables and are independent if and only if:

Independence means that knowing the value of one variable doesn’t change our beliefs about the other. The joint probability factors into the product of individual probabilities.

Bayes’ Theorem

Rearranging the chain rule gives us Bayes’ theorem:

Bayes’ rule is a fundamental principle for updating beliefs based on new evidence. It tells us how to revise our initial beliefs when we observe new data. This is extremely important when we want to experiment and observe new data in an unknown world - it provides a principled framework for learning from experience and adapting our understanding as we gather more information. I will try to cover this aspect in future blog posts on Maximum Likelihood Estimation and Maximum A Postiriories

Components of Bayes’ Theorem

  • : Posterior probability
  • : Likelihood (how likely is given ?)
  • : Prior probability (our initial belief about )
  • : Evidence (probability of observing )

Key Takeaways

Fundamental Concepts

  • Joint distributions: high-dimensional PDFs (continuous variables) or PMFs (discrete variables).
  • Marginal distributions: can be derived by “summing out” other variables from joint distributions
  • Conditional distributions: describe how one variable behaves given knowledge of another
  • Independence means variables don’t influence each other:
  • Chain rule:
  • Bayes’ theorem: