General Form of Distribution of Exponential Family and With Sigma Known
Sharing is caring
In this postal service nosotros introduce Fisher's factorization theorem and the concept of sufficient statistics. We larn how to apply these concepts to construct a general expression for various mutual distributions known as the exponential family unit.
In applied statistics and machine learning we rarely take the fortune of dealing with a known distribution with known parameters. We've already discussed maximum likelihood interpretation as a method to gauge unknown parameters of a known or causeless distribution.
We can raise the level of abstraction, even more, by because families of distributions. The exponential family, which includes distributions such as the Gaussian and the Bernoulli, is particularly useful. To describe it, nosotros beginning need to introduce the factorization theorem and the concept of sufficient statistics.
Sufficient Statistic
In the preceding posts, nosotros've discussed probability and statistics. But until now, nosotros oasis't introduced a formal definition of statistics withal.
A statistic is a role of a random variable.
For example, the mean \tilde\mu of a normally distributed data sample x can be calculated from the data points using the post-obit function.
\tilde\mu = \frac{1}{n} (x_1, x_2, ...,x_n)
We say that the sample mean is a statistic of the random variable X.
A statistic is sufficient if no more information nigh the true distribution parameters tin be inferred from the given sample.
For case, the hateful of the data sample 10 for a normally distributed random variable with known variance is a sufficient statistic. You lot do not need to know all the data that defines the distribution. The sample is enough to estimate the true hateful.
Factorization Theorem
The concept of sufficient statistics is formalized in Fisher's factorization theorem.
Assume you lot take a random variable 10 with distribution parameterized by an unknown parameter θ.
Notation: capital letter X is the random variable, while x is the set of physical realized values that the variable assumes.
Since we don't know θ, we would like to find a fashion to draw the distribution over 10 without θ.
To achieve this goal, nosotros obtain a statistic t(x) from a subsample of x that characterizes θ. If t(10) is a sufficient statistic, it should encapsulate all the necessary knowledge most θ. Appropriately, it should exist possible to describe the distribution over X without θ by factoring it into a component h(ten) that does not depend on θ and 1 component m(t(x)) that can be sufficiently explained by our sample statistic t(x).
f(x|\theta) = h(x)k(t(x))
Exponential Family
Using the sufficient statistic, we tin can construct a general course to describe distributions of the exponential family.
f(ten|\theta) = h(ten)exp(\theta \cdot t(x) -A(\theta))
You summate the dot product between the vector of unknown parameters θ and the vector of sufficient statistics. So you subtract the normalization term A(θ), take the exponent of the whole expression, and multiply information technology past the office h(x) that does not depend on θ.
Note: Mathematically, the normalization term A(θ) ensures that the expression integrates to ane, which is a requirement of probability density functions in full general, or sums to 1 in the case of probability mass functions.
This is very abstruse, so let's have a look at a concrete distribution to show why it is office of the exponential family.
The Gaussian as Part of the Exponential Family unit
The Gaussian has the following probability density office parameterized by its mean, and standard deviation.
f(10|\mu, \sigma^2) = \frac{ane}{\sqrt{2\pi\sigma^2}} exp(\frac{(x-\mu)^2}{2\sigma^two})
Let'due south multiply the term inside the exponential out.
f(10|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^two}} exp(\frac{10^2}{ii\sigma^two} - \frac{2x\mu}{ii\sigma^2} + \frac{\mu^2}{ii\sigma^2})
and bring \sigma^2 out of the denominator and into the exponential term by taking its negative log. I've too algebraically reorganized the terms inside the exponential a bit.
f(x|\mu, \sigma^2) = \frac{i}{\sqrt{2\pi}} exp( \frac{10\mu}{\sigma^two} - \frac{x^two}{two\sigma^two} - \frac{\mu^2}{two\sigma^2} -log(\sigma))
Now, our normal resembles the full general exponential form. We can but pick out the constituent parts.
The parameter-independent part:
h(x) = \frac{one}{\sqrt{ii\pi}}
Our vector of sufficient statistics:
Our parameter vector θ:
\theta = [\frac{\mu}{\sigma^2}, \frac{-1}{2\sigma^2}]^T
The normalization term A(θ):
A(\theta) = \frac{\mu^2}{two\sigma^2}-log(\theta)
The Binomial as Part of the Exponential Family
We've previously introduced the Bernoulli distribution which tin be written every bit a conditional probability of the realized outcome x given the probability of an outcome p.
Let'southward go try to bring this into our feature exponential family unit-course. We start past moving everything into an exponential. To retain equivalency to the original form, we take to apply the natural logarithm which reverses the exponential operation.
p(ten|p) = exp(log(p^x(1-p)^{ane-x}))
Remember, that the logarithm reduces powers to multiplications and multiplications to additions. We can, thus, gradually move parts of our expression out of the log.
p(ten|p) = exp(x \log(p) + (1-x) \ log(one-p)))
p(10|p) = exp(x \log(p) + 1 \ log(1-p) - x \ log(i-p) ))
p(x|p) = exp(x \ \log(\frac{p}{ane-p}) + log(1-p) ))
We've arrived at our characteristic exponential family unit form and we tin pluck out the parameters.
\theta = log(\frac{p}{1-p})
For automobile learning applications, information technology is peculiarly interesting that the relationship betwixt θ and p is invertible considering it allows us to construct the sigmoid function.
p = \frac{one}{1+exp(-\theta)}
This is a commonly used activation function in neural networks.
Wrap Up
We've introduced the exponential family and its constituting parts past learning about Fisher's factorization theorem and the concept of the sufficient statistic.
Understanding maximum likelihood, the full general course of the exponential family unit, and its most of import distributions is a potent building cake in our mathematical foundation for machine learning and data scientific discipline.
This post is function of a series on statistics for motorcar learning and data scientific discipline. To read other posts in this series, go to the index.
Sharing is caring
Source: https://programmathically.com/factorization-theorem-and-the-exponential-family/
0 Response to "General Form of Distribution of Exponential Family and With Sigma Known"
Post a Comment