首页 | 主题 | 图库 | 问答 | 文摘 | 原创 | 百科

历史 | 地理 | 人物 | 艺术 | 体育 | 科学 | 音乐 | 电影 | 信息技术 | 世界遗产

 开放、中立,源自维基百科

个人工具


用搜狗搜索相关网站  Google Search

正态分布

维库,知识与思想的自由文库

(重定向自常態分配)
跳转到: 导航, 搜索

Image:03wiki-zn-frontpage-icon.gif正态分布正在翻译。欢迎您积极翻译与修订

常態分佈
概率密度函數
Probability density function for the Normal distribtion
綠線代表標準常態分佈
累積分佈函數
Cumulative distribution function for the Normal distribution
顏色與概率密度函數同
參數 μ location (real)
σ2 > 0 squared scale (real)
函数支撑 x \in (-\infty;+\infty)\!
概率密度函數 \frac1{\sigma\sqrt{2\pi}}\; \exp\left(-\frac{\left(x-\mu\right)^2}{2\sigma^2} \right) \!
累積分佈函數 \frac12 \left(1 + \mathrm{erf}\,\frac{x-\mu}{\sigma\sqrt2}\right) \!
期望值 μ
中位數 μ
眾數 μ
方差 σ2
偏度 0
峰度 0
信息熵 \ln\left(\sigma\sqrt{2\,\pi\,e}\right)\!
動差生成函數 M_X(t)= \exp\left(\mu\,t+\sigma^2 \frac{t^2}{2}\right)
特性函数 \phi_X(t)=\exp\left(\mu\,i\,t-\frac{\sigma^2 t^2}{2}\right)


正态分布normal distribution),又名高斯分布Gaussian distribution),是一个在各领域,数学、物理及工程上,皆非常重要的概率分布,在統計學的許多方面有著重大的影響力。若随机变量X服从一个数学期望μ标准方差σ2的高斯分布,记为:

X˜N(μ,σ2),

则其概率密度函数

f(x) = {1 \over \sigma\sqrt{2\pi} }\,e^{(x-\mu )^2 \over 2\sigma^2}}

正态分布的期望值μ决定了其位置,其标准差σ决定了分布的幅度。因其曲线呈钟形,因此人们又经常称之为钟形曲线。我们通常所说的标准正态分布μ = 0,σ = 1的正态分布(见右图中绿色曲线)。

目录

[编辑] 概要

正态分布是自然科学行为科学中的定量现象的一个方便模型。各种各样的 心理学 测试分数和 物理 现象比如 光子 计数都被发现近似地服从正态分布。尽管这些现象的根本原因经常是未知的, 理论上可以证明如果把许多小作用加起来看做一个变量,那么这个变量服从正态分布(在R.N.Bracewell的Fourier transform and its application中可以找到一种简单的证明)。正态分布出现在许多区域 统计: 例如, 采样分布 均值 是近似地正态的, 既使被采样的样本总体并不服从正态分布。另外, 正态分布 信息熵 在所有的已知均值及方差的分布中最大,这使得它作为一种均值以及方差已知的分布的自然选择。正态分布是在统计以及许多统计测试中最广泛应用的一类分布。在 概率论, 正态分布是几种连续以及离散分布的 极限分布

[编辑] 历史

正态分布首先由棣莫佛(Abraham de Moivre)在1734年发表的一篇关于二项分布文章中提出。(第二版The Doctrine of Chances1738年重新印刷)in the context of approximating certain binomial distributions for large n拉普拉斯对棣莫佛的结论作了扩展,发表在Analytical Theory of Probabilities1812年)。现在通常称之为棣莫佛-拉普拉斯定理

拉普拉斯在误差分析试验中使用了正态分布。勒让德1805年引入最小二乘法这一重要方法;而高斯则宣称他早在1794年就使用了该方法,并通过假设误差服从正态分布给出了严格的证明。

“钟形曲线”这个名字可以追溯到Jouffret 他在1872年首次提出这个术语"钟形曲面",用来指代二元正态分布(bivariate normal). 正态分布这个名字还被Charles S. Peirce, Francis Galton, Wilhelm Lexis1875分布独立的使用. 这个术语是不幸的,因为它反应和鼓励了一种谬误,即很多概率分布都是正态的。 (请参考下面的"实例")

这个分布被称为“正态”或者“高斯”正好是Stigler名字由来法则的一个例子,这个法则说“没有科学发现是以它最初的发现者命名的”。

[编辑] 正态分布的定义

有几种不同的方法用来说明一个随机变量。最直观的方法是概率密度函数,这种方法能够表示随机变量每个取值有多大的可能性。累积分布函数是一种概率上更加清楚的方法,但是非专业人士看起来不直观(请看下边的例子)。还有一些其他的等价方法,例如cumulant, 特征函数, 动差生成函数以及cumulant-生成函数. 这些方法中有一些对于理论工作非常有用,但是不过直观。请参考关于概率分布的讨论。

[编辑] 概率密度函数

Probability density function for 4 different parameter sets (green line is the standard normal)
Probability density function for 4 different parameter sets (green line is the standard normal)

正态分布概率密度函数 均值为 μ 方差σ2 (或标准方差 σ) 是高斯函数的一个实例:

f(x;\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \, \exp \left( -\frac{(x- \mu)^2}{2\sigma^2} \right).

(请看 指数函数 以及 π.)

如果一个随机变量X 服从这个分布, 我们写作 X ~ N(μ,σ2). 如果 μ = 0 并且 σ = 1, 这个分布被称为标准正态分布, 这个分布能够简化为

f(x) = \frac{1}{\sqrt{2\pi}} \, \exp\left(-\frac{x^2}{2} \right).

右边是给出了不同参数的正态分布的函数图。

正态分布中一些值得注意的量:

  • The density function is symmetric about its mean value.
  • The mean is also its mode and median.
  • 68.268949% of the area under the curve is within one standard deviation of the mean.
  • 95.449974% of the area is within two standard deviations.
  • 99.730020% of the area is within three standard deviations.
  • 99.993666% of the area is within four standard deviations.
  • The inflection points of the curve occur at one standard deviation away from the mean.

[编辑] 累积分布函数

Cumulative distribution function of the above pdf
Cumulative distribution function of the above pdf

The cumulative distribution function (cdf) is defined as the probability that a variable X has a value less than or equal to x, and it is expressed in terms of the density function as

F(x;\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \int_{-\infty}^x  \exp   \left( -\frac{(u - \mu)^2}{2\sigma^2} \ \right)\,  du.

The standard normal cdf, conventionally denoted Φ, is just the general cdf evaluated with μ = 0 and σ = 1,

\Phi(x) =F(x;0,1)= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x \exp\left(-\frac{u^2}{2}\right) \, du.

The standard normal cdf can be expressed in terms of a special function called the error function, as

\Phi(z) = \frac{1}{2} \left[ 1 + \operatorname{erf} \left( \frac{z}{\sqrt{2}} \right) \right] .

The inverse cumulative distribution function, or quantile function, can be expressed in terms of the inverse error function:

\Phi^{-1}(p) = \sqrt2 \; \operatorname{erf}^{-1} \left(2p - 1 \right) .

This quantile function is sometimes called the probit function. There is no elementary primitive for the probit function. This is not to say merely that none is known, but rather that the non-existence of such a function has been proved.

Values of Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration, Taylor series, or asymptotic series.

[编辑] 生成函数

[编辑] 动差生成函数

The moment generating function is defined as the expected value of exp(tX). For a normal distribution, it can be shown that the moment generating function is

M_X(t)\, = \mathrm{E} \left[  \exp(tX) \right]
  = \int_{-\infty}^{\infty}  \frac   {1}   {\sigma \sqrt{2\pi} }   \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)   \exp (tx) \, dx
  = \exp \left(  \mu t + \frac{\sigma^2 t^2}{2} \right)

as can be seen by completing the square in the exponent.

[编辑] 特征函数

The characteristic function is defined as the expected value of exp(itX), where i is the imaginary unit. For a normal distribution, the characteristic function is

\phi_X(t;\mu,\sigma)\! = \mathrm{E} \left[  \exp(i t X) \right]
  = \int_{-\infty}^{\infty}  \frac{1}{\sigma \sqrt{2\pi}}  \exp  \left(- \frac{(x - \mu)^2}{2\sigma^2}  \right)  \exp(i t x) \, dx
  = \exp \left(  i \mu t - \frac{\sigma^2 t^2}{2} \right) .

The characteristic function is obtained by replacing t with it in the moment-generating function.

[编辑] 性质

这里列出了正态分布的一些性质:


  1. 如果X˜N(μ,σ2)ab实数,那么aX + b˜N(aμ + b,(aσ)2)。(参见expected valuevariance。)
  2. 如果X \sim N(\mu_X, \sigma^2_X)Y \sim N(\mu_Y, \sigma^2_Y)是独立的正态随机变量,那么:
    • 它们的和服从正态分布:U = X + Y \sim N(\mu_X + \mu_Y, \sigma^2_X + \sigma^2_Y)
    • 它们的差服从正态分布:V = X - Y \sim N(\mu_X - \mu_Y, \sigma^2_X + \sigma^2_Y)
    • UV相互独立。
  3. If X \sim N(0, \sigma^2_X) and Y \sim N(0, \sigma^2_Y) are independent normal random variables, then:
  4. If X_1, \cdots, X_n are independent standard normal variables, then X_1^2 + \cdots + X_n^2 has a chi-square distribution with n degrees of freedom.

[编辑] 规范化正态随机变量

[编辑] Standardizing normal random variables

As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.

If X ~ N(μ,σ2), then

Z = \frac{X - \mu}{\sigma} \!

is a standard normal random variable: Z ~ N(0,1). An important consequence is that the cdf of a general normal distribution is therefore

\Pr(X \le x) = \Phi \left(  \frac{x-\mu}{\sigma} \right) = \frac{1}{2} \left(  1 + \operatorname{erf}  \left(   \frac{x-\mu}{\sigma\sqrt{2}}  \right) \right) .

Conversely, if Z ~ N(0,1), then

X = σZ + μ

is a normal random variable with mean μ and variance σ2.

The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution.

[编辑] 动差(英文:moment)

一些正态分布的一阶动差如下:

Number Raw moment Central moment Cumulant
0 1 0
1 μ 0 μ
2 μ2 + σ2 σ2 σ2
3 μ3 + 3μσ2 0 0
4 μ4 + 6μ2σ2 + 3σ4 4 0

正态分布的所有二阶以上的累积量为零.

[编辑] 生成正态随机变量

For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods and the most basic is to invert the standard normal cdf. More efficient methods are also known, one such method being the Box-Muller transform.

The Box-Muller transform takes two uniformly distributed values as input and maps them to two normally distributed values. This requires generating values from a uniform distribution, for which many methods are known. See also random number generators.

The Box-Muller transform is a consequence of the fact that the chi-square distribution with two degrees of freedom (see property 4 above) is an easily-generated exponential random variable.

[编辑] 中央极限定理

Plot of the pdf of a normal distribution with μ = 12 and σ = 3, approximating the pmf of a binomial distribution with n = 48 and p = 1/4
Plot of the pdf of a normal distribution with μ = 12 and σ = 3, approximating the pmf of a binomial distribution with n = 48 and p = 1/4

The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the central limit theorem.

The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions.

  • A binomial distribution with parameters n and p is approximately normal for large n and p not too close to 1 or 0 (some books recommend using this approximation only if np and n(1 − p) are both at least 5; in this case, a continuity correction should be applied).

The approximating normal distribution has mean μ = np and variance σ2 = np(1 − p).

The approximating normal distribution has mean μ = λ and variance σ2 = λ.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

[编辑] 无限可分性

The normal distributions are infinitely divisible probability distributions.

[编辑] 稳定性

The normal distributions are strictly stable probability distributions.

[编辑] 标准偏差

Dark blue is less than one standard deviation from the mean.  For the normal distribution, this accounts for 68% of the set while two standard deviations from the mean (blue and brown) account for 95% and three standard deviations (blue, brown and green) account for 99.7%.
Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for 68% of the set while two standard deviations from the mean (blue and brown) account for 95% and three standard deviations (blue, brown and green) account for 99.7%.

In practice, one often assumes that data are from an approximately normally distributed population. If that assumption is justified, then about 68% of the values are at within 1 standard deviation away from the mean, about 95% of the values are within two standard deviations and about 99.7% lie within 3 standard deviations. This is known as the "68-95-99.7 rule" or the "Empirical Rule".

[编辑] 正态测试

[编辑] 相关分布

[编辑] 参量估计

[编辑] 参量的最大似然法估计

[编辑] 让人惊讶的推广(概念一般化)

多元正态分布(multivariate normal distribution) 的协方差矩阵(covariance matrix )的估计的推导是难于理解的。它需要了解谱原理(spectral theorem) 以及为什么把一个标量(scalar)看做一个1×1 matrix的trace而不仅仅是一个标量更合理的原因。请参考协方差矩阵的估计estimation of covariance matrices.

[编辑] 参量的非偏估计

[编辑] 常见实例

[编辑] 光子计数

[编辑] 计量误差

[编辑] 生物标本的物理特性

[编辑] 金融变量

[编辑] 寿命

[编辑] 测试和智力分布

[编辑] 参见

[编辑] 引用条目

[编辑] 外部连接

其它语言
AD Links