首页 | 主题 | 图库 | 问答 | 文摘 | 原创 | 百科

历史 | 地理 | 人物 | 艺术 | 体育 | 科学 | 音乐 | 电影 | 信息技术 | 世界遗产

 开放、中立,源自维基百科

个人工具


用搜狗搜索相关网站  Google Search

贝叶斯定理

维库,知识与思想的自由文库

跳转到: 导航, 搜索


贝叶斯定理Bayes theorem),是概率论中的一个结果,它跟随机变量条件概率以及边缘概率分布有关。在有些关于概率的解说中,贝叶斯定理(贝叶斯更新)能够告知我们如何利用新证据修改已有的看法。

通常,事件A在事件B(发生)的条件下的概率,与事件B在事件A的条件下的概率是不一样的;然而,这两者是有确定的关系,贝叶斯定理就是这种关系的陈述。

作为一个规范的原理,贝叶斯定理对于所有概率的解释是有效的;然而,频率主义者和贝叶斯主义者对于在应用中概率如何被赋值有着不同的看法: 频率主义者根据随机事件发生的频率,或者总体样本里面的个数来赋值概率;贝叶斯主义者要根据未知的命题来赋值概率。一个结果就是,贝叶斯主义者有更多的机会使用贝叶斯定理。本文深度讨论了这些争论。

目录

[编辑] 贝叶斯定理的陈述

贝叶斯定理是关于随机事件A和B的条件概率边缘概率的一則定理。

\Pr(A|B) = \frac{\Pr(B | A)\, \Pr(A)}{\Pr(B)}  \propto L(A | B)\, \Pr(A) \!

其中L(A|B)是在B发生的情况下A发生的可能性。

在贝叶斯定理中,每个名词都有约定俗成的名称:

按這些術語,Bayes定理可表述為:

后驗概率 = (相似度 * 先驗概率)/標准化常量

也就是說,后驗概率与先驗概率和相似度的乘積成正比。

另外,比例Pr(B|A)/Pr(B)也有時被稱作標准相似度(standardised likelihood),Bayes定理可表述為:

后驗概率 = 標准相似度 * 先驗概率

[编辑] 從條件概率推導贝叶斯定理

根據條件概率的定義 . 在事件B发生的条件下事件 A发生的概率是

\Pr(A|B)=\frac{\Pr(A \cap B)}{\Pr(B)}.

同樣地, 在事件A发生的条件下事件 B发生的概率

\Pr(B|A) = \frac{\Pr(A \cap B)}{\Pr(A)}. \!

整理与合并這兩個方程式, 我們可以找到

\Pr(A|B)\, \Pr(B) = \Pr(A \cap B) = \Pr(B|A)\, \Pr(A). \!

这个引理有时称作概率乘法规则.上式兩邊同除以Pr(B), 若Pr(B)是非零的, 我們可以得到贝叶斯 定理:

\Pr(A|B) = \frac{\Pr(B|A)\,\Pr(A)}{\Pr(B)}. \!

[编辑] 二中擇一的形式

贝叶斯定理通常可以再寫成下面的形式:

\Pr(B) = \Pr(A, B) + \Pr(A^C, B) = \Pr(B|A) \Pr(A) + \Pr(B|A^C) \Pr(A^C)\,

其中AC是A的補集(即非A)。故上式亦可寫成:

\Pr(A|B) = \frac{\Pr(B | A)\, \Pr(A)}{\Pr(B|A)\Pr(A) + \Pr(B|A^C)\Pr(A^C)}. , \!

在更一般化的情況,假設{Ai}是事件集合裡的部份集合,對於任意的Ai,贝叶斯定理可用下式表示:

\Pr(A_i|B) = \frac{\Pr(B | A_i)\, \Pr(A_i)}{\sum_j \Pr(B|A_j)\,\Pr(A_j)} , \!

[编辑] 以可能性與相似率表示贝叶斯定理

参见:全機率定理


贝叶斯定理亦可由相似率Λ和可能性O表示:

O(A|B)=O(A) \cdot \Lambda (A|B)

其中

O(A|B)=\frac{\Pr(A|B)}{\Pr(A^C|B)} \!

定義為B發生時,A發生的可能性(odds);

O(A)=\frac{\Pr(A)}{\Pr(A^C)} \!

則是A發生的可能性。相似率(Likelihood ratio)則定義為:

\Lambda (A|B) = \frac{L(A|B)}{L(A^C|B)} = \frac{\Pr(B|A)}{\Pr(B|A^C)} \!

[编辑] 贝叶斯定理與機率密度

贝叶斯定理亦可用於連續機率分佈。由於機率密度函數嚴格上並非機率,由機率密度函數導出贝叶斯定理觀念上較為困難(詳細推導參閱[1])。贝叶斯定理與機率密度的關係是由求極限的方式建立:

f(x|y) = \frac{f(x,y)}{f(y)} = \frac{f(y|x)\,f(x)}{f(y)} \!

全機率定理則有類似的論述:

f(x|y) = \frac{f(y|x)\,f(x)}{\int_{-\infty}^{\infty} f(y|x)\,f(x)\,dx}. \!

如同離散的情況,公式中的每項均有名稱。 f(x, y)是XY的聯合分佈; f(x|y) 是給定Y=y後,X的後驗分佈; f(y|x) = L(x|y)是Y=y後,X的相似度函數(為x的函數); f(x) 和f(y)則是XY的邊際分佈; f(x)則是X的先驗分佈。 為了方便起見,這裡的f在這些專有名詞中代表不同的函數(可以由引數的不同判斷之)。

[编辑] 贝叶斯定理的推廣

對於變數有二個以上的情況,貝式定理亦成立。例如:

\Pr(A|B,C) = \frac{\Pr(A) \, \Pr(B|A) \, \Pr(C|A,B)}{\Pr(B) \, \Pr(C|B)} \!

這個式子可以由套用多次二個變數的貝式定理及條件機率的定義導出:

\Pr(A|B,C) = \frac{\Pr(A,B,C)}{\Pr(B,C)} = \frac{\Pr(A,B,C)}{\Pr(B) \, \Pr(C|B)} =
= \frac{\Pr(C|A,B) \, \Pr(A,B)}{\Pr(B) \, \Pr(C|B)} = \frac{\Pr(A) \, \Pr(B|A) \, \Pr(C|A,B)}{\Pr(B) \, \Pr(C|B)} .

一般化的方法則是利用聯合機率去分解待求的條件機率,並對不加以探討的變數積分(意即對欲探討的變數計算邊緣機率)。取決於不同的分解形式,可以證明某些積分必為1,因此分解形式可被簡化。利用這個性質,贝叶斯定理的計算量可能可以大幅下降。貝氏網路為此方法的一個例子,貝氏網路指定數個變數的聯合機率分佈的分解型式,該機率分佈滿足下述條件:當其他變數的條件機率給定時,該變數的條件機率為一簡單型式。


[编辑] 範例

[编辑] 例一:醫學檢驗中的錯誤陽性反應

Suppose that a test for a particular disease has a very high success rate:

  • if a tested patient has the disease, the test accurately reports this, a 'positive', 99% of the time (or, with probability 0.99), and
  • if a tested patient does not have the disease, the test accurately reports that, a 'negative', 95% of the time (i.e. with probability 0.95).

Suppose also, however, that only 0.1% of the population have that disease (i.e. with probability 0.001). We now have all the information required to use Bayes's theorem to calculate the probability that, given the test was positive, that it is a false positive. This problem is discussed at greater length in Bayesian inference.

Let D be the event that the patient has the disease, and T be the event that the test returns a positive result. Then, using the second alternative form of Bayes's theorem (above), the probability of a true positive is

P(T) = P(T|D)\,P(D) + P(T|D^C)\,P(D^C) \!

P(T) is the probability that a given person tests positive. This depends on the two populations: those with the disease (and correctly test positive 0.99 x 0.001) and those without the disease (and incorrectly test positive 0.05 x 0.999).


P(D|T) = \frac{P(T|D)\,P(D)}{P(T|D)\,P(D) + P(T|D^C)\,P(D^C)} \!
P(D|T) = \frac{0.99\times 0.001}{0.99 \times 0.001 + 0.05\times 0.999}  = 11/566 \approx 0.019, \!

and hence the probability that a positive result is a false positive is about (1 − 0.019) = 0.981.

Despite the apparent high accuracy of the test, the incidence of the disease is so low (one in a thousand) that the vast majority of patients who test positive (98 in a hundred) do not have the disease. It should be noted that this is quite common in screening tests. It is more important to have a very low false negative rate than a high true positive rate.

[编辑] 例二:條件機率

Suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip cookies and 30 plain cookies, while bowl #2 has 20 of each. Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?

Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes's theorem. But first, we can clarify the situation by rephrasing the question to "what’s the probability that Fred picked bowl #1, given that he has a plain cookie?” Thus, to relate to our previous explanation, the event A is that Fred picked bowl #1, and the event B is that Fred picked a plain cookie. To compute Pr(A|B), we first need to know:

  • Pr(A), or the probability that Fred picked bowl #1 regardless of any other information. Since Fred is treating both bowls equally, it is 0.5.
  • Pr(B), or the probability of getting a plain cookie regardless of any information on the bowls. In other words, this is the probability of getting a plain cookie from each of the bowls. It is computed as the sum of the probability of getting a plain cookie from a bowl multiplied by the probability of selecting this bowl. We know from the problem statement that the probability of getting a plain cookie from bowl #1 is 0.75, and the probability of getting one from bowl #2 is 0.5, and since Fred is treating both bowls equally the probability of selecting any one of them is 0.5. Thus, the probability of getting a plain cookie overall is 0.75×0.5 + 0.5×0.5 = 0.625.
  • Pr(B|A), or the probability of getting a plain cookie given that Fred has selected bowl #1. From the problem statement, we know this is 0.75, since 30 out of 40 cookies in bowl #1 are plain.

Given all this information, we can compute the probability of Fred having selected bowl #1 given that he got a plain cookie, as such:

\Pr(A|B) = \frac{\Pr(B | A) \Pr(A)}{\Pr(B)} = \frac{0.75 \times 0.5}{0.625} = 0.6

As we expected, it is more than half.

[编辑] Tables of occurrences and relative frequencies

It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The tables below illustrate the use of this method for the cookies.

Number of cookies in each bowl
by type of cookie
          Relative frequency of cookies in each bowl
by type of cookie
Bowl #1 Bowl #2 Totals
Chocolate Chip
10
20
30
Plain
30
20
50
Total
40
40
80
Bowl #1 Bowl #2 Totals
Chocolate Chip
0.125
0.250
0.375
Plain
0.375
0.250
0.625
Total
0.500
0.500
1.000

The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, or 80 cookies.

[编辑] Example #3: Bayesian inference

Applications of Bayes's theorem often assume the philosophy underlying Bayesian probability that uncertainty and degrees of belief can be measured as probabilities. One such example follows. For additional worked out examples, including simpler examples, please see the article on the examples of Bayesian inference.

We describe the marginal probability distribution of a variable A as the prior probability distribution or simply the prior. The conditional distribution of A given the "data" B is the posterior probability distribution or just the posterior.

Suppose we wish to know about the proportion r of voters in a large population who will vote "yes" in a referendum. Let n be the number of voters in a random sample (chosen with replacement, so that we have statistical independence) and let m be the number of voters in that random sample who will vote "yes". Suppose that we observe n = 10 voters and m = 7 say they will vote yes. From Bayes's theorem we can calculate the probability distribution function for r using

f(r | n=10, m=7) =    \frac {f(m=7 | r, n=10) \, f(r)} {\int_0^1 f(m=7|r, n=10) \, f(r) \, dr}. \!

From this we see that from the prior probability density function f(r) and the likelihood function L(r) = f(m = 7|r, n = 10), we can compute the posterior probability density function f(r|n = 10, m = 7).

The prior probability density function f(r) summarizes what we know about the distribution of r in the absence of any observation. We provisionally assume in this case that the prior distribution of r is uniform over the interval [0, 1]. That is, f(r) = 1. If some additional background information is found, we should modify the prior accordingly. However before we have any observations, all outcomes are equally likely.

Under the assumption of random sampling, choosing voters is just like choosing balls from an urn. The likelihood function L(r) = P(m = 7|r, n = 10,) for such a problem is just the probability of 7 successes in 10 trials for a binomial distribution.

\Pr( m=7 | r, n=10) = {10 \choose 7} \, r^7 \, (1-r)^3.

As with the prior, the likelihood is open to revision -- more complex assumptions will yield more complex likelihood functions. Maintaining the current assumptions, we compute the normalizing factor,

\int_0^1 \Pr( m=7|r, n=10) \, f(r) \, dr = \int_0^1 {10 \choose 7} \, r^7 \, (1-r)^3 \, 1 \, dr = {10 \choose 7} \, \frac{1}{1320} \!

and the posterior distribution for r is then

f(r | n=10, m=7) =   \frac{{10 \choose 7} \, r^7 \, (1-r)^3 \, 1} {{10 \choose 7} \, \frac{1}{1320}} = 1320 \, r^7 \, (1-r)^3

for r between 0 and 1, inclusive.

One may be interested in the probability that more than half the voters will vote "yes". The prior probability that more than half the voters will vote "yes" is 1/2, by the symmetry of the uniform distribution. In comparison, the posterior probability that more than half the voters will vote "yes", i.e., the conditional probability given the outcome of the opinion poll – that seven of the 10 voters questioned will vote "yes" – is

1320\int_{1/2}^1 r^7(1-r)^3\,dr \approx 0.887, \!

which is about an "89% chance".

[编辑] 歷史註記(Historical remarks)

Bayes's theorem is named after the Reverend Thomas Bayes (17021761), who studied how to compute a distribution for the parameter of a binomial distribution (to use modern terminology). His friend, Richard Price, edited and presented the work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances. Pierre-Simon Laplace replicated and extended these results in an essay of 1774, apparently unaware of Bayes' work.

One of Bayes's results (Proposition 5) gives a simple description of conditional probability, and shows that it can be expressed independently of the order in which things occur:

If there be two subsequent events, the probability of the second b/N and the probability of both together P/N, and it being first discovered that the second event has also happened, the probability I am right [i.e., the conditional probability of the first event being true given that the second has also happened] is P/b.

Note that the expression says nothing about the order in which the events occurred; it measures correlation, not causation. His preliminary results, in particular Propositions 3, 4, and 5, imply the result now called Bayes's Theorem (as described above), but it does not appear that Bayes himself emphasized or focused on that result.

Bayes's main result (Proposition 9 in the essay) is the following: assuming a uniform distribution for the prior distribution of the binomial parameter p, the probability that p is between two values a and b is

\frac {\int_a^b {n+m \choose m} p^m (1-p)^n\,dp}  {\int_0^1 {n+m \choose m} p^m (1-p)^n\,dp} \!

where m is the number of observed successes and n the number of observed failures.

What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. So, one can compute probability for an experimental outcome, but also for the parameter which governs it, and the same algebra is used to make inferences of either kind.

Bayes states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball.

[编辑] 参见

[编辑] References

[编辑] Versions of the essay

[编辑] Commentaries

  • G. A. Barnard (1958) "Studies in the History of Probability and Statistics: IX. Thomas Bayes's Essay Towards Solving a Problem in the Doctrine of Chances", Biometrika 45:293–295. (biographical remarks)
  • Daniel Covarrubias. "An Essay Towards Solving a Problem in the Doctrine of Chances". (an outline and exposition of Bayes's essay)
  • Stephen M. Stigler (1982). "Thomas Bayes's Bayesian Inference," Journal of the Royal Statistical Society, Series A, 145:250–258. (Stigler argues for a revised interpretation of the essay; recommended)
  • Isaac Todhunter (1865). A History of the Mathematical Theory of Probability from the time of Pascal to that of Laplace, Macmillan. Reprinted 1949, 1956 by Chelsea and 2001 by Thoemmes.

[编辑] Additional material

其它语言
AD Links