期望

上周的时候,我很久以前提供过技术支持的公司内部的一个客户发过来一组数据,希望能够帮助他们解释其中的异常。这组数据的问题是,模拟产生的样本中有很多重复结果——而这是他们之前没有预料到的。在分析了一下数据产生的过程之后,这个问题被模型到如下的问题。

Problem: What is the expected number n_k of distinct values uniformly and independently selected with replacement from a set of size n, when k values are selected?

这个问题的解决方法如下。
Let N_k be the number of distinct values after k selections. Then we
have

N_0 = 0. (1)

We also have the conditional expected value

E( N_{k+1} | N_k ) = N_k + 1 – N_k / n. (2)

Taking expected values on both sides of (1) and (2), we have

n_0 = 0, (3)

and

n_{k+1} = n_k + 1 – n_k / n = 1 + a * n_k, (4)

where a = 1 – 1 /n.

From (3) and (4), we have

n_{k+1} = 1 + a + a^2 + … + a^k
= (1 – a^{k+1}) / (1 – a).

So
n_k = n * (1 – (1 – 1 / n)^k). (5)

将这个客户所用的n和k值带入公式(5),理论结果和模拟实验结果几乎完全吻合。

Advertisements

2 Comments on “期望”

  1. math711 says:

    这就是我当年遇到过的一道面试题啊….smart Ning!

  2. syncom says:

    看来面试题果然是来源于生活!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s