Starting from:

$30

AdvancedStats-Lab 1: ML estimation with PDF Solved

Theoretical analysis
Question 1: maximum likelihood estimator?
For n iid observations xi of the height of the river, the likelihood can be written as following
n
L(a;x1,...,xn) = YfH(xi)
i=1 n
1 Y −21a Pni=1 x2i
= xi e an i=1
The log-likelihood can be derived from the likelihood as follows
l(a;x1,...,xn) = log(L(a;x1,...,xn))
n n
= Xln(xi) − nln(a) − 1 Xx2i
2a
i=1 i=1 n
∂l −n + 1 Xx2i
∂a(a;x1,...,xn) = a 2a2
i=1
Then one can derive the maximum likelihood estimator by setting the partial derivative to 0.
n
1 X 2
aˆn = xi
2n
i=1
(aˆn;x1,...,xn) = 0 ⇐⇒
Question 2: method of moments estimator?
Z +∞
E[X] = xfH(x)dx
0
Z +∞ x2 −x2
= e 2a dx
a
Z e 2a dx rπa

One can estimate the expectation using the arithmetic mean. Hence, the method of moments estimator a¯n is:
n
2 X 2 a¯n = πn 2 xi
i=1
n r
1 X πa¯n xi = ⇐⇒ n 2 i=1
Question 3: properties of aˆn?
a) Unbiased?
n
1 X
E[aˆn] = E
2n
i=1
dx
Z +∞ −x2 = xe 2a dx
0
E[aˆn] = a
aˆn is unbiased.
b) Optimal? Let us derive the variance of the estimator aˆn.
n
1 X 2
V ar[aˆn] = 4n2 V ar[X ]
i=1
= 1 E[X4] − E[X2]2
4n
4 Z +∞ x5 −2xa2
E[X ] = e dx
0 a
Z +∞ −x2 3 2a
e dx
Thus,V ar[aˆn] = 1 4aE[X2] − E[X2]2
4n
We know from question a) that E[X2] = 2E[aˆn] = 2a.
a2
V ar[aˆn] = n
Let us now compute the Fisher information I(a)
∂2 n 1 Xn 2
2 l(a;x1,...,xn) = a2 − a3 xi ∂a
i=1
In ,...,xn)]
E[X2]
a a n
In(a) = a2
1 V ar[aˆn] = 
In(a)
Its variance equals the Cramer–Rao lower bound and it is unbiased. Hence, aˆn minimizes the mean squared error. So aˆn is both optimal.
c) Efficient? Since aˆn is unbiased and optimal. Therefore, aˆn is efficient because its variance is equal to the Cramer-Rao lower bound.
d) Asymptotically Gaussian? The maximum likelihood estimator is asymptotically gaussian. Hence, aˆn is asymptotically gaussian. I(a) = I nn(a) = a12
√ d 2 n(aˆn − a) −−−−→ N 0,a
n→∞
Application on real data
Question 1: p function of a?
Let p the probability that a disaster happens during one year.
p = 1 − FH(6)
Z ∞ x2 −2a
= dx
6
6
p = e−18a
Question 2: probability of at most one disaster?
During one thousand years, if at most one disaster happened, it means either there was no disasters, or there was only one. Let us derive p1, the probability that at most one disaster happens during one thousand years.
p1 = (1 − p)999 = (1 − e−18a )999
Question 3: estimation of the probability?
X = c(2.5, 1.8, 2.9, 0.9, 2.1, 1.7, 2.2, 2.8) n = length(X) a = sum(Xˆ2)/(2*n) p = (1-exp(-18/a))ˆ999
Regarding the set of 8 observations, one can estimate aˆ = 2.42. The probability p1 can be estimated: p1 = 0.557 .
Exercise 1: Rayleigh distribution
(a)
The parameter a of the Rayleigh distribution was estimated with the maximum likelihood estimator aˆn. It was found that a ≈ 2.42.
(b)
One can generate more samples following a Rayleigh distribution by using the Rayleigh distribution function
√ 
implemented in R. One has to be careful that the scale σ used in R corresponds to a.
n = 100000 X = rrayleigh(n, scale=sqrt(a))
hist(X, nclass=50, xlim = c(0,10), main = "Rayleigh implemented in R")
Rayleigh implemented in R

X
If one has only access to uniform distribution and would like to output a Rayleigh distribution, one can use the inverse distribution function.
Z x t −2t2a
F(x) = e dt
0 a
= 1 − e−x2a2

F(x) = u ⇐⇒ x = p−2aln(1 − u)
If U follows a uniform distribution on [0,1], one can generate samples following the Rayleigh distribution using the uniform distribution.

U ∼ U[0,1] =⇒ p−2aln(U) ∼ Rayleigh(a)
n = 100000 U = runif(n) X = sqrt(-2*a*log(U)) hist(X, nclass=50, xlim = c(0,10), main = "Simulated Rayleigh")
Simulated Rayleigh

X
(c)
Empirically, one can verify that the MLE is unbiased by averaging N samples of the MLE aˆn,1,...,aˆn,N with whatever value for n. For computing resources reasons, let’s take n = 10 and average over N = 100000 samples of n observations. N
1 X
E[aˆn − a] ≈ (aˆn,k − a) N
k=1
N n
1
= X( 1 Xx2 − a)
N 2n i,k k=1 i=1
N = 100000 n = 10 E = 0
for (k in 1:N){
X = rrayleigh(n, scale=sqrt(a))
E = E + 1/(2*n)*sum(Xˆ2) - a
}
E = E/N
E
## [1] -0.002336315
E
a . Hence, empirically, the estimator is unbiased.
(d)
Empirically, one can verify the efficiency of the MLE estimator by computing its variance and compare it to the inverse of the Fisher information. One needs an unbiased estimator of the variance, knowing that the mean is a. N
1
V ar[aˆn] ≈ X(aˆn,k − a)2
N
k=1
N n
1
= X( 1 Xx2 − a)2
N 2n i,k k=1 i=1
N = 100000 n = 10
I = n/aˆ2 # Fisher information
V = 0
for (k in 1:N){
X = rrayleigh(n, scale=sqrt(a)) dV = 1/(2*n)*sum(Xˆ2) - a V = V + dVˆ2
}
V = V/N
V
## [1] 0.5887242
1 a2
= ≈ 0.5847
In(a) n
(a)
In(a)
Hence, one can say that V ar[aˆn] = In1(a) and the estimator is efficient empirically.
(e)
√ d 2 n(aˆn − a) −−−−→ N 0,a
n→∞

The asymptotic normality means that for√ n large, √n(aˆn −a) ∼ N 0,a2. Thus one can plot several samples

of the random variable Zn = n(aˆn − a) and check whether the distribution looks Gaussian.
n = 10000 # Size of the observations for each a_n
N = 10000 # Number of samples of Z_n Z_n = rep(0,N) for (k in 1:N){
X = rrayleigh(n, scale=sqrt(a)) a_n = 1/(2*n)*sum(Xˆ2)
Z_n[k] = sqrt(n)*(a_n - a)
} hist(Z_n, breaks=40, main = "Asymptotic normality")
Asymptotic normality

Z_n
The MLE estimator is asymptotically normal. One can verify the standard deviation a = 2.418.

ML estimation with PMF
Statistical modelling and theoretical analysis
Question 1: Belonging to exponential family
We study the random variable X that follows a geometric distribution with parameter q ∈ ]0;1[. We have :
∀k ∈N?,P(X = k) = q(1 − q)k−1
The likelihood function can be written as :
L(x,q) = 1N?(x) × q(1 − q)x−1 = 1N?(x) × q × e(x−1)ln(1−q)
? × q × exln(1−q)
= 1N (x) 1 − q
We can notice that the model is dominated and the distribution domain where L(x,q) > 0 is N? which does not depend on q. Thus the distribution domain is homogeneous. We then define :
h : x 7→ 1N?(x) q
φ : q 7→ 
1 − q
Q : q 7→ ln(1 − q) S : x 7→ x
We can then write the likelihood like :
L(x,q) = h(x)φ(q)exp(Q(q)S(x))
We can conclude that a geometric distribution belongs to the exponential family and S is a sufficient statistic.Since S is linearly independent with itself, we also deduce that the model is identifiable.
Question 2: Computation of the Fisher Information Matrix
We saw in question 1 that the model was dominated and the distribution domain was homogeneous. We can also easily show that L(x,q) is twice differentiable for variable q and integrable, since it is a polynomial function. Thus the model is regular. We note l(x,q) the log-likelihood of the model :
∀x ∈N?,l(x,q) = ln(q(1 − q)x−1) = ln(q) + (x − 1)ln(1 − q)
We can now deduce the score function:
l(x,q)
1 x − 1
= −
q 1 − q
We can note that the score function is an affine transform of X, thus it is square-integrable because X is, so the Fisher Information Matrix is well-defined. We showed previously that the model was regular, thus we have :
I(q) = Eq sq(X)2
∂2 
= −Eq ∂q2 l(X,q)
−1 X − 1 
= −Eq 2 − (1 − q)2
q
1 Eq(X) − 1 1
= + and E(X) = q2 (1 − q)2 q
= +
q2 (1 − q)2
(1 − q)2 q − q2
= +
q2(1 − q)2 q2(1 − q)2
1

q2(1 − q)
=
Question 3: Maximum likelihood estimator
Let X1,...,Xn a n-sample following the same distribution as X. The likelihood of the model is :
n
L(x1,...,xn,q) = Yq(1 − q)xi−1
i=1
q n Pn x
=(1 − q) i=1 i
n
l(x1,...,xn,q) = nln + ln(1 − q)Xxi 1 − q
i=1
We look for qbn that maximizes the likelihood of the n-sample. Thus it satisfies two equations :
,...,xn,qbn) = 0 (1) ,...,xn,qbn) < 0 (2)
From equation (1) we deduce :
n n X
⇐⇒ = xi
qbn i=1
n
1 1 X
⇐⇒ = xi
1

Xn where n
1 X
Xn = Xi
n
i=1
qbn n i=1
The maximum likelihood estimator qbn is
Question 4: Asymptotic behavior of the estimator
Now we are going to study the asymptotic behavior of the estimator. According to the Central limit theorem, we have :
√ 1 − q
−→N (0,V (X)) where V (X) =
q2
√ 1 − q
⇐⇒ n −→N 0,

q q2

Then, we use the delta method. We define : g : x 7→ x1 which is differentiable in 1q. We have:
√ 1−→NL 0,g0 12 1 − q!
q q q2
⇐⇒ n(qbn − q) −→NL 0,(1 − q)q2
√ − q) −→NL 0,I−1(q) n(qn b
⇐⇒
This estimator is asymptotically normal and its asymptotic variance is the Cramer Rao bound, thus the estimator is asymptotically efficient. This was expected because this is a maximum likelihood estimator.
Question 5: Asymoptotic confidence interval
Finally, we build an asymptotic confidence interval for q. On pose :


Then, we know that : Sn , where t is the student law. Since the law is symmetric, we can
write : −tnα/− where tkα is the unique real number that verifies P(t(k) < tkα) = 1 − α.
Finally, we have:
√ 
⇐⇒ − √ t
⇐⇒
Application on real data
Question 1: Estimation of the fraud probability
X = c(44, 9, 11, 59, 81, 19, 89, 10, 24, 07, 21, 90, 38, 01, 15, 22, 29, 19, 37, 26, 219, n = length(X) p = (n)/sum(X)
2, 57, 11, 34
The probability of fraud pfraud can be estimated with the estimator studied above: pfraud = 0.026 . We take 1 − α = 0.95, we can deduce a confidence interval:
t_alpha = 2.021 #quantile for student law for n = 40, closest value available.
Xn=mean(X) Sn=sqrt(mean((X-Xn)**2)*n/(n-1)) a = 1/(Xn+Sn*t_alpha/sqrt(n)) b = 1/(Xn-Sn*t_alpha/sqrt(n))
With a 95% confidence, we know that pfraud is between 0.019 and 0.043 . If we have n0 = 20000 validated tickets, we can estimate the number nfraud of fraudsters. For 1000 users, there are 26 fraudsters and 974
n0
= 534
1 − pfraud
honest users. Thus we havefraudsters.
Exercise 2: Geometric distribution
(a)
The parameter pfraud of the geometric distribution was estimated with the maximum likelihood estimator pˆn.
It was found that pfraud ≈ 0.026 .
(b)
One has only access to uniform distribution and would like to output a geometric distribution. We start by randomly drawing a number q between 0 and 1, according to the uniform distribution. Then, we realize the following segmentation of the interval [0,1]:
+∞ "k−1 k #
[0,1] = [ Xpfraud(1 − pfraud)i−1,Xpfraud(1 − pfraud)i−1
k=1 i=1 i=1
pfraud)k−1,1 − (1 − pfraud)k
k=1
Thus, if we draw q, we look for k that satisfies:
1 − (1 − pfraud)k−1 ≤ q ≤ 1 − (1 − pfraud)k
⇐⇒ −(1 − pfraud)k−1 ≤ q − 1 ≤−(1 − pfraud)k
⇐⇒ (1 − pfraud)k ≤ 1 − q ≤ (1 − pfraud)k−1
⇐⇒ kln(1 − pfraud) ≤ ln(1 − q) ≤ (k − 1)ln(1 − pfraud) ln(1 − q)
⇐⇒ k − 1 ≤ ≤ k
ln(1 − pfraud)
Since the probability to draw an integer is 0, we can choose k = dln(1 ln(1−p−fraudq) )e
If U follows a uniform distribution on [0,1], one can generate samples following the geometric distribution using the uniform distribution.
ln(1 − U)
U ∼ U[0,1] =⇒ d e∼ G(pfraud)
ln(1 − pfraud)
n = 100000 U = runif(n) X = ceiling(log(1-U)/log(1-p))
hist(X, nclass=800, xlim = c(0,200), main = "Simulated Geometric")
Simulated Geometric

X
(c)
We have shown that:
√ n d 2 − q) n(qˆ − q) −→ N 0,q (1

The asymptotic normality means that for n large, √ 2. Thus one can plot several samples
√ n(aˆn −a) ∼ N 0,a

of the random variable Zn = n(qˆn − q) and check whether the distribution looks Gaussian.
n = 10000 # Size of the observations for each q_n
N = 10000 # Number of samples of Z_n Z_n = rep(0,N) for (k in 1:N)
{
U = runif(n)
X = ceiling(log(1-U)/log(1-p)) p_n = 1/mean(X)
Z_n[k] = sqrt(n)*(p_n - p)
} hist(Z_n, breaks=100, main = "Asymptotic normality")
Asymptotic normality

−0.10 −0.05 0.00 0.05 0.10
Z_n
var_emp = mean(Z_n**2) var_theo = pˆ2*(1-p) 
The MLE estimator is asymptotically normal.An estimation of the variance gives σ = 6.72 × 10−4, which is close to the value : p2fraude ∗ (1 − pfraude) = 6.67 × 10−4
(d)
We have shown that a 95% confidence interval is :
1
Xn S tα/n 2 Xn − √ntα/−21
For a 95% confidence interval. On obtain: Xn 1 Snn ≤ q ≤ Xn−2.1021√Snn . We use the 39 values given and
+2.021√
we simulated 5000 times a 39 dataset, we estimate q and compute the % of q in the confidence interval.
n = 39 # Size of the observations for each q_n N = 50000 # Number of samples of Z_n count = 0 q_n = rep(0,N) for (k in 1:N)
{
U = runif(n)
X = ceiling(log(1-U)/log(1-p)) p_n = 1/mean(X) q_n[k] = p_n if (a<p_n & p_n<b)
{
count = count + 1
}
} hist(q_n, breaks=100, main = "Confidence interval")
Confidence interval

q_n
We obtain that 0.986 of the simulations give an estimation of q in the confidence interval.

More products