Starting from:

$30

CSE544 Assignment 6-Bayesian Inference and Regression Solved

I/We understand and agree to the following: 

(a)   Academic dishonesty will result in an ‘F’ grade and referral to the Academic Judiciary. 

(b)  Late submission, beyond the ‘due’ date/time, will result in a score of 0 on this assignment. 

(write down the name of all collaborating students on the line below) 

  

1. Posterior for Normal                                                                                            (Total 10 points)
Let 𝑋 , 𝑋 , … , 𝑋 be distributed as Normal(θ, σ2), where σ is assumed to be known. You are also given that the prior for θ is Normal(a, b2).

(a)    Show that the posterior of θ is Normal(x, y2), such that:         

        𝑥         𝑎𝑛𝑑 𝑦          ; where 𝑋        ∑       𝑋 and 𝑠𝑒         𝜎    𝑛.

(Hint: less messier if you ignore the constants, but please justify why you can ignore them)

(b)    Compute the (1‐α) posterior interval for θ. 

        Bayesian Inference in action                                                                             
You will need the q2_sigma3.dat and q2_sigma100.dat files for this question; these files are on the class website. Each file contains 5 rows of 100 samples each. Refer back to Q 1 (a); you can use its result even if you have not solved that question. Submit all python code for this question with suitable filenames.

(a)    Assume that σ = 3 (meaning σ2 = 9). Let the prior be the standard Normal (mean 0, variance 1). Read in the 1st row of q2_sigma3.dat and compute the new posterior. Now, assuming this posterior is your new prior, read in the 2nd row of q2_sigma3.dat and compute the new posterior. Repeat till the 5th row. Please provide your steps here and draw a table with your estimates of the mean and variance of the posterior for all 5 steps (table should have 5 rows, 2 columns). Also plot each of the 5 posterior distributions on a single graph and attach this graph. What do you observe?    

(b)    Now assume that σ = 100 and repeat part (a) above but with q2_sigma100.dat. Assume the same prior of a standard Normal. Provide the table and final graph. What do you observe?   

(c)     Based on the comparison of answers of (a) and (b), what can you conclude?   

 

        Regression Analysis                                                                                             
Assume Simple Linear Regression on n sample points (Y1, X1), (Y2, X2), …, (Yn, Xn); that is, Y = β0 + β1 X + εi, where E[εi] = 0.  

(a)    Using the estimates of β derived in class, show that:  𝛽    and  𝛽 𝑌𝛽𝑋, where 𝑋 ∑ 𝑋 /𝑛  and  𝑌 ∑ 𝑌 /𝑛.  (2 points)

(b)    Show that the above estimators, given Xis, are unbiased (Hint: Treat X’s as constants)               
 

        More on Regression and Time series analysis                                                 
US media says that the number of Americans ages 65 and older is projected to nearly double from 52 million in 2018 to 95 million by 2060. However, this is not sufficient to show that the US population is aging. To be precise, we also need to know how the total population will grow. In this question, you are going to use q4.csv (on course website) which contains the US population and the population of 65+ year olds from 1980 to 2019. Report all answers and figures in your submission; you do not need to submit any code.

(a) For US total population and 65+ years olds population, using simple linear regression (population vs. year, include β0 term), plot the original data and the regression fit, and calculate the SSE respectively. Are they all suitable for linear regression? (Which is or which is not?) (4 points)

(b) Using the data from 1980 to 2018, predict the population of 65+ years old in 2060. Show the linear regression equation, prediction result, and SSE. Then, do the same thing but using data only from 2008 to 2018. Which prediction should you trust more? Is the media right with the 65+ population

      doubling?                                                                                                                                                       

(c)  If we want to predict the ratio of 65+ population, there are two ways to do this. One is to compute the ratio from 2008 to 2018 and then predict the result in 2019. The other is first predict the total population for 2019 and 65+ years old population for 2019 (can use result in (b)) respectively using data from 2008 to 2018, and then compute the ratio for 2019. Predict the ratio in both ways. Which way is more accurate for 2019? Why? (Hint: You made different assumptions in these two methods.

      Compare the linearity of 65+ population and the ratio.)                                                                

5.  Multiple Linear Regression (MLR)                                                                      (Total 10 points)
The admission chance when students apply for the Masters program depends on many factors. In q5.csv

(uploaded on the course webpage), there are several parameters included, 1) GRE score (out of 340), 2) TOEFL Score (out of 120), 3) University Rating (out of 5), 4) SOP (Statement of Purpose Strength, out of

5), 5) LOR (Letter of Recommendation Strength, out of 5), 6) GPA (Undergraduate GPA, out of 10), 7)

Research (0 or 1). And the corresponding chance of admit ranges from 0 to 1. The dataset has 500 rows (samples) and 8 columns (chance of admit and 7 features). Submit your code as q5.py and show your answers in the pdf.

(a)    Using MLR, find the linear relationship between chance of admit and the 7 features listed above (input are 7 factors while output is chance of admit). Do not include the intercept term 𝛽 . Use the first 400 rows to train, and the last 100 rows to test. Report your linear equation, and the SSE of

        your test set.                                                                                                                                    

(b)    Now we use less features, only TOEFL, SOP and LOR. Do not include the intercept term 𝛽 . Do the same thing as (a). Report your linear equation, and the SSE of your test set. 

(c)     Now we only use GRE and GPA. Do the same thing as (a). Do not include the intercept term 𝛽 .

Report your linear equation, and the SSE of your test set.       (3 points) (d) What are your observations based on the SSEs obtained for (a), (b), and (c)?     (1 point)


6. Bayesian hypothesis testing                                                                               
You are tired of studying probs and stats and have finally decided to give up your current life and turn to your one true passion – farming. Lucky for you, there is lot of farmland on Long Island, and you have your heart set on a particular farm that is available for purchase. However, you do not know whether the soil in the farm is good or not. Say the soil in the farm is a discrete random variable 𝐻 and it can only take values in the set 0, 1 , where 0 represent good soil and 1 represents bad soil. We transform this as a hypothesis test as follows: 𝑯𝟎: 𝐻     0 and 𝑯𝟏: 𝐻        1. Let the prior probability 𝑃 𝑯𝟎 𝑃 𝐻       0 𝑝 and 𝑃 𝑯𝟏 𝑃 𝐻              1             1             𝑝. The water content in the soil depends upon the type of soil. If we assume water content to be a RV 𝑊, then 𝑓      𝑤|𝐻       0             𝑁 𝑤;      𝜇, 𝜎         and 𝑓    𝑤|𝐻       1

𝑁 𝑤; 𝜇, 𝜎              . To test which of the two hypotheses is correct, you take 𝑛 samples of the soil from different patches of the farm and measure the water content metric of each sample; the resulting data sample set is 𝒘       𝑤 , 𝑤 , 𝑤 … , 𝑤            . Assume that the samples are conditionally independent given the hypothesis/soil type.  

(a) If we denote the hypothesis chosen as a RV 𝐶 where 𝐶 ∈          0, 1 , then according to MAP (Maximum a

0            𝑖𝑓 𝑃 𝐻 0|𝒘 𝑃 𝐻 1|𝒘  posteriori), we have 𝐶 . This implies that the hypothesis H=0 is

1            𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

chosen (referring to C=0) when P(H=0|w) ≥ P(H=1|w). Derive a condition for choosing the hypothesis that soil in the farm is of type is 0, in terms of 𝑝, 𝜇 𝑎𝑛𝑑 𝜎. (4 points)

(b) Write a python function MAP_descision() in a script named Q6_b.py, where your function takes as input (i) the list of observations 𝒘, and (ii) the prior probability of 𝑯𝟎, and returns the chosen hypothesis (value of C) according to the MAP criterion. Report the result for the 10 different instances of observations from the q6.csv dataset and for each prior probability p = [0.1, 0.3, 0.5, 0.8] for the value of (𝜇, 𝜎 ) = (0.5, 1.0). Each column is one set of observations.               (10 points) Example output format:

                 For 𝑃 𝐻             0.1, the hypotheses selected are :: 0 1 0 1 0 0 1 0 0 1  

      For 𝑃 𝐻             0.3, the hypotheses selected are :: 1 1 0 1 1 0 0 0 0 1  

      For 𝑃 𝐻             0.5, the hypotheses selected are :: 1 1 0 1 1 0 0 0 0 1  

      For 𝑃 𝐻             0.8, the hypotheses selected are :: 1 1 0 1 1 0 0 0 0 1 

(c)  Denoting the hypothesis selected as a RV 𝐶 where 𝐶 ∈              0, 1 , the average error probability via the

MAP criterion is given by 𝑨𝑬𝑷 𝑃 𝐶 0|𝐻 1 𝑃 𝐻 1 𝑃 𝐶 1|𝐻 0 𝑃 𝐻 0 . Given the observations 𝒘 𝑤 , 𝑤 , 𝑤 … , 𝑤 , derive 𝑨𝑬𝑷 in terms of 𝜇, 𝜎, Φ 𝑎𝑛𝑑 𝑝. (4 points)

 

 

 

 

 

 

 

 

More products