# 2 Fundamentals of Statistical Analysis question, R required, 20 hours deadline

## Questions:

Question 1 (40 points):

Regression and MLE We are interested in estimating the median home value in New England. For this, we employ a regression from the origin (β1=0)

$\left({\beta }_{1}=0\right)$

as presented below:

Yi=βXi+εi

${Y}_{i}=\beta {X}_{i}+{\epsilon }_{i}$

Where Yi

${Y}_{i}$

is median home value in New England town i

$i$

, and Xi

${X}_{i}$

is a binary variable that equals to 1 if the house is in town i

$i$

and equals to 0 otherwise.

Let Y1,Y2,,Yn

${Y}_{1},{Y}_{2},\dots ,{Y}_{n}$

be independent where

YiεiN(βXi,σ2),N(0,σ2).

$\begin{array}{cc}{Y}_{i}& \sim N\left(\beta {X}_{i},{\sigma }^{2}\right),\\ {\epsilon }_{i}& \sim N\left(0,{\sigma }^{2}\right).\end{array}$

1. (15 points) Find the MLE of β
$\beta$

, β̂ MLE

${\stackrel{^}{\beta }}_{MLE}$

.

2. (15 points) Find the MLE of σ2
${\sigma }^{2}$

, σ̂ 2MLE

${\stackrel{^}{\sigma }}_{MLE}^{2}$

.

3. (10 points) Show that sums of squares of error, SSE, can be written as:

SSE=i=1ny2iβ̂ i=1nxiyi

$SSE=\sum _{i=1}^{n}{y}_{i}^{2}-\stackrel{^}{\beta }\sum _{i=1}^{n}{x}_{i}{y}_{i}$

Question 2 (40 points): Confidence Interval

Let Yi

${Y}_{i}$

still be the median home value in New England town i

$i$

. Let the generated Y

$Y$

below to be the entire population data on median value of NEw England homes, where μ=$329,108 $\mu =329,108$ and σ=$50,000

$\sigma =50,000$

.

set.seed(12)
Y=rnorm(1000, mean=329108, sd=50000)

For steps 1 and 2 to let’s present we do not know μ

$\mu$

.

1. (5 points) Take 100 samples of size 30 (without replacement) from the population of Y
$Y$

’s

2. (10 points) Calculate a 95% confidence interval for μ
$\mu$

for all of the 100 samples.

3. (10 points) How many of these samples include the true mean μ=
$\mu =$

?

4. (15 points) Repeat steps b and c for 90% confidence intervals.

Question 3 (20 points) Regression Estimation

1. (7 points) Using the synthetic data provided below on median home values (Y
$Y$

) and towns in New England (X)

$\left(X\right)$

, estimate the regression from question 1, i.e.,

Yi=βXi+εi

${Y}_{i}=\beta {X}_{i}+{\epsilon }_{i}$

Are the coefficients statistically significant? Do not forget to use factor(X) as opposed to X in your regression!!

housing=read.table("https://unh.box.com/shared/static/twmyqbvx0toxhvdv0n23c55e5cc3ipe4.csv", header = TRUE, sep=",", dec=".")
head(housing)
##          Y X
## 1 426419.3 7
## 2 416306.1 8
## 3 344116.1 9
## 4 453613.3 7
## 5 303323.9 5
## 6 314420.3 6
1. (13 points) Check the residuals of the model. Are the assumptions satisfied? Why? Why not?