# 2 Fundamentals of Statistical Analysis question, R required, 20 hours deadline

## Questions:

Question 1 (40 points):

Regression and MLE We are interested in estimating the median home value in New England. For this, we employ a regression from the origin (β1=0)$\left({\beta }_{1}=0\right)$ as presented below:

Yi=βXi+εi${Y}_{i}=\beta {X}_{i}+{\epsilon }_{i}$

Where Yi${Y}_{i}$ is median home value in New England town i$i$, and Xi${X}_{i}$ is a binary variable that equals to 1 if the house is in town i$i$ and equals to 0 otherwise.

Let Y1,Y2,,Yn${Y}_{1},{Y}_{2},\dots ,{Y}_{n}$ be independent where

YiεiN(βXi,σ2),N(0,σ2).$\begin{array}{cc}{Y}_{i}& \sim N\left(\beta {X}_{i},{\sigma }^{2}\right),\\ {\epsilon }_{i}& \sim N\left(0,{\sigma }^{2}\right).\end{array}$

1. (15 points) Find the MLE of β$\beta$, β̂ MLE${\stackrel{^}{\beta }}_{MLE}$.
2. (15 points) Find the MLE of σ2${\sigma }^{2}$, σ̂ 2MLE${\stackrel{^}{\sigma }}_{MLE}^{2}$.
3. (10 points) Show that sums of squares of error, SSE, can be written as:

SSE=i=1ny2iβ̂ i=1nxiyi$SSE=\sum _{i=1}^{n}{y}_{i}^{2}-\stackrel{^}{\beta }\sum _{i=1}^{n}{x}_{i}{y}_{i}$

Question 2 (40 points): Confidence Interval

Let Yi${Y}_{i}$ still be the median home value in New England town i$i$. Let the generated Y$Y$ below to be the entire population data on median value of NEw England homes, where μ=$329,108$\mu =329,108$ and σ=$50,000$\sigma =50,000$.

set.seed(12)
Y=rnorm(1000, mean=329108, sd=50000)

For steps 1 and 2 to let’s present we do not know μ$\mu$.

1. (5 points) Take 100 samples of size 30 (without replacement) from the population of Y$Y$’s
2. (10 points) Calculate a 95% confidence interval for μ$\mu$ for all of the 100 samples.
3. (10 points) How many of these samples include the true mean μ=$\mu =$?
4. (15 points) Repeat steps b and c for 90% confidence intervals.

Question 3 (20 points) Regression Estimation

1. (7 points) Using the synthetic data provided below on median home values (Y$Y$) and towns in New England (X)$\left(X\right)$, estimate the regression from question 1, i.e.,

Yi=βXi+εi${Y}_{i}=\beta {X}_{i}+{\epsilon }_{i}$

Are the coefficients statistically significant? Do not forget to use factor(X) as opposed to X in your regression!!

housing=read.table("https://unh.box.com/shared/static/twmyqbvx0toxhvdv0n23c55e5cc3ipe4.csv", header = TRUE, sep=",", dec=".")
head(housing)
##          Y X
## 1 426419.3 7
## 2 416306.1 8
## 3 344116.1 9
## 4 453613.3 7
## 5 303323.9 5
## 6 314420.3 6
1. (13 points) Check the residuals of the model. Are the assumptions satisfied? Why? Why not?