Data Structure: The Zillow data
provided is structured in panel form, providing observations about
multiple entities over multiple time periods. An observation in this
dataset is a region of the country on the last day of each month in a
specific year from August 2018 to April 2023 (i.e. Akron, Ohio on August
31, 2018).
Research Question: How do the seasons affect housing prices in the U.S.?
Outcome Variable: The median sales price of houses in the U.S. during the various seasons.
Independent Variables: As the selling price of a house can be affected by a variety of outside factors other than season, we consider the mean days pending, the median price cut, the number of new listings in the month, the median list price, and the Zillow home value index as additional independent variables to account for. The mean days pending reflects the average number of days pending until a house was sold. A longer pending period suggests more negotiations, possibly leading to the deal falling through, affecting the ultimate sale price. The median price cut refers to the median percentage cut taken on the list price of houses in a particular month. We factor in this variable, as taking a price cut will obviously have implications on the selling price of houses. The number of new listings in the month refers to how many houses were put on the market that month, a factor that would increase competition, and, therefore, influence selling price. We also consider the median list price as, again, this variable would have huge implications on the selling price. The Zillow Home Value Index provides a measure of the “typical home value and market changes across a given region and housing type.” Home value is yet another factor that would affect how much a house is sold for, so it is another important variable to consider.
Hypothesis: We hypothesize that the season
with the lowest house prices is the fall. This may be because the
weather is getting cold and the houses will have less curb appeal due to
the changing seasons.
| Fall (N=1410) |
Spring (N=1316) |
Summer (N=1222) |
Winter (N=1410) |
Overall (N=5358) |
|
|---|---|---|---|---|---|
| Median sales price ($) | |||||
| Min | 126000 | 124000 | 125000 | 121000 | 121000 |
| Max | 1360000 | 1470000 | 1450000 | 1370000 | 1470000 |
| Median | 265000 | 273000 | 266000 | 271000 | 268000 |
| N | 1410 | 1316 | 1222 | 1410 | 5358 |
| Mean days pending | |||||
| Min | 11.0 | 8.00 | 9.00 | 14.0 | 8.00 |
| Max | 132 | 131 | 129 | 130 | 132 |
| Median | 35.0 | 35.0 | 29.0 | 45.0 | 36.0 |
| N | 1387 | 1303 | 1206 | 1390 | 5286 |
| Median % cut | |||||
| Min | 0.0149 | 0.0150 | 0.0146 | 0.0140 | 0.0140 |
| Max | 0.0533 | 0.0604 | 0.0564 | 0.0593 | 0.0604 |
| Median | 0.0288 | 0.0296 | 0.0298 | 0.0286 | 0.0292 |
| N | 1410 | 1284 | 1201 | 1403 | 5298 |
| New listings in month | |||||
| Min | 378 | 334 | 332 | 223 | 223 |
| Max | 485000 | 501000 | 518000 | 325000 | 518000 |
| Median | 1630 | 1570 | 1870 | 1140 | 1580 |
| N | 1410 | 1316 | 1222 | 1410 | 5358 |
| Median list price ($) | |||||
| Min | 132000 | 150000 | 158000 | 95000 | 95000 |
| Max | 1360000 | 1440000 | 1430000 | 1350000 | 1440000 |
| Median | 300000 | 302000 | 308000 | 295000 | 300000 |
| N | 1410 | 1314 | 1219 | 1410 | 5353 |
| Zillow home value index | |||||
| Min | 108000 | 109000 | 108000 | 109000 | 108000 |
| Max | 1520000 | 1590000 | 1590000 | 1510000 | 1590000 |
| Median | 272000 | 285000 | 276000 | 279000 | 278000 |
| N | 1408 | 1316 | 1222 | 1410 | 5356 |
Summary Statistics Analysis: The table
reflects the notion that most of the variables are fairly consistent
across all months. Overall, home sellers avoid selling in winter, with a
sizable drop in the number of new pending properties. Sellers’ avoidance
of winter is backed up by the summary statistics as the median percent
sold below list price is higher than all of the other months as is the
median percent sold above list price. Additionally, the mean days
pending is substantially longer - an average of 10 days - as compared to
the other seasons. Across the other seasons, summary data of each of the
variables fluctuates in terms of which season is “better” for selling a
house. General consensus seems to indicate that sellers believe fall and
summer are the best months to sell a house as the median number of
unique listings are the highest in these two seasons.
Correlation Matrix Analysis: Based on our
table, there are a couple of variables that have a strong positive
relationship. Median sales price and Zillow home value, median sales
price and median list price, and median list price and Zillow home value
index are highly correlated to each other. These correlations are
expected since the higher median sales or list price will likely result
in a higher Zillow home value, as these measurements all represent the
value of a home. It is interesting that the mean days pending and median
percentage cut are negatively correlated, since this weakly suggests
that houses with higher price cuts spend less time pending. It is also
interesting that mean days pending is slightly negatively correlated
with median sales price and Zillow home value index, suggesting that a
higher price may be associated with more days pending.
Methodology: In order to identify
outliers in our data, we started by creating boxplots of each of our
identified variables by season. We decided to split our data seasonally
as this is our causal variable of interest and it is necessary to
determine if any of our groupings are particularly skewed and would
therefore bias our results.
Determination: As displayed in our graphs, each of the independent variables we identified have numerous outliers. Looking at the tables listing the number of outliers for each season and variable reveals that the outlier count is fairly consistent across the seasons for each of the variables with the exception of increased outliers for the number of new listings in winter and decreased outliers for the price cut taken in summer. All of the outliers that we identified exceeded the upper bound (as opposed to being beneath the lower bound).
Outlier Totals
| Median Sales Price | Price Cut | List Price | Days Pending | Number of New Listings | ZHVI |
|---|---|---|---|---|---|
| 79 | 24 | 96 | 20 | 150 | 84 |
| 72 | 22 | 100 | 21 | 134 | 75 |
| 79 | 9 | 99 | 24 | 120 | 74 |
| 83 | 15 | 100 | 30 | 134 | 86 |
Analysis: There are a substantial number of
outliers in our data, but because the number of outliers are generally
evenly distributed across the four seasons throughout each of our
explanatory variables, leaving the outliers in our data should not
result in substantial bias. Furthermore, our graphs highlight that the
ranges of outliers across the seasons for each variable are comparable,
providing further evidence that the removal of outliers is not
necessary.
Line Chart Analysis: The chart above highlights the movement of selling price across each of the twelve months of the year. From this visualization, more specific information about the average median selling price of houses can be gleaned. It is interesting to note that the average median sales price plummets from April to May and from July to August. Despite these large declines, spring and summer still lead the way in terms of average median sales price by season as March, April, June, and July report the highest average median selling price data.
Bar Chart Analysis: The bar chart above displays the average median selling price of houses across all years of our data in each season. The visualization provides a quick snapshot of the differences in selling prices of homes by season, highlighting that on average, the selling price of houses in the spring is the highest, followed by summer, winter, and then fall. This is consistent with our hypothesis, as many buyers believe that houses have the most curb appeal in spring, and therefore, will pay more for houses sold during this time period.
Map Analysis: The map above reflects the
fact that the selling price of houses is not just dependent on the
season, but on the location of the house as well. Darker shaded regions
are those where houses sold for higher prices on average, while lighter
shaded regions contain houses that sold for lower prices on average. The
white states are states that we did not have data for. This
visualization is helpful in capturing the effect of outside factors on
selling price, displaying that there are likely many influences other
than just season on selling price. Using the map visualization also
provides useful information for future research projects.
T-test: Fall and Winter
| T-statistic | Degrees of Freedom | P-value | 95% Confidence Interval |
|---|---|---|---|
| -0.0383 | 2817.9 | 0.702 | -15119.43 - 10181.67 |
T-test Analysis: Fall to Winter: The above
statistics display the results from a t-test comparing the means of the
median selling prices for homes in fall and winter. We chose to compare
these two groups because they have the two lowest average median sales
prices, as seen in our bar chart. The p-value is very high, 0.702,
indicating that we fail to reject the null hypothesis - we cannot be
certain that the difference between the means of the median selling
prices of homes in fall and winter is different from zero. This suggests
that seasonal differences in selling price, at least between fall and
winter, are not substantial.
T-test: Spring and Summer
| T-statistic | Degrees of Freedom | P-value | 95% Confidence Interval |
|---|---|---|---|
| 0.554 | 2522.7 | 0.58 | -9847.142 - 17601.488 |
T-test Analysis: Spring to Summer: The
above statistics display the results from a t-test comparing the means
of the median selling prices for homes in spring and summer. We chose to
compare these two groups because they have the two highest average
median sales prices, as seen in our bar chart. As with the fall to
winter comparison, the p-value is high, 0.5797, indicating that we fail
to reject the null hypothesis: we cannot be certain that the difference
between the means of the median selling prices of homes in spring and
summer is different from zero. This p-value is lower than the winter to
fall comparison, suggesting that seasonal differences in selling price
are more substantial across spring and summer, but are still not
statistically significant.
Summary of Findings: Our analysis
indicates that in spring, the average selling price of homes is the
highest as compared to the other seasons. This is important for sellers
to note as they will want to put their homes on the market in spring in
order to capture the highest selling price. Specifically, from our line
chart, March, April, June, and July appear to be the best months to sell
a home in terms of receiving the highest price. We also determined that
the mean days pending is the lowest in the winter and the number of new
listings is also the lowest in the winter, suggesting less competition.
This could suggest to some sellers that they should consider listing
their homes in winter, but they would also need to factor in the
generally lower average selling price. In future research we would like
to consider why selling prices differ seasonally - is our hypothesis
regarding better curb appeal in spring the main driver of higher selling
prices? What are possible other factors driving seasonal price
differences? How come competition appears to have such a small effect on
selling price (by average number of listings, winter has the least
competition, yet selling prices remain low. Spring has the second least
competition, yet selling prices are the highest)? Additionally, we would
like to focus on each month - why were there such huge drop offs from
month to month (March to April and July to August)?