1. Introduction



Data Structure: The Zillow data provided is structured in panel form, providing observations about multiple entities over multiple time periods. An observation in this dataset is a region of the country on the last day of each month in a specific year from August 2018 to April 2023 (i.e. Akron, Ohio on August 31, 2018).

Research Question: How do the seasons affect housing prices in the U.S.?

Outcome Variable: The median sales price of houses in the U.S. during the various seasons.

Independent Variables: As the selling price of a house can be affected by a variety of outside factors other than season, we consider the mean days pending, the median price cut, the number of new listings in the month, the median list price, and the Zillow home value index as additional independent variables to account for. The mean days pending reflects the average number of days pending until a house was sold. A longer pending period suggests more negotiations, possibly leading to the deal falling through, affecting the ultimate sale price. The median price cut refers to the median percentage cut taken on the list price of houses in a particular month. We factor in this variable, as taking a price cut will obviously have implications on the selling price of houses. The number of new listings in the month refers to how many houses were put on the market that month, a factor that would increase competition, and, therefore, influence selling price. We also consider the median list price as, again, this variable would have huge implications on the selling price. The Zillow Home Value Index provides a measure of the “typical home value and market changes across a given region and housing type.” Home value is yet another factor that would affect how much a house is sold for, so it is another important variable to consider.

Hypothesis: We hypothesize that the season with the lowest house prices is the fall. This may be because the weather is getting cold and the houses will have less curb appeal due to the changing seasons.

2. Summary/Descriptive Statistics



Fall
(N=1410)
Spring
(N=1316)
Summer
(N=1222)
Winter
(N=1410)
Overall
(N=5358)
Median sales price ($)
Min 126000 124000 125000 121000 121000
Max 1360000 1470000 1450000 1370000 1470000
Median 265000 273000 266000 271000 268000
N 1410 1316 1222 1410 5358
Mean days pending
Min 11.0 8.00 9.00 14.0 8.00
Max 132 131 129 130 132
Median 35.0 35.0 29.0 45.0 36.0
N 1387 1303 1206 1390 5286
Median % cut
Min 0.0149 0.0150 0.0146 0.0140 0.0140
Max 0.0533 0.0604 0.0564 0.0593 0.0604
Median 0.0288 0.0296 0.0298 0.0286 0.0292
N 1410 1284 1201 1403 5298
New listings in month
Min 378 334 332 223 223
Max 485000 501000 518000 325000 518000
Median 1630 1570 1870 1140 1580
N 1410 1316 1222 1410 5358
Median list price ($)
Min 132000 150000 158000 95000 95000
Max 1360000 1440000 1430000 1350000 1440000
Median 300000 302000 308000 295000 300000
N 1410 1314 1219 1410 5353
Zillow home value index
Min 108000 109000 108000 109000 108000
Max 1520000 1590000 1590000 1510000 1590000
Median 272000 285000 276000 279000 278000
N 1408 1316 1222 1410 5356



Summary Statistics Analysis: The table reflects the notion that most of the variables are fairly consistent across all months. Overall, home sellers avoid selling in winter, with a sizable drop in the number of new pending properties. Sellers’ avoidance of winter is backed up by the summary statistics as the median percent sold below list price is higher than all of the other months as is the median percent sold above list price. Additionally, the mean days pending is substantially longer - an average of 10 days - as compared to the other seasons. Across the other seasons, summary data of each of the variables fluctuates in terms of which season is “better” for selling a house. General consensus seems to indicate that sellers believe fall and summer are the best months to sell a house as the median number of unique listings are the highest in these two seasons.

3. Correlation Matrices




Correlation Matrix Analysis: Based on our table, there are a couple of variables that have a strong positive relationship. Median sales price and Zillow home value, median sales price and median list price, and median list price and Zillow home value index are highly correlated to each other. These correlations are expected since the higher median sales or list price will likely result in a higher Zillow home value, as these measurements all represent the value of a home. It is interesting that the mean days pending and median percentage cut are negatively correlated, since this weakly suggests that houses with higher price cuts spend less time pending. It is also interesting that mean days pending is slightly negatively correlated with median sales price and Zillow home value index, suggesting that a higher price may be associated with more days pending.

4. Outliers



Methodology: In order to identify outliers in our data, we started by creating boxplots of each of our identified variables by season. We decided to split our data seasonally as this is our causal variable of interest and it is necessary to determine if any of our groupings are particularly skewed and would therefore bias our results.







Determination: As displayed in our graphs, each of the independent variables we identified have numerous outliers. Looking at the tables listing the number of outliers for each season and variable reveals that the outlier count is fairly consistent across the seasons for each of the variables with the exception of increased outliers for the number of new listings in winter and decreased outliers for the price cut taken in summer. All of the outliers that we identified exceeded the upper bound (as opposed to being beneath the lower bound).


Outlier Totals

Median Sales PricePrice CutList PriceDays PendingNumber of New ListingsZHVI
7924962015084
72221002113475
799992412074
83151003013486


Analysis: There are a substantial number of outliers in our data, but because the number of outliers are generally evenly distributed across the four seasons throughout each of our explanatory variables, leaving the outliers in our data should not result in substantial bias. Furthermore, our graphs highlight that the ranges of outliers across the seasons for each variable are comparable, providing further evidence that the removal of outliers is not necessary.

5. Data Visualization




Line Chart Analysis: The chart above highlights the movement of selling price across each of the twelve months of the year. From this visualization, more specific information about the average median selling price of houses can be gleaned. It is interesting to note that the average median sales price plummets from April to May and from July to August. Despite these large declines, spring and summer still lead the way in terms of average median sales price by season as March, April, June, and July report the highest average median selling price data.




Bar Chart Analysis: The bar chart above displays the average median selling price of houses across all years of our data in each season. The visualization provides a quick snapshot of the differences in selling prices of homes by season, highlighting that on average, the selling price of houses in the spring is the highest, followed by summer, winter, and then fall. This is consistent with our hypothesis, as many buyers believe that houses have the most curb appeal in spring, and therefore, will pay more for houses sold during this time period.

Map Analysis: The map above reflects the fact that the selling price of houses is not just dependent on the season, but on the location of the house as well. Darker shaded regions are those where houses sold for higher prices on average, while lighter shaded regions contain houses that sold for lower prices on average. The white states are states that we did not have data for. This visualization is helpful in capturing the effect of outside factors on selling price, displaying that there are likely many influences other than just season on selling price. Using the map visualization also provides useful information for future research projects.

6. T-tests



T-test: Fall and Winter

T-statisticDegrees of FreedomP-value95% Confidence Interval
-0.03832817.90.702-15119.43 - 10181.67


T-test Analysis: Fall to Winter: The above statistics display the results from a t-test comparing the means of the median selling prices for homes in fall and winter. We chose to compare these two groups because they have the two lowest average median sales prices, as seen in our bar chart. The p-value is very high, 0.702, indicating that we fail to reject the null hypothesis - we cannot be certain that the difference between the means of the median selling prices of homes in fall and winter is different from zero. This suggests that seasonal differences in selling price, at least between fall and winter, are not substantial.

T-test: Spring and Summer

T-statisticDegrees of FreedomP-value95% Confidence Interval
0.5542522.70.58-9847.142 - 17601.488


T-test Analysis: Spring to Summer: The above statistics display the results from a t-test comparing the means of the median selling prices for homes in spring and summer. We chose to compare these two groups because they have the two highest average median sales prices, as seen in our bar chart. As with the fall to winter comparison, the p-value is high, 0.5797, indicating that we fail to reject the null hypothesis: we cannot be certain that the difference between the means of the median selling prices of homes in spring and summer is different from zero. This p-value is lower than the winter to fall comparison, suggesting that seasonal differences in selling price are more substantial across spring and summer, but are still not statistically significant.

7. Conclusion



Summary of Findings: Our analysis indicates that in spring, the average selling price of homes is the highest as compared to the other seasons. This is important for sellers to note as they will want to put their homes on the market in spring in order to capture the highest selling price. Specifically, from our line chart, March, April, June, and July appear to be the best months to sell a home in terms of receiving the highest price. We also determined that the mean days pending is the lowest in the winter and the number of new listings is also the lowest in the winter, suggesting less competition. This could suggest to some sellers that they should consider listing their homes in winter, but they would also need to factor in the generally lower average selling price. In future research we would like to consider why selling prices differ seasonally - is our hypothesis regarding better curb appeal in spring the main driver of higher selling prices? What are possible other factors driving seasonal price differences? How come competition appears to have such a small effect on selling price (by average number of listings, winter has the least competition, yet selling prices remain low. Spring has the second least competition, yet selling prices are the highest)? Additionally, we would like to focus on each month - why were there such huge drop offs from month to month (March to April and July to August)?