The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Regression Diagnostics by Period using REPS

Introduction

The calculate_regression_diagnostics() function in REPS provides regression diagnostics by period. It is designed for panel or repeated cross-section data (e.g. property transactions over time) to evaluate the quality of period-specific log-linear regressions.

For each period, it:

These diagnostics help assess model quality over time, identifying periods with issues like non-normality, low fit, heteroscedasticity, or autocorrelation.

Required Data

Your dataset should include:

# Example dataset (you should already have this loaded)
head(data_constraxion)
#>   period   price floor_area dist_trainstation neighbourhood_code
#> 1 2008Q1 1142226  127.41917       2.887992985                  E
#> 2 2008Q1  667664   88.70604       2.903955192                  D
#> 3 2008Q1  636207  107.26257       8.250659447                  B
#> 4 2008Q1  777841  112.65725       0.005760792                  E
#> 5 2008Q1  795527  108.08537       1.842145127                  E
#> 6 2008Q1  539206   97.87751       6.375981360                  D
#>   dummy_large_city
#> 1                0
#> 2                1
#> 3                1
#> 4                0
#> 5                0
#> 6                1

# We log transform the floor_area again (see vignette on calculating price index as why)
dataset <- data_constraxion
dataset$floor_area <- log(dataset$floor_area)

Using calculate_regression_diagnostics()

Example:

diagnostics <- calculate_regression_diagnostics(
  dataset = dataset,
  period_variable = "period",
  dependent_variable = "price",
  numerical_variables = c("floor_area", "dist_trainstation"),
  categorical_variables = c("dummy_large_city", "neighbourhood_code")
)

head(diagnostics)
#>   period norm_pvalue  r_adjust  bp_pvalue autoc_pvalue autoc_dw
#> 1 2008Q1   0.9586930 0.8633499 0.74178260 0.5842200307 2.038772
#> 2 2008Q2   0.8191076 0.8607036 0.81813032 0.9540503936 2.274047
#> 3 2008Q3   0.4560750 0.8825515 0.15220690 0.3246547621 1.924436
#> 4 2008Q4   0.9064669 0.9098143 0.97583499 0.7436197200 2.108734
#> 5 2009Q1   0.4036003 0.8624850 0.04268543 0.4948207614 2.003177
#> 6 2009Q2   0.4644423 0.9002921 0.32760619 0.0007476682 1.487031

Visualizing Diagnostics

For convenient visualization:

plot_regression_diagnostics(diagnostics)

This generates a 3x2 grid of plots:

Example:

Interpreting the Output

The hedonic price index relies on a log-linear regression model, which assumes that certain statistical conditions hold. The diagnostics plot provides an overview of how well these assumptions are met across different periods.

Each subplot corresponds to a specific model assumption:

Row 1: Normality and Linearity

Row 2: Independence

Row 3: Homoscedasticity

Summary

The calculate_regression_diagnostics() and plot_regression_diagnostics() functions in REPS enable:

They support robust, high-quality hedonic price index modeling by systematically checking regression assumptions.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.