The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette aims to illustrate how the SynthETIC
package can be used to generate a general insurance claims history with
realistic distributional assumptions consistent with the experience of a
specific (but anonymous) Auto Liability portfolio. The simulator is
composed of 8 modelling steps (or “modules”), each of which will build
on (a selection of) the output from previous steps:
In particular, with this demo we will output
| Description | R Object | 
|---|---|
| N, claim frequency | n_vector= # claims for each accident
period | 
| U, claim occurrence time | occurrence_times[[i]]= claim occurrence
time for all claims that occurred in period i | 
| S, claim size | claim_sizes[[i]]= claim size for all
claims that occurred in period i | 
| V, notification delay | notidel[[i]]= notification delay for all
claims that occurred in period i | 
| W, settlement delay | setldel[[i]]= settlement delay for all
claims that occurred in period i | 
| M, number of partial payments | no_payments[[i]]= number of partial
payments for all claims that occurred in period i | 
| size of partial payments | payment_sizes[[i]][[j]]= $ partial
payments for claim j of occurrence period i | 
| inter-partial delays | payment_delays[[i]][[j]]= inter partial
delays for claim j of occurrence period i | 
| payment times (continuous time) | payment_times[[i]][[j]]= payment times
(in continuous time) for claim j of occurrence period
i | 
| payment times (period) | payment_periods[[i]][[j]]= payment times
(in calendar periods) for claim j of occurrence period
i | 
| actual payments (inflated) | payment_inflated[[i]][[j]]= $ partial
payments (inflated) for claim j of occurrence period
i | 
For a full description of SythETIC’s structure and test
parameters, readers should refer to:
Avanzi, B, Taylor, G, Wang, M, Wong, B (2021).
SynthETIC: An individual insurance claim simulator with
feature control. Insurance: Mathematics and Economics 100,
296–308. https://doi.org/10.1016/j.insmatheco.2021.06.004
The work can also be accessed via arXiv:2008.05693.
To cite this package in publications, please use:
citation("SynthETIC")library(SynthETIC)
set.seed(20200131)We introduce the reference value ref_claim partly as a
measure of the monetary unit and/or overall claims experience. The
default distributional assumptions were set up with a specific (but
anonymous) Auto Liability portfolio in mind. ref_claim then
allows users to easily simulate a synthetic portfolio with similar claim
pattern but in a different currency, for example. We also remark that
users can alternatively choose to interpret ref_claim as a
monetary unit. For example, one can set
ref_claim <- 1000 and think of all amounts in terms of
$1,000. However, in this case the default functions (as listed below)
will not work and users will need to supply their own set of functions
and set the values as multiples of ref_claim rather than
fractions as in the default setting.
We also require the user to input a time_unit (which
should be given as a fraction of year), so that the
default input parameters apply to contexts where the time units are no
longer in quarters. In the default setting we have a
time_unit of 1/4.
The default input parameters will update automatically with the
choice of the two global variables ref_claim and
time_unit, which ensures that the simulator produce
sensible results in contexts other than the default setting. We remark
that both ref_claim and time_unit only affect
the default simulation functions, and users can also choose to set up
their own modelling assumptions for any of the modules to match their
experiences even better. In the latter case, it is the responsibility of
the user to ensure that their input parameters are compatible with their
time units and claims experience. For example, if the time units are
quarters, then claim occurrence rates must be quarterly.
set_parameters(ref_claim = 200000, time_unit = 1/4)
ref_claim <- return_parameters()[1]
time_unit <- return_parameters()[2]The reference value, ref_claim will be used throughout
the simulation process (as listed in the table below).
| Module | Details | 
|---|---|
| 2. Claim Size | At ref_claim = 200000, by default we
simulate claim sizes from S^0.2 ~ Normal (9.5, sd = 3), left
truncated at 30.When the reference value changes, we output the claim sizes scaled by a factor of ref_claim / 200000. | 
| 3. Claim Notification | By default we set the mean notification delay (in
quarters) to be \[min(3, max(1, 2 -
\frac{1}{3} \log(\frac{claim\_size}{0.5~ref\_claim}))\] (which
will be automatically converted to the relevant time_unit)
i.e. the mean notification delay decreases logarithmically with claim
size. It has maximum value 3 and equals 2 for a claim of size exactly at0.5 * ref_claim. | 
| 4. Claim Closure | The default value for the mean settlement delay
involves a term that defines the benchmark for a claim to be considered
“small”: 0.1 * ref_claim. The default mean settlement delay
increases logarithmically with claim size and equals 6 exactly at this
benchmark. Furthermore there was a legislative change, captured in the
default mean function, that impacted the settlement delays of those
“small” claims. | 
| 5. Claim Payment Count | For the default sampling distribution, we need two
claim size benchmarks as we sample from different distributions for
claims of different sizes. In general a small number of partial payments
is required to settle small claims, and additional payments will be
required to settle more extreme claims. It is assumed that claims below 0.0375 * ref_claimcan be settled in 1 or 2 payments,
claims between0.075 * ref_claimin 2 or 3 payments, and
claims beyond0.075 * ref_claimin no less than 4
payments. | 
| 6. Claim Payment Size | We use the same proportion of ref_claimas
in the Claim Closure module, namely0.1 * ref_claim. This benchmark value is used when
simulating the proportion of the last two payments in the defaultsimulate_amt_pmtfunction.The mean proportion of claim paid in the last two payments increases logarithmically with claim size, and equals 75% exactly at this benchmark. | 
| 8. Claim Inflation | Two benchmarks values are required in this section, one
each for the default SI occurrence and SI payment functions. 1) A legislative change, captured by SI occurrence, reduced claim size by up to 40% for the smallest claims and impacted claims up to 0.25 * ref_claimin size.2) The default SI payment is set to be 30% p.a. for the smallest claims and zero for claims exceeding ref_claimin size, and varies linearly for claims between 0
andref_claim. | 
The time_unit chosen will impact the time-related
modules, specifically
Unless otherwise specified, claim_frequency() assumes
the claim frequency follows a Poisson distribution with mean equal to
the product of exposure E associated with period \(i\) and expected claim frequency
freq per unit exposure for that period. The exposure and
expected frequency are allowed to vary across periods, but not within a
period.
Given the claim frequency, claim_occurrence() samples
the occurrence times of each claim from a uniform distribution.
Together, the two functions assume by default that the
arrival of claims follows a Poisson process, with potentially varying
rates across different periods (see Example
1.2).
Alternative sampling processes are discussed in Example 1.3 and 1.4.
years = number of years considered
I = number of claims development periods considered
(which equals the number of years divided by the
time_unit)E[i] = exposure associated with each period
ilambda[i] = expected claim frequency per unit exposure
for period iyears <- 10
I <- years / time_unit
E <- c(rep(12000, I)) # effective annual exposure rates
lambda <- c(rep(0.03, I))# Number of claims occurring for each period i
# shorter equivalent code:
# n_vector <- claim_frequency()
n_vector <- claim_frequency(I = I, E = E, freq = lambda)
n_vector
#>  [1]  90  79 102  78  86  88 116  84  93 104  80  87  86 104  81  84 101  96  96
#> [20]  86 102 103  82  83  80  80  82  87 103  79  79 100  94  99  88 101  91  95
#> [39]  91  84
# Occurrence time of each claim r, for each period i
occurrence_times <- claim_occurrence(frequency_vector = n_vector)
occurrence_times[[1]]
#>  [1] 0.6238351404 0.1206679437 0.2220435985 0.4538308736 0.5910992266
#>  [6] 0.9524491858 0.3660710892 0.1923275446 0.5391526092 0.7398599708
#> [11] 0.9761979643 0.6794459166 0.6491731463 0.0145699105 0.0117662018
#> [16] 0.0002802343 0.1229670814 0.2181776366 0.9188914341 0.3641183279
#> [21] 0.3599445471 0.3228054109 0.7384824581 0.0756409415 0.2406489884
#> [26] 0.0309497463 0.1994408462 0.0391640882 0.1830444403 0.5194172878
#> [31] 0.8934622605 0.2604308173 0.8512500757 0.1738214253 0.4129021554
#> [36] 0.0683904318 0.0944415457 0.5636684340 0.4130775523 0.6496588932
#> [41] 0.2293977202 0.2929870863 0.1346096094 0.3428012058 0.5930486526
#> [46] 0.7660660581 0.7112241383 0.9488298327 0.0046397008 0.7370544358
#> [51] 0.1497760331 0.0386742705 0.1717934967 0.8123882010 0.3574451937
#> [56] 0.7511094357 0.2453237963 0.8360645119 0.7225212962 0.5654766215
#> [61] 0.0858555159 0.2943205256 0.4229451967 0.3454886819 0.6273976711
#> [66] 0.4686531660 0.6168212816 0.2097416152 0.0703774171 0.5280987371
#> [71] 0.2788692161 0.3355113363 0.3388684399 0.2468694879 0.1210995505
#> [76] 0.4063767171 0.1075867382 0.7758433735 0.5431794343 0.9817624143
#> [81] 0.4714252711 0.3129043274 0.8519159236 0.2192278604 0.2754109078
#> [86] 0.9434416124 0.7397910126 0.2484398137 0.5336137633 0.7483879288Note that variables named with _tmp are for illustration
purposes only and not used in the later simulation modules of this
demo.
## input parameters
years_tmp <- 10
I_tmp <- years_tmp / time_unit
# set linearly increasing exposure, ...
E_tmp <- c(rep(12000, I)) + seq(from = 0, by = 100, length = I)
# and constant frequency per unit of exposure
lambda_tmp <- c(rep(0.03, I))
## output
# Number of claims occurring for each period i
n_vector_tmp <- claim_frequency(I = I_tmp, E = E_tmp, freq = lambda_tmp)
n_vector_tmp
#>  [1] 107  97  86 103  87  81  82  83 107  80  81  86  79  98  91  87  93 111  93
#> [20] 104 105 113  89 100 115 104 114 122  90 116 132 100 111 108 135 116 116 109
#> [39] 120 120
# Occurrence time of each claim r, for each period i
occurrence_times_tmp <- claim_occurrence(frequency_vector = n_vector_tmp)
occurrence_times_tmp[[1]]
#>   [1] 0.952972013 0.878173776 0.684479050 0.915558977 0.496780866 0.784500798
#>   [7] 0.446433841 0.102953206 0.612290796 0.680195534 0.556182698 0.045605700
#>  [13] 0.371326311 0.220061345 0.195921842 0.083790625 0.075338539 0.342769926
#>  [19] 0.336097335 0.379061881 0.634857761 0.711008352 0.910231843 0.609100422
#>  [25] 0.645031730 0.859860029 0.786352659 0.286475987 0.189036040 0.595847647
#>  [31] 0.354306386 0.940303840 0.018530716 0.151189459 0.745556375 0.155205039
#>  [37] 0.070178678 0.426025548 0.447296439 0.755066258 0.643531907 0.832750566
#>  [43] 0.613205539 0.397535617 0.870752500 0.220184653 0.226098091 0.466065862
#>  [49] 0.881361386 0.647172325 0.549784031 0.927304841 0.595728125 0.921661125
#>  [55] 0.560342632 0.759705019 0.820286798 0.330417019 0.333587312 0.540555824
#>  [61] 0.054696505 0.558244388 0.807569014 0.628752004 0.042540230 0.176635575
#>  [67] 0.283089697 0.660460350 0.892414873 0.058447282 0.937083544 0.099011265
#>  [73] 0.880388323 0.620242061 0.648976628 0.412398872 0.033443779 0.967655757
#>  [79] 0.605652047 0.309612707 0.583694900 0.387392525 0.403679390 0.763759864
#>  [85] 0.768409867 0.493427805 0.884637634 0.022691348 0.016921406 0.546337125
#>  [91] 0.282798626 0.291636830 0.210914176 0.140094880 0.106370681 0.703040822
#>  [97] 0.011059992 0.910601367 0.117060644 0.783328586 0.491064691 0.005622066
#> [103] 0.828679769 0.214179660 0.241332419 0.079605893 0.341526252Users can choose to specify their own claim frequency distribution
via simfun, which takes both random generation functions
(type = "r", the default) and cumulative distribution
functions (type = "p"). For example, we can use the
negative binomial distribution in base R, or the
zero-truncated Poisson distribution from the actuar
package.
# simulate claim frequencies from negative binomial
# 1. using type-"r" specification (default)
claim_frequency(I = I, simfun = rnbinom, size = 100, mu = 100)
#>  [1]  94 103  73 123 131  73 113 101  95  91 120  84 106 112  88  72  94  88 105
#> [20]  95 115  75  90  85  93  92 123 107 109  92  93 105 116 103 100  84  93 102
#> [39]  81  93
# 2. using type-"p" specification, equivalent to above
claim_frequency(I = I, simfun = pnbinom, type = "p", size = 100, mu = 100)
#>  [1] 121  77  89  91 118 110 121  87  98  96  91 108  85  83  67 109 101  93 110
#> [20]  86 100  94 106  90 102 106  98 104 130 117  95  81  86  97 115 104  95  89
#> [39]  97  64
# simulate claim frequencies from zero-truncated Poisson
claim_frequency(I = I, simfun = actuar::rztpois, lambda = 90)
#>  [1]  80  90  83  74  78  97 105 102  81  85 109  96  93 101  87  88  86  93  76
#> [20]  95  75  74 105  80 103  93 101  78 108  78  90  91 103  98  81 106  80 100
#> [39]  89 100
claim_frequency(I = I, simfun = actuar::pztpois, type = "p", lambda = 90)
#>  [1]  89  89  82  89  83  92  96 106  96 105 104  97  85  99 104  86  88 100  76
#> [20] 114  79  90 100  98  89  99  87  83  68  88  88  73  99 111  75  75  86  89
#> [39]  94  94Similar to Example 1.2, we can modify the frequency parameters to vary across periods:
claim_frequency(I = I, simfun = actuar::rztpois, lambda = time_unit * E_tmp * lambda_tmp)
#>  [1]  98  83  92 114  88 105  97  98  99  82 106  94  92  92  87  97  97 101  96
#> [20] 103 105  91 108 125 120 134 108 128  94  95 114 122 119 116 117 105 115 103
#> [39] 120 121If one wishes to code their own sampling function (either a direct random generating function, or a proper CDF), this can be achieved by:
# sampling from non-homogeneous Poisson process
rnhpp.count <- function(no_periods) {
  rate <- 3000
  intensity <- function(x) {
    # e.g. cyclical Poisson process
    0.03 * (sin(x * pi / 2) / 4 + 1)
  }
  lambda_max <- 0.03 * (1/4 + 1)
  target_num_events <- no_periods * rate * lambda_max
  
  # simulate a homogeneous Poisson process
  N <- stats::rpois(1, target_num_events)              # total number of events
  event_times <- sort(stats::runif(N, 0, no_periods))  # random times of occurrence
  
  # use a thinning step to turn this into a non-homogeneous process
  accept_probs <- intensity(event_times) / lambda_max
  is_accepted <- (stats::runif(N) < accept_probs)
  claim_times <- event_times[is_accepted]
  
  as.numeric(table(cut(claim_times, breaks = 0:no_periods)))
}
n_vector_tmp <- claim_frequency(I = I, simfun = rnhpp.count)
plot(x = 1:I, y = n_vector_tmp, type = "l",
     main = "Claim frequency simulated from a cyclical Poisson process",
     xlab = "Occurrence period", ylab = "# Claims")We note that the claim_occurrence() function for
simulating the claim times conditional on claim frequencies assumes a
uniform distribution, and that this cannot be modified without changing
the module. Indeed, the modular structure of SynthETIC
ensures that one can easily unplug any one module and replace it with a
version modified to his/her own purpose.
For example, if one wishes to replace this uniform distribution
assumption and/or the whole Claim Occurrence
module, they can simply supply their own vector of claim times and
easily convert to the list format consistent with the
SynthETIC framework for smooth integration with the later
modules.
# Equivalent to a Poisson process
event_times_tmp <- sort(stats::runif(n = 4000, 0, I))
accept_probs_tmp <- (sin(event_times_tmp * pi / 2) + 1) / 2
is_accepted_tmp <- (stats::runif(length(event_times_tmp)) < accept_probs_tmp)
claim_times_tmp <- event_times_tmp[is_accepted_tmp]
# Number of claims occurring for each period i
# by counting the number of event times in each interval (i, i + 1)
n_vector_tmp <- as.numeric(table(cut(claim_times_tmp, breaks = 0:I)))
n_vector_tmp
#>  [1] 69 83 20 17 80 93 16 32 91 69 15 20 83 91 22 18 80 80 18 21 84 76 16 23 88
#> [26] 71 17 16 86 82 21 26 90 78 22 24 91 83 20 16
# Occurrence time of each claim r, for each period i
occurrence_times_tmp <- to_SynthETIC(x = claim_times_tmp, 
                                     frequency_vector = n_vector_tmp)
occurrence_times_tmp[[1]]
#>  [1] 0.02530799 0.04686387 0.05357550 0.09490267 0.11894343 0.12501800
#>  [7] 0.13716378 0.15126045 0.15634258 0.15940280 0.16802373 0.18235956
#> [13] 0.18668332 0.21388937 0.22286944 0.23443578 0.25782885 0.26155421
#> [19] 0.26320521 0.28145757 0.29412562 0.30614907 0.31135755 0.31419038
#> [25] 0.31721340 0.36016260 0.36224677 0.38213086 0.40569117 0.41305029
#> [31] 0.42631640 0.46257297 0.47055278 0.50429167 0.52571459 0.55899206
#> [37] 0.56786386 0.58312503 0.60509380 0.63549361 0.65102738 0.65478115
#> [43] 0.69002035 0.71057001 0.76572523 0.76638297 0.77963726 0.78088350
#> [49] 0.78823166 0.79273382 0.79278816 0.87073245 0.87538892 0.88691234
#> [55] 0.89333584 0.90006417 0.90130811 0.90719697 0.91259621 0.91676780
#> [61] 0.92419677 0.92717615 0.93680889 0.94427726 0.95027973 0.95496928
#> [67] 0.96544794 0.99243500 0.99412480By default claim_size() assumes a left truncated power
normal distribution: \(S^{0.2} \sim
\mathcal{N}(\mu = 9.5, \sigma = 3)\), left truncated at 30.
There is no need to specify a sampling distribution if the user
is happy with the default power normal. This example is mainly
to demonstrate how the default function works.
We can specify the CDF to generate claim sizes from. The default distribution function can be coded as follows:
# use a power normal S^0.2 ~ N(9.5, 3), left truncated at 30
# this is the default distribution driving the claim_size() function
S_df <- function(s) {
  # truncate and rescale
  if (s < 30) {
    return(0)
  } else {
    p_trun <- pnorm(s^0.2, 9.5, 3) - pnorm(30^0.2, 9.5, 3)
    p_rescaled <- p_trun/(1 - pnorm(30^0.2, 9.5, 3))
    return(p_rescaled)
  }
}# shorter equivalent: claim_sizes <- claim_size(frequency_vector = n_vector)
claim_sizes <- claim_size(frequency_vector = n_vector, 
                          simfun = S_df, type = "p", range = c(0, 1e24))
claim_sizes[[1]]
#>  [1]   36899.8605  141849.5622   16603.1996   15223.8127  817938.5635
#>  [6]     351.6745   10015.8195  175456.1709  261284.1640   10573.3070
#> [11]    1816.3114  148336.4740   63029.1666   42487.6202 1173690.6195
#> [16]   10115.9730  348303.5182   12709.8373    9907.2378  459456.9972
#> [21]  198300.2470  140546.6355  440010.0635    9049.5595  183783.2434
#> [26]  281371.7621   13484.3109  172535.2796  115910.4392  164630.4482
#> [31]  306454.7755   89385.6339   87880.5636   20344.9518    8222.1841
#> [36]   83930.4455  285393.3539   81616.3755   97873.8282   49740.3601
#> [41]    2975.6479  106689.5794  106692.4096   13111.5667  129786.1892
#> [46]  804276.2609   41682.3680   26625.5493    8153.2405    1299.1344
#> [51]    9281.8516   22930.1019   13522.4890 1097254.4832     860.0697
#> [56]   90844.6450   33528.3215  106920.4338   72328.5316  202366.0786
#> [61]   63360.0478   30876.0966    9173.3091   16974.8611  266508.4047
#> [66]   34008.6725   22024.3869   59790.1005  309377.4778  199897.8116
#> [71]   44762.1634   88697.2476 1330889.9965  341884.3029    5675.1792
#> [76]  978097.6886  423139.0905  196612.4491   20500.5420     574.5562
#> [81]  436317.4379   28970.5921   81768.3397    5542.6770   38275.7790
#> [86]  168907.0167   81956.4967   22462.0561   15488.0224  161646.0382Users can also choose any other individual claim size distribution,
e.g. Weibull from base R or inverse Gaussian from
actuar:
## weibull
# estimate the weibull parameters to achieve the mean and cv matching that of
# the built-in test claim dataset
claim_size_mean <- mean(test_claim_dataset$claim_size)
claim_size_cv <- cv(test_claim_dataset$claim_size)
weibull_shape <- get_Weibull_parameters(target_mean = claim_size_mean, 
                                        target_cv = claim_size_cv)[1]
weibull_scale <- get_Weibull_parameters(target_mean = claim_size_mean, 
                                        target_cv = claim_size_cv)[2]
# simulate claim sizes with the estimated parameters
claim_sizes_weibull <- claim_size(frequency_vector = n_vector,
                                  simfun = rweibull, 
                                  shape = weibull_shape, scale = weibull_scale)
# plot empirical CDF
plot(ecdf(unlist(test_claim_dataset$claim_size)), xlim = c(0, 2000000),
     main = "Empirical distribution of simulated claim sizes",
     xlab = "Individual claim size")
plot(ecdf(unlist(claim_sizes_weibull)), add = TRUE, col = 2)
## inverse Gaussian
# modify actuar::rinvgauss (left truncate it @30 and right censor it @5,000,000)
rinvgauss_censored <- function(n) {
  s <- actuar::rinvgauss(n, mean = 180000, dispersion = 0.5e-5)
  while (any(s < 30 | s > 5000000)) {
    for (j in which(s < 30 | s > 5000000)) {
      # for rejected values, resample
      s[j] <- actuar::rinvgauss(1, mean = 180000, dispersion = 0.5e-5)
    }
  }
  s
}
# simulate from the modified inverse Gaussian distribution
claim_sizes_invgauss <- claim_size(frequency_vector = n_vector, simfun = rinvgauss_censored)
# plot empirical CDF
plot(ecdf(unlist(claim_sizes_invgauss)), add = TRUE, col = 3)
legend.text <- c("Power normal", "Weibull", "Inverse Gaussian")
legend("bottomright", legend.text, col = 1:3, lty = 1, bty = "n")The applications discussed above assume that the claim sizes are
sampled from a single distribution for all policyholders (e.g. the
default power normal, custom sampling distribution specified by
simfun).
Suppose we instead want to simulate from a model which uses covariates to predict claim sizes. For example, consider a (theoretical) gamma GLM with log link:
\[ \begin{align*} E(S_i) =\mu_i &=\exp(\boldsymbol{x}_i^\top \boldsymbol\beta)\\ &= \exp(\beta_0 + \beta_1 \times age_i + \beta_2 \times age_i^2)\\ &= \exp(27 - 0.768 \times age_i + 0.008 \times age_i^2) \end{align*} \]
# define the random generation function to simulate from the gamma GLM
sim_GLM <- function(n) {
  # simulate covariates
  age <- sample(20:70, size = n, replace = T)
  mu <- exp(27 - 0.768 * age + 0.008 * age^2)
  rgamma(n, shape = 10, scale = mu / 10)
}
claim_sizes_GLM <- claim_size(frequency_vector = n_vector, simfun = sim_GLM)
plot(ecdf(unlist(claim_sizes_GLM)), xlim = c(0, 2000000),
     main = "Empirical distribution of claim sizes simulated from GLM",
     xlab = "Individual claim size")Suppose we have an existing dataset of claim costs at hand that we
wish to simulate from, e.g. ausautoBI8999 (an automobile
bodily injury claim dataset in Australia) from CASDatasets. We can take a
bootstrap resample of the dataset and then convert to
SynthETIC format with ease:
# install.packages("CASdatasets", repos = "http://cas.uqam.ca/pub/", type = "source")
library(CASdatasets)
data("ausautoBI8999")
boot <- sample(ausautoBI8999$AggClaim, size = sum(n_vector), replace = TRUE)
claim_sizes_bootstrap <- to_SynthETIC(boot, frequency_vector = n_vector)Another way to code this would be to write a random generation
function to perform bootstrapping, and then use claim_size
as usual:
sim_boot <- function(n) {
  sample(ausautoBI8999$AggClaim, size = n, replace = TRUE)
}
claim_sizes_bootstrap <- claim_size(frequency_vector = n_vector, simfun = sim_boot)Alternatively, one can easily fit a parametric distribution to an
existing dataset with the help of the fitdistrplus package
and then simulate from the fitted parametric distribution (Example 2.2).
SynthETIC assumes the (removable) dependence of
notification delay on claim size and occurrence period of the claim, and
thus requires the user to specify a paramfun
(parameter function) with arguments
claim_size and occurrence_period (and possibly
more, see Example 3.2). The dependencies
can be removed if the arguments are not referenced
inside the function; e.g. the default notification delay function (shown
below) is independent of the individual claim’s
occurrence_period.
Other than this pre-specified dependence structure, users are free to
choose any distribution, whether it be a pre-defined
distribution in R, or more advanced ones from packages, or
a proper user-defined function, to better match their own claim
experience.
Indeed, although not recommended, users are able to add further dependencies in their simulation. This is illustrated in Example 4.2 of the settlement delay module.
By default, SynthETIC samples notification delays from a
Weibull distribution:
## input
# specify the Weibull parameters as a function of claim_size and occurrence_period
notidel_param <- function(claim_size, occurrence_period) {
  # NOTE: users may add to, but not remove these two arguments (claim_size, 
  # occurrence_period) as they are part of SynthETIC's internal structure
  
  # specify the target mean and target coefficient of variation
  target_mean <- min(3, max(1, 2-(log(claim_size/(0.50 * ref_claim)))/3))/4 / time_unit
  target_cv <- 0.70
  # convert to Weibull parameters
  shape <- get_Weibull_parameters(target_mean, target_cv)[1]
  scale <- get_Weibull_parameters(target_mean, target_cv)[2]
  
  c(shape = shape, scale = scale)
}
## output
notidel <- claim_notification(n_vector, claim_sizes, 
                              rfun = rweibull, paramfun = notidel_param)SynthETIC does not restrict the choice of the sampling
distribution. For example, we can use a transformed gamma
distribution:
## input
# specify the transformed gamma parameters as a function of claim_size and occurrence_period
trgamma_param <- function(claim_size, occurrence_period, rate) {
  c(shape1 = max(1, claim_size / ref_claim),
    shape2 = 1 - occurrence_period / 200,
    rate = rate)
}
## output
# simulate notification delays from the transformed gamma
notidel_trgamma <- claim_notification(n_vector, claim_sizes, 
                                      rfun = actuar::rtrgamma, 
                                      paramfun = trgamma_param, rate = 2)
# graphically compare the result with the default Weibull distribution
plot(ecdf(unlist(notidel)), xlim = c(0, 15),
     main = "Empirical distribution of simulated notification delays",
     xlab = "Notification delay (in quarters)")
plot(ecdf(unlist(notidel_trgamma)), add = TRUE, col = 2)
legend.text <- c("Weibull (default)", "Transformed gamma")
legend("bottomright", legend.text, col = 1:2, lty = 1, bty = "n")Clearly the transformed gamma with the parameters specified above accelerates the reporting of the simulated claims.
One may wish to simulate from a more exotic sampling distribution that cannot be easily written as a nice pre-defined distribution function and its parameters. For example, consider a mixed distribution:
rmixed_notidel <- function(n, claim_size) {
  # consider a mixture distribution
  # equal probability of sampling from x (Weibull) or y (transformed gamma)
  x_selected <- sample(c(T, F), size = n, replace = TRUE)
  x <- rweibull(n, shape = 2, scale = 1)
  y <- actuar::rtrgamma(n, shape1 = min(1, claim_size / ref_claim), shape2 = 0.8, rate = 2)
  result <- length(n)
  result[x_selected] <- x[x_selected]; result[!x_selected] <- y[!x_selected]
  
  return(result)
}In this case, we can consider claim_size as the
“parameter” for the sampling distribution (just in the same way as
shape and scale for gamma distribution). Then
we can either define a parameter function like below:
rmixed_params <- function(claim_size, occurrence_period) {
  # claim_size is the only "parameter" required for rmixed_notidel
  c(claim_size = claim_size)
}or simply run
notidel_mixed <- claim_notification(n_vector, claim_sizes, rfun = rmixed_notidel)which would give the same result as
notidel_mixed <- claim_notification(n_vector, claim_sizes, 
                                    rfun = rmixed_notidel, paramfun = rmixed_params)Claim settlement delay represents the delay from claim notification
to closure. Like notification delay,
SynthETIC assumes the (removable) dependence of settlement delay on
claim size and occurrence period of the claim, and thus requires the
user to specify a paramfun (parameter
function) with arguments claim_size and
occurrence_period (and possibly more, see Example 3.2).
Other than this pre-specified dependence structure, users are free to
choose any distribution by specifying their own
rfun and/or paramfun (see
?claim_closure).
Indeed, although not recommended, users are able to add further dependencies in their simulation. This is illustrated in Example 4.2.
Below we show the default implementation with a Weibull distribution.
## input
# specify the Weibull parameters as a function of claim_size and occurrence_period
setldel_param <- function(claim_size, occurrence_period) {
  # NOTE: users may add to, but not remove these two arguments (claim_size, 
  # occurrence_period) as they are part of SynthETIC's internal structure
  
  # specify the target Weibull mean
  if (claim_size < (0.10 * ref_claim) & occurrence_period >= 21) {
    a <- min(0.85, 0.65 + 0.02 * (occurrence_period - 21))
  } else {
    a <- max(0.85, 1 - 0.0075 * occurrence_period)
  }
  mean_quarter <- a * min(25, max(1, 6 + 4*log(claim_size/(0.10 * ref_claim))))
  target_mean <- mean_quarter / 4 / time_unit
  
  # specify the target Weibull coefficient of variation
  target_cv <- 0.60
  c(shape = get_Weibull_parameters(target_mean, target_cv)[1, ],
    scale = get_Weibull_parameters(target_mean, target_cv)[2, ])
}
## output
# simulate the settlement delays from the Weibull with parameters above
setldel <- claim_closure(n_vector, claim_sizes, rfun = rweibull, paramfun = setldel_param)
setldel[[1]]
#>  [1]  8.0573782 15.8053863 14.2603566  4.5533956 16.4750118  0.6757682
#>  [7]  0.9682578 24.0401675 22.1846980  0.8610996  0.6683468 16.6714091
#> [13]  8.6054676  5.3500387 27.4864242  3.9482892  8.9834318  2.1112819
#> [19]  3.2612704 28.5662863 17.3838814 21.1837957  9.1088178  7.5033991
#> [25] 20.4644707 40.5724033  2.8940488  5.4080938  9.1196423 15.8132703
#> [31] 16.2229426  7.0756243  0.7837022  6.4323955  5.4962792  5.4157934
#> [37]  3.4942413  5.4052155  6.5808379 14.1095568  1.4248888 26.6289747
#> [43]  8.4400973  1.6465559 17.7384784 32.3709478  5.8339610  1.8125425
#> [49]  7.4113260  0.5666871  1.3118229  8.5053221  1.6374438 21.8701518
#> [55]  0.3746391  3.8139748  3.8327095  7.7515528 15.9745272  2.4149164
#> [61]  5.3988990 12.6057319  4.4768139  3.5457149 27.1495422  2.8924025
#> [67]  7.0365031  6.7507450  8.8945451 26.5963006 26.3620067  2.9837048
#> [73]  1.2960545 10.3304475  0.6139096  6.9856492 13.6588810 11.7053405
#> [79]  4.0233359  0.5923331 18.3592450  0.8768501  4.3853213  2.1379966
#> [85] 11.0601241  7.6971241  4.3468542 10.4686613  4.3434433 27.9304924There is no need to specify a sampling distribution if one is happy with the default Weibull specification. This example is just to demonstrate some of the behind-the-scenes work of the default implementation, and at the same time, to show how one may specify and input a random sampling distribution of their choosing.
Suppose we would like to add the dependence of settlement delay on
notification delay, which is not natively included in
SynthETIC default setting. For example, let’s consider the
following parameter function:
## input
# an extended parameter function for the simulation of settlement delays
setldel_param_extd <- function(claim_size, occurrence_period, notidel) {
  
  # specify the target Weibull mean
  if (claim_size < (0.10 * ref_claim) & occurrence_period >= 21) {
    a <- min(0.85, 0.65 + 0.02 * (occurrence_period - 21))
  } else {
    a <- max(0.85, 1 - 0.0075 * occurrence_period)
  }
  mean_quarter <- a * min(25, max(1, 6 + 4*log(claim_size/(0.10 * ref_claim))))
  # suppose the setldel mean is linearly related to the notidel of the claim
  target_mean <- (mean_quarter + notidel) / 4 / time_unit
  
  # specify the target Weibull coefficient of variation
  target_cv <- 0.60
  c(shape = get_Weibull_parameters(target_mean, target_cv)[1, ],
    scale = get_Weibull_parameters(target_mean, target_cv)[2, ])
}As this parameter function setldel_param_extd is
dependent on notidel, it should not be surprising that we
need to input the simulated notification delays when calling
claim_closure. We need to make sure that the argument names
are matched exactly (notidel in this example) and that the
input is specified as a vector of simulated quantities (not a list).
## output
# simulate the settlement delays from the Weibull with parameters above
notidel_vect <- unlist(notidel) # convert to a vector
setldel_extd <- claim_closure(n_vector, claim_sizes, rfun = rweibull, 
                              paramfun = setldel_param_extd,
                              notidel = notidel_vect)
setldel_extd[[1]]
#>  [1] 12.1568308  5.7147653  2.6794347  5.3800022 17.9283800  1.3657691
#>  [7] 15.2106418 14.2292539 16.0858636  5.7297223  8.7951146 16.5970588
#> [13] 12.2784410 11.9040272 12.8664728  5.1080408 13.1036800  6.5397494
#> [19]  3.7179928 12.9722770  7.1703573 21.6103265 15.2480896  4.2696178
#> [25] 12.3937742  6.0241202  6.8885033 17.4607486 20.7077826 11.5404139
#> [31] 17.1947624  7.0867713  9.1985114  4.7736206  1.7251691 31.8464475
#> [37] 23.8246794  3.5279912  1.1289398 15.6119759  4.4294932  8.0177843
#> [43] 28.1040458  4.2926962  7.8847506 14.6828112  7.2319794 12.2086684
#> [49]  4.2205753  4.1867548  4.1112308  8.7305403  9.6678687 14.1384301
#> [55]  0.4709342  8.0945496  9.3847636  8.4394349  3.5098351 21.4191003
#> [61] 14.2441120  8.1720867  8.1452173 19.5395709 21.4692182 11.7798869
#> [67] 14.8865520 10.3571895  9.1078003 13.5433455 18.5632148  0.2452477
#> [73] 20.6059863 18.3335179  4.7796994 43.5627513 40.2400408  5.0635200
#> [79] 14.4437240  8.6861120 26.4583571 11.6013506  1.9648260  6.2968274
#> [85] 15.5847478 26.7951624 10.6777622  3.5172161  1.1503382 18.2049233claim_payment_no() generates the number of partial
payments associated with a particular claim, from a user-defined random
generation function which may depend on claim_size.
Below we spell out the default function in SynthETIC
that simulates the number of partial payments (from a mixture
distribution):
## input
# the default random generating function
rmixed_payment_no <- function(n, claim_size, claim_size_benchmark_1, claim_size_benchmark_2) {
  # construct the range indicators
  test_1 <- (claim_size_benchmark_1 < claim_size & claim_size <= claim_size_benchmark_2)
  test_2 <- (claim_size > claim_size_benchmark_2)
  # if claim_size <= claim_size_benchmark_1
  no_pmt <- sample(c(1, 2), size = n, replace = T, prob = c(1/2, 1/2))
  # if claim_size is between the two benchmark values
  no_pmt[test_1] <- sample(c(2, 3), size = sum(test_1), replace = T, prob = c(1/3, 2/3))
  # if claim_size > claim_size_benchmark_2
  no_pmt_mean <- pmin(8, 4 + log(claim_size/claim_size_benchmark_2))
  prob <- 1 / (no_pmt_mean - 3)
  no_pmt[test_2] <- stats::rgeom(n = sum(test_2), prob = prob[test_2]) + 4
  no_pmt
}Since the random function directly takes claim_size as
an input, no additional parameterisation is required (unlike in Example 3.1, where we first need a
paramfun that turns the claim_size into
Weibull parameters). We can simply run claim_payment_no()
without inputting a paramfun.
## output
no_payments <- claim_payment_no(n_vector, claim_sizes, rfun = rmixed_payment_no,
                                claim_size_benchmark_1 = 0.0375 * ref_claim,
                                claim_size_benchmark_2 = 0.075 * ref_claim)
no_payments[[1]]
#>  [1]  4  4  4  4  6  1  2  5 18  2  1  9  9  5 17  2  9  2  3  4  6  7  4  2  6
#> [26]  8  2  4  7  5  8  4  4  5  3  4  4  8  4  6  2  5  5  3  5 13  7  4  3  1
#> [51]  3  4  3  9  1  4  4  5 11  6  4  5  2  4  7  5  4  8  4  8  6  5  6 12  1
#> [76]  4  5  5  4  1 10  4  5  1  4  7  4  4  4  8Note that the claim_size_benchmark_1 and
claim_size_benchmark_2 are passed on to
rmixed_payment_no and will not be required if we choose an
alternative sampling distribution.
This mixture sampling distribution has been included as the default. There is no need to reproduce the above code if the user is happy with this default distribution. A simple equivalent to the above code is just
no_payments <- claim_payment_no(n_vector, claim_sizes)This example is here only to demonstrate how the default function operates. If one would like to keep the structure of this function but modify the benchmark values, they may do so via
no_payments_tmp <- claim_payment_no(n_vector, claim_sizes,
                                    claim_size_benchmark_2 = 0.1 * ref_claim)Suppose we want to use a zero truncated Poisson distribution instead,
with the rate parameter as a function of claim_size:
## input
paymentNo_param <- function(claim_size) {
  no_pmt_mean <- pmax(4, pmin(8, 4 + log(claim_size / 15000)))
  c(lambda = no_pmt_mean - 3)
}
## output
no_payments_pois <- claim_payment_no(
  n_vector, claim_sizes, rfun = actuar::rztpois, paramfun = paymentNo_param)
table(unlist(no_payments_pois))
#> 
#>   1   2   3   4   5   6   7   8   9  10  11  12  13  14 
#> 927 864 646 496 301 198 101  56  19   8   2   3   2   1We can use the following code to create a claims dataset containing all individual claims features that we have simulated so far:
claim_dataset <- generate_claim_dataset(
  frequency_vector = n_vector,
  occurrence_list = occurrence_times,
  claim_size_list = claim_sizes,
  notification_list = notidel,
  settlement_list = setldel,
  no_payments_list = no_payments
)
str(claim_dataset)
#> 'data.frame':    3624 obs. of  7 variables:
#>  $ claim_no         : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ occurrence_period: num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ occurrence_time  : num  0.624 0.121 0.222 0.454 0.591 ...
#>  $ claim_size       : num  36900 141850 16603 15224 817939 ...
#>  $ notidel          : num  1.49 1.08 1.48 2.19 1.38 ...
#>  $ setldel          : num  8.06 15.81 14.26 4.55 16.48 ...
#>  $ no_payment       : num  4 4 4 4 6 1 2 5 18 2 ...test_claim_dataset, included as part of the package, is
an example dataset of individual claims features resulting from a
specific run with the default assumptions.
str(test_claim_dataset)
#> 'data.frame':    3624 obs. of  7 variables:
#>  $ claim_no         : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ occurrence_period: num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ occurrence_time  : num  0.624 0.121 0.222 0.454 0.591 ...
#>  $ claim_size       : num  785871 22562 215771 117654 31627 ...
#>  $ notidel          : num  0.0652 1.1772 2.5262 0.9262 1.6507 ...
#>  $ setldel          : num  18.23 2.33 34 11.98 11.81 ...
#>  $ no_payment       : num  6 4 11 6 4 12 1 9 2 5 ...The default function samples the sizes of partial payments conditional on the number of partial payments, and the size of the claim:
## input
rmixed_payment_size <- function(n, claim_size) {
  # n = number of simulations, here n should be the number of partial payments
  if (n >= 4) {
    # 1) Simulate the "complement" of the proportion of total claim size 
    #    represented by the last two payments
    p_mean <- 1 - min(0.95, 0.75 + 0.04*log(claim_size/(0.10 * ref_claim)))
    p_CV <- 0.20
    p_parameters <- get_Beta_parameters(target_mean = p_mean, target_cv = p_CV)
    last_two_pmts_complement <- stats::rbeta(
      1, shape1 = p_parameters[1], shape2 = p_parameters[2])
    last_two_pmts <- 1 - last_two_pmts_complement
    
    # 2) Simulate the proportion of last_two_pmts paid in the second last payment
    q_mean <- 0.9
    q_CV <- 0.03
    q_parameters <- get_Beta_parameters(target_mean = q_mean, target_cv = q_CV)
    q <- stats::rbeta(1, shape1 = q_parameters[1], shape2 = q_parameters[2])
    # 3) Calculate the respective proportions of claim amount paid in the 
    #    last 2 payments
    p_second_last <- q * last_two_pmts
    p_last <- (1-q) * last_two_pmts
    # 4) Simulate the "unnormalised" proportions of claim amount paid 
    #    in the first (m - 2) payments
    p_unnorm_mean <- last_two_pmts_complement/(n - 2)
    p_unnorm_CV <- 0.10
    p_unnorm_parameters <- get_Beta_parameters(
      target_mean = p_unnorm_mean, target_cv = p_unnorm_CV)
    amt <- stats::rbeta(
      n - 2, shape1 = p_unnorm_parameters[1], shape2 = p_unnorm_parameters[2])
    # 5) Normalise the proportions simulated in step 4
    amt <- last_two_pmts_complement * (amt/sum(amt))
    # 6) Attach the last 2 proportions, p_second_last and p_last
    amt <- append(amt, c(p_second_last, p_last))
    # 7) Multiply by claim_size to obtain the actual payment amounts
    amt <- claim_size * amt
    
  } else if (n == 2 | n == 3) {
    p_unnorm_mean <- 1/n
    p_unnorm_CV <- 0.10
    p_unnorm_parameters <- get_Beta_parameters(
      target_mean = p_unnorm_mean, target_cv = p_unnorm_CV)
    amt <- stats::rbeta(
      n, shape1 = p_unnorm_parameters[1], shape2 = p_unnorm_parameters[2])
    # Normalise the proportions and multiply by claim_size to obtain the actual payment amounts
    amt <- claim_size * amt/sum(amt)
  } else {
    # when there is a single payment
    amt <- claim_size
  }
  return(amt)
}
## output
payment_sizes <- claim_payment_size(n_vector, claim_sizes, no_payments,
                                    rfun = rmixed_payment_size)
payment_sizes[[1]][[1]]
#> [1]  3318.366  2995.088 27335.431  3250.976As this is the default random generation function that
SynthETIC adopts, a shorter equivalent command would be to
call claim_payment_no without specifying a
rfun.
payment_sizes <- claim_payment_size(n_vector, claim_sizes, no_payments)Let’s consider a simplistic example where we assume the partial payment sizes are (stochastically) equal. This will result in the following simulation function:
## input
unif_payment_size <- function(n, claim_size) {
  prop <- runif(n)
  prop.normalised <- prop / sum(prop)
  
  return(claim_size * prop.normalised)
}
## output
# note that we don't need to specify a paramfun as rfun is directly a function
# of claim_size
payment_sizes_unif <- claim_payment_size(n_vector, claim_sizes, no_payments,
                                         rfun = unif_payment_size)
payment_sizes_unif[[1]][[1]]
#> [1]  8200.060  1533.751 20451.872  6714.178The simulation of the inter-partial delays is almost identical to that of partial payment sizes, except that it also depends on the claim settlement delay - the inter-partial delays should add up to the settlement delay.
Other than this, the SynthETIC function implementation
of claim_payment_delay() is almost the same as
claim_payment_size(), but of course, with a different
default simulation function:
## input
r_pmtdel <- function(n, claim_size, setldel, setldel_mean) {
  result <- c(rep(NA, n))
  # First simulate the unnormalised values of d, sampled from a Weibull distribution
  if (n >= 4) {
    # 1) Simulate the last payment delay
    unnorm_d_mean <- (1 / 4) / time_unit
    unnorm_d_cv <- 0.20
    parameters <- get_Weibull_parameters(target_mean = unnorm_d_mean, target_cv = unnorm_d_cv)
    result[n] <- stats::rweibull(1, shape = parameters[1], scale = parameters[2])
    # 2) Simulate all the other payment delays
    for (i in 1:(n - 1)) {
      unnorm_d_mean <- setldel_mean / n
      unnorm_d_cv <- 0.35
      parameters <- get_Weibull_parameters(target_mean = unnorm_d_mean, target_cv = unnorm_d_cv)
      result[i] <- stats::rweibull(1, shape = parameters[1], scale = parameters[2])
    }
  } else {
    for (i in 1:n) {
      unnorm_d_mean <- setldel_mean / n
      unnorm_d_cv <- 0.35
      parameters <- get_Weibull_parameters(target_mean = unnorm_d_mean, target_cv = unnorm_d_cv)
      result[i] <- stats::rweibull(1, shape = parameters[1], scale = parameters[2])
    }
  }
  # Normalise d such that sum(inter-partial delays) = settlement delay
  # To make sure that the pmtdels add up exactly to setldel, we treat the last one separately
  result[1:n-1] <- (setldel/sum(result)) * result[1:n-1]
  result[n] <- setldel - sum(result[1:n-1])
  return(result)
}
param_pmtdel <- function(claim_size, setldel, occurrence_period) {
  # mean settlement delay
  if (claim_size < (0.10 * ref_claim) & occurrence_period >= 21) {
    a <- min(0.85, 0.65 + 0.02 * (occurrence_period - 21))
  } else {
    a <- max(0.85, 1 - 0.0075 * occurrence_period)
  }
  mean_quarter <- a * min(25, max(1, 6 + 4*log(claim_size/(0.10 * ref_claim))))
  target_mean <- mean_quarter / 4 / time_unit
  c(claim_size = claim_size,
    setldel = setldel,
    setldel_mean = target_mean)
}
## output
payment_delays <- claim_payment_delay(
  n_vector, claim_sizes, no_payments, setldel,
  rfun = r_pmtdel, paramfun = param_pmtdel,
  occurrence_period = rep(1:I, times = n_vector))
# payment times on a continuous time scale
payment_times <- claim_payment_time(n_vector, occurrence_times, notidel, payment_delays)
# payment times in periods
payment_periods <- claim_payment_time(n_vector, occurrence_times, notidel, payment_delays,
                                      discrete = TRUE)
cbind(payment_delays[[1]][[1]], payment_times[[1]][[1]], payment_periods[[1]][[1]])
#>          [,1]      [,2] [,3]
#> [1,] 2.343763  4.456503    5
#> [2,] 3.153731  7.610234    8
#> [3,] 1.299679  8.909913    9
#> [4,] 1.260205 10.170118   11base_inflation_past =
vector of historic quarterly inflation rates for the
past \(I\) periods,
base_inflation_future = vector of expected
quarterly base inflation rates for the next \(I\) periods (users may also choose to
simulate the future inflation rates); the lengths of the vector might
differ from \(I\) when a
time_unit different from calendar quarter is usedclaim_payment_inflation)SI_occurrence = function of
occurrence_time and claim_size that outputs
the superimposed inflation index with respect to the occurrence time of
the claimSI_payment = function of
payment_time and claim_size that outputs the
superimposed inflation index with respect to payment time of the
claim# Base inflation: a vector of quarterly rates
# In this demo we set base inflation to be at 2% p.a. constant for both past and future
# Users can choose to randominise the future rates if they wish
demo_rate <- (1 + 0.02)^(1/4) - 1
base_inflation_past <- rep(demo_rate, times = 40)
base_inflation_future <- rep(demo_rate, times = 40)
base_inflation_vector <- c(base_inflation_past, base_inflation_future)
# Superimposed inflation:
# 1) With respect to occurrence "time" (continuous scale)
SI_occurrence <- function(occurrence_time, claim_size) {
  if (occurrence_time <= 20 / 4 / time_unit) {1}
  else {1 - 0.4*max(0, 1 - claim_size/(0.25 * ref_claim))}
}
# 2) With respect to payment "time" (continuous scale)
# -> compounding by user-defined time unit
SI_payment <- function(payment_time, claim_size) {
  period_rate <- (1 + 0.30)^(time_unit) - 1
  beta <- period_rate * max(0, 1 - claim_size/ref_claim)
  (1 + beta)^payment_time
}# shorter equivalent code:
# payment_inflated <- claim_payment_inflation(
#   n_vector, payment_sizes, payment_times, occurrence_times, claim_sizes, 
#   base_inflation_vector)
payment_inflated <- claim_payment_inflation(
  n_vector,
  payment_sizes,
  payment_times,
  occurrence_times,
  claim_sizes,
  base_inflation_vector,
  SI_occurrence,
  SI_payment
)
cbind(payment_sizes[[1]][[1]], payment_inflated[[1]][[1]])
#>           [,1]      [,2]
#> [1,]  3318.366  4311.707
#> [2,]  2995.088  4683.972
#> [3,] 27335.431 46142.064
#> [4,]  3250.976  5909.406Use the following code to create a transactions dataset containing full information of all the partial payments made.
# construct a "claims" object to store all the simulated quantities
all_claims <- claims(
  frequency_vector = n_vector,
  occurrence_list = occurrence_times,
  claim_size_list = claim_sizes,
  notification_list = notidel,
  settlement_list = setldel,
  no_payments_list = no_payments,
  payment_size_list = payment_sizes,
  payment_delay_list = payment_delays,
  payment_time_list = payment_times,
  payment_inflated_list = payment_inflated
)
transaction_dataset <- generate_transaction_dataset(
  all_claims,
  adjust = FALSE # to keep the original (potentially out-of-bound) simulated payment times
)
str(transaction_dataset)
#> 'data.frame':    19240 obs. of  12 variables:
#>  $ claim_no         : int  1 1 1 1 2 2 2 2 3 3 ...
#>  $ pmt_no           : num  1 2 3 4 1 2 3 4 1 2 ...
#>  $ occurrence_period: num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ occurrence_time  : num  0.624 0.624 0.624 0.624 0.121 ...
#>  $ claim_size       : num  36900 36900 36900 36900 141850 ...
#>  $ notidel          : num  1.49 1.49 1.49 1.49 1.08 ...
#>  $ setldel          : num  8.06 8.06 8.06 8.06 15.81 ...
#>  $ payment_time     : num  4.46 7.61 8.91 10.17 7.5 ...
#>  $ payment_period   : num  5 8 9 11 8 14 16 18 8 12 ...
#>  $ payment_size     : num  3318 2995 27335 3251 12109 ...
#>  $ payment_inflated : num  4312 4684 46142 5909 14547 ...
#>  $ payment_delay    : num  2.34 3.15 1.3 1.26 6.3 ...test_transaction_dataset, included as part of the
package, is an example dataset showing full information of the claims
features at a transaction/payment level, generated by a specific
SynthETIC run with the default assumptions.
str(test_transaction_dataset)
#> 'data.frame':    18983 obs. of  12 variables:
#>  $ claim_no         : int  1 1 1 1 1 1 2 2 2 2 ...
#>  $ pmt_no           : num  1 2 3 4 5 6 1 2 3 4 ...
#>  $ occurrence_period: num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ occurrence_time  : num  0.624 0.624 0.624 0.624 0.624 ...
#>  $ claim_size       : num  785871 785871 785871 785871 785871 ...
#>  $ notidel          : num  0.0652 0.0652 0.0652 0.0652 0.0652 ...
#>  $ setldel          : num  18.2 18.2 18.2 18.2 18.2 ...
#>  $ payment_time     : num  4.2 7.1 11.2 14.4 18.5 ...
#>  $ payment_period   : num  5 8 12 15 19 19 3 3 4 4 ...
#>  $ payment_size     : num  25105 26177 26333 26341 592457 ...
#>  $ payment_inflated : num  25632 27113 27829 28294 649128 ...
#>  $ payment_delay    : num  3.51 2.9 4.06 3.29 4.01 ...SynthETIC includes an output function which summarises
the claim payments by occurrence and development periods. The usage of
the function takes the form
claim_output(
  frequency_vector = ,
  payment_time_list = ,
  payment_size_list = ,
  aggregate_level = 1,
  incremental = TRUE,
  future = TRUE,
  adjust = TRUE
)Note that by default, we aggregate all out-of-bound transactions into
the maximum development period. But if we set
adjust = FALSE, then the function would produce a separate
“tail” column to represent all payments beyond the maximum development
period (see function documentation ?claim_output).
Examples:
# 1. Constant dollar value INCREMENTAL triangle
output <- claim_output(n_vector, payment_times, payment_sizes,
                       incremental = TRUE)
# 2. Constant dollar value CUMULATIVE triangle
output_cum <- claim_output(n_vector, payment_times, payment_sizes,
                           incremental = FALSE)
# 3. Actual (i.e. inflated) INCREMENTAL triangle
output_actual <- claim_output(n_vector, payment_times, payment_inflated,
                              incremental = TRUE)
# 4. Actual (i.e. inflated) CUMULATIVE triangle
output_actual_cum <- claim_output(n_vector, payment_times, payment_inflated,
                                  incremental = FALSE)
# Aggregate at a yearly level
claim_output(n_vector, payment_times, payment_sizes, aggregate_level = 4)
#>            DP1      DP2      DP3      DP4      DP5      DP6     DP7     DP8
#> AP1  5441334.8  7961965  8715133  7757804  9368717 10077602 4536992 2958126
#> AP2  1223856.8  7661077 11470102 12295572 10019224  6405349 3870569 3058368
#> AP3  1337059.0  7729829 10415393  9184967  7911014  9851800 3730634 2746250
#> AP4  1408416.9  7366584 10022456  6492512  7686760  6511769 7019836 3380195
#> AP5  1043099.4 11165081 12245396 11447788  9258850  9778165 2918579 1086486
#> AP6  1606167.1 12189927 11810369 11403546  7517649  5248115 6715757 3177059
#> AP7  1026119.1  8268436 11545276 14881340  6348613  8700601 2047984 1282147
#> AP8  1480645.9  9127262 10456350  9814104  5770773  5112508 4721472 3394193
#> AP9   809770.8  9050236 11270231 11843245  8720327  8765637 4359670 3399178
#> AP10 1326387.9  6884813 13336114 13163692 11176762  4296762 5584352 4501879
#>            DP9      DP10
#> AP1  2506372.9 4229404.4
#> AP2  1502956.5 6137834.8
#> AP3  3607883.7 4862099.2
#> AP4  1480462.8 2316440.9
#> AP5  2263033.7  977759.8
#> AP6  1832934.7 4770802.6
#> AP7   181066.7  626389.1
#> AP8  2830044.7 2375556.8
#> AP9  1001578.7 1971019.0
#> AP10  811270.6  332693.7Note that by setting future = FALSE we can obtain the
upper left part of the triangle (i.e. only the past claim payments). The
past data can then be used to perform chain-ladder reserving
analysis:
# output the past cumulative triangle
cumtri <- claim_output(n_vector, payment_times, payment_sizes,
                       aggregate_level = 4, incremental = FALSE, future = FALSE)
# calculate the age to age factors
selected <- vector()
J <- nrow(cumtri)
for (i in 1:(J - 1)) {
  # use volume weighted age to age factors
  selected[i] <- sum(cumtri[, (i + 1)], na.rm = TRUE) / sum(cumtri[1:(J - i), i], na.rm = TRUE)
}
# complete the triangle
CL_prediction <- cumtri
for (i in 2:J) {
  for (j in (J - i + 2):J) {
    CL_prediction[i, j] <- CL_prediction[i, j - 1] * selected[j - 1]
  }
}
CL_prediction
#>            DP1      DP2      DP3      DP4      DP5      DP6      DP7      DP8
#> AP1  5441334.8 13403300 22118433 29876237 39244954 49322556 53859548 56817674
#> AP2  1223856.8  8884934 20355036 32650607 42669831 49075181 52945750 56004118
#> AP3  1337059.0  9066887 19482280 28667248 36578262 46430063 50160696 52906947
#> AP4  1408416.9  8775001 18797457 25289969 32976730 39488499 46508335 49104697
#> AP5  1043099.4 12208180 24453576 35901365 45160214 54938379 60648732 64034493
#> AP6  1606167.1 13796094 25606463 37010009 44527658 54180191 59811737 63150772
#> AP7  1026119.1  9294555 20839831 35721171 45483850 55343663 61096141 64506879
#> AP8  1480645.9 10607908 21064257 31268170 39813834 48444522 53479896 56465452
#> AP9   809770.8  9860006 19793773 29382239 37412474 45522604 50254272 53059754
#> AP10 1326387.9  8272148 16606178 24650515 31387558 38191631 42161309 44514996
#>           DP9     DP10
#> AP1  59324047 63553451
#> AP2  57507075 61606941
#> AP3  54787092 58693042
#> AP4  50849723 54474965
#> AP5  66310076 71037537
#> AP6  65394951 70057169
#> AP7  66799250 71561585
#> AP8  58472055 62640718
#> AP9  54945330 58862561
#> AP10 46096918 49383318We observe that the chain-ladder analysis performs very poorly on this simulated claim dataset. This is perhaps unsurprising in view of the data features and the extent to which they breach chain ladder assumptions. Data sets such as this are useful for testing models that endeavour to represent data outside the scope of the chain-ladder.
Note that by default, similar to the case of
claim_output and claim_payment_inflation, we
will truncate the claims development such that payments that were
projected to fall out of the maximum development period are forced to be
paid at the exact end of the maximum development period allowed. This
convention will cause some concentration of transactions at the end of
development period \(I\) (shown as a
surge in claims in the \(I\)th
period).
Users can set adjust = FALSE to see the “true” picture
of claims development without such artificial adjustment. If the plots
look significantly different, this indicates to the user that the user’s
selection of lag parameters (notification and/or settlement delays) is
not well matched to the maximum number of development periods allowed,
and consideration might be given to changing one or the other.
plot(test_claims_object)# compare with the "full complete picture"
plot(test_claims_object, adjust = FALSE)# plot by occurrence and development years
plot(test_claims_object, by_year = TRUE)Once all the input parameters have been set up, we can repeat the
simulation process as many times as desired through a for loop. The code
below saves the transaction dataset generated by each simulation run as
a component of results_all.
times <- 100
results_all <- vector("list")
for (i in 1:times) {
  # Module 1: Claim occurrence
  n_vector <- claim_frequency(I, E, lambda)
  occurrence_times <- claim_occurrence(n_vector)
  # Module 2: Claim size
  claim_sizes <- claim_size(n_vector, S_df, type = "p", range = c(0, 1e24))
  # Module 3: Claim notification
  notidel <- claim_notification(n_vector, claim_sizes, paramfun = notidel_param)
  # Module 4: Claim settlement
  setldel <- claim_closure(n_vector, claim_sizes, paramfun = setldel_param)
  # Module 5: Claim payment count
  no_payments <- claim_payment_no(n_vector, claim_sizes, rfun = rmixed_payment_no,
                                  claim_size_benchmark_1 = 0.0375 * ref_claim,
                                  claim_size_benchmark_2 = 0.075 * ref_claim)
  # Module 6: Claim payment size
  payment_sizes <- claim_payment_size(n_vector, claim_sizes, no_payments, 
                                      rfun = rmixed_payment_size)
  # Module 7: Claim payment time
  payment_delays <- claim_payment_delay(n_vector, claim_sizes, no_payments, setldel,
                                        rfun = r_pmtdel, paramfun = param_pmtdel,
                                        occurrence_period = rep(1:I, times = n_vector))
  payment_times <- claim_payment_time(n_vector, occurrence_times, notidel, payment_delays)
  # Module 8: Claim inflation
  payment_inflated <- claim_payment_inflation(
    n_vector, payment_sizes, payment_times, occurrence_times,
    claim_sizes, base_inflation_vector, SI_occurrence, SI_payment)
  
  results_all[[i]] <- generate_transaction_dataset(
    claims(
      frequency_vector = n_vector,
      occurrence_list = occurrence_times,
      claim_size_list = claim_sizes,
      notification_list = notidel,
      settlement_list = setldel,
      no_payments_list = no_payments,
      payment_size_list = payment_sizes,
      payment_delay_list = payment_delays,
      payment_time_list = payment_times,
      payment_inflated_list = payment_inflated),
    # adjust = FALSE to retain the original simulated times
    adjust = FALSE)
}What if we are interested in seeing the average claims development
over a large number of simulation runs? The plot.claims
function in this package at present only works for a single
claims object so we need to come up with a way to combine
the claims objects generated by each run. A much simpler
alternative would be to just increase the exposure rates and plot the
resulting claims object. This has the same effect as
averaging over a large number of simulation runs.
This long-run average of claims development offers insights into the effects of the distributional assumptions that users have made throughout the way, and hence the reasonableness of such choices.
The code below runs only for 10 simulations and we can already see
the trend emerging, which matches with the result of our single
simulation run above. Increasing times to run simulation
will show a smoother trend, which we refrain from producing here because
running simulation on this amount of data takes some time (100
simulations take around 10 minutes on a quad-core machine). We remark
that the major simulation lags are caused by the
claim_payment_delay and (less severely)
claim_payment_size functions.
start.time <- proc.time()
times <- 10
# increase exposure to E*times to get the same results as the aggregation of
# multiple simulation runs
n_vector <- claim_frequency(I, E = E * times, lambda)
occurrence_times <- claim_occurrence(n_vector)
claim_sizes <- claim_size(n_vector)
notidel <- claim_notification(n_vector, claim_sizes, paramfun = notidel_param)
setldel <- claim_closure(n_vector, claim_sizes, paramfun = setldel_param)
no_payments <- claim_payment_no(n_vector, claim_sizes, rfun = rmixed_payment_no,
                                claim_size_benchmark_1 = 0.0375 * ref_claim,
                                claim_size_benchmark_2 = 0.075 * ref_claim)
payment_sizes <- claim_payment_size(n_vector, claim_sizes, no_payments, rmixed_payment_size)
payment_delays <- claim_payment_delay(n_vector, claim_sizes, no_payments, setldel,
                                      rfun = r_pmtdel, paramfun = param_pmtdel,
                                      occurrence_period = rep(1:I, times = n_vector))
payment_times <- claim_payment_time(n_vector, occurrence_times, notidel, payment_delays)
payment_inflated <- claim_payment_inflation(
  n_vector, payment_sizes, payment_times, occurrence_times,
  claim_sizes, base_inflation_vector, SI_occurrence, SI_payment)
all_claims <- claims(
  frequency_vector = n_vector,
  occurrence_list = occurrence_times,
  claim_size_list = claim_sizes,
  notification_list = notidel,
  settlement_list = setldel,
  no_payments_list = no_payments,
  payment_size_list = payment_sizes,
  payment_delay_list = payment_delays,
  payment_time_list = payment_times,
  payment_inflated_list = payment_inflated
)
plot(all_claims, adjust = FALSE) +
  ggplot2::labs(subtitle = paste("With", times, "simulations"))proc.time() - start.time
#>    user  system elapsed 
#>  24.353   0.346  25.194Users can also choose to plot by occurrence year, or remove the
inflation by altering the arguments by_year and
inflated in
plot(claims, by_year = , inflated = , adjust = )These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.