# Graduation and statistical models AssignmentTutorOnline | Good Grade Guarantee!

Chapter 7Graduation and statistical modelsIntroductionAs we have seen in previous chapters, Actuaries need to use mortality rates toperform actuarial calculations. Crude mortality rates can be obtained fromobserved data and some analysis, but these rates are unlikely to producesmooth functions due to sampling errors. It is generally accepted that smoothestimated mortality rates should be used in actuarial calculations. Hence theprocess of graduation is used to smooth the crude mortality rates.The first section of the chapter returns to the basic properties of a failuretime distribution, including the survivor and hazard functions used throughout this module. The connection between these functions and the actuarialmeasures of mortality and force of mortality are reviewed. Smooth functions for these measures can be produced by fitting generalized linear modelswith either Binomial or Poisson errors, respectively. The desirable featuresof a graduation are discussed and illustrated by an example. Finally thegraduation achieved for this example is assessed.7.1 Review of mortality ratesSurvival data methods have been developed to analyze the problems encountered when studying the occurrence of events in time. Several aspects of timemay be important : (i) age, (ii) calendar time, (iii) time since first exposureto some influence or treatment. Problems are specified in terms of one timescale, values of which will be denoted by t. A random variable denoting thetime of occurrence of an event of interest (death) will be denoted by T.The distribution of the failure time TThe survivor function of the failure time distribution is defined byS(t) = P(T ≥ t), (5)which lies in the range [0,1] and is the probability that the failure occursat or after time t. As a function of t the survivor function is decreasing(non-increasing).The probability density function (p.d.f.) of the failure time distribution isdefined by82f(t) = limh→0P (t ≤ T < t + h)h =-dS(t)dt ,which is minus the rate of change of the survivor function.The hazard function of the failure time distribution (or force of mortality) isdefined byλ(t) = limh→0P (t ≤ T < t + h | T > t)h =f(t)S(t).The cumulative (integrated) hazard function of the failure time distributionis defined byΛ(t) = Z0t λ(u)du = – log(S(t)).Let Y = S(T ), with S(t) defined as in Equation (5).Then Y ∼ U(0, 1), i.e. Y has a uniform distribution on (0,1), so that thep.d.f. is constant on (0,1), with probability density function given byfY (y) = 1, if 0 ≤ y ≤ 1,= 0, otherwise .Let X = – log(Y ). Then X ∼ Ne(1), ie X has a negative exponentialdistribution with expectation equal to 1, and hence Λ(T ) ∼ Ne(1), withprobability density function given byfX(x) = exp(-x), if x ≥ 0,= 0, otherwise .Let W = log(X). Then W ∼ SEV , i.e. W has a standard extreme valuedistribution, and hence log(Λ(T )) ∼ SEV , with probability density functiongiven byfW (w) = exp(w – exp(w)), if – ∞ < w < ∞.ModelsCovariates are denoted by z. These are assumed to be fixed for an individual.A failure time model specifies the failure time distribution conditional uponthe covariates using f(t | z; β), or S(t | z; β), with parameters β.Often it is necessary to relate failure to variables which vary with time.These are called time-dependent covariates. The covariate history at time tis denoted by z(t). This approach is consistent with that taken in Cox modelof Chapter 3.83A hazard model relates the instantaneous hazard or risk of failure at time tto the covariate history at time t, by using λ(t | z(t); β), or Λ(t | z(t); β).Note: Failure time models may always be represented as hazard models, buthazard models can only make predictions for failure time distributions when(a) the covariates are fixed ; or(b) the time variation of z(t) is totally deterministic.When there are stochastic covariates, the failure time distribution cannot bepredicted without predicting the covariate process.Actuarial measures of mortalityLet T be the age of a person’s death. One actuarial measure of mortality isthe probability of death hqx in the interval (x, x + h] having attained the agex (in years). This will be given byhqx = P(x < T ≤ x + h | x < T) = (S(x) – S(x + h))/S(x) ,where S(t) is the survivor function of the distribution of T, the age of death.The interval h is often taken as 1 and then the notation qx = 1qx is used.Then qx is known as the rate of mortality. Also, px = 1-qx is the probabilitythat a person aged x is alive after a year.A second actuarial measure is the force of mortality µx+ 12at age x + 12 (inyears). This will be given byµx+ 12= λ x + 1 2 = Sf((xx ++ 1 212)) ,if λ(t) is the hazard function, f(t) is the probability density function andS(t) is the survivor function of the distribution of T, the age of death. Forsmall h, hqx ≈ hµx.If h = 1 is considered small enough then the following approximation thatqx ≈ µx+ 12can be used, improving the approximation by using the value ofthe force of mortality in the centre of interval at x + 1 2.7.2 The need for graduationCrude mortality rates obtained from a mortality investigation are not therates that are used for actuarial calculations. Actuaries are interested inthe values of the probability of death qx at age x (in years) and the forceof mortality µx+ 12at age x + 12. It is generally assumed that these shouldboth be smooth functions of x. So it is necessary that these crude rates84be smoothed to remove random sampling errors. The process of performingthis smoothing is known as graduation. This produces a set of graduatedestimates that are a smooth function of age.Model assumptionsAssume that mortality data consisting of Ex, the number at risk of death forage x, and dx, the number of deaths in (x, x+1], for age x, has been collectedfor ages (years) x = x1, x2, . . . , xm. Then two models widely used are:1. Poisson model: when this model is used the force of mortality µx+ 12at time x + 12 is the parameter to be estimated. Then Dx, the numberof deaths for age x, is distributed so thatDx ∼ P o(Exµx+ 12) ,which is a Poisson distribution with expectation Exµx+ 12, where µx+ 12isthe expected number of deaths for age (x, x + 1] and Ex is the numberat risk of death for age x.2. Binomial model: when this model is used the rate of mortality (theprobability of death) qx at time x is the parameter to be estimated.Then Dx, the number of deaths for age x, is distributed so thatDx ∼ B(Ex, qx) ,which is a Binomial distribution with expectation Exqx and varianceExqx(1 – qx), where qx is the probability of death for age (x, x + 1] andExis the number at risk of death for age x.Standard TablesPublished life tables based on large amounts of data are called standardtables. The main examples are:1. National life tables from the census data and death registration data.These are published every ten years for England and Wales.2. Tables based on data from life insurance companies. The ContinuousMortality Investigation Bureau obtains data from life insurance companies and publishes tables for different types of business. Tables basedon data from 1991-4 are known as the ‘92 series’ tables.8530 35 40 450.0005 0.0010 0.0015 0.0020 0.0025 0.0030x(age)qFigure 1: Plot of the estimated probability of death against age.Example 7.1. An example of a mortality table is now presented based ondata from the 92 series mortality tables. The data consists of Ex, the numberat risk of death for age x, and dx, the number of deaths in (x, x + 1], for agex, for ages from x = 30 to x = 49 years. A plot of ˆ qx = dx/Ex, the estimateof qx, the probability of death for age (x, x + 1), is given in Figure 1.Graduation aims to produce a set of graduated estimates that are a smoothfunction of age. If probability is the function to be estimated, then it isnecessary to use functions in the range [0, 1] to smooth the probability, asprobability is defined in this range. Many functions are defined on the wholereal line [-∞, ∞] and to enable these functions to be used in the smoothingprocess it is more appropriate to consider the log-odds of the probabilitylog(q/(1 – q)) instead, as this is defined on [-∞, ∞]. So to find a suitablemodel the log-odds transformation log(ˆ q/(1 – qˆ)) is used as it transforms ˆ qon to the interval [-∞, ∞].86A plot is given in Figure 2 which suggests a positive linear relationship withage as the correlation coefficient is 0.94 A linear model could be fitted using least squares but a more appropriate model would be logistic regressionmodel with a binomial error structure. This is the Binomial model givenabove. Both these methods were used and the resulting plot for the graduated estimate of the probability of death are given in Figure 3. The Binomialmodel would appear to give the better fit, especially at the higher ages, andas it the most appropriate model it is the one that will be used.30 35 40 45-7.5 -7.0 -6.5 -6.0x(age)log(q/(1-q))Figure 2: Plot of log odds of death against age.The Binomial model used in Figure 3 has a linear predictor for the log-oddsof qx given bylog 1 -qxqx = β0 + β1x .The models in Figure 3 were fitted using the statistical software packageR, that you may be familiar with from MA7403 Statistics. The matrix M28730 35 40 450.0005 0.0010 0.0015 0.0020 0.0025 0.0030x(age)qFigure 3: Plot of graduated estimate of probability of death against age.has 2 columns, the first column contains dx and the second column containsEx-dx. The column vector x contains the ages from x = 30 to x = 49 years.The output for the Binomial model using the glm function is given below.> grada.lg <- glm(M2 ~ x, family= “binomial”)> summary(grada.lg)Call: glm(formula = M2 ~ x, family = “binomial”)Deviance Residuals:Min 1Q Median 3Q Max-2.1614 -1.0985 -0.2736 0.9305 2.8601Coefficients:Estimate Std. Error z value Pr(>|z|)88(Intercept) -11.168106 0.209486 -53.31 <2e-16 ***x 0.108128 0.004873 22.19 <2e-16 ***—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)

Null deviance: 584.089

on 19on 18

degrees of freedomdegrees of freedom AIC: 162.99

Residual deviance:

38.545

AssignmentTutorOnline

Number of Fisher Scoring iterations: 4Hence, using the estimates of β0 and β1 above, the graduated estimatesoqxare given bylogoqx1-oqx ! = -11.168 + 0.108x ,oroqx=11 + exp(11.168 – 0.108x) .This model would imply that the log-odds increases by 1 over a ten yearsinterval. This is equivalent to the odds increasing by a multiple of exp(1) =2.72 over a ten years interval. As the probabilities are small then this isapproximately true for the probabilities, ie the probability of death increasesby a factor of about 3 over a ten years interval.Also, this model is an improvement on the null model with β1 = 0 whichmakesoqx a constant for all x, which is equivalent to fitting a horizontal line tothe data in Figure 2. This is confirmed by the change in deviance for these twomodels, which from the above R-extract is equal to 584.09 – 38.55 = 545.54on 1 degree of freedom (df). This is a significant improvement in the fit (asthe 95th percentile for χ2 1 is 3.84).The curves in Figure 3 are smooth curves but they do not fit the data wellfor the lower ages. This can be confirmed from the residuals as there are fouroutside the range (-2, 2). So as well as smoothness adherence to the datathrough a goodness of fit is required.7.3 Desirable features of graduationThere are three desirable features of a graduation:891. smoothness;2. goodness of fit to the data; and3. appropriateness for purpose.It is desirable to have a smooth function as a result of the graduation processand to use a parametric model with the fewest parameters. This principle isknown as the principle of parsimony. However a simple parsimonious modelwhich does not adequately describe the data is not likely to be of great use.So parsimony has to be balanced against a need to adequately be a good offit to the data. Also we should always keep in mind the purpose for whichthe graduated estimates are to be used.The usual approach in statistical model building is to start with the simplestmodel which could be appropriate and then to add further terms (functions)to this model which will significantly improve the fit of the model and thenstop when no further significant improvement is obtained. For example if thelog-odds of the probability is being modeled by using a polynomial equation,then higher order terms are added until there is no significant improvement.Example 7.2. As was seen in Example 2, using a logistic regression modelwith Binomial errors and a linear predictor with only a first degree term inage x was not a good fit to the observed data. Hence higher degree polynomials were fitted starting with a quadratic equation. This model reducedthe deviance to 23.83 on 17 df, a change of 14.72 on 1 df, which gives asignificant improvement in fit. A further cubic term was added to the modelto see if this would improve the fit. The output from R for this cubic modelis given below. It is seen that this model reduced the deviance to 16.12 on16 df, a change from the quadratic model of 7.68 on 1 df, which again givesa significant improvement in fit. Further quartic and quintic terms were alsoadded but neither made any significant improvement to the fit as the changesin deviance were 0.03 and 0.10, both on 1 df, respectively. So the conclusionwas that a cubic model should be considered as an acceptable model for thedata. The resulting plot for the graduated estimate of the probability ofdeath is given in Figure 4, which gives a much better fit to the data at thelower ages between 30 and 35.The Binomial model used in Figure 4 has a linear predictor for the log-oddsof qx given bylog 1 -qxqx = β0 + β1x + β2×2 + β3×3 .90As x2 and x3 are large numbers it can be seen below that the coefficientsare small but, as is indicated by the standard errors and z values, thesecoefficients are significantly different from zero.> grada3.lg <- glm(M2 ~ x+x2+x3, family= “binomial”)> summary(grada3.lg)Call: glm(formula = M22 ~ x + x2 + x3, family = “binomial”)Deviance Residuals:Min 1Q Median 3Q Max-1.8539 -0.6358 -0.2040 0.4837 2.0221Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) 25.5464096 11.2332131 2.274 0.02295 *x -2.5522631 0.8572427 -2.977 0.00291 **x2 0.0633020 0.0215444 2.938 0.00330 **x3 -0.0004956 0.0001785 -2.776 0.00550 **—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)Null deviance: 584.089 on 19 degrees of freedomResidual deviance: 16.148 on 16 degrees of freedom AIC: 144.59Number of Fisher Scoring iterations: 4Hence using the estimates of β0 to β3, the graduated estimatesoqx are givenbylogoqx1-oqx ! = 25.5464 – 2.5522x + 0.0633×2 – 0.0005×3 ,oroqx=11 + exp(-25.5464 + 2.5522x – 0.0633×2 + 0.0005×3) .Example 7.3. In Examples 1 and 2 a logistic regression model, withBinomial errors and a linear predictor given by a polynomial equation in agex, was used to fit a model to the observed data. Above we introduced the9130 35 40 450.0005 0.0010 0.0015 0.0020 0.0025 0.0030x(age)qFigure 4: Plot of the graduated estimate of the probability of death againstage for the cubic equation Binomial model.Poisson model, but what would have been the result of using this model?Because the Binomial distribution with parameters n and q is approximatedwell by the Poisson distribution if n is large and q is small, then as this wasthe case in the data used in Examples 1 and 2 the result with using thePoisson model is almost exactly the same as using the Binomial model. Thisis generally the case in most actuarial applications. Hence the results for thePoisson model are not reproduced.(Note: The results for the Poisson model can be obtained from R by usingthe glm function with glm(M2[, 1] ∼ x + x2 + x3, family = poisson, offset =log(M2[, 1] + M2[, 2])).)927.4 Assessing a graduationIn Section 7.3 three desirable features of a graduation were proposed. Thefirst was smoothness. This can be achieved by using smooth functions in themodel for the linear predictor, as was done in Section 7.3. Smoothness onits own is not sufficient. The graduation in Figure 3 using the model withthe linear predictor as a straight line gives a smooth graduation, but it isnot a very good fit to the data. A graduation like this would be describedas overgraduated. The opposite of overgraduation is undergraduation. It ispossible to construct a curve which perfectly adheres to the data. This couldhave been done in Example 2 by fitting a polynomial model of the 19thdegree, which would have ensured that the graduated curve passed throughall the 20 ˆ qx data points. However this curve would have been far fromsmooth. The ‘art’ of graduation is to find a satisfactory compromise.If the model is unknown and it is necessary to check for smoothness then thiscan be done by calculating the differences of the graduated estimatesoqx upto the third difference, as third differences measure the change in curvature.The criterion for smoothness usually used is that the third differences ofoqx should be small in magnitude compared to the graduated estimates andprogress regularly.The second desirable feature of a graduation was goodness of fit (adherence)to the data. There are a number of ways of doing this, but the most important is through the inspection of the residuals. For the generalized linearmodels used in this chapter to produce graduated estimates the usual residuals calculated are the Pearson and Deviance residuals. These residuals areidentical in the case of the General Linear Model, with Normal errors, covered in module MA7403 Statistics. For a Generalized Linear Model theseresiduals can differ but for the data used in the Examples 1 to 3 there is verylittle difference. Hence we will use the Pearson residuals. For the Binomialmodel (and hence for Examples 1 to 3 also approximately for the Poissonmodel) the Pearson residual is given byrx =dx – ExoqxqEx qox (1- qox) ≈dx – ExoqxqEx qoxwhere dx is the number of observed deaths for age x,oqx is the graduatedestimate of the probability of death for age (x, x + 1] and Ex is the numberat risk of death for age x.These residuals can be used as a diagnostic with a variety of plots, includingNormal and half-Normal plots, and a number of tests, such as the standard-93Figure 5: Plot of residuals against age for the cubic-equation Binomial modelized deviations test, cumulative deviations test, serial correlations test, signstest, changes of sign test and grouping of signs test.The third desirable feature of a graduation was appropriateness for purpose.The suitability of a graduation for practical use depends on the nature ofthe work. For instance, in life insurance work, losses result from prematuredeaths (so that benefits are paid sooner than expected). So mortality shouldnot be underestimated. However, in pensions or annuity work, losses resultfrom delayed deaths (so that benefits are paid for longer than expected). Somortality should not be overestimated.Example 7.4. In Figure 5 the Pearson residuals are plotted against age forthe cubic equation Binomial errors model given in the output in Example 2.This plot would appear to represent a random scatter of points, with no pattern. Only one observation, for age 36, of the 20 points is outside the interval(-2,2). A Normal probability plot is given in Figure 6. A MINITAB plot hasbeen used as this was used MA7403 Statistics. (This plot is preferable to94the plot provided by R.) The plot in Figure 6 is consistent with the Normalassumption for the residuals, which is confirmed by the Ryan-Joiner statisticwhich has a p-value > 0.1. Hence it is reasonable to conclude that thereis good adherence to the data by this cubic model and as the graduation issmooth a satisfactory graduation has been performed. Assuming, of course,it is fit for purpose!Figure 6: Normal probability plot of residuals for the cubic equation Binomialmodel.95SummaryThis chapter has considered the problem of graduation by applying the Generalized Linear Model.This has been illustrated with the example of a mortality table from the 92series mortality data.Both the rate of mortality and the force of mortality could be predicted froma Generalized Linear Model, using the Binomial and Poisson error structures,respectively. Then the linear predictor could be chosen as a smooth functionenabling smooth graduated mortality functions to be produced.Because the Poisson distribution can be used as an approximation to theBinomial distribution, only the Binomial model was considered as the Poissonmodel gives almost identical results.The fit of Generalized Linear Models were compared by use of likelihoodratio (deviance) to obtain the ‘best’ model and this model was verified foradherence to the data through the inspection of residuals. This enabled thedesirable features of smoothness and goodness of fit for a graduation to beassessed.96Questions1. If a random variable T has a probability distribution with cumulativedistribution function F(t) = P(T ≤ t), what is the relation betweenF(t) and the survivor function S(t)? What is the interpretation of thehazard function λ(t)? Why is (S(t) – S(t + h))/S(t) ≈ λ(t)h for smallh?2. What are the three components of a Generalized Linear Model used toanalyze mortality data by modelling the rate of mortality using logisticlinear regression?3. Why would we plot the log odds of the observed mortality rates whenattempting to graduate crude mortality rates by age?4. What test statistic should be used to decide on whether an additionalterm in the linear predictor improves the goodness-of-fit of a logisticregression model? What distribution is used when discriminating between models in this way?5. What are the desirable features of a graduation that should be assessed?What is the principal way of assessing whether the graduation adheresto the data? How is this done?Answers are available on page 124.97

QUALITY: 100% ORIGINAL – **NO PLAGIARISM**.

****REMEMBER TO PRECISE PAGE NUMBER******Hit The Order Button To Order A******Custom Paper****