QALYs and Economic Evaluation


When the quality of life is affected by a medical intervention it must be included in an economic evaluation. The most common approach is to employ Cost Utility Analysis (CUA). This measures the cost of achieving an additional quality adjusted life year (QALY). All else equal interventions are preferred when the cost per QALY is low. (However other factors may be of importance such as fairness.) QALYs are calculated by multiplying life years by an index of utility measured on a 0-1 scale. MAU instruments are designed to measure utility and to facilitate the calculation of QALYs. The dimensions of an MAU instrument may also be used to profile the effect of an intervention (ie describe how dimensions of QoL vary because of the intervention).


MAU instruments measure 'utility' which is an index of the strength of a person's preference for a health state. This is usually measured on a scale on which zero (0.00) represents death and unity (1.00) is good health. A MAU instrument measures the utility of a health state. When the utility index number is multiplied by the number of years in this health state we obtain the number of QALYs.

Note that 'quality' actually means 'utility' which equates with the strength of a person's preference.

Also note that while every MAU instrument purports to measure the same quantity, ie ‘utility’, the numbers produced by different instruments actually vary. This makes the choice of instrument important. (See Choice of instrument.)


See above. MAU instruments are also useful in clinical trials where the focus of the study is well defined such as a program for improving vision. While there is a plethora of disease specific instruments the use of a broad based – multi attribute – instrument is often desirable as it has the potential to identify unexpected effects of a therapy. In particular a narrowly focused element instrument may fail to detect psycho social changes which some MAU instruments were designed to measure.

An advantage of a MAU instrument is that it weights the various responses by the relative importance (preference weight or utility) to the public of each attribute which allows a meaningful summation of scores.


No. Economic evaluation like all summative program evaluation measures population outcome: that is, the extent to which a program works. Similar to other outcome indicators, the utility scores are the average utilities obtained from a group of patients, trial participants and/or controls. Typically, there is a very large dispersion of individual results around the average score and, consequently, the average QALY value would be a highly unreliable predictor of the utility for any one individual person. It follows from this that QALYs are only appropriately used for assessing the overall impact of a program and not the benefit obtained by any given individual from that program. The purpose of cost utility is to rank the overall program and not to provide advice to clinicians or program managers about individuals.


Yes. This could occur if participants in a study completed the AQoL before and after an intervention. The difference between the obtained scores would provide a measure of the effect size of the program. Non-utility values are commonly used in the psychometrics and medical literature. They are easily calculated from an instrument by assigning numbers (1…5) to the response categories when these are consistently in ascending or descending order of importance. The response numbers for different questions are simply added and the total number rescaled so that the instrument score varies between 0-1 or 0-100.


QALYs are intended to measure benefits, not costs. Typically cost utility analysis ranks programs by comparing the dollar cost per QALY obtained in the different programs of interest. In principle, a dollar value could be attached to each QALY and dollar costs compared with dollar benefits (thereby converting 'cost utility analysis into cost benefits analysis). The AQoL does not seek to do this. Cost effectiveness and cost utility analysis were introduced specifically to avoid the need to place a dollar value upon life per se. Some economists do convert life years or lives into dollar benefits using either the 'human capital' or 'willingness to pay', generally for risk reduction. These techniques are problematic.


There are a significant number of problems associated with the use of QALYs. Some of these are conceptual and arise from the fact that the quality and length of life are not the only outcomes from a program. For example, issues of process and the distribution of outcomes are important. Further there are a set of issues concerning the accuracy of utility measurement. For these reasons, it is important that the role, strengths and weaknesses of QALYs be understood when evaluation results are interpreted. Despite these difficulties, the majority of health service researchers appear to accept that the measurement of QALYs represents an important development since they explicitly recognize the importance of health related quality of life and enable its measurement during program evaluation. The assumptions employed in measuring QALYs are (or should be) transparent and may be subject to sensitivity analysis. However the most compelling reason for using QALYs is that there is no alternative approach to combining the length and quality of life. Both of these are potential outcomes from health programs (i.e. programs may increase the length of life and the health-related quality of life). When a decision is finally made relating to the funding or not of a program an implicit or explicit importance weight will have been applied to the length of life and the quality of life. CUA makes these weights explicit.


MAU and other QALY instruments have nothing to do with costs. They help to measure the benefits of health programs by quantifying the quality (utility) of different health states in such a way that the quality and length of life may be combined as quality adjusted life years (QALYs).



Area 7.6 million square km; sheep 68.1 million; people 22.8 million = 98.5 million total


Some view the overall score of an MAU instrument this way, sight, pain and mental health cannot be added.

The criticism is invalid. Sight, pain and mental health are not added. Rather, it is the preference for these (or their value) which is added. Similarly the GDP does not add the number of transport services, holidays taken and commodities sold. Rather it adds the value of these.

No; ideally, economic evaluation incorporating utility measurement should be conducted alongside clinical trials. Economics is concerned with evaluating outcomes that are obtained from random control trials and other forms of experiments, quasi-experiments and pre-experimental research. It complements and does not replace other evaluative techniques. The result of an economic evaluation can be no better than the rigor of the clinical/epidemiological evaluation upon which it is based.


The theory underpinning the calculation of the Value of a Statistical Life (VSL) is logically invalid and the results are empirically inconsistent with an individual's ability to pay.

Utility may be measured in numerous ways. Each of these allows people to express the strength of their preference for a health state relative to death and good health. Common techniques are: (a) Rating Scale (b) Standard Gamble; (c) Time Trade-Off; and (d) Person Trade-Off.

A new technique the Relative Social Willingness to Pay (RS-WTP) is under development at the Monash CHE (see Richardson et al. (2008) Research Paper 22).

The AQoL instruments employ the Time-Trade-Off (TTO), in which a person indicates the proportion of a given number of remaining years of life (usually defined as 10 years by the interviewer) that they would be prepared to give up in order to avoid living in the health state being measured. For example, if a person with a life expectancy of ten years on a dialysis machine was prepared to give up two of these years that is 20% to be in good health, then their utility score would be 0.8 (i.e., 1.0-0.2, where 1 represents good health). An adjustment may be made to this calculation to allow for a persons rate of time preference.


Firstly, the values may be the basis for cost utility analysis. With this, the utilities obtained from the AQoL will be multiplied by the number of years spent in the health state. This gives the number of quality adjusted life years (QALYs). The additional QALYs arising from a health intervention the health program are compared with the program costs to give the cost per QALY for the program. This may be compared with the cost per QALY of other programs. Unless there are other relevant factors such as social equity we would normally prefer programs where the cost per QALY is lowest as we may thereby obtain the largest number of QALYs the best outcome from a given budget. Importantly, cost utility analysis of this type cannot answer the question 'Should we undertake Program X? as it only enables programs to be ranked by their cost per QALY. Of course a judgement may be made that the cost of a QALY is clearly too high or so low that a decision about the program is self-evident.

Secondly, the utility values may be compared before and after a program. Where this is done for several programs, the different utilities can be compared. If one program is superior with respect to all possible criteria cost, length of life, quality of life, and any other relevant factor then it dominants the 'alternatives' and it should be preferred. Thirdly, the AQoL can be used in program evaluation to produce a profile of health-related quality of life (HRQoL) as defined by the five different dimensions of health contained within it. Where measurement is made repeatedly, changes in health profiles can be tracked over time. This can be done for each dimension separately, or using the overall AQoL utility values.


There are two ways of developing health state scenarios. With the composite approach, health state scenarios or vignettes or complex health state descriptions have been constructed and numerical values placed upon them using direct scaling. Under the multi-attribute utility (MAU) approach, health states are decomposed into a generic descriptive system and a set of scale values corresponding with each possible health state in the descriptive system developed. As with the scenario-based approach, the values are obtained using one of the standard scaling techniques, viz. rating scales, time trade-off, standard gamble, person trade-off or, most recently at CHE, the relative social willingness to pay. Both the approaches have strengths and weaknesses. The composite approach may include more context specific information and may describe a changing health scenario. It may include the risk and prognosis facing the patient. However, the validity of such scenarios is seldom (if ever) tested in the way in which the AQoL is being tested. In addition, because very few health states can be validated in this manner due to the time and cost of doing so, vignettes are limited in the range of health states covered and are insensitive. The use of vignettes in a series of studies increases the likelihood of comparability in the measurement of HRQoL. Generic instruments, on the other hand, are relatively cheap to administer and can be used across many health coalitions and for these reasons they can be completed by the same group of patients periodically during a longitudinal study in order to create a time profile of the HRQoL rather than relying upon a single point in time estimate.

HRQoL Indices As noted there is a large set of possible QALY-like indices. Each of these is defined (inter alia) by the choice of:

(i) the scaling instrument (Time Trade-off, Standard Gamble, Person Trade-off, Rating Scale, etc);

(ii) the time frame evaluated (single year; duration of health state);

(iii) choice of the group which rates judges the health state (general public, patient, potential patient); (iv) (iv) the perspective social or individual which is adopted (imagine you are the patient versus imagine you are on a health committee judging social importance); and

(v) the inclusion or exclusion of additional value weights (for example, for age, initial severity, social group).

QALYs and Healthy Year Equivalents (HYEs) have generally been calculated by asking a cross section of the population, imagining they are patients and to adopt an individual perspective. QALYs (narrowly defined) refer to single year average utilities measured using the Standard Gamble or Time Trade-Off techniques. HYEs are calculated from the full duration of the health state using the standard gamble. Disability Adjusted Life Years (DALYs) have used expert opinion and asked the panel of experts to adopt a societal perspective. Scale values are based upon the Person Trade-Off . The choice between these options is not a technical issue: it is a matter of social values. To date, these values have not been well researched and the metric adopted in different contexts has been a result of the preference or judgement of the proponent rather than the result of either an empirical survey of population values or an ethical analysis of the full range of options. (Proponents of the HYE argue that it is the measure suggested by (orthodox) economic theory. The claim is controversial not least because economic theory cannot determine or prescribe social values.)


These are too numerous to summarise here. Despite the answer to the previous question the team leader, in particular, has been a critic of both QALYs and economic evaluation. Critiques available on the Monash Centre for Health Economics website (usually drafts of later publications) include the following research papers: 34 (2009); 18 (2007); 8 (2005); 7 (2005); 140 (2003); 134 (2002); 129 (2002); 120 (2001); 112 (2000); 111 (2000); 105 (2000); 108 (1999); 77 (1997); 50 (1995); 45 (1995); 23 (1992); 5 (1990); 1 (1990).


The two approaches are similar in principle but different in practice. Both commence with a description of a health state and, secondly, place a numerical value upon the health state. The holistic approach treats each health state as being unique. Typically, people who have experienced the health state will be interviewed and elements of particular importance for their quality of life will be summarised in a vignette or written scenario. Anything relevant to the QoL or which helps describe it may be included in a vignette for a CUA of breast cancer treatment. This vignette is subsequently presented to other individuals for assessment using one of the utility scaling instruments (TTO, etc) and a utility score is placed upon the entire health state. In contrast, and as described earlier, the MAU methodology employs a generic multi attribute descriptive system, ie questionnaire. Utility scores are assigned to a health state (ie a combination of attribute levels) using a formula which has been constructed from the utility scores of a (generally) cross section of the population obtained during the construction of the MAU instrument.

DALYs, HYES and QALYs (narrowly defined) are three of a much larger group of possible metrics which combine life years and the health related quality of life. In principle, any of the metrics in the set could be candidates for use in health services research. Partly for historical reasons most have not been considered and only the DALY, HYE and QALY have received individualised names. This does not, however, imply that they are fundamentally different in kind or purpose from many of the unnamed alternative metrics.

QALYs: Quality Adjusted Life Years are calculated as life years times an index of utility (strength of preference) where the index varies from 1.0 (full health to 0.0 (death). The index is measured as the average utility of a twelve month period in a particular health state and this has usually been measured using the Standard Gamble or Time Trade Off technique.

HYEs: Healthy Year Equivalents are calculated using only the standard gamble technique which its proponents claim to be the theoretically correct scaling instrument (a view which is disputed). An index of utility is calculated for the entire (multi-year) period in a health state using the standard gamble. This is subsequently converted into healthy year equivalents using a second stage standard gamble in which the probability is fixed (and equal to the value found in Stage 1) and the number of years of full health are varied.

DALYs: Disability Adjusted Life Years are calculated from years with disability or poor health, as with QALYs, by multiplying the unadjusted life years by an index of the health related quality of life where this index refers to a single year in a health state. In this case, the index is calculated from a scale which is calibrated at selected points using the Person Trade-Off technique. As this adopts an impersonal or societal perspective some argue it does not truly measure (individual) utility. In the WHO (Murray-Lopez version) life years are also multiplied by an importance weight for people's age. The Australian DALY studies have not used these. DALYs have normally been calculated as a loss of utility which is numerically identical to 1.0 minus the utility of a health state. (This has no substantive significance.) In BoD studies, DALYs calculated as described above are added to the years of life lost because of a disease to give the total DALY loss.


In addition to the utility instruments there are a very large number of disease-specific and a smaller number of generic non-utility instruments (the SF36, the Nottingham Health Profile and the Sickness Impact Profile being examples of the latter). Most of these purport to measure health status. We recommend that researchers should incorporate all three levels of measurement. Each level provides different information about the effectiveness of an intervention and the different levels complement each other. Thus evaluation studies should include a disease specific instrument, one of the generic health-status instruments and a utility instrument. The defining difference between the generic disease specific/health status instruments and the utility instruments is that the latter apply utility weights to different dimensions of health; these utilities (or disutilities) are then used in either a summative or multiplicative model to obtain a single index of HRQoL (or, more accurately, an index of the strength of a person's preference for this health state compared with full health and death). Without the utility weights the descriptive system of the AQoL could be (and has been (Lewis et al 1997)) used as a generic multi-attribute (psychometric) instrument where an overall score is obtained by summing the unweighted patient responses. For use as a generic utility instrument the descriptive system must have certain important characteristics; viz, response categories for each item must be hierarchical (as in a Guttman scale); broad health dimensions must be orthogonal (there must be no double counting of health attributes); and there should be preference independence between dimensions (the preference score for one item or dimension must not depend upon the level of health defined by another dimension in the instrument; for example, preference dependence would occur if the disutility of pain increased when a person's social relationships were poor).


The AQoL and other QoL instruments


In principle each MAU instrument purports to measure the ‘utility’ of a health state; that is, each purports to measure the strength of a person’s preference for that health state. Consequently, the numbers produced by instruments should be the same. In practice they differ very significantly. Drawing upon results from 7720 respondents the ‘Multi Instrument Comparison’ (MIC) project has demonstrated that different instruments are sensitive to different dimensions or facets of a health state. The EQ-5D primarily measures physical function and pain. The AQoL-8D largely measures psycho-social facets to which the EQ-5D is relatively insensitive. The MIC research papers provide pairwise comparisons of all MAU instruments and quantifies their responsiveness to different dimensions of the QoL (see Richardson, Iezzi, Khan, Maxwell 2012 A cross-national comparison of 12 quality of life instruments, MIC Paper 2: Australia Research Paper 78, CHE Monash University. Results for UK, USA, Canada and Norway are in subsequent reports).


The short answer is ‘no, not if the measurement of QoL – 50 percent of the QALY equation – is of importance.

The longest AQoL instrument – AQoL-8D – takes an average of 5.5 minutes to complete in its online version. (Of course some people will take longer.) A common comment is that clinicians are reluctant to include MAU questions in their already large battery of questionnaires. There is, however, some responsibility upon consultant economists to maintain the quality of the advice and service provided. If an instrument is insensitive to a health intervention – as a number of MAU instruments are to psycho-social interventions – then the ‘price’ of compromising with respect to the instrument may be an invalid evaluation, a high cost to QALY ratio and the failure of the intervention to be funded. There is, in fact, very limited evidence on patient resistance to relatively short questionnaires as demonstrated by the Multi Instrument Comparison (MIC) project where 7720 respondents completed 226 questions and an online Self TTO.


In the UK the National Institute for Health and Clinical Excellent (NICE) has mandated the use of a single instrument, the EQ-5D. The argument has been that the use of a single instrument achieves comparability of measurement. The logic of this argument is unambiguously wrong. Analogously we would not achieve comparability in the measurement of medical need through the use of a single and insensitive indicator such as blood pressure. To the contrary, the use of a single insensitive instrument ensures discrimination. EQ-5D primarily measures pain and physical function. Its use for psychological interventions discriminates against these interventions.


Yes. This is one of the strengths of a simple generic instrument. It may be applied weekly, monthly or at any appropriate time interval.


The minimum clinical difference is the clinical or quantitative change in a measure that would typically cause a clinician to change his or her treatment. For a researcher seeking to change practice a sample size is calculated to enable this difference to be detected with a given statistical power (usually 80 percent) at a conventional level of statistical significance (usually 5 percent).

No exact analogy exists in cost utility analysis as clinicians use clinical, not QoL, indices. For policy makers concerned with cost per QALY the relevant data relates to the best estimate where confidence (for each component of the cost per QALY) increases with the sample size.

Nevertheless there may be a context where a researcher wishes to ensure that a change will improve QoL sufficiently that it will be detected by patients. Drummond (1991) suggests a figure of 0.03 for this purpose. Subsequent research has reported that patients detect a change in their health status when the SF-6D changes by 0.04 or the EQ-5D by 0.075 (Walters and Brazier 2005).

However it is important to reiterate that these higher figures should not be confused with minimum changes which are meaningful for QALY calculations. A change of 0.075 has the same impact as a 7.5 percent change in the length or quantity of life.



Drummond M. (1991). ‘Introducing economic and quality of life measures into clinical studies’, Annals of Medicine, Special Edition 33:5, p344-349.

Walters S, Brazier J. (2005). ‘Comparison of the minimally important difference for two health states: EQ-5D and SF-6D’, Quality of Life Research, 14:1523-32.


No. The AQoL assists in the measurement of benefits which are then compared with costs in order to make a decision.


QALYs are designed to measure the average utility of a group of patients or program participants.


See Richardson, McKie, Bariola, (2011) Review and Critique of related multi attribute utility instruments, Research Paper 64, CHE, Monash University (forthcoming in AJ Culyer (ed) Online Encyclopedia of Health Economics, Elsevier Science, San Diego). This paper describes the construction, similarities and dissimilarities between the major instruments.

Also see Richardson, Iezzi, Khan, Maxwell, (2012) A cross-national comparison of 12 quality of life instruments, MIC Paper 2: Australia, Research Paper 78, CHE Monash University. This paper presents results from a comparison of the major instruments using data from 7720 respondents in five countries. A pairwise comparison of instruments is undertaken which quantifies the advantage of each instrument with respect to different dimensions of the quality of life

In principle the AQoL is similar to other MAU instruments; they all purport to measure the strength of preference for different health states on a 0-1 scale. In practice each of the existing MAU instruments differs in important respects.

Some conceptualise health in terms of disease characteristics: impairment and disability (HUI-I, II and III; 15D; DALY). Others have a heavier emphasis upon handicap: illness induced, or lack of capacity to carry out normal social activities (the AQoL, WHOQoL; SF36 and EuroQoL).

Even when broadly conceptualised in the same way, the descriptive systems of different generic instruments vary considerably with respect to the detail with which they describe different health dimensions. Instruments also differ with respect to the scaling (utility scoring) system adopted.

Some are based upon the use of rating scales (15D and QWB); others have used the time-trade-off (the AQoL, EuroQoL and HUI instruments); one has used the person trade-off (the DALY); and one has used magnitude estimation (Rosser-Kind).


Rationale for the AQoL


See Do different instruments give the same answer?

In addition to the AQoL, there are eight other generic utility instruments in existence or which are/may be developed. These are the:

  • Health Utility Index (HUI) Mark I, II, III (developed in Canada);
  • Rosser-Kind index (UK);
  • Quality of Wellbeing (QWB) instrument (USA);
  • 15D (Finland);
  • EuroQoL or EQ-5D (European)/EQ5D;
  • SF36 utility adaptation by Brazier, referred to as the SF6D (American/British);
  • World Health Organization/World Bank DALY; and the
  • World Health Organizations WHOQoL, which may eventually receive utility weights.

Several of these are now being calibrated using Australian preference scores. Some of the instruments are seriously compromised by the simplicity of their descriptive systems. The available evidence shows that the differences between instruments is primarily attributable to the questions asked, ie the descriptive system and that variation in preferences between countries is relatively unimportant (despite the undemonstrated belief that Australians, Americans English, etc have major differences in their preference for pain, happiness, physical dexterity, etc). Researchers must be cautious in their choice of instrument and ensure that the questions asked are sensitive to the health states of importance to them. In sum, the construction of the AQoL was motivated by deficiencies in existing instruments. (See Why AQoL?)


A large number of disease-specific and a smaller number of generic QoL instruments exist (see, for example, Bowling 2001 for a review of over 200 scales in the areas of cancer, mental health, respiratory and neurological conditions, rheumatic, cardiovascular and other diseases). These instruments do not weight the different dimensions of HRQoL by utility or the strength of people's preferences: different dimensions of HRQoL are simply added up to obtain an overall score. This implies there is a need for instruments which weight the different dimensions such that they can legitimately combined. Utility instruments achieve this weighting property through the elicitation of preferences for different health states, thus overcoming this criticism.

There are a number of situations in which it is necessary to know whether or not the overall quality of life has improved or deteriorated as the result of a health intervention; this is required for both summative program evaluation and for economic evaluation. With limited budgets we are commonly forced to select between programs. Consequently we must, implicitly or explicitly, compare the total benefits derived from competing programs. This requires an overall assessment of the program benefits; i.e. the derivation of a single index of HRQoL. Although, in principle, detailed utility studies could be carried out using the composite or vignette approach, in practice research budgets are limited and MAU instruments offer a low cost method for obtaining this information. Perhaps most importantly, MAU instruments have now gained world-wide acceptance and are being widely used. It is likely their use will continue to expand. It is therefore important to have instruments which minimise bias and maximise the likelihood of obtaining valid utility scores.


  • The AQoL is the only utility instrument which employed correct psychometric techniques for instrument development for the construction of its descriptive system.
  • The HUI instruments and the AQoL are the only utility instruments using a flexible multiplicative model for combining HRQoL dimensions. After subsequent second stage adaptation of the QoL scores AQoL- 6D, 7D, 8D have unique scoring algorithms.
  • The AQoL is the only instrument which independently models all the sub-dimensions of health and then combines these sub-models.
  • The AQoL project has undertaken a more exhaustive analysis of the exchange rate between HRQoL and life years than any other reported in the literature. At the time of writing (2009), this enquiry is ongoing.
  • Similarly, the AQoL- 8D is undergoing a large scale validation study by comparing its predicted values with the values obtained from other generic instruments and from direct TTO self-assessment. Again, no similar study has been reported in the literature for any other instrument. A preliminary report on this study has been issued, and can be found at the CHE website (Hawthorne et al 2000).


Validation and psychometric properties


Validation is an ongoing process of testing an instrument in different ways and in different contexts to determine whether or not the instrument measures what it purports to measure.

Unfortunately, the term validation is commonly misunderstood; many people think it describes whether an instrument is or is not valid. Instruments may be valid in some circumstances but not in others. There are three kinds of evidence suggesting validity.

Content validity: Is defined as how well an instruments items may be considered to be a representative sample of the universe which the researcher is trying to measure. In HRQoL measurement this might be the extent to which the items in an instrument cover the full domain of HRQoL; that is, whether or not the instrument includes items which enquire about each of the dimensions of health that are included in the underlying concept of HRQoL. Where content validity is determined by looking at the instrument this is described as face validity; although popular, apparent face validity does not confer content validity.

Construct validity: Construct validity indicates whether or not instrument items truly represent the underlying construct that is of interest, i.e. scales scores can be used to infer certain concepts. This implies the researcher has to define an underlying construct of whatever it is that is being measured. If no adequate construct is defined, the content of the instrument defines the construct that is being measured. If the construct is correctly represented by the instrument it is possible to draw a succession of inferences from the instrument concerning instrument scores in different contexts. Construct validity therefore involves an ongoing process of testing the instrument in different contexts.

Criterion validity: This describes the relationship between an instruments scores and either other independent measures (the criteria) or other specific measures (predictors), where the criteria or predictors are the gold standard for the measurement (e.g. in the case of breast cancer, the gold standard is the histopathological confirmation of cancer). In the absence of a gold standard, confidence in criterion validity increases if the instrument has high correlation with each of the accepted extant instruments.


Both the descriptive system and the utility values associated with health states require validation. More specifically the descriptive system should be shown to have content, construct and criterion validity. Utility scores should correctly reflect the strength of people's preference for health states. As well, the model employed to combine the utility scores from different dimensions should also be valid.


Yes. Using the time trade-off (TTO) instrument a person might think the health state being evaluated represents very poor health and readily trade off years of life. Rather than live for their full 10 remaining years in the poor health state, they would readily live for 1 year in full health and trade off 9 years.

It is appropriate at such a low number of years to ask whether the person would prefer death rather than live any time in the health state. When the answer is affirmative, a Worse-than-Death TTO question is asked: If there still remained 10 years of life, what is the maximum time the person could live in the poor health state if they knew there would be a cure which would restore them to full health for the remainder of the 10 years.

Placing a numerical value on these states is difficult. If a person refused even one day in the health state followed by 10 years of full health the implied numerical value of the health state is almost minus infinity. This problem is discussed at length in Richardson and Hawthorne (2001) and various options are discussed and their numerical implications demonstrated. The final algorithm used for the calculation of utilities transforms negative scores in such a way that the lower boundary is U = -0.25; that is there is a disutility of 1.25.

Working Paper 113 - Richardson, J. & Hawthorne G. (2001). Negative utility scores and evaluating AQoL all worst health state. Melbourne, Monash University.

Interview methodology is presented in detail in Iezzi and Richardson (2009). Measuring Quality of Life at the Centre for Health Economics. Melbourne, Monash University.


This involves two separate issues.

  • the validity of the TTO utility measurement technique; and
  • the validity of AQoL scores after dimension utilities have been combined using a multiplicative model.

Validating scaling techniques is problematical as it is not possible to observe actual trade-offs between the quality and length of life which correspond with the trade-offs measured by the various scaling instruments. Consequently, validity has been determined primarily by face validity. Some have argued that the standard gamble should be regarded as the gold standard for utility measurement as its use assumes the axioms of von Neumann & Morgenstern; this appears to make the standard gamble results consistent with mainstream economic theory. However, as the axioms have been shown to be empirically incorrect we have not adopted this procedure (Pope CHE Working Paper). Rather, for the reasons outlined by Richardson (1994) and Dolan et al (1996), we have accepted the time trade off as having the greatest prima facie validity.


No. The AQoL is the only instrument whose descriptive system was constructed using correct psychometric principles for instrument construction. Most instruments in the literature claim validation on the basis of a correlation between results and results from another instrument which has been validated (often in the same way!). This type of result is necessary but far from sufficient for confidence in an instrument. A valid instrument will produce valid utility scores. The criterion for achieving this is, in fact, exceedingly stringent. The percentage increase in the numerical score of the utility index must indicate an increase in the quality of life which is valued equally to an identical percentage increase in the length of life. No instrument has been shown to have this property.

For a discussion of this so called strong interval property see Richardson Working Paper 5 (1990) Cost utility analysis: what should be measured. Importantly, an instrument may be valid in one context (disease area, intervention) but not in another.


This table provides intraclass correlation coefficients for the instrument and its dimensions for reapplication of the AQoL8D instrument after two weeks and after one month.


Table 3 Test-Retest reliability: intra class correlation coefficients (ICC)



Test-retest reliability coefficients are sourced from J Richardson, A Iezzi (2011). Psychometric validity and the AQoL 8D Multi attribute utility instrument. Research Paper 71, Table 3, p13.


The AQoL- 4D measures 1.07 billion health states which is a very small subset of the number of health states defined by AQoL- 8D (many of which are a little improbable, eg being blind, deaf, bedridden, full of energy and in control of your life). These health states cannot all be measured individually and, like all other MAU instruments (except for the Rosser-Kind index) AQoL models utility scores from a limited number of observations. To date, most instruments have adopted an additive model in which the disutility associated with each response from each item is independently measured, and the overall disutility estimated or modelled as a weighted average of these disutilities, where the weights are also obtained empirically during the scaling survey. This additive model is probably invalid and the multiplicative model employed by the HUI and AQoL instruments is superior (Richardson & Hawthorne 1998). However there is no certainty that even this more flexible model does not introduce significant estimation bias.


QoL and health promotion


Yes. For example, a road safety awareness program may prevent injury and death. Since the disutility associated with most injuries can be measured by the AQoL- 4D, any reduction in these injuries can be quantified. Similarly it is likely that the AQoL- 4D can measure the disutility associated with illnesses which would occur in the absence of an immunisation program. However, the AQoL- 4D is relatively insensitive to changes in health in the vicinity of full health. AQoL- 6D was constructed to overcome this limitation.


Yes. This is generally true for any two QoL instruments. In principle utility scores should be similar. For this reason the project team is constructing transformations between instruments.


No. Each utility instrument has different items, different dimensions of HRQoL, and different scaling properties. Given these differences there is no reason different instruments will necessarily provide comparable utilities. Empirical evidence on this point is presented in Hawthorne et al (2000) and Richardson (2012) where it is shown that very different estimates are obtained on the different utility instruments and that this is a function of the coverage of the instruments (see Transformations between instruments).


This depends upon the outcome of the program and the sub-population which receives the benefits of the program. Researcher discretion is needed to assess whether or not the questions in an instrument are likely to detect the changes they anticipate will occur as a result of their program.


The AQoL- 6D is able to measure some but not all relevant outcomes. Many more may be measured by AQoL- 8D. See the answer to the previous question.


Ethical issues


There are different dimensions of health and combining them is said to be like combining apples and oranges. The index number produced by utility instruments is an index of the strength of preference for different health states. Combining dimensions is therefore analogous to combining the preference for apples with the preference for oranges; combining this is a valid procedure.


Yes. The equations are nothing more than a sophisticated form of averaging individual utilities. The simplest form of averaging would be to add up the number of 'yes' answers in a simple yes/no questionnaire and then using a 'mathematical' equation which divides the total number of yeses by the number of people surveyed. With a more sophisticated approach answers would be multiplied by an 'importance weight' and the 'mathematical equation' would be equivalent to the formula for a weighted average. This is the approach adopted by simple additive MAU models. Thirdly, the model may combine importance scores multiplicatively, ie scores may be multiplied together after adjusting for their importance and then the results scaled to a 0-1 range. This is what the AQoL and HUI instruments do. The AQoL uses a two stage procedure in which the multiplicative model is used to combine items within each of the four dimensions used and then an overarching equation is used to obtain the final score. It remains true, however, that the multiplicative model does impose a particular structure upon utility values. To increase flexibility AQoL 6D, 7D and 8D add a 3rd stage which adjust scores to better fit independently measured holistic health states. Despite the apparent complexity, these methods all seek to average peoples own preferences.


No. Where individual choice is possible it is clearly better to allow individuals to select their own programs and to state their own preferences. Cost utility analysis is primarily useful for evaluating programs based on the common or average experience. For example, a new technology must either be installed or not installed in a hospital; a new procedure must be either included or not included in the Medicare schedule or in the benefits provided by a private health fund. In these cases it is necessary to make a collective choice and there is no mechanism for making such choices, other than consensus, which can reflect every person's preference. The ethical strength of the QALY procedure is that the final decision will be based upon public preferences and the strength of these preferences and not upon a process which dis-enfranchises those who are affected by the decision.


See previous answer.


No. This argument was effectively used to discredit the prioritisation process adopted in the State of Oregon in the USA. The argument is, however, almost certainly wrong. Benefits in the CUA procedure are not derived from the utility of a person's long term health state before the procedure (as implied by the above argument) but from the improvement that results from an intervention. The improvement will generally be the same for the long term disabled and the remainder of the population. Further, when the total range of possible benefits is considered, the potential for health improvement by the disabled is greater and the potential QALY gain is larger for disabled persons since it may be possible to cure some disabilities.

In theory, there is a situation in which discrimination may occur, known as the 'double jeopardy' argument. This maintains that saving the life of a long term disabled citizen, for example a quadriplegic, will result in a smaller QALY gain than life saving for a person who is not disabled because the final QALY score for the disabled person will be lower. This theoretical argument has never been investigated nor has it ever been supported by those constructing QALY instruments. There was an intellectual challenge to reconcile the double jeopardy scenario with the prima facie (but not necessary) need for mathematical consistency in the manipulation of QALY scores. This has been met in the literature. See in particular Nord Cost Value Analysis. The academic exercise never implied practical ramifications as long as the community rejects discrimination against the disabled. CUA must reflect and not impose community values.


Yes. QALYs measure only two dimensions of outcome; viz. quality and length of life. However, these two dimensions are generally considered to be very important. This does not deny the importance of other process, context and equity issues. At present these factors must remain as 'intangibles' and their importance included subjectively in an overall assessment of a health program.


There has been vigorous debate over this issue in the literature. AQoL-4D incorporates the utility values of a representative cross-section of the Australian population. In this respect the AQoL has adopted the common practice of using community values. AQoL-7D and 8D include patients with visual and psychiatric illnesses.


The issues raised are not specific to the AQoL. They concern the use of all generic MAU instruments and, more generally, the use of any HRQoL generic or disease specific instrument and the use of any utility scoring.