In silico validation of the Autoinflammatory Disease Damage Index

Introduction Autoinflammatory diseases can cause irreversible tissue damage due to systemic inflammation. Recently, the Autoinflammatory Disease Damage Index (ADDI) was developed. The ADDI is the first instrument to quantify damage in familial Mediterranean fever, cryopyrin-associated periodic syndromes, mevalonate kinase deficiency and tumour necrosis factor receptor-associated periodic syndrome. The aim of this study was to validate this tool for its intended use in a clinical/research setting. Methods The ADDI was scored on paper clinical cases by at least three physicians per case, independently of each other. Face and content validity were assessed by requesting comments on the ADDI. Reliability was tested by calculating the intraclass correlation coefficient (ICC) using an ‘observer-nested-within-subject’ design. Construct validity was determined by correlating the ADDI score to the Physician Global Assessment (PGA) of damage and disease activity. Redundancy of individual items was determined with Cronbach’s alpha. Results The ADDI was validated on a total of 110 paper clinical cases by 37 experts in autoinflammatory diseases. This yielded an ICC of 0.84 (95% CI 0.78 to 0.89). The ADDI score correlated strongly with PGA-damage (r=0.92, 95% CI 0.88 to 0.95) and was not strongly influenced by disease activity (r=0.395, 95% CI 0.21 to 0.55). After comments from disease experts, some item definitions were refined. The interitem correlation in all different categories was lower than 0.7, indicating that there was no redundancy between individual damage items. Conclusion The ADDI is a reliable and valid instrument to quantify damage in individual patients and can be used to compare disease outcomes in clinical studies.


INTRODUCTION
Autoinflammatory diseases (AID) are characterised by seemingly unprovoked, recurrent episodes of inflammation caused by activation of the innate immune system. The four most common monogenic AIDs are cryopyrin-associated periodic syndromes (CAPS), tumour necrosis factor receptor-associated periodic syndrome (TRAPS), mevalonate kinase deficiency (MKD) and familial Mediterranean fever (FMF). 12 Chronic inflammation in AIDs may cause irreversible damage in multiple organ systems, such as visual loss, deafness, joint restriction and amyloidosis. 3 Even though targeted therapy for these AIDs has become available, 4-6 permanent damage may still accumulate before diagnosis or start of therapy. Furthermore, the majority of studies on new biological therapies for AIDs are recently initiated, with limited follow-up, hence the potency of these drugs to prevent or stop the development of damage is not yet known. 378 The Autoinflammatory Disease Damage Index (ADDI) has been developed to enable assessment of the long-term burden of AIDs in a standardised manner, as a comprehensive tool measuring damage in patients with AIDs. 5 Although developed for the four main monogenic AIDs, the ADDI may potentially also be useful in other diseases with autoinflammatory features. 910 To properly validate a damage index such as the ADDI, several aspects are important: reliability, content validity, face validity, criterion validity and construct validity. 11 A reliable index means that for a given patient, different observers will give the same score; this can be assessed by calculating the interobserver variability (intraclass correlation coefficient, ICC). Content validity tests whether the content of the index truthfully reflects the subject the index applies to. Face validity is the subjective impression whether a test measures the intended phenomenon. Criterion validity tests whether an index is as good as the gold standard. Construct validity consists of convergent and discriminant validity: convergent validity determines whether an index correlates to a similar index (eg, whether the ADDI correlates to other indices of damage or impairments in daily living), whereas discriminant validity determines whether the index is different from a dissimilar index (eg, the ADDI should not correlate with indices of disease activity).
Continuously during development and validation of the ADDI, content validity, face validity and adherence to the OMERACT principles (truth, discrimination and feasibility) were assessed. [12][13][14] As a reference standard for disease damage in AIDs is lacking, criterion validity cannot be determined. Physician Global Assessment (PGA) of damage can be considered the best alternative for a gold standard, but it is not a validated measure. Therefore, we decided to use the PGA-damage to assess construct validity rather than criterion validity. Hence, in this study we aimed to investigate the reliability and construct validity, using paper clinical cases of patients with FMF, CAPS, TRAPS and MKD, designed to ensure that all the damage items were adequately covered.

Development of the validation plan
Together with an experienced methodologist (HvS), a validation plan was developed. Paper clinical cases were based on real patient data, but modified to protect patient privacy and to ensure that all damage items would be sufficiently represented and different degrees of damage could be tested. Using a pilot with a limited number of cases and expert participants, a preliminary ICC was determined and the final number of cases required for the validation was calculated. All expert physicians who participated in the development of the ADDI (top 40 enrollers in the Eurofever Registry and nine experts from the Americas) were invited to participate in the validation process. One expert involved in the development of paper cases (JF) did not take part in the scoring.

Development of the cases
The cases for validation of the ADDI were derived from anonymised clinical data of patients with confirmed FMF, CAPS, TRAPS and MKD included in the European-based online Eurofever Registry. 1516 All physicians involved in the Eurofever project (Executive Agency for Health and Consumers, Project No 2007332) were asked to complete follow-up data on patients they had entered in the registry. The registry collects detailed information on all potential organ involvement as well as general features of AIDs. To cover a wide case mix, expert physicians from the Americas were asked to submit their anonymous patient data using a preformed template. The patient information retrieved from the Eurofever Registry and American cases served as a resource for paper clinical case scenarios. Cases were modified to ensure that each ADDI item was represented at least four times. Precautions were made to provide a similar number of cases for each disease and to have cases with different grades of disease activity and damage. All paper cases were checked for comprehensiveness and realistic character by one expert (JF).

Case distribution
The case summaries were distributed via a web-based survey, in which experts completed the ADDI, estimated the degree of disease damage and disease activity using a 10-point PGA-damage and PGA-activity, respectively, and could provide comments. The distribution of cases followed the 'observer-nested-within-subject' design, meaning that a large group of experts all scored a subset of the cases. 11 Each group of four experts scored 10 cases each, a minimum of three doctors was needed per group to calculate the ICC. Additional experts were asked to complete the survey when necessary. An equal proportion of adult and paediatric physicians was ensured in each participant group. Furthermore, each group contained four doctors from different countries and centres.

Definition of damage
Damage is defined as persistent or irreversible change in structure or function, which is present for at least 6 months. Damage items should not be scored if they are attributed to ongoing disease activity. Damage may be the result of prior disease activity, complications of therapy or comorbid conditions that developed after the onset of AID signs and symptoms. If damage has been present for longer than 6 months, but later resolves, it should still be scored in order to capture the damage that was present in the individual for that time period. This definition can be found within in the ADDI in earlier versions of the damage index. 12

Statistical analysis
Statistical analyses were performed in IBM SPSS Statistics V.21. The total score of the ADDI is the sum of points given for all categories. The ICC was determined to assess the reliability of the damage index as a whole, as well as for the eight categories and all individual items. The ICC determined absolute agreement, for example, whether two different physicians give the exact same score for a given patient, and considered single measures, indicating reliability of a single observer. 11 The ICC was also assessed for the PGA-damage and the PGA-activity, in order to determine whether these measurements would be sufficiently reliable to test construct validity. An ICC of 0.8 or higher was considered indicative for excellent reliability. 1117 Cronbach's alpha was used to determine possible redundancy of different items (eg, whether two items would score the same damage). An interitem correlation of more than 0.7 was considered to indicate redundancy. 18 A Spearman rank test was used to assess discriminant and convergent validity, correlating the ADDI to PGA-activity and PGA-damage, respectively. A Spearman rank test with r=0.1-0.3 was considered weak, r=0.3-0.5 was considered moderate and r>0.5 was considered strong. 19

Discussion on the items and definitions
A small team (NMtH, ALJvD, JF) discussed all items with an ICC below 0.7. This discussion encompassed possible explanations for a low score (eg, unclear definition of an item or the lack of a growth chart hampering easy scoring of growth failure). Further, based on experts' comments and suggestions during the scoring, possibilities to improve the item and/or definition were discussed. The initial and refined items were proposed to all experts via a web-based survey and subsequently discussed in an open face-to-face meeting at the Paediatric Rheumatology Congress in Athens (PReS 2017). Consensus was considered achieved if more than 70% of experts agreed.

Pilot
A pilot study with 15 paper cases was completed by four experts. This yielded a preliminary ICC of 0.85 (95% CI 0.70 to 0.94), which implied that a minimum of 90 cases would be needed for the validation of the ADDI. We therefore decided to assign 110 cases to the experts.

Collection of cases
A total of 120 patients whose follow-up had been documented in the Eurofever Registry were identified, and an additional 20 cases were submitted by non-European experts. By selecting and combining case information, a total of 110 cases were compiled from these ter Haar et al. 140 cases. The final paper clinical cases included 29 patients with CAPS, 27 with TRAPS, 29 with FMF and 25 with MKD.

Validation
In total, 37 of 44 participants responded. In 10 groups at least three participants responded, which led to 100 cases that could be used for the analyses. Due to insufficient response in one group, these 10 cases could not be used. Each item received a non-zero score (indicating presence of that item) at least 18 times.

Intracluster correlation coefficient
The ICC of the ADDI was 0.84 (95% CI 0.78 to 0.89). This indicates good inter-rater reliability. The ICCs per disease, for different organ systems and the individual damage items are shown in table 1. The highest ICC was found for the item 'hearing loss' (0.86, 95% CI 0.81 to 0.90) exceeding the overall ICC, the lowest ICC was found for the item 'puberty delay' (0.29, 95% CI 0.16 to 0.43).

Construct validity
The ICCs of PGA-damage (0.75, 95% CI 0.67 to 0.81) and PGA-activity (0.62, 95% CI 0.52 to 0.71) were considered sufficiently reliable to determine construct validity. A strong relation was found between the score of the ADDI and PGA-damage (Spearman's r=0.92, 95% CI 0.88 to 0.95, p<0.001, see figure 1). This correlation coefficient indicates that an increase in the ADDI score is strongly associated with an increase in the total estimated damage. The relation between disease activity (PGA-activity) and the ADDI score was much weaker (Spearman's r=0.40, 95% CI 0.21 to 0.55, p<0.001, see figure 2), indicating that the ADDI is not primarily driven by disease activity.

Interitem correlation
In order to assess whether items had too much overlap, interitem correlation was determined using Cronbach's alpha. Of specific interest was the interitem correlation between cognitive impairment (mainly relating to adult patients or adolescents) and developmental delay (mainly relating to paediatric patients), as the experts worried that these might have too much overlap. The interitem correlation between cognitive impairment and developmental delay was 0.66, indicating that there was minimal redundancy. All interitem correlation matrixes can be found in online supplementary table 1a-e.

Comments from the experts
The ADDI was considered a simple and easily applicable tool. The most important feedback during the survey included comments and uncertainties about scoring, for example, due to limited information in the case description (eg, the lack of growth charts to completely assess growth failure), unclear definitions in the ADDI (eg, whether psychiatric comorbidities are part of the item central nervous system (CNS) involvement), or doubts about the severity of organ involvement (eg, severity of visual loss). A full overview of these comments can be found in online supplementary table 2.
ter Haar et al. Other important comments comprised item scoring (suggesting a higher/lower weighting), or suggestions to refine item definitions. These suggestions were presented to all participants using an online survey. The results of this survey were subsequently discussed in a face-to-face meeting. Following this meeting, the maximum total score of the category 'reproductive' was limited to 2, in order to reduce sex differences in scoring of this category. Furthermore, slight changes were made in the definitions for growth failure, CNS involvement, joint restriction, puberty delay and serosal scarring (online supplementary table 2). The revised ADDI can be found in table 2.
All items were considered truthful, discriminative and feasible; however, doubts were raised about the reliability and feasibility of the scoring of musculoskeletal pain as there is no objective test to assess this. Despite that, it was considered that this particular item was sufficiently valid and very important to patients; therefore it was kept as part of the ADDI.

DISCUSSION
This validation study demonstrates that the ADDI is a reliable tool to measure damage in the four main monogenic AIDs. Most items were considered clearly defined and easy to score. Further, the ADDI correlated well with the estimated damage and was not strongly influenced by disease activity, indicating good convergent and discriminant validity, respectively. No significant overlap was found between items, therefore all items were included in the final version of the ADDI. Some items were slightly refined, based on comments provided by the clinical experts. This is the first validation of a disease damage index for AIDs. An ICC of 0.84 is comparable to other damage indices for rheumatic diseases, such as the Juvenile Arthritis Damage Index (ICC 0.85-0.97), 20 Localized Scleroderma Skin Damage Index (ICC 0.99), 21 Cutaneous Lupus Erythematosus Disease Area and Severity Index (ICC 0.86), 22 Vasculitis Damage Index (ICC 0.94) 23 and Combined Damage Assessment (ICC 0.78). 23 A key strength of this validation study is the participation of adult and paediatric experts worldwide who all provided patient cases and scored the ADDI. This makes it plausible that the ADDI can be used in clinical settings involving paediatric as well as adult patients with FMF, CAPS, TRAPS or MKD. However, the fact that the AID experts scoring the cases were also involved in the development of the ADDI and the collection of patient information might have resulted in a relatively high ICC. Physicians with less knowledge of the tool or AIDs in general might encounter more difficulties interpreting the damage items and scoring the ADDI.
Another strength of this study is the development of cases, which were based on actual patient data while modifications were made to ensure a sufficient representation of all damage items. The total of 110 cases is a large number for validation, given the rarity of these diseases. However, the lack of validation in a real clinical setting is also a drawback of this study. The modification of cases could have resulted in less realistic scenarios. Additionally, scoring paper cases may be easier than applying ADDI in the clinical setting as all the information is summarised and presented in a uniform way. On the other hand, due to the nature of cases (paper clinical instead of real patients) participants may have interpreted data more ambiguously than they would in real life. Scoring anonymous cases, without knowing the patients or being able to ask additional questions, is probably harder than in daily practice. Indeed, comments of the participants reflected some of these difficulties they experienced when assessing the paper cases.
Some important issues could not yet be addressed due to the design of this validation study. The responsiveness to change, that is, whether accrued damage over time is also reflected in an increasing score of the ADDI in an individual patient, could not be determined. A long-term observational study would be needed to measure responsiveness to change and subsequently assess the minimal clinically important difference of the ADDI. Further, convergent validity of the ADDI should preferentially include correlations with scores on quality of life and functional ability, especially because the damage items in the ADDI had been selected for their impact on patients' lives. As the information about quality of life and functional ability was lacking in the Eurofever Registry, this part of the construct validity was impossible to assess. Moreover, ideally the discriminative validity of the ADDI should be assessed by its correlation to a validated activity index, such as the Auto-Inflammatory Disease Activity Index (AIDAI). 2425 As we could not derive the AIDAI values from the patient data, we used PGA-activity as a surrogate marker. However, the ICC of PGA-activity was low with a broad CI, meaning that this estimate for activity as provided by the experts was not a very reliable measure. This may be explained by the characteristics of these AIDs, for example, episodes of febrile attacks with symptom-free periods in-between. Altogether, a long-term prospective study assessing the ADDI, AIDAI and scores of quality of life and functional ability in patients over time is needed to address the above-mentioned issues.
Besides the strong correlation between the ADDI and PGA-damage, the ADDI also moderately correlated to the PGA of disease activity. For a perfect discriminant validity, there would be no correlation between the ADDI and an activity score. However, in this case some degree of correlation is acceptable and probably unavoidable, since patients with more disease activity over the years generally accrue more damage. Furthermore, some items such as hearing loss may (initially) reflect both activity and damage. This overlap is partly prevented by the criterion that an item should be present for at least 6 months to be scored as a damage item. Therefore, disease activity has limited influence on the ADDI score.
Although the overall ICC was >0.8, the ICC of some individual items was less than 0.6. This could be explained by limited information provided in some of the paper cases, less experience of adult rheumatologists with paediatric measures (eg, scoring of pubertal delay) or the more subjective nature of some items (eg, musculoskeletal pain). Indeed, objective items such as hearing loss, renal insufficiency and osteoporosis all had an individual ICC of >0.8. As the overall ICC was good and the nature of the cases may be an important reason for a lower ICC, items scoring less than 0.6 were deemed acceptable, although sometimes with small alterations in the definition. A study testing the ADDI in real-life patients and also by individuals not involved in its development would be needed to overcome the above-mentioned issues.
ter Haar et al. During the face-to-face meeting, it was suggested to omit musculoskeletal pain from the ADDI, as it seems to be more subjective than the other items. Musculoskeletal pain, and other less objectively scored items such as fatigue and headache, might better be captured by patient-reported outcome measurements (PROM) in addition to the ADDI. A combination with (items from) the juvenile autoinflammatory disease multidimensional assessment report (JAIMAR) is worth considering. 26 However, the JAIMAR is only validated on patients with FMF. Because musculoskeletal pain was emphasised by the patient representatives during the development phase of the ADDI as an important long-term disease burden in their daily activities, it was decided to keep this item in the ADDI, at least until a composite damage assessment including internationally validated PROMs is available.
As we found a relatively high ICC for the PGA-damage among the experts, one could argue that a detailed damage index is not necessary when the PGA is also reliable. However, we would still recommend the use of a damage index, since the physicians scoring the ADDI were considered experts in the area of AIDs, therefore their estimation of damage might be more accurate than that of physicians with less experience. Second, even though the estimates of PGA-damage might be reliable, an estimate of damage on a numerical scale does not give transparent information on why a certain amount of damage was estimated for a patient. The ADDI provides insight to the reasons why a certain level of damage is scored for a patient. Third, the ADDI provides a useful aide memoire and systematic means of collecting and quantifying damage, which is crucial to enable future comparisons between different studies.
Since damage prevention is one of the main purposes in the anti-inflammatory treatment of AIDs, its reliable assessment is an important measure in clinical practice as well as in therapeutic trials. As more information becomes available for the long-term outcomes of AIDs, the ADDI will have to reflect these in a data-driven manner. So far, it can be considered a reliable tool to assess disease damage for the four most commonly encountered monogenic AIDs.  Correlation of the mean ADDI score and the mean score of activity (PGA-activity) per case, assessed by at least three observers. Each dot represents a patient case. The line indicates the correlation. ADDI, Autoinflammatory Disease Damage Index; PGA, Physician Global Assessment.  The total ADDI score is the sum of the eight categories (maximum 27 points).
Ann Rheum Dis. Author manuscript; available in PMC 2021 September 02.