|
|
||||||||
| ABSTRACT |
|
|
|---|
| INTRODUCTION |
|
|
|---|
Worldwide, there is currently no established consensus on the approach to lymphedema staging, and there are still many different systems used in both filarial and oncologic lymphedema.10,11 Current research in filarial lymphedema uses variously a three-stage,4,9,12,13 four-stage,8,14 or the Dreyer seven-stage7 system. This leads to difficulty in cross-referencing. There is a need for an accepted standard staging system if research and public health outcomes are to be comparable. An important step toward acceptance of a staging system is to show its reliability. This paper explores the inter-rater reliability of the Dreyer staging system and a related questionaire.
The Dreyer staging system was developed in the 1990s over a period of years (see Table 1
). It was designed to have implications for treatment and for prognosis.15,16 In contrast to the three- and four-stage systems (which merely classify lymphedema as mild, moderate, severe, or elephantiasis), the Dreyer system emphasises the use of specific, characteristic, clinical manifestations rather than measures of leg volume or a subjective assessment of severity. These criteria classify lymphedema according to key clinical features such as reversibility of swelling, presence of skin folds, and presence of dermatologic changes. It was intended to avoid ambiguous concepts such as "pitting" and "fibrosis" that even physicians disagree on. There is much anecdotal evidence and personal communicationfor example, from teams in Brazil, Dominican Republic, and much of Asiathat suggest it to be a useful and reliable way to stage lymphedema; however, it has not yet been adopted as a standard worldwide. Some health care specialists currently still recommend use of a three-stage system. This may reflect some concern that Dreyer staging is too complicated and difficult for nonspecialized health workers to use. The latest World Health Organization training book on lymphedema management does now present both systems, having previously advised use of the three-stage system.17 Knowledge of the inter-observer variability between health workers is vital to confirm the reliability of this staging system.
|
| MATERIALS AND METHODS |
|
|
|---|
|
Health workers had been previously trained over the preceding months. This included attendance at lymphatic filariasis workshops organized by the Ministry of Health and Pan American Health Organization and clinical training within the lymphedema clinic. The two health workers were both familiar with the Dreyer staging system and had been using it to assess patients in the clinic; however, there had been no previous assessment of inter-observer agreement. The clinical nature of entry lesions was covered in a skin-care training workshop. This was further reinforced through supervised experience with patients in the clinic.
Inter-rater reliability was measured using random marginal agreement coefficients (RMACs).18 RMACs measure the agreement between the raters while adjusting for chance agreement and can be written in terms of the costs of disagreement as
![]() |
For nominal responses (i.e., those without inherent ordering) the true cost of disagreement is estimated by the proportion of ratings that disagree, and the chance cost is estimated by the proportion of times two randomly sampled ratings disagree, where the sampling is with replacement from the pool of all ratings by either rater. The RMACs are equal to one if there is perfect agreement, zero if the agreement is similar to agreement by chance, and less than zero if the agreement is lower than agreement by chance. For binary responses, the sample RMAC is equivalent to the intraclass kappa,19 which is the proper kappa to use because it avoids the anomalous behavior of Cohens kappa that can occur when each rater determines a different percentage of subjects as having the measured attribute (i.e., when the marginal distributions differ). Further, the sample RMAC is equivalent to Cohens kappa when the marginal distributions are equal and requires no more assumptions than Cohens kappa.18
For ordinal responses, agreement is measured with the sample weighted kappa RMACs using a squared difference cost function, which is nearly identical to the intraclass correlation coefficient when the marginal distributions are equal18 and avoids the anomalous behavior of Cohens weighted kappa when the marginal distributions are not equal. These sample-weighted kappa RMACs account for the ordinal nature of the responses. For example, if two observers rate the same leg as Dreyer stages 2 and 3, this is counted as better agreement than if two observers rate the leg as Dreyer stages 2 and 6. The confidence intervals (CIs) were calculated using asymptotic variance estimates given in Fay.18
Because the sample RMACs are equivalent to Cohens kappa statistics when marginal distributions of the raters are equal, they can be interpreted in a similar way as kappa statistics. Following Landis and Kochs interpretation of kappa values,20 sample RMACs of greater than 0.81 represent almost perfect agreement, values of between 0.61 and 0.8 signify substantial agreement, 0.410.6 moderate agreement, 0.210.4 fair agreement, and 00.2 poor to slight agreement.
| RESULTS |
|
|
|---|
|
Both observers found interdigital lesions present in 9 of the 17 (%) patients seen (kappa of 1,100% agreement). There was high but not perfect agreement over the identification of the number of lesions, with a weighted kappa of 0.84 (95% CI 0.671.00) for the number of lesions identified.
The nature of the lesions had lower agreement levels, with kappa scores ranging between 0.33 and 0.92. Only "fair" agreement was seen as to the presence of peeling or cracking, and moderate agreement was seen for presence of maceration and odor. However, the color of the lesions had substantial agreement and the presence of reported itch or pain had higher levels of observer agreement, with kappa scores representing substantial and near-perfect agreement (Table 3
).
|
| DISCUSSION |
|
|
|---|
Recent work emphasizes the particular importance of interdigital lesions (IDLs) in filarial lymphedema. IDL are a frequent entry lesion (seen in more than 50% of those with lymphedema in this and recent studies)8,22 and are commonly seen in those presenting with ADLA. Work in Guyana has shown IDLs to be the strongest risk factor for increased frequency of ADLA in the previous year.22 IDLs may also be important in lymphedema development.23 The fact that these lesions are commonly asymptomatic means that diagnosis by health workers is vital for initial identification.8,22 In this study, a high level of agreement between health workers was seen in correctly identifying presence of interdigital entry lesions; however, there was more inter-observer variability when describing the nature of these lesions, with only a moderate agreement as to the clinical nature of these lesions.
Many of the descriptive terms in standard use to describe skin lesions of these types had only moderate levels of inter-observer agreement in this study. This seems to reinforce concerns that such terms remain ambiguous for health workers even after specific training. Previous dermatologic studies on the nature of interdigital lesions have shown that many clinical criteria used by health workers have a low predictive value for presence of infection, particularly for subjective measurements such as odor but also for apparently more objective terms such as skin peeling.24 Likewise, in other skin conditions, for example atopic dermatitis, it has been shown that there can be low agreement among "experts" in clinical skin examination.25
Lymphedema workers in other regions also report difficulty with classifying such lesions. This degree of difficulty may further depend on whether the naked eye is used or additional aids, for example a magnifying glass, to examine for cracks in the lesions (Dreyer G, personal communication). This emphasizes that caution should be used regarding management decisions based on the presence of these clinical entities; however, in contrast to the clinical diagnoses of the lesions, the symptomatic nature of lesions appeared to be more reliable, with high observer agreement for reported presence of itch and pain. Itch is one factor that has been shown in research to be more closely related to a positive fungal diagnosis.24,26
In this study, the identification of lesions was more reliable than the classification of their nature. Additionally, recent work has found that interdigital entry lesions act as a risk factor for ADLA frequency regardless of their clinical nature or infectious etiology.22 Furthermore, studies on treatment of foot lesions have shown that clinical cure can be achieved with a variety of topical treatments, and that application of unmedicated creams can also offer clinical and mycologic cure.27 Therefore, the subsequent management to resolve entry lesions may be similar for all lesions in this environment.28 For these reasons, it seems adequate and more achievable to concentrate efforts on training health workers to identify, rather than to classify, such risk lesions.
There was high inter-observer agreement in grading lymphedema. This addresses concerns that the Dreyer staging system is too complicated and demonstrates that it can be a reliable system. This would seem to confirm that the effort to avoid ambiguous clinical terms is successful in this grading system. This replicability between health workers in this small study gives some support to the use of this staging system. Note that the clinical questionnaire may have had an effect on the Dreyer staging agreement because both health workers performed the Dreyer staging concurrently with the questionnaire. Further work is needed to determine if the questionnaire may have effected the inter-rater reliability measures. Additionally, we note that although kappa statistics (RMAC as well as Cohens kappa) adjust for chance agreement, there still is some dependence on the kappa values on the distribution of the ratings in the study population.29 The implications for the Dreyer stages is that populations that are more uniformly distributed about the 8 stages should be easier to show high kappa values than the current study if the proportion of agreement stays the same between the two, and conversely populations more unevenly distributed about the 8 stages may be harder to show high kappa values.
This study did not address whether the Dreyer disease stages are linked to treatment and prognostic implications.15,16 Some information that helps to validate the Dreyer staging score as a clinically meaningful score is given in a recent publication by some of the current authors.22 That study used data on 73 patients with filarial lymphedema (including some of the 17 patients studied here, but using only the ratings from one of the health workers) that showed positive significant correlation between the Dreyer stage of the worse leg and i) number of episodes of ADLA, ii) number of interdigital lesions, and iii) number of abnormal nails.22 Further work is needed to directly compare the reliability of the Dreyer staging system with the World Health Organization three-stage grading system. Additionally, all these systems need to be validated more fully.
The high agreement between health workers when using a questionnaire seems to indicate a reliable detection of abnormalities. Questionnaires and algorithms are being developed and successfully used for identification and management of a wide range of skin conditions. These are being shown to be valuable in particular in countries where there is limited specialist training in dermatology and limited access to laboratory diagnostic techniques.30,31 This study provides support for the use of a clinical questionnaire to aid health workers when managing lymphedema patients. The current handbook for management of lymphedema patients includes such questionnaires and currently advises management depending on Dreyer stage.15,16 These could be validated and further developed into a management algorithm, offering the potential of significant health benefits in endemic communities.
Received September 23, 2005. Accepted for publication November 8, 2005.
Acknowledgments: The authors thank David Addiss, Gerusa Dreyer, and Tom Nutman for comments on early versions of the manuscript. We are grateful to Brenda Rae Marshall (NIAID intramural editor).
Financial support: This work was supported by the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health.
* Address correspondence to Michael P. Fay, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892-7609. E-mail: mfay{at}niaid.nih.gov ![]()
Authors addresses: Tess McPherson, 25 Norham Road, Oxford, OX2 6sf, UK, Telephone: 01865 558743, E-mail: tessmcp{at}hotmail.com. Michael P. Fay, Biostatistic Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, Telephone: 301-451-5124, E-mail: mfay{at}niaid.nih.gov. Shanti Singh, Ministry of Health, Georgetown, Guyana. Rebecca Penzer, Independent Nurse Consultant, Opal Skin Solutions, Oxford, UK, Telephone: 44(0)1865 771507, E-mail: rpenzer{at}opalskin.co.uk. Rod Hay, Queens University, Belfast, Northern Ireland, UK.
Reprint requests: Michael P. Fay, Biostatistics Research Branch, National Infectious Diseases, Bethesda, MD 20892-7609, E-mail: mfay{at}niaid.nih.gov.
| REFERENCES |
|
|
|---|
dema in a filariasis-endemic area. Br J Dermatol: in press.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |