|
|
||||||||
| ABSTRACT |
|
|
|---|
| INTRODUCTION |
|
|
|---|
For clinical assessment, the WHO simplified trachoma grading system (Table 1
) is widely used in both research and control programs, even though it was developed only to aid assessment of trachoma by non-specialist personnel.4 The system is believed to have good reproducibility.4,5 However, field assessments are almost impossible to mask. In research, this makes it difficult to exclude the possibility of bias.
|
of 0.71 for TF, 0.74 for TI, and 0.73 for TS.6 No other formal analyses of the reliability of photographs for grading trachoma have been published.
Many subsequent studies have used photographs for the purposes of validating field data,916 explaining positive laboratory results for individuals graded clinically as not having active trachoma,17 or as the single means of assessing clinical status.1821 In view of this reliance on photographs and their potential application in our own studies, we have undertaken further assessment of the process.
| SUBJECTS AND METHODS |
|
|
|---|
The study took place in Kahe Mpya sub-village in the Rombo District of Tanzania.22 Before commencing fieldwork, the field grader (PAM, an ophthalmic nurse with extensive trachoma field experience) was evaluated against another experienced, validated23 grader (DCWM). Masked to the others assessment, each independently examined the right eyes of the same fifty 57 year-old children. According to the reference grader, the prevalences of TF, TI, and TS were 10/50 (20%), 3/50 (6%), and 4/50 (8%), respectively. Agreement was 100%, 96%, and 96%, giving kappas of 1.00 (perfect agreement), 0.73, and 0.73.
In July 2000, we invited all residents of Kahe Mpya to participate in a longitudinal study.22,24 Clinical grades and photographs used here were obtained at baseline22 before any interventions against trachoma.
The everted right tarsal conjunctiva of each participant was evaluated against the simplified WHO system criteria4 using x 2.5 binocular loupes. Grading was undertaken in sunlight whenever possible; when the conjunctiva was inadequately illuminated, a torch was used.
To increase the likelihood of obtaining at least one satisfactory picture of each eye, we took two photographs of the tarsal conjunctiva of each individual. We used an EOS-300 single lens reflex camera (Canon, Tokyo, Japan), an EF 100mm f2.8 macro lens (Canon), a x 2 teleconverter, Macro-Lite ML-3 ring flash (Canon), and 100ASA color print film (FujiFilm, Tokyo, Japan). The camera was hand-held at a focal length of approximately 30 cm and manually focused on the central tarsal plate. Aperture was set at f/19. One photographer took all photographs; before this study, he had taken approximately 2,000 conjunctival photographs in trachoma-endemic villages.
Prints (15 cm x 10 cm) were prepared by a professional London photograph laboratory (giving a final magnification of x 5, c.f. x 2.5 for field grading), then assessed independently by two ophthalmologists, both of whom were experienced trachoma graders. Photographs were also graded independently by the field grader. Photograph grading was undertaken without magnifying loupes, using the simplified WHO system, and masked to field assessments and assessments of the other photograph graders.
Each set of two photographs (taken of one conjunctiva) was considered a pair, allowing graders to obtain as much visual information as had been recorded for that eye. Because most research studies use photographs to assess for active trachoma, only the signs TF and TI are considered here. For each subject, graders could record grades for these two signs or, if the photographs did not provide sufficient information, record that the two photographs were ungradable. No time limit was imposed.
Data were double-entered into Microsoft (Redmond, WA) Access (2002, SP3) and analyzed using Stata 7 (Stata Corporation, College Station, TX). For primary analyses, when a photograph grader wrote ungradable, but the comparison grader (in the field or examining photographs) made a diagnosis, this counted as a disagreement. Analyses were repeated with exclusion of ungradable photographs.6
| RESULTS |
|
|
|---|
Grader A found that 106 (11%) of 948 sets of photographs were inadequate for grading. Graders B and C (the field grader grading the photographs) found 1 set (0.1%) and 35 sets (4%) inadequate, respectively. Based on graded sets only, graders A, B, and C recorded TF prevalences of 11%, 38%, and 9%, respectively, and TI prevalences of 23%, 15% and 14%, respectively.
Inter-observer agreements are shown in Table 2
for the comparison of photographic and field grading. Kappa statistics for agreement between the three photographic graders were 0.32 (95% confidence interval [CI] = 0.290.35) for TF and 0.52 (95% CI = 0.490.55) for TI or, after exclusion of ungradable photographs, 0.37 (95% CI = 0.330.41) for TF and 0.66 (95% CI = 0.620.70) for TI.
|
| DISCUSSION |
|
|
|---|
Examining individuals in a village and reading photographs in an office are very different activities. In the field, there is pressure to maintain high throughput, and many individuals (particularly children) are unable to cooperate fully with the examination process. Conversely, the conjunctivae may be examined from multiple angles and are always in focus; illumination can be adjusted if required. The photograph graders in this study, conversely, saw only two views of each conjunctiva, and image quality relied on the subjects cooperativeness, the photographers skill and patience, and the nature of the photographic medium. It is difficult to take clear, close photographs of a small, irregularly curved, reflective, and often camera-shy surface. Furthermore, particularly when thickened and inflamed, the conjunctiva is a three-dimensional structure, and can only be imperfectly represented by a two-dimensional photograph. Although the camera was manually refocused from a slightly different vantage for the two pictures of each eye (in an effort to provide two slightly different views), the amount of information available to the photographic examiners was considerably less than that available to the field examiner.
In the only other published evaluation of imaging for trachoma,6 West and Taylor achieved better agreement than we did. Their study had some limitations. First, the same expert examiner examined subjects clinically and graded the slides. Although masked to the clinical grade, the examiner would have known the approximate prevalence of each sign. In our study, two other highly experienced graders (in addition to the field examiner) evaluated the photographs. Second, in the study of West and Taylor, there was agreement in clinical trachoma status between right and left eyes in 91% of the 136 subjects. It is not clear from their report if right and left eye slides of each patient were examined sequentially; if they were, a potential bias was introduced. In our study, only right eyes were included. Third, West and Taylor excluded the 8.5% of photographs believed to inadequately represent the conjunctiva. We believe that photographs considered ungradable are more likely to be of conjunctivae for which the diagnosis is borderline. If a photograph provides an in-focus, free-of-flash-reflex view of three-quarters of the central tarsal conjunctiva, and six follicles are visible in that area, the grader will assign a diagnosis of TF. If no follicles are seen, the grader may be comfortable assigning a diagnosis of no TF. If three follicles are seen, the examiners task is difficult. Similarly, in the field, when three follicles are noted in the first three-quarters of the central tarsal conjunctiva examined, the grader needs to evaluate the rest of the conjunctiva very carefully. Such eyes are the ones for which verification of field diagnoses are most important: excluding them from an evaluation of photographs may be unhelpful. We calculated primary kappa scores counting ungradable photographs as disagreements. If these photographs are excluded, the mean of three kappas for the photograph versus field comparisons increases slightly (from 0.44 to 0.57 for TF and 0.51 to 0.60 for TI), but agreement remains only moderate.8
On some counts, our study can be criticized in relation to the previous work. We used 100ASA film rather than high resolution 25ASA film.6 We used photographic prints, while they used slides, which have better color reproduction. Our photographic graders saw photographed tissues at twice the magnification used in the field, while West and Taylors examiner had the same magnification in each setting. In addition, although our field grader (photograph grader C) was standardized against a gold standard grader, we did not have the opportunity to validate our two other photograph graders against the gold standard, each other, or the field grader. Our works limitations, however, mirror those of most published studies that use photographs. Of 13 studies using photographs to validate or replace field trachoma grading cited in this papers introduction, only two13,16 state that slide film was used, and only one16 specifies the film speed. None provides sufficient information to determine the image magnification ratio between field and photographic examination. Most reports give no information as to who graded the photographs9,10,15,1719 or identify them only as a trained grader,12 a trained reader,20,21 an independent investigator,13 or clinicians.16 On the basis of our results, we conclude that either the signs TF and TI are less reproducible than previously believed,4,5 or that photographs are problematic for their diagnosis.
Could better pictures be obtained? Digital imaging, now recommended for fundus photography in diabetic retinopathy screening,25 could potentially reduce the proportion of subjects for whom no useful pictures are delivered to the remote examiner because it allows quality control through immediate image review.26 Until recently, however, digital images were of lower resolution than those generated by conventional photography, and in any case the difficulties presented by the irregular curvature of the conjunctiva, the limited number of views that can be taken of each eye, and the two-dimensional representation of three-dimensional epithelium would persist. A recent trial of latrine provision for trachoma control used a combination of slide and digital photography.16 Sixty percent of 2,489 slide photographs were gradable and 72% of 986 digital images were gradable. The proportion of images that were out of focus, too bright, too dark, or which otherwise provided inadequate views was approximately the same for the two media; the only difference was that 353 slide photographs were rendered useless by problems (such as untimely camera opening) that affected whole rolls of film (Emerson PM and others, unpublished data). Digital photography does at least minimize the risk of the latter type of error. Unless reliability testing demonstrates better agreement than seen here, however, for trachoma studies, we believe that photographs should not be used for diagnosing TF or TI.
Received May 18, 2005. Accepted for publication September 28, 2005.
Acknowledgments: We thank the village and sub-village chairmen, elders, and villagers of Kahe for their enthusiastic participation; our field team for help with data collection; Dr. Paul Emerson for helpful conversations about the use of photographs; and Professor John Shao and the members of the research steering committee.
Financial support: This study was supported by grants from the Wellcome Trust/Burroughs Wellcome Fund (059134) and the Edna McConnell Clark Foundation (99100).
* Address correspondence to Anthony W. Solomon, Clinical Research Unit, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom. E-mail: anthony.solomon{at}lshtm.ac.uk ![]()
Authors addresses: Anthony W. Solomon, Clinical Research Unit, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom, Telephone: 44-20-7958-8336, Fax: 44-20-7958-8317, E-mail: anthony.solomon{at}lshtm.ac.uk. Richard J. C. Bowman, Clinical Research Unit, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom, Telephone: 44-20-7958-8359, Fax: 44-20-7958-8317, E-mail: richardbowman{at}intafrica.com. David C. W. Mabey, Clinical Research Unit, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom, Telephone: 44-20-7927-2297, Fax: 44-20-7637-4314, E-mail: david.mabey{at}lshtm.ac.uk. David Yorston, Moorfields Eye Hospital, 162 City Road, London EC1V 2PD, United Kingdom, Telephone: 44-20-7253-3411, Fax: 44-20-7253-4696, E-mail: dhyorston{at}enterprise.net. Patrick A. Massae, Rombo Trachoma Research Project, Huruma Hospital, PO Box 202, Mkuu, Rombo, Tanzania, Telephone: 255-27-275-7230, E-mail: patrick.massae{at}iwayafrica.com. Salesia Safari, Huruma Hospital, PO Box 948 Moshi, Tanzania, Telephone: 255-27-275-7136, Fax: 255-27-275-7341, E-mail: huruma.hospital{at}iwayafrica.com. Brian Savage, 91 Drewry Lane, Derby DE22 3QS, United Kingdom, Telephone: 44-1332-242-635, E-mail: Able4{at}btinternet.com. Neal D. E. Alexander, Infectious Diseases Epidemiology Unit, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom, Telephone: 44-20-7927-2483, Fax: 44-20-7636-8739, Email: neal.alexander{at}lshtm.ac.uk. Allen Foster, International Centre for Eye Health, Clinical Research Unit, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom, Telephone: 44-20-7958-8359, Fax: 44-20-7958-8317, E-mail: allenfoster{at}compuserve.com.
The kappa statistic is an index of intra-observer or inter-observer reliability for categoric data. It is the difference between the observed and chance values of the proportion of agreement between two sets of observations of the same variable, expressed as a proportion of this differences maximum value.7 Kappa therefore has possible values between 1 and +1, with 1 indicating complete disagreement, +1 complete agreement, and 0 the level of agreement expected by chance. Divisions for describing the relative strength of agreement associated with this measurement have been (arbitrarily) defined as poor =
0.00; slight = 0.000.20; fair = 0.210.40; moderate = 0.410.60; substantial = 0.610.80; and almost perfect = 0.811.00.8 ![]()
| REFERENCES |
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |