|
|
||||||||
| ABSTRACT |
|
|
|---|
| INTRODUCTION |
|
|
|---|
Conducting a household survey in the developing world, both in large national surveys and in smaller surveys for research or monitoring and evaluation activities, can require great effort and considerable compromise. Detailed, high-quality national/sub-national household surveys such as Demographic Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS) generate valuable national-level data on a wide range of demographic and disease indicators. However, these surveys are time-consuming and costly and therefore can only be conducted once every few years. Moreover, they generally take place during the dry season, which may have implications for the study of diseases with transmission rates associated with the rainy season. There is an definite need for more affordable, high-quality, rapid assessment surveys that offer at short notice valid data representative of the population.
Implementing a statistically valid survey in developing countries or disaster areas may be particularly difficult. Lack of infrastructure and unavailable or inadequate sampling frames commonly pose a challenge to the standard multistage survey design, which requires a sampling frame or complete listing of households to obtain a statistically valid sample. It is usually time-consuming to compile this list, and as a result, many researchers opt for some variant of the Expanded Program on Immunization (EPI) cluster survey method,1,2 a non-probability sampling procedure, which greatly limits the validity and generalizability of the survey results.3,4 As a result, there is a substantial body of literature devoted to either finding efficient statistically sound alternatives to this process3–8 or evaluating its utility.9–16 However, a fast and efficient method of collecting a complete sampling frame for probability sampling is still needed.
The use of personal digital assistants (PDAs) and global positioning systems (GPS) in research has increased in recent years. In the United States, PDAs are being used for data collection in a variety of settings such as hospitals or physicians offices for clinical trials, patient diaries, or field assessment of patients.17–22 PDAs have also been used to collect survey data with this technology.23–27 Geographic information systems (GIS) have been used independently to create a grid over maps for generating sampling frames28 as well as locate homes of patients for treatment follow-ups.29 Although there has been an increase in the independent use of PDAs and GIS for surveys, the potential power of using both PDAs and GPS in combination for surveys has yet to be realized.
We have developed an innovative method to use PDAs with GPS units to overcome some of the logistical and statistical barriers associated with household surveys. We describe the use of a PDA-based software application developed at the Centers for Disease Control and Prevention (CDC), Atlanta, GA, to generate probability based samples in household surveys in a feasible time frame and to simultaneously use the equipment for interviewing and data entry. With this method, field teams can quickly generate a complete list of households at the enumeration area (EA) level by mapping the location of each household, select a computer-generated simple random sample, and return to the selected households using the GPS navigation tool to conduct a PDA-based interview. To demonstrate the applicability of this method in prevalence or coverage surveys in different settings, we present examples of recent national household surveys in Togo and Niger where this technique was applied.
| MATERIALS |
|
|
|---|
Software design and description. The software system consisted of 2 separate modules: 1) GPS Sample, which is a GPS mapping and navigation program and 2) the electronic questionnaire. The first module, GPS Sample, was developed at CDC using Visual Basic Visual Studio .NET 2005 (Microsoft Corporation). This module was designed for use with a PDA equipped with a GPS device working on the National Marine Electronics Association (NMEA)–0183 protocol, one of the most common protocols used by GPS receivers to decode GPS satellite signals into a useable format including time, date, and positioning information.30–34 The second module (the electronic applications used to administer the questionnaires and collect the data) was also developed at the CDC using Visual CE® 9.0 Professional (Syware Inc., Cambridge, MA). Visual basic or other packages are also viable options for designing questionnaires.
GPS Sample program features.
Mapping.
Using GPS Sample, we collected information about each location in three data entry fields: a character ID field (usually to identify the EA or village), a numeric ID field (usually to identify the household), and a text field for additional comments (Figure 1
). The comment field provides definitive descriptive information to help differentiate between locations that are close together (< 10 meters apart). When a record is saved to the database, it contains the information in these three fields, the GPS NMEA sentence, and a computer-generated random number between 0 and 1, which is used for probability sampling.
|
Data transfer. A strength of the GPS Sample program is the ability to use multiple PDAs to simultaneously map different areas and later combine databases for sampling. After mapping, team members transferred data between PDAs using the infrared beam feature and then combined the GPS points into one database. Because random numbers are assigned at the time the points are saved, all PDAs have identical copies of the combined database from which a probability sample is chosen.
Sample selection. Another strength of the GPS Sample program is that it lets the user choose the number of locations to sample and the number of desired alternates. The sample selection is based on the random number assigned at the point of mapping. For example, if 12 households are desired, the mapped points with the 12 smallest random numbers are selected. This selection information (whether the location is included in the sample, an alternate, or not in the sample) is also saved as a variable in the GPS database. The program currently allows the option of simple random sampling or segment sampling; however, stratified or systematic samples may be incorporated in future releases.
Navigation.
The GPS Sample software has a navigation utility that is used to guide the user back to the selected locations for the interview after the EA has been mapped and a sample has been selected. The navigation view lists the numeric ID of the locations in the sample, comment information, distance from the interviewers present location, and compass direction (Figure 2
). Locations listed can be sorted by ID or distance or deleted from the list, which enables multiple workers to share the navigation and interviewing process. Depending on the GPS receiver, workers can generally navigate to within 10–15 meters of a mapped household using the GPS coordinates and then use the comment field to definitively identify which places to perform the interview.
|
The GPS Sample has a supplemental program that easily transfers the GPS data from the PDA to a Microsoft® Office Access 2003 database on a personal computer. The database contains variables for information entered during mapping, sampling selection, and GPS information obtained from the NMEA 0183 sentence that can be analyzed or combined with other geographic information available for the area. For example, a considerable amount of geographic data is available via the Internet at no cost, allowing survey information to be combined with such information as distance to primary or secondary roads, vegetation cover, rainfall, mean temperatures, and elevation, to name a few.36–38
PDA-based questionnaire. The questionnaire was developed on a personal computer and then downloaded to each PDA device. For each section of the questionnaire, a different Visual CE® form was designed to collect and store the data. Questions were in the form of drop-down menus, text entry fields, radio buttons, or check boxes, and incorporated appropriate response-based skips and data quality checks. Relational databases were created from these forms and linked using unique identification (ID) keys. Unique IDs for each interview were generated systematically by each PDA at the start of every interview. Also, date and time stamps were added to determine the length of the interview, and a value was assigned at the end of each interview to designate whether the interview was completed or terminated early.
Data safety. When dealing with both sensitive survey data and positional information for each household, the team must take extra care to ensure there is no loss of anonymity or compromised confidentiality if a PDA is lost or stolen. Although no sensitive information was collected for the surveys described here, access to the data and nonessential programs was restricted using a password-protected program. Additional safeguards such as using the SQL Server Mobile encryption option or using the built-in capability of Windows Mobile® to password-protect the entire PDA device could be used when collecting sensitive health data.
| METHODS |
|
|
|---|
Survey method. We used a three-stage cluster design for these coverage surveys and all regions in the country were included. At the first stage, probability proportional to size (PPS) sampling was used to select districts within each region. For Togo, districts were stratified based on whether the Red Cross was present in the district prior to sampling. At the second stage, PPS sampling was used to choose EAs within each of the selected districts. Lastly, the GPS Sample program was used to select a simple random sample of households within each EA.
Training. For the described surveys, a five-day training session was held to introduce staff to the PDA, GPS technology, and survey content. Our training covered all key aspects of the technology from a conceptual overview to interviewing techniques, practice sessions, and field testing. Team supervisors received additional training in PDA maintenance, battery charging, troubleshooting, and data backup.
Team composition. Typically, the number and composition of teams vary depending on the number of EAs, regional geography/topography, distance between areas, availability of resources (money, vehicles, and surveyors), and length of the study. In general, our teams consisted of 3–4 surveyors and one supervisor (to allow the use of one field vehicle per team). Surveyors were literate, conversant in local languages and customs, experienced in community interviews, and capable of mapping and merging databases. Most of our team members had an advanced degree and supervisors usually had a more technical/senior background. Additionally, local community health workers were approached in each EA on the day of the actual survey to accompany the survey staff, introduce them, and help identify and confirm which households were within the EA boundaries.
A field day. A typical field day was as follows. 1) In the morning, the teams had a brief introduction to the village heads and identified the local guides (community health workers). 2) The team members determined which section of the EA each member would map. 3) Each surveyor, with the help of a local guide, mapped every household in his or her designated area using PDAs equipped with GPS. This task was usually completed before noon. 4) After mapping, the survey members transferred the GPS data to each others PDAs using the beam function. 5) On each PDA, the databases were combined to obtain a list of all households in the EA. 6) The GPS Sample program was used to randomly select households from the combined dataset. 7) Survey members determined which households each one was responsible for interviewing. 8) Each individual navigated back to the selected households using the GPS Sample program, conducted the interview (after obtaining consent), and entered responses directly into the PDA database. All households selected were approached to participate in the survey. If members of a household were absent, an alternate household was selected; however, if members refused to be surveyed, the household was not replaced.
Data management and analyses. Data collected on the PDA were synchronized into a Microsoft® Office Access 2003 database. The data transfer/synchronization system incorporated into Visual CE® was used to structure the rules for the synchronization process (i.e., which tables were synchronized to which database, along with actions for updating or overwriting existing records). The data synchronization of more than 20,000 GPS records and 1,800 household interviews was completed in roughly two hours. Preliminary data analyses were prepared before the end of the survey using SAS version 9.1 (SAS Institute, Cary, NC) to allow for rapid processing and presentation of a preliminary report.
ArcView 3.3 (Environmental Systems Research Institute, Redlands, CA) spatial analysis software was used during the survey to monitor GPS data quality. Spatial data from completed EAs were examined for obvious aberrancies such as a large cluster of households at a single focal point (suggesting a surveyor was standing in one spot to enter data for more than one household), a scarcity of households where households are expected (suggesting the area was possibly missed), and outliers (indicating an error in the precision of the GPS device). In this way, possible errors or falsification of data was detected, addressed, and rectified while surveyors were in the field.
| RESULTS |
|
|
|---|
For each of the three regions where the follow-up anemia survey took place (Maritime, Plateaux, and Savanes), a random sample of 27 households was selected from each of 30 EAs. Six teams consisting of four members (three surveyors and one supervisor) each completed the mapping, invitations, and 27 interviews in an EA within one day. For the three regions where the anemia survey did not take place (Centrale, Kara, and Lomé), two districts were selected in each region. In each district, 12 EAs were surveyed and 16 households were randomly selected within each EA. Each team was responsible for one district, which required mapping and surveying two EAs per day (by dividing into two-person teams). Survey teams worked six days a week for two weeks with one day in between to travel between districts.
It took 19 days to map a total of 21,588 households and conduct 3,523 interviews in 162 EAs. The median number of households mapped per individual per EA was 40 with an interquartile range (IQR) of 27–53 and a mean mapping time of 1 hour and 48 minutes (IQR = 54 minutes to 2.5 hours). The median distance traveled by each individual in an EA on one day was 1.3 km (IQR = 0.8–2.5 km). The household interviews took an average of 7.5 minutes (IQR = 5–11 minutes) with each staff member conducting approximately 8–9 interviews. Preliminary results of the coverage survey were presented to the Ministry of Health within two days of completion of data collection.
Case 2 Niger coverage survey.
This national ITN coverage survey took place from January 23 to February 17, 2006, one month after the Niger National Integrated Child Health Campaign.41 It was conducted in both urban and rural communities in seven regions in Niger (Agadez, Diffa, Dosso, Maradi, Tahoua, Tillaberi, and Zinder). A total of 112 EAs were chosen (Figure 3
); within each, 16 households were randomly selected for interviews (Figure 4
).
|
|
Teams mapped 28,552 households and completed 1,801 interviews. Each workday teams averaged 3 hours 20 minutes (IQR = 1 hour 28 minutes to 4 hours 17 minutes) mapping and each individual mapped roughly 58 (IQR = 33–75) households per EA. The median distance traveled by each individual in an EA on one day was 3.3 km (IQR = 1.0–10.8 km). Interview times averaged 14 minutes (IQR = 9–18 minutes) with each staff member conducting approximately 4 interviews. Data were downloaded immediately upon receipt, and analyses were performed using SAS version 9.1. A formal debriefing was held, and a preliminary report of results was distributed three days after the last day of data collection.
| DISCUSSION |
|
|
|---|
Surveys using GPS-based data collection offer clear methodologic advantages over traditional EPI-based surveys. Typically, an EPI cluster survey procedure selects a central location in the EA, randomly picks a direction by spinning a bottle, and chooses subsequent households along the directional line selected. This may result in inaccurate sample estimates because the total number of households in the EA is unknown, and the households interviewed may not be representative of the entire EA. Incorporating GPS into the sampling method enables one to select a random sample of households for interviewing from a complete and up-to-date listing of the households in the EA, and as a result, the probability of selection is known. The GPS data collected also provides geospatial information for reports and analyses.
The advantages of PDA-based data collection over paper forms have been well documented.42 Paper-based data entry can be time-consuming and error-prone. Using PDAs, we were able to incorporate sophisticated quality checks upon data entry to minimize errors and immediately prompt the interviewer for clarification should data conflicts occur. For interviewers, the burden of carrying a PDA with a GPS unit is not much more than that of carrying paper forms, and can prove advantageous in inclement weather. Furthermore, data from the PDAs can be aggregated into a single database, usually within hours, making it possible to assess data quality while in the field. For Togo and Niger, we were able to generate a preliminary report of survey findings from the aggregated data (including maps of the districts sampled showing all of the households and their sample inclusion status) within a few days of completion of data collection. This can be particularly useful in situations where data are needed quickly to guide public health action, such as in routine monitoring and evaluation and rapid needs assessments.
Incorporating GPS with PDAs for health surveys is not new. In a tuberculosis study, Dwolatzky and others used aerial photographs to identify the geographic coordinates of patients households and then used PDAs with GPS to navigate back to those households.29 In an ecologic study by Keating and others, GPSs were used to map houses and create a grid overlay system to generate a sampling frame.28 Several commercial and non-commercial mobile products are available that integrate GPS with survey data collection software on PDAs. These products include Visual CE®, Field Adapted Survey Toolkit (GeoAge, Jacksonville, FL), GPS Pathfinder® Office/TerrasyncTM system (Trimble, Sunnyvale, CA), and EpiHandy (Center for International Health at University of Bergen, Bergen, Norway). However, the strength and innovation of the method/software described here is the ability for multiple people to rapidly map households in an area, combine data to generate a sampling frame, and select a random sample of households or clusters for probability-based sampling while in the field.
There are several logistical considerations when designing a study using PDAs with GPS for field assessments. First, the timeline and team composition may need to be adjusted compared with an EPI cluster survey. Some time must be allotted to complete mapping prior to interviewing. In our surveys, most teams were able to map the EA in the morning and interview in the afternoon of the same day. However, for larger areas (geographically or by population), it may take a longer time or require larger teams. This was a major concern for the ITN coverage survey in Niger, which involved covering vast geographic areas with limited resources and time constraints. The task was accomplished by combining team efforts for some large EAs and allotting more time for teams working in remote EAs. Second, the accuracy of the GPS receivers should be considered. With the PDA-based GPS units, the navigation utility is accurate to within a 10–15-meter radius. This distance is suitable for a household survey in which the user can provide a descriptive comment to help differentiate close GPS measurements; however, it may not be a suitable distance for studies that require greater accuracy. Third, precautions must be taken in case a PDA malfunctions. Safety measures can be incorporated in the GPS Sample and questionnaire applications to ensure careful and regular data back-up (on the PDA and SD card) in case of a malfunction. A few reserve PDAs are recommended. Lastly, preparing a PDA-based survey requires early planning and preparation. Specifically, the questionnaire must be finalized well in advance of the survey to provide enough time for development and testing on the PDAs.
A further design matter to take into account for PDA surveys with GPS units is the training of survey team members and communication in the field. In some settings, survey team members may not have any exposure or experience to PDAs or GPS technology. It is important to schedule time during training for familiarizing the team members with basic PDA and GPS functionality and hardware. We have found most team members are enthusiastic about learning the new technology and find the actual questionnaire is simpler to learn because the PDAs automatically incorporate complicated skip patterns that may be difficult using paper. Those team members with computer experience rapidly adapt to the new technology and are able to spend more time focusing on interviewing skills. In addition to training, the ability to communicate with team members in the field is extremely important. Communication is necessary for possible PDA troubleshooting and monitoring team progress. Most PDA/GPS problems are simple user errors that can be resolved with a short discussion with the supervisor.
Lastly, an important consideration in study design is cost. In general, the least expensive survey method is a convenience sample, such as an EPI cluster survey. Cost increases when a statistically valid method is desired, such as with the MICS and DHS. The method described here is more expensive than a convenience sample, but much less expensive (in terms of resources and time) than the DHS or MICS. The primary expense for a survey using PDA with GPS is equipment; approximately $600 for one PDA with a long-life battery, SD data card, and GPS unit in a weatherproof protective case. Although the initial investment for a PDA-based survey is potentially greater than for a paper-based survey, it is counterbalanced by the additional cost of printing, distribution, data entry, and data cleaning. Larger surveys will realize even greater savings in time and cost for data entry when PDAs are used. In addition, there is a cost savings when the PDAs are used for multiple surveys. The GPS Sample software that was developed for this method is now publicly available at no cost from the authors.
Although initially developed for use in prevalence or coverage surveys in developing countries, this method could be applied in a wide variety of settings and study designs. For example, in disaster-affected areas, one may wish to conduct a quick census of locations with certain attributes, but only collect detailed information for a sample of the locations. Also, the system can be applied to lot quality assurance sampling (LQAS) surveys.43 LQAS requires a simple random sample at the primary sampling unit level. This assumption is often violated in practice because of the extra work load necessary. If PDA/GPS technology is used, these projects could now be conducted on a more statistically sound basis.
A scale-up in national public health programs and interventions has created an increased need for accurate monitoring and evaluation with rapid feedback on the status of disease control progress. The presented survey method that uses PDAs equipped with GPS receivers provides a valuable alternative to the EPI cluster survey method and can help to provide valid, population-level intervention coverage to inform control policy and programs.
GPS Sample. The GPS Sample software that was developed for this method is now publicly available upon request at no cost from the CDC (not for commercial use). For the most recent software program and information, please visit http://www2.ncid.cdc.gov/GPS2 or contact GPSsample{at}cdc.gov.
Received February 6, 2007. Accepted for publication April 30, 2007.
Acknowledgments: We thank the many people who participated in our surveys and the interviewers from the Togolese Ministry of Health, Togolese Red Cross, Niger Ministry of Health, and Red Cross Society of Niger for their time, energy, and willingness to learn something entirely new. Special thanks to Dr. Marcy Erskine (Canadian Red Cross) for her logistical support. We also thank others at CDC, particularly Dr. Natasha Hochberg, Dr. Alexandre Macedo de Oliveira, and Dr. Ramesh Krishnamurthy for their time and energy learning and implementing this method in the field, and the International Federation of Red Cross and Red Crescent Societies for their assistance.
Financial support: The surveys for which method was developed were supported by the Canadian International Development Agency, through the Canadian Red Cross.
Disclaimer: The opinions or assertions contained in this manuscript are the private ones of the authors and are not to be construed as official or reflecting the views of the U.S. Public Health Service or Department of Health and Human Services. Use of trade names is for identification only and does not imply endorsement by the U.S. Public Health Service or Department of Health and Human Services.
* Address correspondence to Jodi L. Vanden Eng, Division of Parasitic Diseases, National Center for Infectious Diseases, Centers for Disease Control and Prevention, Mailstop F-22, 4770 Buford Highway, Atlanta, GA 30341. E-mail: jev8{at}cdc.gov ![]()
Authors addresses: Jodi L. Vanden Eng, Adam Wolkon, Anatoly S. Frolov, M. James Eliades, William A. Hawley, and Allen W. High-tower, Division of Parasitic Diseases, National Center for Infectious Diseases, Centers for Disease Control and Prevention, Mailstop F-22, 4770 Buford Highway, Atlanta, GA 30341. Dianne J. Terlouw, Child and Reproductive Health Group, Liverpool School of Tropical Medicine, Pembroke Place, L3 5QA, Liverpool, United Kingdom. Kodjo Morgah, Vincent Takpa, Aboudou Dare, Yao K. Sodahlon, and Yao Doumanou, Ministère de la Santé, BP 386, Lomé, Togo.
| REFERENCES |
|
|
|---|
This article has been cited by other articles:
![]() |
M. A. Kulkarni, J. V. Eng, R. E. Desrochers, A. H. Cotte, J. L. Goodson, A. Johnston, A. Wolkon, M. Erskine, P. Berti, A. Rakotoarisoa, et al. Contribution of Integrated Campaign Distribution of Long-Lasting Insecticidal Nets to Coverage of Target Groups and Total Populations in Malaria-Endemic Areas in Madagascar Am J Trop Med Hyg, March 1, 2010; 82(3): 420 - 425. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |