Representativeness of surveys and its analysis
Details
Serval ID
serval:BIB_993FB588DDB7
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Representativeness of surveys and its analysis
Journal
FORS Guides
Publication state
Published
Issued date
14/12/2021
Peer-reviewed
Oui
Number
15
Language
english
Abstract
The analysis of representativeness of a data set belongs to the standard quality assurance procedures in survey research. This FORS Guide challenges current practices of the analysis of representativity and suggests a framework to analyse the risk for representation bias taking into account different uses of data.
Recommendations for researchers:
- Avoid the term “representative”. If it needs to be used, explain clearly what is meant, revealing the context for which the statement is made. Only use it when it refers to probability sampling and do not make a general claim.
- Be creative. Instead of trusting one indicator, use several indicators linked to the analysis that is or will be made.
- Be specific. If having to inform generally on a data set, cover multiple uses of the data, never make general claims and base recommendations on the findings of the analysis.
- Be prudent. Reflect possible biases with regard to results of substantive analyses.
- Be scientific. Take plausible assumptions, be consistent, be simple and comprehensible, do not over-generalise, remain within the scope of the analysis.
- Stay focused. Keep an eye on what the goal is; what is the correlation of the test variables with the variables and statistics of interest? This correlation sets the limits of influence of the test variables on the statistics of interest.
- Be inclusive. Use as much information as is available. Whenever possible use advanced statistical models to account for uncertainty due to (unit and item) nonresponse, such as full information maximum likelihood or multiple imputation.
- Big data are not representative for a general population. It is usually not the goal of analysing big data to draw conclusions regarding the general population. Rather, it is the analysis of all available data on a subject matter. It is not a sample and certainly not a probabilistic one, therefore inference cannot be made. Big data is very useful but not for claims regarding the general population. For example, an analysis of gender-neutral pronouns in Twitter data is very interesting but does not reflect the use of gender-neutral pronouns in other contexts, nor does the same analysis using the complete set of the most prestigious newspaper articles in the same time period. However, such data can be used to start formulating theories regarding the general population which then is studied using other data; or the results from Twitter and the newspaper can be compared and interpreted fully. This is highly interesting, but the concept of representativeness does not make sense in such contexts.
Recommendations for researchers:
- Avoid the term “representative”. If it needs to be used, explain clearly what is meant, revealing the context for which the statement is made. Only use it when it refers to probability sampling and do not make a general claim.
- Be creative. Instead of trusting one indicator, use several indicators linked to the analysis that is or will be made.
- Be specific. If having to inform generally on a data set, cover multiple uses of the data, never make general claims and base recommendations on the findings of the analysis.
- Be prudent. Reflect possible biases with regard to results of substantive analyses.
- Be scientific. Take plausible assumptions, be consistent, be simple and comprehensible, do not over-generalise, remain within the scope of the analysis.
- Stay focused. Keep an eye on what the goal is; what is the correlation of the test variables with the variables and statistics of interest? This correlation sets the limits of influence of the test variables on the statistics of interest.
- Be inclusive. Use as much information as is available. Whenever possible use advanced statistical models to account for uncertainty due to (unit and item) nonresponse, such as full information maximum likelihood or multiple imputation.
- Big data are not representative for a general population. It is usually not the goal of analysing big data to draw conclusions regarding the general population. Rather, it is the analysis of all available data on a subject matter. It is not a sample and certainly not a probabilistic one, therefore inference cannot be made. Big data is very useful but not for claims regarding the general population. For example, an analysis of gender-neutral pronouns in Twitter data is very interesting but does not reflect the use of gender-neutral pronouns in other contexts, nor does the same analysis using the complete set of the most prestigious newspaper articles in the same time period. However, such data can be used to start formulating theories regarding the general population which then is studied using other data; or the results from Twitter and the newspaper can be compared and interpreted fully. This is highly interesting, but the concept of representativeness does not make sense in such contexts.
Open Access
Yes
Create date
11/01/2022 23:26
Last modification date
21/11/2022 8:25