1
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Web surveys have replaced Face‐to‐Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID‐19 pandemic‐related restrictions. However, this mode still faces significant limitations in obtaining probability‐based samples of the general population. For this reason, most web surveys rely on nonprobability survey designs. Whereas probability‐based designs continue to be the gold standard in survey sampling, nonprobability web surveys may still prove useful in some situations. For instance, when small subpopulations are the group under study and probability sampling is unlikely to meet sample size requirements, complementing a small probability sample with a larger nonprobability one may improve the efficiency of the estimates. Nonprobability samples may also be designed as a mean for compensating for known biases in probability‐based web survey samples by purposely targeting respondent profiles that tend to be underrepresented in these surveys. This is the case in the Survey on the impact of the COVID‐19 pandemic in Spain (ESPACOV) that motivates this paper. In this paper, we propose a methodology for combining probability and nonprobability web‐based survey samples with the help of machine‐learning techniques. We then assess the efficiency of the resulting estimates by comparing them with other strategies that have been used before. Our simulation study and the application of the proposed estimation method to the second wave of the ESPACOV Survey allow us to conclude that this is the best option for reducing the biases observed in our data.

          Related collections

          Most cited references59

          • Record: found
          • Abstract: not found
          • Article: not found

          Greedy function approximation: A gradient boosting machine.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            XGBoost: A Scalable Tree Boosting System

            Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems. KDD'16 changed all figures to type1
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Selection Bias in Web Surveys

                Bookmark

                Author and article information

                Contributors
                mrueda@ugr.es
                luiscastro193@ugr.es
                Journal
                Biom J
                Biom J
                10.1002/(ISSN)1521-4036
                BIMJ
                Biometrical Journal. Biometrische Zeitschrift
                John Wiley and Sons Inc. (Hoboken )
                0323-3847
                1521-4036
                22 September 2022
                22 September 2022
                : 10.1002/bimj.202200035
                Affiliations
                [ 1 ] Department of Statistics and Operational Research University of Granada Granada Spain
                [ 2 ] Institute for Advanced Social Studies/Spanish Research Council (IESA‐CSIC) Córdoba Spain
                [ 3 ] Department of Quantitative Methods for Economics and Business University of Granada Granada Spain
                Author notes
                [*] [* ] Correspondence

                María del Mar Rueda, Department of Statistics and Operational Research, University of Granada. Avda. Fuentenueva s/n, 18071, Granada, Spain.

                Email: mrueda@ 123456ugr.es

                Author information
                https://orcid.org/0000-0002-2903-8745
                https://orcid.org/0000-0001-5285-1470
                https://orcid.org/0000-0003-2654-0032
                https://orcid.org/0000-0002-0934-4219
                https://orcid.org/0000-0002-9655-933X
                Article
                BIMJ2398
                10.1002/bimj.202200035
                9538074
                36136044
                366e492c-e052-428a-a939-47a6dd854647
                © 2022 The Authors. Biometrical Journal published by Wiley‐VCH GmbH.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.

                History
                : 05 July 2022
                : 04 February 2022
                : 21 July 2022
                Page count
                Figures: 1, Tables: 9, Pages: 19, Words: 10279
                Funding
                Funded by: FEDER/Junta de Andalucía
                Award ID: A‐SEJ‐154‐UGR20
                Award ID: FQM170‐UGR20
                Funded by: Ministerio de Educación y Ciencia
                Award ID: PID2019‐106861RB‐I00
                Award ID: CEX2020‐001105‐M/AEI/10.13039/50110001103
                Funded by: Funding for open access charge: Universidad de Granada / CBUA
                Categories
                Research Article
                Research Articles
                Custom metadata
                2.0
                corrected-proof
                Converter:WILEY_ML3GV2_TO_JATSPMC version:6.2.0 mode:remove_FC converted:07.10.2022

                Quantitative & Systems biology
                covid‐19,machine‐learning techniques,nonprobability surveys,propensity score adjustment,survey sampling

                Comments

                Comment on this article