Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

Tsui-Shan Lu, Matthew P. Longnecker, Haibo Zhou

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study.

Original languageEnglish
Pages (from-to)985-997
Number of pages13
JournalStatistics in Medicine
Volume36
Issue number6
DOIs
Publication statusPublished - 2017 Mar 15

Fingerprint

Statistical Inference
Sampling Design
Dependent
Polychlorinated Biphenyls
Budgets
Hearing Loss
Estimator
Sample Size
Costs and Cost Analysis
Case-cohort Design
Failure Time Data
Binary Response
Case-control
Design
Empirical Likelihood
Likelihood Methods
Control Design
Normality
Univariate
Covariates

Keywords

  • continuous multivariate responses
  • correlated responses
  • empirical likelihood
  • outcome-dependent sampling
  • semiparametric

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design. / Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo.

In: Statistics in Medicine, Vol. 36, No. 6, 15.03.2017, p. 985-997.

Research output: Contribution to journalArticle

@article{9fdfdfee90da483a9741f298e4d51583,
title = "Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design",
abstract = "Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study.",
keywords = "continuous multivariate responses, correlated responses, empirical likelihood, outcome-dependent sampling, semiparametric",
author = "Tsui-Shan Lu and Longnecker, {Matthew P.} and Haibo Zhou",
year = "2017",
month = "3",
day = "15",
doi = "10.1002/sim.7195",
language = "English",
volume = "36",
pages = "985--997",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "6",

}

TY - JOUR

T1 - Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

AU - Lu, Tsui-Shan

AU - Longnecker, Matthew P.

AU - Zhou, Haibo

PY - 2017/3/15

Y1 - 2017/3/15

N2 - Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study.

AB - Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study.

KW - continuous multivariate responses

KW - correlated responses

KW - empirical likelihood

KW - outcome-dependent sampling

KW - semiparametric

UR - http://www.scopus.com/inward/record.url?scp=85006993231&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006993231&partnerID=8YFLogxK

U2 - 10.1002/sim.7195

DO - 10.1002/sim.7195

M3 - Article

VL - 36

SP - 985

EP - 997

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 6

ER -