One-level multiple Logistic regression analysis of the multi-value nominal data collected from the complex sampling survey design

DOI：10.11886/scjsws20191119005

 作者 单位 E-mail 刘媛媛 天津医科大学公共卫生学院卫生统计学教研室 李长平 天津医科大学公共卫生学院卫生统计学教研室 1067181059@qq.com 胡良平 世界中医药学会联合会临床科研统计学专业委员会

本文目的是介绍复杂抽样调查设计多值名义资料一水平多重logistic回归模型构建，并探讨不同策略之间的差异。采用SAS中的LOGISTIC过程和SURVEYLOGISTIC过程，分别按照是否考虑抽样设计与是否考虑抽样权重共4种分析策略对数据构建广义logistic回归模型，并比较结果。不同分析策略所得结果显示，不仅参数估计值、回归系数标准误、OR值及其置信区间的估计值有所差别，而且对纳入模型的解释变量也有影响。因此，在对复杂抽样调查设计多值名义资料构建广义logistics回归模型时，既要考虑抽样设计，又要兼顾抽样权重，否则即使样本量足够大，也会导致错误的推断结论。

The purpose of this article was to introduce the construction of multiple logistic regression models with multi-value nominal data collected from the complex sampling survey design, and to explore the differences between different strategies. Using the LOGISTIC procedure and the SURVEYLOGISTIC procedure in SAS software, generalized logistics regression models were constructed based on whether the sampling design or the sampling weights were considered, and the results were compared.The results obtained by different analysis strategies showed that not only the values of parameter estimation, the standard error of the regression coefficients, the OR value and its confidence intervals were different, but also the explanatory variables in the established models were also different. When constructing a generalized logistics regression model for multi-value nominal data of complex sampling design, both the sampling design and the sampling weights should be considered. Otherwise, even if the sample size was large enough, it would lead to the erroneous inference conclusions.
附件