报告时间:526日上午10:00-11:00

报告地点:精正楼(数学楼)306

报告人:南开大学周永道教授


报告摘要Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to the original data, we propose a criterion, generalized empirical F-discrepancy (GEFD), and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs. These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs. By simulation examples and a real case study, we show that the proposed subsampling method enjoys the model-free property and is superior to the random sampling method. In practice, such a model-free property is more appealing than the model-based subsampling methods, where the latter may have poor performance when the model is misspecified, as demonstrated in our simulation studies.


个人简介:

   周永道,男,南开大学统计与数据科学学院教授、博导,入选国家级、天津市和南开大学等人才项目。研究方向为试验设计和数据挖掘。主持过四项国家自然科学基金、一项天津市自然科学基金重点项目及其它多项纵横向项目。在统计学顶级期刊 JASABiometrika 及中国科学等国内外重要期刊发表学术论文40多篇;在Springer、科学出版社和高教出版社合作出版了三部中英文专著和两部统计学专业教材。曾获国家统计局统计科学研究优秀成果奖一等奖。曾访问加州大学洛杉矶分校、西蒙菲莎大学、曼彻斯特大学、香港大学等高校。现为中国数学会均匀设计分会秘书长、泛华统计协会永久会员、美国《数学评论》评论员。



邀请人:唐煜