计算统计学

计算统计学或统计计算是统计学与计算机科学之间的纽带，是指通过计算方法实现的统计方法。计算统计学是计算科学中专门针对统计学数学科学的领域，目前还在迅速发展，因此有人呼吁在普通统计教育中教授更广泛的计算概念。^[1]

与传统统计学一样，其目标是将原始数据转化为知识，^[2]而重点在于计算机密集型统计方法，例如样本量非常大的情形与非齐性数据集等。^[2]

“计算统计学”（computational statistics）与“统计计算”（statistical computing）两词常常混用，国际统计计算协会前主席Carlo Lauro建议加以区分，“统计计算”可定义为“计算机科学在统计学中的应用”，“计算统计学”则定义为“在计算机上实现统计方法的算法的设计，包括前计算机时代无法想象的算法（如自助法、蒙特卡洛方法等），并应对用分析难以解决的问题”。^[3]

“计算统计学”也可指计算密集型统计方法，如重抽样、马尔可夫链蒙特卡洛、局部回归、核密度估计、人工神经网络与广义加性模型。

历史

虽然计算统计学在今天得到了广泛应用，但在统计学界被接受的历史其实相对较短。大多数情况下，统计领域的奠基人在开发计算统计方法时依赖数学与渐进逼近。^[4]

统计学领域中，“计算机”（computer，即字面上的“计算用的机器”）一词首次出现于Robert P. Porter于1891年发表在《美国统计协会杂志》（Journal of the American Statistical Association）中的一篇文章，文章讨论了赫尔曼·霍利里思的机器在美国第11次人口普查中的使用情况。^{[来源请求]}赫尔曼·霍利里思的机器又叫穿孔制表机（tabulating machine），是电动机械学机器，用于协助汇总存储在打孔卡上的信息。发明者赫尔曼·霍利里思（1860年2月29日 – 1929年11月7日）是美国商人、发明家、统计学家，穿孔制表机于1884年获得专利，用在了美国1890年的人口普查中。1880年普查大约有5000万人参与，用了7年多时间才完成制表工作；而1890年普查时，人口有超过6200万，却只用了不到一年时间。这标志着机械化计算统计与半自动数据处理系统时代的开端。 1908年，威廉·戈塞进行了现在广为人知的蒙特卡洛模拟，从而发现了学生t-分布。^[5]在计算方法的帮助下，他还绘制了经验分布图与相应的理论分布图。计算机给模拟带来了革命性变化，使复制戈塞的实验变得不过是一种练习。^[6]^[7]

后来，科学家们提出了生成伪随机性偏差的计算方法，用逆累积分布函数或接受-拒绝方法将均匀偏差转换为其他分布形式，并开发了马尔可夫链蒙特卡洛的状态空间方法。^[8]1947年，兰德公司首次尝试全自动生成随机数，生成的随机数表整合为《百万乱数表》，于1955年出版。

到20世纪50年代中期，已经有多篇文章和专利提出了随机数生成器的设备，^[9]其开发源于用随机数进行模拟和统计分析中其他基本组成的需要，其中最著名的是ERNIE，它产生的随机数决定了英国发行的彩票债券Premium Bond的中奖者。1958年，约翰·图基发明了大折刀（jackknife），是一种在非标准条件下减少样本参数估计偏差的方法。^[10]这就需要计算机操作，至此，计算机使很多繁琐的统计研究变得可行。^[11]

方法

最大似然估计

最大似然估计用于根据观测数据估计假定概率分布的参数。其方法是最大化似然函数，使观测数据在假定的统计模型下最有可能实现。

蒙特卡洛法

蒙特卡洛法是依靠重复随机抽样获得数值结果的统计方法，其概念是利用随机性解决原则上确定性的问题，常用于物理学与数学问题，在难以使用其他方法是往往有效。蒙特卡洛法主要用于三类问题：最优化、数值积分与从概率分布中生成抽样。

马尔可夫链蒙特卡洛

马尔可夫链蒙特卡洛方法从连续随机变量中创建样本，概率分布与已知函数成正比。这些样本可用于估计变量的积分，如其期望值或方差。包含的步骤越多，样本分布就越接近实际预期分布。

应用

协会

国际统计计算协会

另见

参考文献

^ Nolan, D. & Temple Lang, D. (2010). "Computing in the Statistics Curricula", The American Statistician 64 (2), pp.97-107.
^ ^2.0 ^2.1 Wegman, Edward J. “Computational Statistics: A New Agenda for Statistical Theory and Practice. （页面存档备份，存于互联网档案馆）” Journal of the Washington Academy of Sciences （页面存档备份，存于互联网档案馆）, vol. 78, no. 4, 1988, pp. 310–322. JSTOR
^ Lauro, Carlo, Computational statistics or statistical computing, is that the question?, Computational Statistics & Data Analysis, 1996, 23 (1): 191–193, doi:10.1016/0167-9473(96)88920-1
^ Watnik, Mitchell. Early Computational Statistics. Journal of Computational and Graphical Statistics. 2011, 20 (4): 811–817 [2024-02-06]. ISSN 1061-8600. S2CID 120111510. doi:10.1198/jcgs.2011.204b. （原始内容存档于2023-12-21）（英语）.
^ "Student" [William Sealy Gosset]. The probable error of a mean (PDF). Biometrika. 1908, 6 (1): 1–25 [2024-02-06]. JSTOR 2331554. doi:10.1093/biomet/6.1.1. hdl:10338.dmlcz/143545. （原始内容存档 (PDF)于2008-03-08）.
^ Trahan, Travis John. Recent Advances in Monte Carlo Methods at Los Alamos National Laboratory. 2019-10-03. OSTI 1569710. doi:10.2172/1569710.
^ Metropolis, Nicholas; Ulam, S. The Monte Carlo Method. Journal of the American Statistical Association. 1949, 44 (247): 335–341. ISSN 0162-1459. PMID 18139350. doi:10.1080/01621459.1949.10483310.
^ Robert, Christian; Casella, George. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data. Statistical Science. 2011-02-01, 26 (1). ISSN 0883-4237. S2CID 2806098. arXiv:0808.2902  . doi:10.1214/10-sts351  .
^ Pierre L'Ecuyer. History of uniform random number generation (PDF). 2017 Winter Simulation Conference (WSC). 2017: 202–230 [2024-02-06]. ISBN 978-1-5386-3428-8. S2CID 4567651. doi:10.1109/WSC.2017.8247790. （原始内容存档 (PDF)于2022-08-04）.
^ QUENOUILLE, M. H. Notes on Bias in Estimation. Biometrika. 1956, 43 (3–4): 353–360. ISSN 0006-3444. doi:10.1093/biomet/43.3-4.353.
^ Teichroew, Daniel. A History of Distribution Sampling Prior to the Era of the Computer and its Relevance to Simulation. Journal of the American Statistical Association. 1965, 60 (309): 27–49. ISSN 0162-1459. doi:10.1080/01621459.1965.10480773.

外部链接

协会

International Association for Statistical Computing （页面存档备份，存于互联网档案馆）
Statistical Computing section of the American Statistical Association （页面存档备份，存于互联网档案馆）

期刊

Computational Statistics & Data Analysis （页面存档备份，存于互联网档案馆）
Journal of Computational & Graphical Statistics
Statistics and Computing （页面存档备份，存于互联网档案馆）

[1] Nolan, D. & Temple Lang, D. (2010). "Computing in the Statistics Curricula", The American Statistician 64 (2), pp.97-107.

[:0-2] 2.0 ^2.1 Wegman, Edward J. “Computational Statistics: A New Agenda for Statistical Theory and Practice. （页面存档备份，存于互联网档案馆）” Journal of the Washington Academy of Sciences （页面存档备份，存于互联网档案馆）, vol. 78, no. 4, 1988, pp. 310–322. JSTOR

[3] Lauro, Carlo, Computational statistics or statistical computing, is that the question?, Computational Statistics & Data Analysis, 1996, 23 (1): 191–193, doi:10.1016/0167-9473(96)88920-1

[4] Watnik, Mitchell. Early Computational Statistics. Journal of Computational and Graphical Statistics. 2011, 20 (4): 811–817 [2024-02-06]. ISSN 1061-8600. S2CID 120111510. doi:10.1198/jcgs.2011.204b. （原始内容存档于2023-12-21）（英语）.

[5] "Student" [William Sealy Gosset]. The probable error of a mean (PDF). Biometrika. 1908, 6 (1): 1–25 [2024-02-06]. JSTOR 2331554. doi:10.1093/biomet/6.1.1. hdl:10338.dmlcz/143545. （原始内容存档 (PDF)于2008-03-08）.

[6] Trahan, Travis John. Recent Advances in Monte Carlo Methods at Los Alamos National Laboratory. 2019-10-03. OSTI 1569710. doi:10.2172/1569710.

[7] Metropolis, Nicholas; Ulam, S. The Monte Carlo Method. Journal of the American Statistical Association. 1949, 44 (247): 335–341. ISSN 0162-1459. PMID 18139350. doi:10.1080/01621459.1949.10483310.

[8] Robert, Christian; Casella, George. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data. Statistical Science. 2011-02-01, 26 (1). ISSN 0883-4237. S2CID 2806098. arXiv:0808.2902  . doi:10.1214/10-sts351  .

[9] Pierre L'Ecuyer. History of uniform random number generation (PDF). 2017 Winter Simulation Conference (WSC). 2017: 202–230 [2024-02-06]. ISBN 978-1-5386-3428-8. S2CID 4567651. doi:10.1109/WSC.2017.8247790. （原始内容存档 (PDF)于2022-08-04）.

[10] QUENOUILLE, M. H. Notes on Bias in Estimation. Biometrika. 1956, 43 (3–4): 353–360. ISSN 0006-3444. doi:10.1093/biomet/43.3-4.353.

[11] Teichroew, Daniel. A History of Distribution Sampling Prior to the Era of the Computer and its Relevance to Simulation. Journal of the American Statistical Association. 1965, 60 (309): 27–49. ISSN 0162-1459. doi:10.1080/01621459.1965.10480773.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]