Previous |  Up |  Next


Title: Significance tests to identify regulated proteins based on a large number of small samples (English)
Author: Klawonn, Frank
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 48
Issue: 3
Year: 2012
Pages: 478-493
Summary lang: English
Category: math
Summary: Modern biology is interested in better understanding mechanisms within cells. For this purpose, products of cells like metabolites, peptides, proteins or mRNA are measured and compared under different conditions, for instance healthy cells vs. infected cells. Such experiments usually yield regulation or expression values – the abundance or absence of a cell product in one condition compared to another one – for a large number of cell products, but with only a few replicates. In order to distinguish random fluctuations and noise from true regulations, suitable significance tests are needed. Here we propose a simple model which is based on the assumption that the regulation factors follow normal distributions with different expected values, but with the same standard deviation. Before suitable significance tests can be derived from this model, a reliable estimation for the standard deviation in the context of many small samples is needed. We therefore also include a discussion on the properties of the sample MAD (Median Absolute Deviation from the median) and the sample standard deviation for small samples sizes. (English)
Keyword: MAD
Keyword: standard deviation
Keyword: small samples
Keyword: significance test
MSC: 62A10
MSC: 93E12
idMR: MR2975802
Date available: 2012-08-31T15:56:53Z
Last updated: 2013-09-24
Stable URL:
Reference: [1] Anders, S., Huber, W.: Differential expression analysis for sequence count data.Genome Biology 11 (2010), R106. 10.1186/gb-2010-11-10-r106
Reference: [2] Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing.J. Roy. Statist. Soc. Ser. B (Methodological) 57 (1995), 289–300. Zbl 0809.62014, MR 1325392
Reference: [3] Berrar, D. P., Dubitzky, M., Granzow, M., eds.: A Practical Approach to Microarray Data Analysis.Springer, Dordecht 2009.
Reference: [4] Breitwieser, F. P., Müller, A., Dayon, L., Köcher, T., Hainard, A., Pichler, P., Schmidt-Erfurth, U., Superti-Furga, G., Sanchez, J.-C., Mechtler, K., Bennett, K. L., Colinge, J.: General statistical modeling of data from protein relative expression isobaric tags.J. Proteome Res. 10 (2011), 2758–2766. 10.1021/pr1012784
Reference: [5] Croux, C., Rousseuw, P. J.: Alternatives to the median absolute deviation.In: Computational Statistics (Y. Dodge J. and Whittaker, eds.), Physica 1, Heidelberg 1992, pp. 411–428.
Reference: [6] Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor.Springer, New York 2005. Zbl 1142.62100, MR 2201836
Reference: [7] Holm, S.: A simple sequentially rejective multiple test procedure.Scand. J. Statist. 6 (1979), 65–70. Zbl 0402.62058, MR 0538597
Reference: [8] Hundertmark, C., Fischer, R., Reinl, T., May, S., Klawonn, F., Jänsch, J.: MS-specific noise model reveals the potential of iTRAQ in quantitative proteomics.Bioinformatics 25 (2009), 1004–1011. 10.1093/bioinformatics/btn551
Reference: [9] Klawonn, F., Hundertmark, C., Jänsch, L.: A maximum likelihood approach to noise estimation for intensity measurements in biology.In: Proc. Sixth IEEE International Conference on Data Mining: Workshops (S. Tsumoto, C. W. Clifton, N. Zhong, X. Wu, J. Liu, B. W. Wah, and Y.-M. Cheung, eds.), IEEE, Los Alamitos 2006, pp. 180–184.
Reference: [10] Klawonn, F., Wüstefeld, T., Zender, L.: Statistical modelling for data from experiments with short hairpin RNAs.In: Advances in Intelligent Data Analysis IX, Springer, Berlin 2010, pp. 79–90.
Reference: [11] Development Core Team, R.: R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna 2009,
Reference: [12] Robinson, M. D., Oshlack, A.: A scaling normalization method for differential expression analysis of RNA-seq data.Genome Biology 11 (2010), R25. 10.1186/gb-2010-11-3-r25
Reference: [13] Rousseuw, P. J., Croux, C.: Alternatives to the median absolute deviation.J. Amer. Statist. Assoc. 88 (1993), 1273–1283. MR 1245360, 10.1080/01621459.1993.10476408
Reference: [14] Shaffer, J. P.: Multiple gypothesis testing.Ann. Rev. Psych. 46 (1995), 561–584. 10.1146/
Reference: [15] Smyth, G. K.: LIMMA: Linear models for microarray data.In: Bioinformatics and Computational Biology Solutions using R and Bioconductor (R. Gentleman, V. Carey, W. Huber, R. Irizarry, and S. Dudoit, eds.), Springer, New York 2005, pp. 397–420. MR 2201836


Files Size Format View
Kybernetika_48-2012-3_9.pdf 360.7Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo