Previous |  Up |  Next

Article

Keywords:
MAD; standard deviation; small samples; significance test
Summary:
Modern biology is interested in better understanding mechanisms within cells. For this purpose, products of cells like metabolites, peptides, proteins or mRNA are measured and compared under different conditions, for instance healthy cells vs. infected cells. Such experiments usually yield regulation or expression values – the abundance or absence of a cell product in one condition compared to another one – for a large number of cell products, but with only a few replicates. In order to distinguish random fluctuations and noise from true regulations, suitable significance tests are needed. Here we propose a simple model which is based on the assumption that the regulation factors follow normal distributions with different expected values, but with the same standard deviation. Before suitable significance tests can be derived from this model, a reliable estimation for the standard deviation in the context of many small samples is needed. We therefore also include a discussion on the properties of the sample MAD (Median Absolute Deviation from the median) and the sample standard deviation for small samples sizes.
References:
[1] Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biology 11 (2010), R106. DOI 10.1186/gb-2010-11-10-r106
[2] Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B (Methodological) 57 (1995), 289–300. MR 1325392 | Zbl 0809.62014
[3] Berrar, D. P., Dubitzky, M., Granzow, M., eds.: A Practical Approach to Microarray Data Analysis. Springer, Dordecht 2009.
[4] Breitwieser, F. P., Müller, A., Dayon, L., Köcher, T., Hainard, A., Pichler, P., Schmidt-Erfurth, U., Superti-Furga, G., Sanchez, J.-C., Mechtler, K., Bennett, K. L., Colinge, J.: General statistical modeling of data from protein relative expression isobaric tags. J. Proteome Res. 10 (2011), 2758–2766. DOI 10.1021/pr1012784
[5] Croux, C., Rousseuw, P. J.: Alternatives to the median absolute deviation. In: Computational Statistics (Y. Dodge J. and Whittaker, eds.), Physica 1, Heidelberg 1992, pp. 411–428.
[6] Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York 2005. MR 2201836 | Zbl 1142.62100
[7] Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 (1979), 65–70. MR 0538597 | Zbl 0402.62058
[8] Hundertmark, C., Fischer, R., Reinl, T., May, S., Klawonn, F., Jänsch, J.: MS-specific noise model reveals the potential of iTRAQ in quantitative proteomics. Bioinformatics 25 (2009), 1004–1011. DOI 10.1093/bioinformatics/btn551
[9] Klawonn, F., Hundertmark, C., Jänsch, L.: A maximum likelihood approach to noise estimation for intensity measurements in biology. In: Proc. Sixth IEEE International Conference on Data Mining: Workshops (S. Tsumoto, C. W. Clifton, N. Zhong, X. Wu, J. Liu, B. W. Wah, and Y.-M. Cheung, eds.), IEEE, Los Alamitos 2006, pp. 180–184.
[10] Klawonn, F., Wüstefeld, T., Zender, L.: Statistical modelling for data from experiments with short hairpin RNAs. In: Advances in Intelligent Data Analysis IX, Springer, Berlin 2010, pp. 79–90.
[11] Development Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna 2009, http://www.R-project.org
[12] Robinson, M. D., Oshlack, A.: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11 (2010), R25. DOI 10.1186/gb-2010-11-3-r25
[13] Rousseuw, P. J., Croux, C.: Alternatives to the median absolute deviation. J. Amer. Statist. Assoc. 88 (1993), 1273–1283. DOI 10.1080/01621459.1993.10476408 | MR 1245360
[14] Shaffer, J. P.: Multiple gypothesis testing. Ann. Rev. Psych. 46 (1995), 561–584. DOI 10.1146/annurev.ps.46.020195.003021
[15] Smyth, G. K.: LIMMA: Linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor (R. Gentleman, V. Carey, W. Huber, R. Irizarry, and S. Dudoit, eds.), Springer, New York 2005, pp. 397–420. MR 2201836
Partner of
EuDML logo