Detection of Possible Reading Frame Shifts in Genes Using Triplet Frequencies Homogeneity

Authors

  • Valentina Rudenko Bioengineering Centre of RAS, Moscow NRNU MEPHi, Moscow
  • Yulia Suvorova Bioengineering Centre of RAS, Moscow
  • Eugene Korotkov Bioengineering Centre of RAS, Moscow NRNU MEPHi, Moscow

DOI:

https://doi.org/10.17713/ajs.v40i1&2.205

Abstract

A new approach for detecting the reading frame shifts in coding DNA sequences has been developed. To detect the shift the hypothesis of homogeneity of triplet frequencies along the sequence was checked. The statistical significance was estimated by using Monte Carlo method. The method developed has allowed revealing 25% more cases of frame shifts in
sequences with a length greater than 300, than the approach used earlier.

References

Bennetzen, J. L., and Hall, B. D. (1982). Codon selection in yeast. The Journal of Biological Chemistry, 257, 3026-3031.

Boys, R. J., and Henderson, D. A. (2004). A Bayesian approach to DNA sequence segmentation. Biometrics, 60, 573-588.

Braun, J. V., Braun, R. K., and Müller, H.-G. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika, 87, 301-314.

Claverie, J. M. (1993). Detecting frame shifts by amino acid sequence comparison. Journal of Molecular Biology, 234, 1140-1157.

Fichant, G. A., and Quentin, Y. (1995). A frameshift error detection algorithm for DNA sequencing projects. Nucleic Acids Research, 23, 2900-2908.

Filina, M. V., and Zubkov, A. M. (2008). Exact computation of Pearson statistics distribution and some experimental results. Austrian Journal of Statistics, 37, 129-135.

Korotkov, E. V., and Korotkova, M. A. (2010). Study of the triplet periodicity phase shifts in genes. Journal of Integrative Bioinformatics, 7, 131-142.

Kullback, S. (1959). Information Theory and Statistics. New York: John Wiley & Sons.

Posfai, J., and Roberts, R. J. (1992). Finding errors in DNA sequences. Proceedings of the National Academy of Sciences, 89, 4698-4702.

Salem, I. H., Kamoun, F., Louhichi, N., Rouis, S., Mziou, M., Fendri-Kriaa, N., et al. (2010). Mutations in LAMA2 and CAPN3 genes associated with genetic and phenotypic heterogeneities within a single consanguineous family involving both congenital and progressive muscular dystrophies. Bioscience Reports, 31, 125-135.

Schiex, T., Gouzy, J., Moisan, A., and Oliveira, Y. de. (2003). FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences. Nucleic Acids Research, 31, 3738-3741.

Sprinthall, R. C. (2002). Basic Statistical Analysis (7th ed.). Boston: Allyn & Bacon.

Stallmeyer, B., Fenge, H., Nowak-Gottl, U., and Schulze-Bahr, E. (2010). Mutational spectrum in the cardiac transcription factor gene NKX2.5 (CSX) associated with congenital heart disease. Clinical Genetics, 78, 533-540.

Watson, J. D., Baker, T. A., Bell, S. P., Gann, A., Michael, L., Richard, L., et al. (2007). Molecular Biology of the Gene (6th ed.). San Francisco: Benjamin Cummings.

Wolfe, D. A., and Chen, Y. S. (1990). The change point problem in a multinomial sequence. Communication in Statistics – Computation and Simulation, 19, 603-618.

Downloads

Published

2016-02-24

How to Cite

Rudenko, V., Suvorova, Y., & Korotkov, E. (2016). Detection of Possible Reading Frame Shifts in Genes Using Triplet Frequencies Homogeneity. Austrian Journal of Statistics, 40(1&2), 137–146. https://doi.org/10.17713/ajs.v40i1&2.205

Issue

Section

Articles