Primer to analysis of genomic data using R

The focus of this introductory course is on using R for analysis of modern genomic data. Molecular data has grown to unwieldy dimensions over the last decade. How to handle, analyse and make sense of these large datasets is quite a challenge. This is an applied hands-on course.

GO

main topics

  • Understanding the basics of R needed to analyse genomic data
  • R packages available for genomic analysis
  • How to work with and manipulate large data files
  • The importance of pre-processing and quality control with genomic data
  • Principles of genome wide association studies
  • Principles of genomic prediction
  • Population genetics analyses
  • Principles of gene expression analysis
  • Functional annotation
  • Working with public databases to extract biological information
  • Automation of analyses
  • Improving performance of R

why R?

In recent years R has become de facto statistical programming language of choice for statisticians and it is widely used to teach statistic courses at universities. It is also arguably the most widely used environment for analysis of high throughput genomic data and in particular for gene expression analyses. R's main strength lies in the literally thousands of packages freely available from repositories such as CRAN or Bioconductor which build on the core platform. Chances are that there already is an off the shelf package available for a particular task. And, since R is a scripted language it is very easy to essentially assemble various packages, add some personalized routines and chain-link it all into a full analysis pipeline all the way from raw data to final report. This of course dramatically reduces development and deployment times for complex analyses. During the course we will put quite a lot of emphasis on analytical pipelines, instead of step by step analyses - quite critical to maintain our sanity in the genomic era.

To get started just download practicals from the DOWNLOAD link above. The pre-course instructions will help setup the computer with the various bits of software needed. The course is based on the book Primer to Analysis of Genomic Data Using R published by Springer in the Use R! series.

some other tutorial style publications for R

Gondro, C., L.R. Porto-Neto and S.H. Lee (2013). "R for Genome Wide Association Studies". Genome-Wide Association Studies and Genomic Prediction. C. Gondro, J.H.J. van der Werf and B. Hayes. Methods in Molecular Biology, Springer: 1-18.

Gondro, C., S.H. Lee, H.K. Lee and L.R. Porto Neto (2013). "Quality Control for Genome Wide Association Studies". Genome-Wide Association Studies and Genomic Prediction. C. Gondro, J.H.J. van der Werf and B. Hayes. Methods in Molecular Biology, Springer: 129:148.

Porto Neto, L.R., S.H. Lee, H.K. Lee and C. Gondro (2013). "Signatures of Selection and Population Diversity Measures". Genome-Wide Association Studies and Genomic Prediction. C. Gondro, J.H.J. van der Werf and B. Hayes. Methods in Molecular Biology, Springer: 423-436.

Gondro, C. and P. Kwan (2012). "Parallel evolutionary computation in R". Multidisciplinary Computational Intelligence Techniques: Applications in Business, Engineering and Medicine. S. Ali, N. Abbadeni and M. Batouche. IGI Global: 351-377.

acknowledgements

Thanks to all the students who took the course and provided valuable input into its structure and contents.

assistance

If you have any comments, questions, suggestions or find any bugs please let me know. To organize a short couse at your institution just contact me at the email address below.

Email: cgondro2@une.edu.au