11 Continued learning
11.1 Free online books:
11.1.1 Beginner
- R for Data Science: Excellent open and online resource for using R for data analysis and data science.
- Fundamentals of Data Visualization: Excellent online resource for using ggplot2 and R graphics. The book mostly focuses on concepts and theory of how to visualize, rather than the practicalities (i.e. no coding involved).
- ModernDive: Statistical Inference via Data Science: Great book on using statistics and data science methods in R.
- Happy Git and GitHub for the useR (highly recommended): Specifically useful is the chapter on Daily Workflows using Git.
- Data Visualization: A practical introduction: A book that goes into practical as well as conceptual detail on how and why to make certain graphs, given your data.
- Course material for a statistics class: Excellent course material for teaching statistics and R.
- ModernDive: Statistical Inference via Data Science: Great book on using statistics and data science methods in R
- Data Skills for Reproducible Research: A book-format resource for learning about reproducible research practices, mainly aimed at psychology students.
- Data wrangling, exploration, and analysis with R: A foundational course, originally designed by Jenny Bryan (of Posit and tidyverse) for students in the life sciences at UBC, but now used by students in many fields.
11.1.2 Intermediate and above
- Efficient R Programming: Excellent book on being efficient when writing R code.
- Advanced R: Detailed book on advanced features of R.
- R Packages: Learn how to create R Packages from the basics.
- R Programming for Data Science: Great overview of using R for Data Science, with more of a focus on the programming side of things
- rstats.wtf: What They Forgot to Teach You About R, a book on more advanced topics in R.
- Data Science: A book on data science using R, with a focus on the tidyverse.
- R Packages: A book on how to create R packages. A very useful reference to use for when you want to create your own package.
11.2 Quick references:
- RStudio cheatsheets: Multiple, high-quality cheatsheets you can print off to use as a handy reference.
- Tidyverse style guide: To learn about how to write well-styled code in R.
- Tidyverse design philosophy of writing code
11.3 Articles:
- Good enough practicies in scientific computing: An article listing and describing some practices to use when writing code.
- Best practices in scientific computing.
- Case study of reproducible methodds in Bioinformatics: [@Kim2018a].
- Our path to better science in less time using open data science tools article
11.4 General sites:
- Organizing R Source Code.
- Hands-on tutorial for learning Git, in a web-based terminal.
- Simpler, first-steps guide to using Git.
- RStudio tutorial on using R Markdown.
- Markdown syntax guide.
- Pandoc Markdown Manual (R Markdown uses Pandoc).
- Adding citations in R Markdown.
- Case studies and lessons for doing reproducibility
11.5 Interactive sites or resources for hands-on learning:
11.6 Videos:
- Video on using Git in RStudio.
11.7 Getting help:
- StackOverflow for tidyr.
- StackOverflow for dplyr.
- StackOverflow for ggplot2.
- Tip: Combine auto-completion with
::
to find new functions and documentation on the functions (e.g. try typingbase::
and then hitting Tab to show a list of all functions found in base R). - Oh Shit Git!: A resource for dealing with Git issues.
11.8 Teaching:
- Openscapes Champions Lesson Series: Learning materials for being a teacher.
- Framework for Open and Reproducible Research Training: A great set of resources for learning about how and why to teach open and reproducible research.
- Post: Why beginners should teach
11.9 Examples:
These are some real world examples of how Git and GitHub are used in health research, some of which also use R and incorporate reproducibility.
- Some research projects at Steno Aarhus using the UK Biobank data:
- ukbAid: An R package and documentation for streamlining the use of the UK Biobank data on the DNAnexux platform
- legliv: The Association between Substitution of Red Meat with Legumes and Risk of Primary Liver Cancer in UK Biobank: A Cohort Study
- leha: Legumes as a substitute for red and processed meat, poultry, or fish, and the risk of non-alcoholic fatty liver disease in a large cohort
- Research projects at Steno Diabetes Center Aarhus:
- LIVING Project: A national evaluation of the patient education concept Lev Livet
- DP-Next: Sustainable Type 2 Diabetes Prevention for the 21st Century
- Seedcase Project: A framework for open and scalable data: Software and training to bring data engineering to research
- ON-LiMiT: Remission of type 2 diabetes with diet and exercise
These are, naturally, biased to project at Steno Aarhus since that is where the lead instructor works. But if you have any examples to add here, please let us know!