class: center, middle, inverse, title-slide # Getting started with R ### Mikhail Dozmorov ### Virginia Commonwealth University ### 08-19-2021 --- # Why programming - Programming will make your academic journey better - Instead of remembering what button you clicked, you write scripts with exact commands that makes steps in your analysis clear and allow others to understand/reproduce it (or, spot mistakes) - Programming forces you to have a deeper understanding of what you are doing, and facilitates your learning and comprehension of the methods you use --- ## Why R? - R is a programming language designed for data analysis and statistics - Free, open-source and cross-platform - An extremely powerful language for statistical modeling, machine learning, data manipulation, and visualization - Efficient data analysis on data of all shapes and sizes (big data including) - Optimized operations on vectors, matrices lists - Very sophisticated graphs and data displays - Not just scripts, but fully reproducible reports, papers, presentations, web applications .small[ https://www.r-project.org/ ] --- ## Why R? - Thousands of packages that add extra functionality. Covering virtually all scientific disciplines and analytical frameworks. - Image analysis, geospatial, epidemiology, genetics, bioinformatics, and a lot more - R can connect to spreadsheets, databases, and many other data formats, on your computer or on the web. - Large and welcoming user community .small[ https://www.r-project.org/ [18,000 CRAN packages, tweet by Dirk Eddelbuettel, 2021-08-11](https://twitter.com/eddelbuettel/status/1425425651092410369?s=20) ] --- ## RStudio - RStudio is an interface IDE, integrated development environment) to work with R, with many features and functionalities for efficient work - Free and open source - Integrates file navigation, visualization, documentation, version control and project management - You write the same R code in RStudio as you would elsewhere, and it executes the same way. RStudio helps by keeping things nicely organized .small[ https://www.rstudio.com/products/rstudio/download/ ] --- ## Why RStudio - Project-centric work - scripts and data are organized in one folder (project), easily accessible - Single workspace with four (rearrangeable, zoomable) panels - Work on multiple projects simultaneously in several instances of RStudio - Work on multiple (types of) scripts - See all variables in R environment, easily visualize them - Easy access to help, plots, packages - Simple integration with Git version control system - **After you install R and RStudio, you only need to run RStudio** --- ## RStudio interface .center[<img src="https://uclouvain-cbio.github.io/WSBIM1207/figs/rstudio-screenshot.png" height=600 >] --- ## RStudio interface RStudio is divided into 4 panes, by default: - **Source** for scripts and documents (top-left) - **Environment/History** (top-right), - **Files/Plots/Packages/Help/Viewer** (bottom-right) - **R Console** (bottom-left) Additional goodies: - Autocompletion - Highlightning - Keyboard shortcuts - Many more --- ## RStudio help .center[<img src="img/rstudio_help.png" height=550 >] --- ## Installing and loading packages ```r install.packages("cowsay") ``` ```r library(cowsay) say('time') ``` ``` ## ## -------------- ## 2021-08-21 13:44:13 ## -------------- ## \ ## \ ## \ ## |\___/| ## ==) ^Y^ (== ## \ ^ / ## )=*=( ## / \ ## | | ## /| | | |\ ## \| | |_|/\ ## jgs //_// ___/ ## \_) ## ``` --- ## Getting started with R ```r install.packages("swirl") ``` ```r library(swirl) ``` ``` ## ## | Hi! Type swirl() when you are ready to begin. ``` --- ## RStudio is more than IDE RStudio PBC develops many now-gold-standard R packages - `tidyverse` – R packages for data science, including ggplot2, dplyr, tidyr, and purrr - `shiny` – An interactive web technology - `rmarkdown` – Insert R code into markdown documents - `knitr` – Dynamic reports combining R, TeX, Markdown & HTML - `tensorflow` - open-source software library for Machine Intelligence - `reticulate` - provides a comprehensive set of - `devtools` – Package development tool --- ## Getting help - Get an overview of all functions in a package: `help(package = "dplyr")` - Use `?function_name` to get help on a function from a _loaded_ package. E.g., `?boxplot` (same as `help(boxplot)`) - Use `example(boxplot)` to see how the function can be used - Use `??function_name` to search for the function across all installed packages, even not loaded. E.g., `??ggplotly` - Search engine is your best friend --- ## References - Introduction to bioinformatics, https://uclouvain-cbio.github.io/WSBIM1207/sec-rrstudio.html - Orientation to programming, R, and RStudio, https://gge-ucd.github.io/R-DAVIS/lesson_intro_r_rstudio.html --- ## Conclusion .center[<img src="img/twitter_advice.png" height=400 >] https://twitter.com/TanentzapfLab/status/1427720047431065601?s=20