+ - 0:00:00
Notes for current slide
Notes for next slide

Code organization best practices

Mikhail Dozmorov

Virginia Commonwealth University

09-27-2021

1 / 18

Automating everything

  • Little automation is better than no automation
  • It's more work to do things properly, but it could save you a ton of aggravation down the road

What to automate?

  • what you're trying to do
  • what you're thinking about
  • what you're seeing
  • what you're concluding and why
2 / 18

Good vs. bad programming

  • Bad programmer explains him/herself with comments
  • Good programmer explains him/herself with code
3 / 18

Good code = Clean code

Follow coding conventions

Clean code is

  • Understandable at first glance
  • Neat and elegant
  • Unambigious
  • Not necesserily computationally efficient
  • Self-explanatory
  • Maintainable
4 / 18

Bad code

  • Full of “magic” – variables/values noone can understand
  • Cluttered, or too loose
  • Redundant
  • Poorly commented
  • Does not follow conventions
  • Hardly maintainable

Code represents you – don’t write a bad code

5 / 18

Good vs. Bad code

Is this a good code?

j <- 4
for (i in data) {
if (i == "premium") {
dis <- 20
} else if (j >= 3) {
dis <- 15
} else {
dis <- 0
}
}
6 / 18

Good vs. Bad code

Is this a good code?

Number_Of_Items_In_Basket <- 4
for (Customer_Status in Customer_Records) {
if (Customer_Status == "premium") {
Discount <- 20
} else if (Number_Of_Items_In_Basket >= 3) {
Discount <- 15
} else {
Discount <- 0
}
}
7 / 18

Good variable names

Variable names – nouns

  • informative
  • unambigious
  • descriptive
  • variables are in lower case, constants are in UPPER case
8 / 18

Good variable names

  • Tab completion - Almost all modern text editors provide tab completion, so that typing the first part of a variable name and then pressing the tab key inserts the completed name of the variable. Employing this means that meaningful, longer variable names are no harder to type than terse abbreviations

  • Choose and follow conventions

    • underscore_convention
    • camelCaseConvention
    • dot.convention

https://www.chaseadams.io/posts/most-common-programming-case-types/

9 / 18

Good vs. bad variable names

10 / 18

Good vs. bad variable names

11 / 18

Good function names

  • Function names – verbs
    • “verb first” rule, e.g., print_full_name
    • informative, unambigious, descriptive, etc., as for variables

12 / 18

Good naming practices

  • Short but meaningful
  • Don’t use spaces, either in variable names or file names, use underscore "_" or dot "." instead, e.g., "tcga_first_batch"
  • Avoid leading and trailing spaces within cells, e.g., " pass" or "pass "
  • Avoid special characters, except for underscores and hyphens. Other symbols ($, @, %, #, &, *, (, ), !, /, etc.) often have special meaning in programming languages, and so they can be harder to handle
good name good alternative avoid
Max_temp_C MaxTemp Maximum Temp (◦C)
Precipitation_mm Precipitation precmm
Mean_year_growth MeanYearGrowth Mean growth/year
sex sex M/F
weight weight w.
cell_type CellType Cell type
Observation_01 first_observation 1st Obs.
13 / 18

Refactoring

Refactoring – making better code

  • Make code understandable by other developers. Here we ask ourselves a question; If I would give the code to my grandma, would she understand it?

  • Increase readability of the code = reduce cluttering of the code. Make code loose in tight places and tight in loose places

  • Globally search-and-replace bad variable/function names

14 / 18

Code formatting

  • formatR - Provides a function tidy_source() to format R source code. Spaces and indent will be added to the code automatically, and comments will be preserved under certain conditions, so that R code will be more human-readable and tidy. There is also a Shiny app as a user interface in this package (see tidy_app()).

  • styler - Non-Invasive Pretty Printing of R Code. Pretty-prints R code without changing the user's formatting intent. style_file() or style_dir() functions will automatically format R files.

15 / 18

Computational reproducibility in plain language

  • Write code that uses relative paths.
    • Don't use hard-coded absolute paths (i.e. /Users/stephen/Data/seq-data.csv or C:\Stephen\Documents\Data\Project1\data.txt)
    • Instead, define a variable, e.g., data_dir with the full path to the project's folder and use file.path() to append it to the relative paths
data_dir <- "/Users/stephen/Project/"
fileNameIn1 <- file.path(data_dir, "seq-data.csv")
read.table(fileNameIn1)
16 / 18

Computational reproducibility in plain language

  • Document everything and use code as documentation.

    • Document why you do something, not mechanics
    • Document your methods and workflows
    • Document the origin of all data in your project directory
    • Document when and how you downloaded the data
    • Record data version info
    • Record software version info with session_info()
  • Always set your seed. If you're doing anything that involves random/monte-carlo approaches, always use set.seed().

17 / 18

References

18 / 18

Automating everything

  • Little automation is better than no automation
  • It's more work to do things properly, but it could save you a ton of aggravation down the road

What to automate?

  • what you're trying to do
  • what you're thinking about
  • what you're seeing
  • what you're concluding and why
2 / 18
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow