class: center, middle, inverse, title-slide # Classes and Methods in R ##
https://github.com/rdpeng/CourseraLectures/blob/master/classes-methods.Rnw
### Modified after Roger Peng’s lecture ###
https://youtu.be/93N0HdoZW9g
### 10-11-2021 --- ## Classes and Methods - A system for doing object oriented programming - R is both interactive *and* has a system for object orientation. - Other languages which support OOP (C++, Java, Lisp, Python, Perl) generally speaking are not interactive languages <!-- - In R much of the code for supporting classes/methods is written by John Chambers himself (the creator of the original S language) and documented in the book _Programming with Data: A Guide to the S Language_ - A natural extension of Chambers' idea of allowing someone to cross the user `\(\longrightarrow\)` programmer spectrum --> - Object oriented programming is a bit different in R than it is in most languages - even if you are familiar with the idea, you may want to pay attention to the details --- ## Two styles of classes and methods S3 classes/methods - Essentially a `list` with a `class` attribute on it - Informal, you can assign any class to any list, which is nonsense - Sometimes called *old-style* classes/methods S4 classes/methods - More formal and rigorous - Included with S-PLUS 6 and R 1.4.0 (December 2001) - Also called *new-style* classes/methods --- ## Two worlds living side by side - For now (and the forseeable future), S3 classes/methods and S4 classes/methods are separate systems (but they can be mixed to some degree). - Each system can be used fairly independently of the other. - Developers of new projects (you!) are encouraged to use the S4 style classes/methods. <!-- - But many developers still use S3 classes/methods because they are "quick and dirty" (and easier). - In this lecture we will focus primarily on S4 classes/methods - The code for implementing S4 classes/methods in R is in the **methods** package, which is usually loaded by default (but you can load it with `library(methods)` if for some reason it is not loaded) --> --- ## Object Oriented Programming in R - A *class* is a description of an thing. A class can be defined using `setClass()` in the **methods** package. - An *object* is an instance of a class. Objects can be created using `new()`. - A *method* is a function that only operates on a certain class of objects. - A generic function is an R function which dispatches methods. A generic function typically encapsulates a "generic" concept (e.g. `plot`, `mean`, `predict`, ...) <!-- The generic function does not actually do any computation. --> - A *method* is the implementation of a generic function for an object of a particular class. --- ## Things to look up - The help files for the `methods` package are extensive - do read them as they are the primary documentation - You may want to start with `?Classes` and `?Methods` - Check out `?setClass`, `?setMethod`, and `?setGeneric` - Use `class?` prefix for help on specific classes, e.g., `class?GRanges` <!-- Some of it gets technical, but try your best for now - it will make sense in the future as you keep using it. - Most of the documentation in the **methods** package is oriented towards developers/programmers as these are the primary people using classes/methods --> --- ## Classes All objects in R have a class which can be determined by the class function ```r class(1) ``` ``` ## [1] "numeric" ``` ```r class(TRUE) ``` ``` ## [1] "logical" ``` ```r class(rnorm(100)) ``` ``` ## [1] "numeric" ``` ```r class(NA) ``` ``` ## [1] "logical" ``` ```r class("foo") ``` ``` ## [1] "character" ``` --- ## Classes (cont'd) Data classes go beyond the atomic classes ```r x <- rnorm(100) y <- x + rnorm(100) fit <- lm(y ~ x) ## linear regression model class(fit) ``` ``` ## [1] "lm" ``` ```r names(fit) ``` ``` ## [1] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [9] "xlevels" "call" "terms" "model" ``` ```r isS4(fit) ``` ``` ## [1] FALSE ``` --- ## Classes (cont'd) ```r getClass("lm") ``` ``` ## Virtual Class "lm" [package "methods"] ## ## Slots: ## ## Name: .S3Class ## Class: character ## ## Extends: "oldClass" ## ## Known Subclasses: ## Class "mlm", directly ## Class "aov", directly ## Class "glm", directly ## Class "maov", by class "mlm", distance 2 ## Class "glm.null", by class "glm", distance 2 ``` --- ## Generics/Methods in R - S4 and S3 style generic functions look different but conceptually, they are the same (they play the same role). - When you program you can write new methods for an existing generic OR create your own generics and associated methods. - Of course, if a data type does not exist in R that matches your needs, you can always define a new class along with generics/methods that go with it --- ##An S3 generic function (in the `base` package) The `mean` function is generic ```r mean ``` ``` ## function (x, ...) ## UseMethod("mean") ## <bytecode: 0x7f90107b2a08> ## <environment: namespace:base> ``` So is the `print` function ```r print ``` ``` ## function (x, ...) ## UseMethod("print") ## <bytecode: 0x7f8ff61573b0> ## <environment: namespace:base> ``` --- ##S3 methods ```r methods("mean") ``` ``` ## [1] mean.Date mean.default mean.difftime mean.POSIXct mean.POSIXlt ## [6] mean.quosure* ## see '?methods' for accessing help and source code ``` --- ##An S4 generic function (from the `methods` package) The S4 equivalent of `print` is `show` ```r show ``` ``` ## standardGeneric for "show" defined from package "methods" ## ## function (object) ## standardGeneric("show") ## <bytecode: 0x7f8fe6a9a138> ## <environment: 0x7f9006d7cee8> ## Methods may be defined for arguments: object ## Use showMethods(show) for currently available ones. ## (This generic function excludes non-simple inheritance; see ?setIs) ``` The `show` function is usually not called directly (much like `print`) because objects are auto-printed --- ##S4 methods Think of S4 methods as simple functions that perform certain task depending on class ``` mimicMethod <- function(x) { if (is(x, "matrix")) method1(x) if (is(x, "data.frame")) method2(x) if (is(x, "IRanges")) method3(x) } ``` <!-- There are many different methods for the `show` generic function showMethods("show") --> --- ## Generic/method mechanism The first argument of a generic function is an object of a particular class (there may be other arguments) - The generic function checks the class of the object. - A search is done to see if there is an appropriate method for that class. - If there exists a method for that class, then that method is called on the object and we're done. - If a method for that class does not exist, a search is done to see if there is a default method for the generic. If a default exists, then the default method is called. - If a default method doesn't exist, then an error is thrown. --- ## Examining Code for Methods Examining the code for an S3 or S4 method requires a call to a special function - You cannot just print the code for a method like other functions because the code for the method is usually hidden. - If you want to see the code for an S3 method, you can use the function `getS3method`. - The call is `getS3method(<generic>, <class>)` - For S4 methods you can use the function `getMethod` - The call is `getMethod(<generic>, <signature>)` (more details later) --- ## Examining Code for Methods ``` > showMethods("mean") Function: mean (package base) x="ANY" x="AtomicList" x="CompressedIntegerList" x="CompressedLogicalList" x="CompressedNumericList" x="CompressedRleList" x="Rle" x="Views" ``` --- ##S3 Class/Method: Example 1 What's happening here? ```r set.seed(2) x <- rnorm(100) mean(x) ``` ``` ## [1] -0.03069816 ``` - The class of x is `numeric` - But there is no mean method for `numeric` objects! - So we call the default function for `mean`. --- ##S3 Class/Method: Example 1 ```r getS3method("mean", "default") ``` ``` ## function (x, trim = 0, na.rm = FALSE, ...) ## { ## if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) { ## warning("argument is not numeric or logical: returning NA") ## return(NA_real_) ## } ## if (na.rm) ## x <- x[!is.na(x)] ## if (!is.numeric(trim) || length(trim) != 1L) ## stop("'trim' must be numeric of length one") ## n <- length(x) ## if (trim > 0 && n) { ## if (is.complex(x)) ## stop("trimmed means are not defined for complex data") ## if (anyNA(x)) ## return(NA_real_) ## if (trim >= 0.5) ## return(stats::median(x, na.rm = FALSE)) ## lo <- floor(n * trim) + 1 ## hi <- n + 1 - lo ## x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi] ## } ## .Internal(mean(x)) ## } ## <bytecode: 0x7f8fe606f2a8> ## <environment: namespace:base> ``` --- ##S3 Class/Method: Example 2 What happens here? ```r set.seed(3) df <- data.frame(x = rnorm(100), y = 1:100) sapply(df, mean) ``` ``` ## x y ## 0.01103557 50.50000000 ``` - The class of df is `data.frame`; in a data frame each column can be an object of a different class - We `sapply` over the columns and call the `mean` function - In each column, `mean` checks the class of the object and dispatches the appropriate method. - Here we have a `numeric` column and an `integer` column; in both cases `mean` calls the default method --- ## Calling Methods - Some methods are visible to the user (i.e. `mean.default`) - You should **never** call methods directly - Use the generic function and let the method be dispatched automatically --- ##S3 Class/Method: Example 3 The `plot` function is generic and its behavior depends on the object. ```r set.seed(10) x <- rnorm(100) plot(x) ``` ![](S4_classes_methods_files/figure-html/plotdefault-1.png)<!-- --> --- ##S3 Class/Method: Example 3 For time series objects, `plot` connects the dots ```r set.seed(10) x <- rnorm(100) y <- x + rnorm(100) fit <- lm(y ~ x) ## linear regression model par(mfrow = c(1, 4)) plot(fit) ``` ![](S4_classes_methods_files/figure-html/plotts-1.png)<!-- --> --- ## Write your own methods! If you write new methods for new classes, you'll probably end up writing methods for the following generics: - print/show - summary - plot There are two ways that you can extend the R system via classes/methods - Write a method for a new class but for an existing generic function (i.e. like `print`) - Write new generic functions and new methods for those generics --- ## S4 Classes Why would you want to create a new class? - To represent new types of data (e.g. gene expression, space-time, hierarchical, sparse matrices) - New concepts/ideas that haven't been thought of yet (e.g. a fitted point process model, mixed-effects model, a sparse matrix) - To abstract/hide implementation details from the user <!-- I say things are `new` meaning that R does not know about them (not that they are new to the statistical community).--> --- ## S4 Class/Method: Creating a New Class A new class can be defined using the `setClass` function - At a minimum you need to specify the name of the class - You can also specify data elements that are called *slots* - You can then define methods for the class with the `setMethod` function - Information about a class definition can be obtained with the `showClass` function --- ##S4 Class/Method: Polygon Class Creating new classes/methods is usually not something done at the console; you likely want to save the code in a separate file ```r setClass("polygon", representation(x = "numeric", y = "numeric")) ``` The slots for this class are `x` and `y`. The slots for an S4 object can be accessed with the `@` operator. You should never have to access slots directly. You should get/set data out of the class slot using the "accessor" functions (getters and setters). --- ##S4 Class/Method: Polygon Class - A `plot` method can be created with the `setMethod` function. - Specify a generic function(`plot`), and a class *signature*. <!-- A signature is a character vector indicating the classes of objects that are accepted by the method. In this case, the `plot` method will take one type of object--a `polygon` object. --> ```r setMethod("plot", "polygon", function(x, y, ...) { plot(x@x, x@y, type = "n", ...) xp <- c(x@x, x@x[1]) yp <- c(x@y, x@y[1]) lines(xp, yp) }) ``` Note that the slots of the polygon (the x- and y-coordinates) are accessed with the `@` operator. If things go well, you will not get any messages or errors and nothing useful will be returned by either `setClass` or `setMethod`. --- ##S4 Class/Method: Polygon Class After calling `setMethod` the new `plot` method will be added to the list of methods for `plot`. ```r showMethods("plot") ``` ``` ## Function: plot (package base) ## x="ANY" ## x="polygon" ``` Note that the signature for class `polygon` is listed. The method for `ANY` is the default method and it is what is called when now other signature matches --- ##S4 Class/Method: Polygon class ```r p <- new("polygon", x = c(1, 2, 3, 4), y = c(1, 2, 3, 1)) plot(p) ``` ![](S4_classes_methods_files/figure-html/showplotpolygon-1.png)<!-- --> --- ## Class inheritance - Class inheritance is used when you define a new class which “is almost like this other class but with a little twist”. - For example `ExpressionSet` inherits from `eSet`, and when you look at the class definition you cannot easily see a difference. The difference is that `ExpressionSet` is meant to contain expression data and has the `exprs()` accessor. --- ## References - https://github.com/MalteThodberg/S4-Bioconductor - a collection of references by Malte Thodberg - https://github.com/nullranges/nullranges - an R package for generation of null ranges via bootstrapping or covariate matching - [Reference classes](http://adv-r.had.co.nz/R5.html)