Abstract. This vignette explains how new classes of objects can be created, using oce objects as a base class. The advantage of this is that the newly-formed objects will automatically have important properties of oce objects, in terms of operators such as [[
and [[<-
, functions such as subset()
and summary()
, schemes for handling units, etc. The treatment centres on the creation of a hypothetical class called wave
, which will hold time-series of elevation data.
The setClass
function is the key to defining a new class. Entering ?setClass
in an R console reveals the details of this function, including some notes on how it has evolved since R version 3.0.
We need only a simple form here, with
being enough to create a new class called wave
that inherits the base features of the oce
class.
To create a new object that inherits from the wave
class, use e.g.
We can see that w
inherits from the oce
class with
and a check that is common to see in code is
The contents of the object are revealed with
str(w)
#> Formal class 'wave' [package ".GlobalEnv"] with 3 slots
#> ..@ metadata :List of 2
#> .. ..$ units: list()
#> .. ..$ flags: list()
#> ..@ data : list()
#> ..@ processingLog:List of 2
#> .. ..$ time : POSIXct[1:1], format: "2019-06-16 14:44:31"
#> .. ..$ value: chr "Create oce object"
Notice that w
three oce
“slots”, named metadata
, data
and processingLog
. These are inherited from oce
. The first is meant to hold information about the data, such as a file-name, an instrument number, a location of sampling, etc. The second is meant to hold actual data or measurements. And the third, not normally accessed by the user directly, holds information about the object’s evolution (note that the object considers itself an oce
object). The names of these three slots are usually enough to keep them straight in the analyst’s head, although it can sometimes be difficult deciding whether something belongs in metadata
or data
.
The oce
system also provides its objects with certain operators and functions. For example, [[
can be used to retrieve the slots, items within the slots, and (in some cases) values that may be calculated from the contents of the slots (e.g. ctd
and similar objects related to hydrography can return calculated potential temperature or Conservative Temperature, even though neither is typically stored in such datasets).
That’s not the only thing. All oce
objects give special powers to the [[
operator, e.g. we can retrieve the metadata
slot with
and the same for the data slot
The [[<-
operator can be used to fill the slots with information, e.g. we could insert a station location of "STN01"
with
and verify that this worked with
str(w)
#> Formal class 'wave' [package ".GlobalEnv"] with 3 slots
#> ..@ metadata :List of 3
#> .. ..$ units : list()
#> .. ..$ flags : list()
#> .. ..$ station: chr "STN01"
#> ..@ data : list()
#> ..@ processingLog:List of 2
#> .. ..$ time : POSIXct[1:1], format: "2019-06-16 14:44:31"
#> .. ..$ value: chr "Create oce object"
However, there is a better way to insert metadata, with the oceSetMetadata
function, e.g.
sets the serial number to 1234, and
str(w[["metadata"]])
#> List of 4
#> $ units : list()
#> $ flags : list()
#> $ station : chr "STN01"
#> $ serialNumber: num 1234
verifies that this worked.
Now, let’s insert some data. Imagine a half-minute dataset with 10Hz sampling, for a signal with a \(1\)m elevation wave with period \(10\)s, plus some noise of order \(1\)cm.
t <- as.POSIXct("2019-01-01 00:00:00", tz="UTC") + seq(0, 30, length.out=100)
tau <- 10
e <- sin(as.numeric(2 * pi * as.numeric(t) / tau)) + rnorm(t, sd=0.01)
(Notice that we are not using the R ts
form to make this time-series. This is because oceanographic data are commonly acquired on an irregular time interval, so it makes sense to store observation time explicitly, instead of using the start/step/stop approach of the ts
scheme.)
These data may be inserted into our object with
At this point the reader is likely to use str
to see if this worked, but since the object is starting to fill up, it might make sense to use the summary
function, which is inherited from oce
.
summary(w)
#> * Time ranges from 2019-01-01 to 2019-01-01 00:00:30 with 100 samples and mean increment 0.3030303 s
#> * Data
#>
#> Min. Mean Max. Dim. OriginalName
#> elevation -1.0054 -1.7484e-06 1.015 100 -
#>
#> * Processing Log
#> - 2019-06-16 11:44:31 UTC: `Create oce object`
#> - 2019-06-16 11:45:38 UTC: `oceSetMetadata(object = w, name = "serialNumber", value = 1234)`
#> - 2019-06-16 11:45:38 UTC: `oceSetData(object = w, name = "time", value = t)`
#> - 2019-06-16 11:45:38 UTC: `oceSetData(object = w, name = "elevation", value = e)`
This produces a useful summary not just of the data, but also of how the object was constructed. But we can do better. In some crazy world, someone might consider measuring elevation in feet, not metres, and so we ought to specify the unit. The way to do this is with the unit
argument of oceSetData
. This is a somewhat tricky argument, as a study of the result of ?oceSetData
will reveal. For now, we just show a common way, without explanation, writing
This over-rides the existing definition. Now, let’s look at the summary:
summary(w)
#> * Time ranges from 2019-01-01 to 2019-01-01 00:00:30 with 100 samples and mean increment 0.3030303 s
#> * Data
#>
#> Min. Mean Max. Dim. OriginalName
#> elevation [m] -1.0054 -1.7484e-06 1.015 100 -
#>
#> * Processing Log
#> - 2019-06-16 11:44:31 UTC: `Create oce object`
#> - 2019-06-16 11:45:38 UTC: `oceSetMetadata(object = w, name = "serialNumber", value = 1234)`
#> - 2019-06-16 11:45:38 UTC: `oceSetData(object = w, name = "time", value = t)`
#> - 2019-06-16 11:45:38 UTC: `oceSetData(object = w, name = "elevation", value = e)`
#> - 2019-06-16 11:45:38 UTC: `oceSetData(object = w, name = "elevation", value = e, unit = list(unit = expression(m), scale = ""))`
Notice that we now have a unit on the elevation, but we have an indication that the value of that quantity was defined twice. This processing-log feature is one of the big advantages of using oceSetData
over direct insertion into an object.
The most common function to add is a plot function. Since plot
is a built-in function, we are subclassing it. The details of doing this are provided by ?setMethod
. Again, studying the documentation for that function would be worthwhile, but the gist is provided by a simple example, e.g.
setMethod(f="plot",
signature=signature("wave"),
definition=function(x, which=1, ...) {
if (which == 1) {
plot(x[["time"]], x[["elevation"]], ...)
} else if (which == 2) {
hist(x[["elevation"]], ...)
} else {
stop("which must be 1 or 2")
}
})
Here, the signature
argument tells R that plot()
called with a wave
object as its first argument ought to use the indicated function. That function takes just two arguments: the object to be plotted, and which
, an indication of the desired plot type.
For example, since which
defaults to 1, we can get a popular plot with
Note the simplicity of this action. The user has no reason to state what kind of object this is, because R detects the type and dispatches to the specialized wave
-plotting function. This may seem like a small thing in the present context, but imagine an analyst writing code to analyse a wide variety of data types: it is very convenient to have a simple function call that works for each.
Since the ...
argument is passed into both the plotting methods. Thus, for example, a cleaner time-series plot might be created with
The following lets the user specify time
and elevation
when the object is created. It also permits a specification of units, with a default being to us metres.
setMethod(f="initialize",
signature="wave",
definition=function(.Object, time, elevation, units) {
if (missing(units)) {
.Object@metadata$units <- list()
if (missing(units))
.Object@metadata$units$elevation <- list(unit=expression(m), scale="")
}
.Object@data$time <- if (missing(time)) NULL else time
.Object@data$elevation <- if (missing(elevation)) NULL else elevation
.Object@processingLog$time <- presentTime()
.Object@processingLog$value <- "create 'wave' object"
return(.Object)
}
)
A test proves that this works as hoped for.
ww <- new("wave", time=t, elevation=e)
summary(ww)
#> * Time ranges from 2019-01-01 to 2019-01-01 00:00:30 with 100 samples and mean increment 0.3030303 s
#> * Data
#>
#> Min. Mean Max. Dim. OriginalName
#> elevation [m] -1.0054 -1.7484e-06 1.015 100 -
#>
#> * Processing Log
#> - 2019-06-16 14:45:38 UTC: `create 'wave' object`
Notice that the units now appear, without complication to the user. Oh, and this object now knows that it is a wave
object, not an oce
object our little class is getting smarter by the minute!
[[
operatorAs in the previous section, the key of specializing how [[
works is to use setMethod()
, but this time the function is named "[["
. Suppose we want to permit e.g. w[["peak"]]
as a way to find the maximum value of wave height. This becomes a call to the "[["
function, with first argument as w
, and second argument as "peak"
. We can handle this with:
setMethod(f="[[",
signature(x="wave", i="ANY", j="ANY"),
definition=function(x, i, j, ...) {
if (i == "peak") {
return(max(x[["elevation"]], na.rm=TRUE))
} else {
callNextMethod()
}
}
)
(The details of the signature
definition are explained in the documentation provided by ?setMethod
, and readers ought to study that material before changing the signature
definition.)
The important thing to focus on is the if
block in the function definition. Called as e.g. w[["peak"]]
causes i
to equal "peak"
, and so the return value will be the maximum elevation. However, in all other instances, the return values is provided by callNextMethod()
, and what that does is to dig one level deeper for a way to handle [[
. At that deeper level, it finds the oce
definition, the details of which can be found with ?"[[,oce-method"
.
The test
w[["peak"]]
#> [1] 1.014971
str(w[["elevation"]])
#> num [1:100] 0.00451 0.20649 0.37584 0.53922 0.70422 ...
verifies that our new code works for getting the peak value, and that it falls back to the oce
code for other calls.