1 Introduction

The geometr package provides tools that generate and process easily accessible and tidy geometric shapes (of class geom). Moreover, geometr aims to improve interoperability of geometric classes. One could argue that spatial classes are merely a special case of geometric classes, where the points’ coordinates refer to real locations on the surface of the earth, specified in further detail by the coordinate reference system (crs). For ordinary geometric shapes (such as squares or circles), the coordinate (reference) system is the cartesian coordinate system. geometr makes the generalisation to treat all geometric and spatial classes in the same way, and thus both of them are termed geometric objects/classes here.

Geometric classes contain typically a collection of points that outline the geometric shapes or features. A feature in geometr is defined as a set of points that form no more than one single unit of a given feature type (point, line and polygon) and, in contrast to the simple features standard, there are no multi-* features. Sets of geometric objects that belong together beyond their geometric connectedness are assigned a common group, that can have its own group attributes (more on this in the chapter Attributes of a geom). Features are characterised by a location, some coordinate (reference) system, and various other properties or metadata. Most geometric classes are conceptually quite similar, yet a common, interoperable standard lacks for accessing and modifying features, their points or the metadata.

This vignette outlines in detail first how geometr improves interoperability, then it describes the datastructure of a geom (the geometric class that comes with geometr), how different feature types are cast into one another, shows how to visualise geometric objects and eventually gives a short introduction of the tools that come with this first version of geometr.

2 Interoperability

Interoperable software can easily exchange information with other software, which can be achieved by providing the output of functionally similar operations in a common arrangement or format. This principle is not only true for software written in different languages, but can also apply to several packages within the R ecosystem. R is an open source environment which means that no single package or class will ever be the sole source of a particular datastructure and this is also the case for spatial and other geometric data.

Interoperable data is data that has a common arrangement and that uses terms from the same ontology, resulting ideally in semantic interoperability. As an example, we can think of the extent of a geometric object. An extent reports the minimum and maximum value of all dimensions an object resides in. There are, however, several ways in which even this simple information can be reported, for example as vector or as table and with or without names. Moreover, distinct workflows provide data so that the same information is not at the same location or with the same name in all structures, e.g., the minimum value of the x dimension is not always the first information and is not always called ‘xmin’.

The following code chunk exemplifies this by showing various functions, which are all considered standard in R to date, that derive an extent from specific spatial objects:

st_bbox() provides the information as a named vector and presents first minimum and then maximum values of both dimensions, bbox() provides a table with minimum and maximum values in columns and extent() provides the information in an S4 object that presents first the x and then the y values. Neither data structures, nor names or positions of the information are comparable.

For a human user the structure of those information might not matter, because we recognise, in most cases intuitively, where which information is to be found in a datastructure. In the above case it is easy to recognise how the combination of column and rownames (of bbox()) refers to the already combined names (of st_bbox() or extent()). However, this capacity of humans to recognise information relative to the context needs to be programmed into software, for it to have that ability. Think, for example, of a new custom function that is designed to extract and process information from an arbitrary spatial input, i.e., without knowing in advance what spatial class the user will provide. This would require an extensive code-logic to handle all possible input formats, complicated further by classes that may become available only in the future.

geometr improves interoperability in R for geometric and thus spatial classes by following the Bioconductor standard for S4 classes. Here, getters and setters are used as accessor functions, and as pathway to extract or modify information of a given data structure. geometr thus provides getters that provide information in identical arrangement from a wide range of classes, and likewise setters that modify different classes in the same way, despite those classes typically need differently formatted input, arguments and functions. The following code chunk shows how different input classes yield the same output object.

The output of the getters provided by geometr is tidy, i.e., it provides variables in columns and observations in rows, and it is interoperable, i.e., it provides the same information in the same location of the output object, with the same names. This ensures, amongst other advantages, that a custom function that processes geometric information, requires merely one very simple row of code to extract those information from a potentially wide range of distinct classes.

3 Description of the class geom

geometr comes with the S4 class geom. geom is a geometric (spatial) class that has been primarily developed for its interoperability and easy access.

This means also here that all objects of this class are structurally the same, that no slots are removed or added when modifying an object and that all properties are labelled with the same terms in each object of that class. This interoperability is true for objects representing point, line or polygon features, for objects that contain a single or several features and for objects that are either merely geometric or indeed spatial/geographic because they contain a coordinate reference system. A geom contains, moreover, only direct information, i.e., such information that can’t be derived from other of its information. A prominent example is the extent, which is not stored within a geom but within many other spatial classes (in R), and which can very simply be derived from the coordinate values of the points that make up the geometry.

3.1 Create a geom

A geom can be created simply by transforming it from another class (that is, any class for which a method has been defined), or by using one of the geometry shape functions that are labelled gs_* in geometr.

From these examples we learn something more about objects of class geom. nc_geom is made up of 108 polygon features (with 2529 points), has a coordinate reference system (crs) and a set of (feature) attributes. The attributes’ values are not shown by the print method of a geom, which is a more compact visualisation of the important information. Moreover, there is a “tiny map” that shows where the points of the respective geom are concentrated, which gives a rough but quick overview of the shape of the object. If there is less than 1/16th of all points in a section of the map, a ◌ is shown, for more than 1/16th but less than 1/8th this is ○, for more than 1/8th but less than 1/4th ◎ and for sections with more than 1/4th of points, this is ◉.

aPoly is only made up of one feature with 5 points and a cartesian coordinate system. As a matter of fact, any geom that has no crs assigned is assumed to be a mere geometric object of which the values are valid for a cartesian coordinate system.

3.2 How are polygons handled?

You might wonder why it shows 5 points for aPoly, while only 4 have been defined. This is due to how polygons are stored in a geom. A polygon is by definition a two-dimensional plane, in contrast to a line that has only one dimension, its length, and a point, which is dimensionless. A polygon and a line can be made up of the same points and a polygon is indeed nothing more than a sequence of lines (a path) that outlines the shape of the polygon. To then distinguish a line and polygon with the same points, it can be defined that a polygon must have duplicate start and end points, which would constitute a closed path.

Polygons may also have holes (and islands therein), for example a park with a pond that has a little island in the middle. Such cases are of course also possible with a geom and the only thing to consider is that the outer (closed) ring must be given as first ring. All rings that are supposed to be nested within this ring must themselves be closed paths, but their order does not matter. Moreover, when building a polygon with hole in geometr, the rotation direction described by the sequence of the points does not matter. Whether part of a polygon is “inside”, and thus whether a closed path describes a hole or not, is determined by the code-logic of the functions processing polygons.

3.3 Attributes of a geom

You may also be wondering why nc_geom has 108 features, while nc_sf has actually 100 features. nc_sf consists of 100 “MULTIPOLYGONS” where in fact only a small set of them are composed of several polygons. This is due the definition of simple features, where one “simple” feature can be of a multi* type that comprises several closed paths in the same object and which would thus be called MULTIPOLYGON. Yet, an object can also contain only a single closed path and still have the feature type MULTIPOLYGON. In geometr these inconsistencies are avoided because they require a lot of extra code logic that is, in my opinion, not worth the supposed flexibility. Hence, a geom breaks down multi*-features into their distinct closed paths, into “simpler features”, so to speak.

This, however, requires that the togetherness of multi*-features has to be captured in another way. A set of features can be regarded as belonging together when they share attributes, such as a group of islands that form a nation. To capture attributes of sets or groups of features, a geom has an additional attribute table, one that captures those group attributes. Moreover, also the list of points is treated as a separate, third attribute table, for attributes that only the points, but not the features (groups of points) have.

All three attribute tables can be accessed with the function getTable().

We see that the tables contain ID variables, namely fid and gid, which were not part of the original object. Those allow identifying to which feature coordinates belong and which features form groups. We also see that the feature-table contains those 108 features and that, for instance, features four, five and six are “the same”, or at least have the same attributes. This makes no sense, obviously, so gc_geom has the argument group = FALSE/TRUE to set whether duplicate feature attributes are actually group attributes.

It becomes clear, as mentioned above, that this approach to attribute tables makes the class geom quite flexible. One could for example assign some measurements of a point pattern (such as Arne Pommerening’s Clocaenog 6 sample data) to the attribute table of a point geom of that pattern.

3.4 Data provenance

An object of class geom contains, just like a raster the slot @history. This slot documents the provenance of that object, i.e., where it came from and how it has been modified. The function gc_geom(), all gs_*() and all gt_()* functions attach information to the list the @history slot is. In case you want to make use of this slot, it can be set via setHistory().

4 Casting

It is quite straightforward to cast from one feature type to another with geometr. As already mentioned above, all feature types (point, line and polygon) have the same arrangement and differ only in the slot @type. This means that the only element that needs to be changed to cast a any geom to another geom is the feature type. All gs_*() functions contain the argument anchor =, this is where location information for creating the output are provided. If what is provided here is already a geom, most of the location information and other properties can be (and in fact are) simply copied into the new object.

The following code chunk shows casting in geometr. When casting to polygons, vertex checks are carried out that make sure the output is a valid polygon. This results, as also described above, in the first/last point becoming duplicated to form a closed path. Consequently, those duplicates have to be handled, for example by removing duplicates.

5 Visualising

We saw already some quite powerful use-cases of the visualise() function that comes with geometr. The philosophy of plotting geometric objects in geometr deviates a bit from other approaches.

The idea is that both vector and raster objects can be visualised at the same time, and that additional options can be specified for their appearance, and for the appearance of the other plot elements. In case several objects are provided to visualise() at the same time, they are facetted, i.e., plotted in separate panels next to one another. Moreover, it is possible to set the plot title by providing a name to the object that shall be plotted, for example via 'plot name' = plotObject. In case a title is not provided in that way, the title is either extracted from the object name (for example, the raster in the following example is part of the RasterStack gtRasters with the name continuous) or a default value is used.

In case a geom is provided with a relative scale, the values are scaled between 0 and 1 and it is plotted by default relative to the plot extent. This allows, for example, to observe always the same region in a set of images, in which some signal changes, and eventually to extract the values. To this end, a geom would be provided with a reference window from which the relative values shall be derived and its’ coordinates are scaled to those values.

In various functions of geometr the reference window can play a role, most prominently to control which part of an object is plotted. The window of any object that has been plotted here so far, other than relPoly, has been set implicitly, usually to the extent of that object. relPoly has been assigned with a reference window that is larger than its’ extent and this lead to an effect that could be described zooming out. Likewise, one could zoom in by assigning a reference window smaller than the extent of an object. The reference region for a plot can be set via the window = argument in visualise(), which requires absolute values. To zoom in on relPoly, we thus have to assign the plot extent as window to relPoly and rescale the relative values to absolute values on that basis.

visualise() also allows to plot images, simply by providing an object that has the three layers red, green and blue and setting image = TRUE.

To adjust the appearance of a plot one can either provide a theme for all plot elements or quick options for the plotted objects only.

5.0.1 The gtTheme

The theme controls how plots that are created with visualise() appear. By default, this theme is gtTheme and it has the following properties.

A new theme can be created by modifying any of the elements (shown in yellow in the console), via the function setTheme(). vector and raster are provided with a scale, where it is noted that one of the properties is scaled to one of the attributes of the object. By default, the line colour (linecol) is scaled to the feature ID (fid) for vector objects and rasters are scaled to their unique values, or the ID of an optional attribute table. For vector objects also other properties can be scaled, namely the point symbol (pointsymbol) and size (pointsize), the colour of points (also linecol), the line type (linetype) and width (linewidth) of lines and polygons and the fill colour of polygons (fillcol).

By default gtTheme contains only a single value for most properties (except colours), so when scaling a property, a sensible range needs to be specified. For example, to make a plot where each point represents the diameter of a tree, it would make sense to scale the points between values that resemble the tree diameter.

Alternatively, one could use quick options to set scaling, by providing the property to scale and the attribute to scale to as property = attribute, also for several properties at the same time.

Colour palettes can be set by providing “waypoints” between which colours should be interpolated (see the documentation of colorRamp). By default, these are a darkish blue (#00204D) for low values and yellow (#FFEA46) for high values. When including additional values, or when building totally new palettes, the colour can be modified accordingly.