Using {densityarea}

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Using {densityarea}

To get started with using {densityarea}, we’ll need to load some packages, and some data to work with. {densityarea} is meant to play nicely with tidyverse-style data processing, in addition to loading the package itself, we’ll also load {dplyr}. We have the option of working with the density polygons in the form of simple features from {sf}, so we’ll load that as well. Finally, we’ll load {ggplot2} and {ggdensity} for the sake of data visualization.

Initial look at the data

Let’s plot the original, raw data from s01, with the Highest Density Regions overlaid (thanks to the {ggdensity} package).

ggplot(data = s01,
       aes(x = F2,
           y = F1)
       )+
  geom_point(alpha = 0.1)+
  stat_hdr(probs = c(0.8, 0.5),
           aes(fill = after_stat(probs)),
           color = "black",
           alpha = 0.8)+
  scale_y_reverse()+
  scale_x_reverse()+
  scale_fill_brewer(type = "seq")+
  coord_fixed()

The function ggdensity::get_hdr() is perfect for quickly adding interpretable densities to your plots. To work with these densities as polygons, we can use densityarea::density_polygons().

Getting density areas

Per the name of the package, we can get the area within each of these density polygons with density_area().

As a first data processing step, let’s log transform and flip our F1 and F2 values.

s01 |> 
  mutate(
    lF1 = -log(F1),
    lF2 = -log(F2)
  ) -> 
  s01

To get the area within the 80% density polygon for the entire data set, we’ll pass s01 through a dplyr::reframe() function.

s01 |> 
  group_by(name) |> 
  reframe(
    density_area(lF2, lF1, probs = 0.8)
  ) 
#> # A tibble: 1 × 4
#>   name  level_id  prob  area
#>   <chr>    <int> <dbl> <dbl>
#> 1 s01          1   0.8 0.406

Or, if we wanted the areas associated with subsets of the data (say, for each vowel) we’d just change our dplyr::group_by() call.

s01 |> 
  group_by(name, vowel) |> 
  reframe(
    density_area(lF2, lF1, probs = 0.8)
  ) ->
  vowel_areas

Let’s rearrange the order of rows to see the largest areas first.

vowel_areas |> 
  arrange(desc(area))
#> # A tibble: 15 × 5
#>    name  vowel level_id  prob   area
#>    <chr> <chr>    <int> <dbl>  <dbl>
#>  1 s01   AO           1   0.8 0.488 
#>  2 s01   AA           1   0.8 0.278 
#>  3 s01   AH           1   0.8 0.274 
#>  4 s01   AE           1   0.8 0.258 
#>  5 s01   UW           1   0.8 0.229 
#>  6 s01   EH           1   0.8 0.226 
#>  7 s01   IY           1   0.8 0.206 
#>  8 s01   UH           1   0.8 0.203 
#>  9 s01   OW           1   0.8 0.197 
#> 10 s01   IH           1   0.8 0.186 
#> 11 s01   AY           1   0.8 0.171 
#> 12 s01   AW           1   0.8 0.170 
#> 13 s01   ER           1   0.8 0.145 
#> 14 s01   EY           1   0.8 0.101 
#> 15 s01   OY           1   0.8 0.0904

Density Polygons

Polygon Data Frames

A single probability level

In the simplest approach, we can use density_polygons() to return a data frame for just one probability level, 60%.

s01 |> 
  group_by(name) |> 
  reframe(
    density_polygons(lF2, lF1, probs = 0.6)
  )->
  sixty_poly_df

head(sixty_poly_df)
#> # A tibble: 6 × 7
#>   name  level_id    id  prob   lF2   lF1 order
#>   <chr>    <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 s01          1     1   0.6 -7.44 -6.78     1
#> 2 s01          1     1   0.6 -7.42 -6.76     2
#> 3 s01          1     1   0.6 -7.41 -6.75     3
#> 4 s01          1     1   0.6 -7.40 -6.72     4
#> 5 s01          1     1   0.6 -7.40 -6.69     5
#> 6 s01          1     1   0.6 -7.40 -6.69     6

Now, it’s possible for the HDR polygon to actually come in multiple pieces, but in this case, there’s just one polygon, so we can plot it.

ggplot(sixty_poly_df,
       aes(lF2, lF1))+
  geom_polygon(
    aes(color = prob,
        group = prob),
    fill = NA,
    linewidth = 1
  )+
  coord_fixed()

Multiple probability levels

To get polygons associated with multiple probability levels, we simply pass a vector of values to probs.

s01 |> 
  group_by(name) |> 
  reframe(
    density_polygons(lF2, 
                     lF1, 
                     probs = c(0.6, 0.8))
  )->
  multi_poly_df

head(multi_poly_df)
#> # A tibble: 6 × 7
#>   name  level_id    id  prob   lF2   lF1 order
#>   <chr>    <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 s01          2     1   0.8 -7.78 -6.01     1
#> 2 s01          2     1   0.8 -7.80 -6.01     2
#> 3 s01          2     1   0.8 -7.82 -6.02     3
#> 4 s01          2     1   0.8 -7.85 -6.02     4
#> 5 s01          2     1   0.8 -7.87 -6.02     5
#> 6 s01          2     1   0.8 -7.90 -6.02     6

ggplot(multi_poly_df,
       aes(lF2, lF1))+
  geom_polygon(
    aes(color = prob,
        group = prob),
    fill = NA,
    linewidth = 1
  )+
  coord_fixed()

Polygon Simple Features

We can also get density_polygons() to return the polygons as simple features, as defined in the {sf} package, by passing it the argument as_sf = TRUE.

s01 |> 
  group_by(name) |> 
  reframe(
    density_polygons(lF2,
                     lF1,
                     probs = c(0.8, 0.6),
                     as_sf = TRUE)
  ) |> 
  st_sf()->
  multi_poly_sf

The final function there, sf::st_sf(), wasn’t strictly necessary, but makes life a little easier for plotting. Here’s what the result looks like:

multi_poly_sf
#> Simple feature collection with 2 features and 3 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -7.93703 ymin: -6.913825 xmax: -7.208434 ymax: -6.009484
#> CRS:           NA
#> # A tibble: 2 × 4
#>   name  level_id  prob                                                  geometry
#>   <chr>    <int> <dbl>                                                 <POLYGON>
#> 1 s01          1   0.6 ((-7.636315 -6.19764, -7.645598 -6.201069, -7.65986 -6.2…
#> 2 s01          2   0.8 ((-7.777586 -6.009484, -7.801131 -6.010429, -7.824676 -6…

And here’s a plot.

ggplot(multi_poly_sf)+
  geom_sf(aes(color = prob),
          fill = NA)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.