The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Peak Finder

library(PnT)
library(plotly)
mylar <- readRDS('../inst/extdata/mylar_data.rds')

This data represents a spectrum of light released by analyzing the elements in a sample. The first peak represents hydrogen in the sample, and the second peak represents other elements.

plot_ly(mylar, x = ~Ch, y = ~CPS,
        type = 'scatter', mode = 'lines')

To find these peaks, we can just use the following function:

find_peaks(mylar, 'Ch', 'CPS')
#> # A tibble: 2 × 2
#>       X     Y
#>   <int> <dbl>
#> 1   217  1352
#> 2   323  3000

Next, let’s see how to find peaks in more noisy data

X <- seq(0, 10, 0.01)
Y <- sin(X) + runif(length(X), min=-0.1, max=0.1)
data <- data.frame(X=X, Y=Y)
plot_ly(data, x = ~X, y = ~Y,
        type = 'scatter', mode = 'lines')

To find the peaks, we can just use the same function

find_peaks(data, 'X', 'Y')
#> # A tibble: 2 × 2
#>       X     Y
#>   <dbl> <dbl>
#> 1  1.55  1.10
#> 2  7.73  1.09

Now, if we have even more noisy data, the peak finder might need better filter parameters.

Y <- Y + runif(length(X), min=-0.1, max=0.1)
data <- data.frame(X=X, Y=Y)
find_peaks(data, 'X', 'Y')
#> # A tibble: 38 × 2
#>        X     Y
#>    <dbl> <dbl>
#>  1  0.31 0.435
#>  2  0.71 0.830
#>  3  0.82 0.880
#>  4  1.26 1.07 
#>  5  1.33 1.11 
#>  6  1.42 1.13 
#>  7  1.65 1.17 
#>  8  1.9  1.08 
#>  9  1.99 1.02 
#> 10  2.22 0.902
#> # ℹ 28 more rows

This data has too much noise for the filter. So, we can increase the amount of noise blocked by the filter using the parameter minYerror. This parameter has default value 0.1.

find_peaks(data, 'X', 'Y', minYerror=0.2)
#> # A tibble: 2 × 2
#>       X     Y
#>   <dbl> <dbl>
#> 1  1.65  1.17
#> 2  7.65  1.14

Another possibility is that we only want the thin, sharp peaks.

Y <- (7/(5+(X-5)^2)) + (0.05/(0.02+(X-8)^2)) + (0.03/(0.03+(X-1)^2))
data <- data.frame(X=X, Y=Y)
plot_ly(data, x = ~X, y = ~Y,
        type = 'scatter', mode = 'lines')

For example, say we only want the sharp peaks (the first and last), not the wide peak in the middle.

find_peaks(data, 'X', 'Y')
#> # A tibble: 3 × 2
#>       X     Y
#>   <dbl> <dbl>
#> 1     1  1.33
#> 2     5  1.41
#> 3     8  3.00

The usual function call does not only get those peaks. Instead, we use the parameter minSlope, which controls a filter testing the slope around peaks relative to the dimensions of the data. This argument is 0 by default. Increasing minSlope allows us to filter out smooth peaks in favor of sharp ones.

find_peaks(data, 'X', 'Y', minSlope = 1)
#> # A tibble: 2 × 2
#>       X     Y
#>   <dbl> <dbl>
#> 1     1  1.33
#> 2     8  3.00

If, instead, the outer peaks are just edge effects, and we only want the middle peak, we can use edgeFilter. In this case, setting edgeFilter to 0.25 filters out the first and last quarter (0.25) of the data.

find_peaks(data, 'X', 'Y', edgeFilter = 0.25)
#> # A tibble: 1 × 2
#>       X     Y
#>   <dbl> <dbl>
#> 1     5  1.41

Examples

logy <- function(figure){
  layout(figure, yaxis=list(type="log"))
}

Here, we see some real world examples where this package can be used.

Air Quality Data Analysis

Here, we have a data set to analyze: air quality data from a school.

fig <- plot_ly(school_data, x=~Energy, y=~Counts, type="scatter", mode="lines"); fig

To find the prominent peaks, we can use the basic function call:

find_peaks(school_data, 'Energy', 'Counts')
#> # A tibble: 6 × 2
#>         X     Y
#>     <dbl> <dbl>
#> 1 -0.0022 10642
#> 2  0.278  12839
#> 3  0.528   7317
#> 4  1.03    1748
#> 5  1.49    2128
#> 6  1.75    5361

However, if we plot the data using a log plot, we see that many of the other peaks are not noise.

logy(fig)

The main example of this is the peak at x = 6.4 keV. In the previous plot, this peak was barely visible, but in this one, it is obviously not just noise. By hovering over the peak with your mouse, you can see that this peak has a height of about 60 from the rest of the graph. Using the argument asFraction in conjunction with minYerror, you can make sure this is the smallest peak found.

find_peaks(school_data, 'Energy', 'Counts', asFraction = FALSE, minYerror = 60)
#> # A tibble: 13 × 2
#>          X     Y
#>      <dbl> <dbl>
#>  1 -0.0022 10642
#>  2  0.278  12839
#>  3  0.528   7317
#>  4  0.718    595
#>  5  1.03    1748
#>  6  1.26     785
#>  7  1.49    2128
#>  8  1.75    5361
#>  9  2.31     494
#> 10  2.63     208
#> 11  3.31     362
#> 12  3.69     242
#> 13  6.40      81

In this example, setting asFraction = FALSE allows you to input the height of 60 you measured for the peak directly into the parameter minYerror without having to divide it by the total height of the data.

Jadeite Control Data

fig <- plot_ly(jadeite_control, x=~Ch, y=~CPS, type="scatter", mode="lines"); fig

Again, we need to use a log plot to see the smaller peaks clearly.

logy(fig)

We can see that the small peak at x = 1000 has a height of about 400. So, we can try what we did before to find the peaks of the graph.

find_peaks(jadeite_control, 'Ch', 'CPS', asFraction = FALSE, minYerror = 400)
#> # A tibble: 19 × 2
#>        X       Y
#>    <dbl>   <dbl>
#>  1   129   32702
#>  2   155   45298
#>  3   184  382405
#>  4   214 1048170
#>  5   285    7100
#>  6   323    8754
#>  7   366    9347
#>  8   403   27282
#>  9   425   24060
#> 10   452   58156
#> 11   491   12029
#> 12   551    9306
#> 13   608    4596
#> 14   662   10285
#> 15   722    3632
#> 16   781   26212
#> 17   862    4895
#> 18   915    1601
#> 19   990    1462

However, what if the peaks at 285, 323, and 366 are actually noise? Say you want to filter out these peaks, and other small peaks in the area where most of the peaks are larger. In this case, we have to break up the peak finding into two separate ranges. First, we find the peaks in the range where the peaks are smaller:

smallerPeaks <- find_peaks(jadeite_control, 'Ch', 'CPS', asFraction = FALSE, minYerror = 400, ROI = c(500, 1050))
smallerPeaks
#> # A tibble: 8 × 2
#>       X     Y
#>   <dbl> <dbl>
#> 1   551  9306
#> 2   608  4596
#> 3   662 10285
#> 4   722  3632
#> 5   781 26212
#> 6   862  4895
#> 7   915  1601
#> 8   990  1462

Now, we can find the peaks in the other range:

largerPeaks <- find_peaks(jadeite_control, 'Ch', 'CPS', asFraction = FALSE, minYerror = 10000, ROI = c(0, 500))
largerPeaks
#> # A tibble: 5 × 2
#>       X       Y
#>   <dbl>   <dbl>
#> 1   129   32702
#> 2   155   45298
#> 3   184  382405
#> 4   214 1048170
#> 5   452   58156

Here, the ROI argument determines the region searched by the peak finder. If we want to combine these peaks into one data frame, we can just use bind_rows.

dplyr::bind_rows(largerPeaks, smallerPeaks)
#> # A tibble: 13 × 2
#>        X       Y
#>    <dbl>   <dbl>
#>  1   129   32702
#>  2   155   45298
#>  3   184  382405
#>  4   214 1048170
#>  5   452   58156
#>  6   551    9306
#>  7   608    4596
#>  8   662   10285
#>  9   722    3632
#> 10   781   26212
#> 11   862    4895
#> 12   915    1601
#> 13   990    1462

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.