The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Use PRQL on R

PRQL (Pipelined Relational Query Language, pronounced “Prequel”) is a modern language for transforming data, can be compiled to SQL.

This package provides a simple function to convert a PRQL query string to a SQL string.

For example, this is a PRQL query.

from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}

And, this is the SQL query that is compiled from the PRQL query.

SELECT
  cyl,
  mpg,
  ROUND(mpg, 0) AS mpg_int
FROM
  mtcars
WHERE
  cyl > 6

To compile a PRQL string, just pass the query string to the prql_compile() function, like this.

library(prqlr)

"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
" |>
  prql_compile() |>
  cat()
#> SELECT
#>   cyl,
#>   mpg,
#>   ROUND(mpg, 0) AS mpg_int
#> FROM
#>   mtcars
#> WHERE
#>   cyl > 6
#> 
#> -- Generated by PRQL compiler version:0.11.2 (https://prql-lang.org)

This output SQL query string can be used with already existing great packages that manipulate data with SQL.

Work with DB

Using it with the {DBI} package, we can execute PRQL queries against the database.

library(DBI)

# Create an ephemeral in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")

# Create a table inclueds `mtcars` data
dbWriteTable(con, "mtcars", mtcars)

# Execute a PRQL query
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
  prql_compile("sql.sqlite") |>
  dbGetQuery(con, statement = _)
#>   cyl  mpg mpg_int
#> 1   8 18.7      19
#> 2   8 14.3      14
#> 3   8 16.4      16

We can also use the sqldf::sqldf() function to automatically register Data Frames to the database.

"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
  prql_compile("sql.sqlite") |>
  sqldf::sqldf()
#>   cyl  mpg mpg_int
#> 1   8 18.7      19
#> 2   8 14.3      14
#> 3   8 16.4      16

Since SQLite is used here via {RSQLite}, the target option of prql_compile() is set to "sql.sqlite".

Available target names can be found with the prql_get_targets() function.

Work with R Data Frames

Using {prqlr} with the {tidyquery} package, we can execute PRQL queries against R Data Frames via {dplyr}.

{dplyr} is a very popular R package for manipulating Data Frames, and the PRQL syntax is very similar to the {dplyr} syntax.

Let’s run a query that aggregates a Data Frame flights, contained in the {nycflights13} package.

library(tidyquery)
library(nycflights13)

"
from flights
filter (distance | in 200..300)
filter air_time != null
group {origin, dest} (
  aggregate {
    num_flts = count this,
    avg_delay = (average arr_delay | math.round 0)
  }
)
sort {-origin, avg_delay}
take 2
" |>
  prql_compile() |>
  query()
#> # A tibble: 2 × 4
#>   origin dest  num_flts avg_delay
#>   <chr>  <chr>    <int>     <dbl>
#> 1 LGA    BUF        122        -2
#> 2 LGA    PWM        273         2

This query can be written with {dplyr}’s functions as follows.

library(dplyr, warn.conflicts = FALSE)
library(nycflights13)

flights |>
  filter(
    distance |> between(200, 300),
    !is.na(air_time)
  ) |>
  group_by(origin, dest) |>
  summarise(
    num_flts = n(),
    avg_delay = mean(arr_delay, na.rm = TRUE) |> round(0),
    .groups = "drop"
  ) |>
  arrange(desc(origin), avg_delay) |>
  head(2)
#> # A tibble: 2 × 4
#>   origin dest  num_flts avg_delay
#>   <chr>  <chr>    <int>     <dbl>
#> 1 LGA    BUF        122        -2
#> 2 LGA    PWM        273         2

Note that {dplyr} queries can be generated by the tidyquery::show_dplyr() function!

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.