The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Run SQL queries on Snowflake, Redshift, Postgres, SQLite, or a local
DuckDB database from R using a single function:
queryDB()
.
snowquery now also supports streaming remote query results directly into a local DuckDB file for fast, repeatable, low‑cost analysis.
# The easiest way to get snowquery
install.packages("snowquery")
# Or you can get the development version from GitHub
# install.packages("devtools")
::install_github("mermelstein/snowquery") devtools
Redshift is currently only available on the development version of this package. See installation instructions above.
When connecting to a Redshift DWH you might need to specify an SSL
connection. You can do this with a sslmode='require'
connection variable or by passing that to the queryDB()
function directly.
Because Snowflake’s driver requires a ton of fiddling in order to make it work for R. It sucks. A lot.
To sum up the current experience of running SQL against Snowflake from:
That’s why the snowquery
package takes the Snowflake
python connector and leverages it in the background to run queries
from R.
For more information on using snowquery
, please see the
package website.
Base R dependencies are installed automatically via the DESCRIPTION. Additional runtime needs:
duckdb
(installed automatically when
you install the package, but you may also install manually with
install.packages("duckdb")
).reticulate
.
The package will automatically install / upgrade
snowflake-connector-python[pandas]
the first time you run a
Snowflake query. No manual pip install
is required in
normal use.RPostgres
(already
imported). If you need SSL, set sslmode
in the YAML or pass
it directly.If the package cannot find a working Python, you’ll receive an actionable error explaining what to install.
For all db connections you will need a YAML file called
snowquery_creds.yaml
at ~/snowquery_creds.yaml
with one or more named connections:
---
my_snowflake_dwh:
db_type: snowflake
account: your_account
warehouse: your_wh
database: your_db
user: your_user # note: key is 'user' not 'username' for snowflake
password: your_pw
role: your_role
my_redshift_dwh:
db_type: redshift
host: redshift-cluster.host.aws.com
port: 5439
database: analytics
username: rs_user
password: rs_pw
sslmode: require
my_postgres_db:
db_type: postgres
host: localhost
port: 5432
database: pg_db
username: pg_user
password: pg_pw
my_sqlite_db:
db_type: sqlite
database: /path/to/local.sqlite
my_duckdb_local:
db_type: duckdb # connects to the default analytics.duckdb file in working dir
This follows a named connection format, where you can have multiple
named connections in the same file. For example you might have a
my_snowflake_dwh
connection and a
my_snowflake_admin
connection, each with their own
credentials.
This package looks for the credential file at this location:
~/snowquery_creds.yaml
. If it is in any other
location it will not work. If the package cannot locate the
file you will receive an error like:
cannot open file '/expected/path/to/file/snowquery_creds.yaml': No such file or directory
.
You can manually pass credentials to the queryDB()
function
but it is recommended to use the YAML file.
You are now ready to query away!
Load this library in your R environment with
library(snowquery)
.
queryDB("SELECT * FROM MY_AWESOME_TABLE", conn_name='my_snowflake_dwh')
or
# Query Snowflake and get results as a data frame
<- queryDB("SELECT * FROM my_large_table LIMIT 1000", conn_name = "my_snowflake_dwh") results
or
# You can also pass in credentials manually
<- queryDB("SELECT * FROM my_table",
results db_type='snowflake',
username='my_username',
password='my_password',
account='my_account',
database='my_database',
warehouse='my_warehouse',
role='my_role',
timeout=30)
print(results)
Provide cache_table_name
to stream results into the
local DuckDB file analytics.duckdb
:
# Cache a Snowflake query (streaming batches when possible)
queryDB(
"SELECT * FROM MY_SCHEMA.BIG_FACT_TABLE WHERE load_date >= '2025-09-01'",
conn_name = "my_snowflake_dwh",
cache_table_name = "big_fact_local",
overwrite = TRUE
)# message: Successfully cached 1234567 rows to DuckDB table 'big_fact_local' (analytics.duckdb).
# Cache a Postgres query (uses dplyr/dbplyr lazy streaming under the hood)
queryDB(
"SELECT id, event_ts, metric FROM events WHERE event_ts >= now() - interval '7 days'",
conn_name = "my_postgres_db",
cache_table_name = "recent_events",
overwrite = TRUE
)
Key behaviors: * Snowflake path streams using
fetch_pandas_batches()
when available; otherwise falls back
to a single fetch. * Postgres / Redshift path uses a lazy dplyr table
registered into DuckDB for efficient transfer. *
overwrite = TRUE
creates/replaces the DuckDB table; set
overwrite = FALSE
to append (for Snowflake append happens
batch‑by‑batch; for Postgres/Redshift you can modify logic as needed). *
You cannot cache from a DuckDB source
(db_type == 'duckdb'
).
Add a DuckDB connection to the YAML (see my_duckdb_local
example above) and query cached tables:
<- queryDB(
local_summary "SELECT COUNT(*) AS n_rows FROM big_fact_local",
conn_name = "my_duckdb_local"
) local_summary
You can also use DBI or dplyr directly:
library(DBI)
<- DBI::dbConnect(duckdb::duckdb(), dbdir = "analytics.duckdb")
duck_con ::dbGetQuery(duck_con, "SELECT * FROM recent_events LIMIT 5")
DBI::dbDisconnect(duck_con, shutdown = TRUE) DBI
There is one function you need: queryDB()
. It will take
a SQL query as a string parameter and run it on the db.
For example:
library(snowquery)
<- "SELECT * FROM MY_AWESOME_TABLE"
query <- queryDB(query, conn_name='my_snowflake_dwh')
result_dataframe print(result_dataframe)
or
library(snowquery)
queryDB("SELECT * FROM MY_AWESOME_TABLE", conn_name='my_snowflake_dwh')
or
library(snowquery)
# You can also pass in credentials manually
<- queryDB("SELECT * FROM my_table",
result db_type='snowflake',
username='my_username',
password='my_password',
account='my_account',
database='my_database',
warehouse='my_warehouse',
role='my_role',
timeout=30)
print(result)
Workflow: 1. Pull once from a remote DWH. 2. Iterate locally against
DuckDB for joins, aggregations, prototyping. 3. Refresh by re‑running
with overwrite = TRUE
, or append with
overwrite = FALSE
(Snowflake / append path: ensure schema
consistency).
Why DuckDB? * Fast analytical execution (vectorized / columnar). *
Lightweight (no server to run). * Plays nicely with data frames and
dplyr (dbplyr
translations work out of the box for
Postgres/Redshift streaming path).
Limitations / Notes: * Snowflake streaming depends on connector feature availability; falls back to full fetch if batch iterator missing. * Appending from heterogenous schemas is not validated automatically. * No explicit indexing (internal zone maps generally sufficient). * Caching from a DuckDB source is intentionally blocked (it is already local).
Planned (future): progress display, verbosity flag, helper to enumerate cached tables.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.