The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The gutenbergr package helps you download and process public domain works from Project Gutenberg. This vignette introduces the package’s metadata datasets and core downloading functionality.
gutenberg_metadataThe gutenberg_metadata dataset contains information
about each work in the Project Gutenberg collection:
#> # A tibble: 81,633 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 1 "The De… Jeffe… 1638 en Politics/American …
#> 2 2 "The Un… Unite… 1 en Politics/American …
#> 3 3 "John F… Kenne… 1666 en Category: Essays, …
#> 4 4 "Lincol… Linco… 3 en US Civil War/Categ…
#> 5 5 "The Un… Unite… 1 en United States/Poli…
#> 6 6 "Give M… Henry… 4 en American Revolutio…
#> 7 7 "The Ma… <NA> NA en Category: History …
#> 8 8 "Abraha… Linco… 3 en US Civil War/Categ…
#> 9 9 "Abraha… Linco… 3 en US Civil War/Categ…
#> 10 10 "The Ki… <NA> NA en Banned Books List …
#> # ℹ 81,623 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
You can filter this to find specific works:
#> # A tibble: 3 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 105 Persuasi… Auste… 68 en "Category: Novels/…
#> 2 22963 Persuasi… Auste… 68 en ""
#> 3 36777 Persuasi… Auste… 68 fr "FR Littérature/Ca…
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
The metadata currently in the package was last updated on 13 March 2026.
gutenberg_works()In most analyses, you’ll want to filter for English works, avoid
duplicates, and include only books with downloadable text. The
gutenberg_works() function does this automatically:
#> # A tibble: 63,090 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 1 "The De… Jeffe… 1638 en Politics/American …
#> 2 2 "The Un… Unite… 1 en Politics/American …
#> 3 3 "John F… Kenne… 1666 en Category: Essays, …
#> 4 4 "Lincol… Linco… 3 en US Civil War/Categ…
#> 5 5 "The Un… Unite… 1 en United States/Poli…
#> 6 6 "Give M… Henry… 4 en American Revolutio…
#> 7 7 "The Ma… <NA> NA en Category: History …
#> 8 8 "Abraha… Linco… 3 en US Civil War/Categ…
#> 9 9 "Abraha… Linco… 3 en US Civil War/Categ…
#> 10 10 "The Ki… <NA> NA en Banned Books List …
#> # ℹ 63,080 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
You can also filter directly within the function:
#> # A tibble: 14 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 105 "Persua… Auste… 68 en "Category: Novels/…
#> 2 121 "Northa… Auste… 68 en "Gothic Fiction/Ca…
#> 3 141 "Mansfi… Auste… 68 en "Category: Novels/…
#> 4 158 "Emma" Auste… 68 en "Category: Novels/…
#> 5 161 "Sense … Auste… 68 en "Category: Romance…
#> 6 946 "Lady S… Auste… 68 en "Category: Novels/…
#> 7 1212 "Love a… Auste… 68 en "Category: Romance…
#> 8 1342 "Pride … Auste… 68 en "Best Books Ever L…
#> 9 31100 "The Co… Auste… 68 en "Category: Romance…
#> 10 37431 "Pride … Auste… 68 en "Category: Plays/F…
#> 11 42078 "The Le… Auste… 68 en "Category: Biograp…
#> 12 63569 "The Wa… Auste… 68 en "Category: Novels/…
#> 13 74233 "Fragme… Auste… 68 en "Category: Novels/…
#> 14 77117 "The Wa… Auste… 68 en ""
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
#> # A tibble: 24 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 105 Persuas… Auste… 68 en Category: Novels/C…
#> 2 121 Northan… Auste… 68 en Gothic Fiction/Cat…
#> 3 141 Mansfie… Auste… 68 en Category: Novels/C…
#> 4 158 Emma Auste… 68 en Category: Novels/C…
#> 5 161 Sense a… Auste… 68 en Category: Romance/…
#> 6 946 Lady Su… Auste… 68 en Category: Novels/C…
#> 7 1212 Love an… Auste… 68 en Category: Romance/…
#> 8 1342 Pride a… Auste… 68 en Best Books Ever Li…
#> 9 17797 Memoir … Auste… 7603 en Category: Biograph…
#> 10 22536 Jane Au… Auste… 25392 en Category: Biograph…
#> # ℹ 14 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
#> # A tibble: 93 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 46 "A Chri… Dicke… 37 en Children's Literat…
#> 2 98 "A Tale… Dicke… 37 en Historical Fiction…
#> 3 564 "The My… Dicke… 37 en Mystery Fiction/Ca…
#> 4 580 "The Pi… Dicke… 37 en Best Books Ever Li…
#> 5 588 "Master… Dicke… 37 en Category: Novels/C…
#> 6 644 "The Ha… Dicke… 37 en Christmas/Category…
#> 7 650 "Pictur… Dicke… 37 en Category: Travel W…
#> 8 653 "The Ch… Dicke… 37 en Category: Novels/C…
#> 9 675 "Americ… Dicke… 37 en Category: Travel W…
#> 10 676 "The Ba… Dicke… 37 en Christmas/Category…
#> # ℹ 83 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
gutenberg_subjectsThe gutenberg_subjects dataset pairs works with Library
of Congress classifications and subject headings:
#> # A tibble: 262,666 × 3
#> gutenberg_id subject_type subject
#> <int> <fct> <chr>
#> 1 1 lcsh United States -- History -- Revolution, 1775-1783 …
#> 2 1 lcsh United States. Declaration of Independence
#> 3 1 lcc E201
#> 4 1 lcc JK
#> 5 2 lcsh Civil rights -- United States -- Sources
#> 6 2 lcsh United States. Constitution. 1st-10th Amendments
#> 7 2 lcc JK
#> 8 2 lcc KF
#> 9 3 lcsh United States -- Foreign relations -- 1961-1963
#> 10 3 lcsh Presidents -- United States -- Inaugural addresses
#> # ℹ 262,656 more rows
This is useful for finding works by genre or topic:
#> # A tibble: 986 × 3
#> gutenberg_id subject_type subject
#> <int> <fct> <chr>
#> 1 170 lcsh Detective and mystery stories
#> 2 173 lcsh Detective and mystery stories
#> 3 244 lcsh Detective and mystery stories
#> 4 305 lcsh Detective and mystery stories
#> 5 330 lcsh Detective and mystery stories
#> 6 481 lcsh Detective and mystery stories
#> 7 547 lcsh Detective and mystery stories
#> 8 863 lcsh Detective and mystery stories
#> 9 905 lcsh Detective and mystery stories
#> 10 1155 lcsh Detective and mystery stories
#> # ℹ 976 more rows
#> # A tibble: 59 × 3
#> gutenberg_id subject_type subject
#> <int> <fct> <chr>
#> 1 108 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 2 221 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 3 244 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 4 834 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 5 1661 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 6 2097 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 7 2343 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 8 2344 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 9 2345 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 10 2346 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> # ℹ 49 more rows
You can join this with gutenberg_works() to download
books by subject:
# Get IDs of detective stories
detective_ids <- gutenberg_subjects |>
filter(subject == "Detective and mystery stories") |>
inner_join(gutenberg_works(), by = "gutenberg_id") |>
pull(gutenberg_id)
# Download a sample
detective_stories <- gutenberg_download(
detective_ids[1:5],
meta_fields = c("title", "author")
)Download a book using its Gutenberg ID with
gutenberg_download():
#> # A tibble: 8,357 × 4
#> gutenberg_id text title author
#> <int> <chr> <chr> <chr>
#> 1 105 "Persuasion" Persuasion Austen, Jane
#> 2 105 "" Persuasion Austen, Jane
#> 3 105 "" Persuasion Austen, Jane
#> 4 105 "by Jane Austen" Persuasion Austen, Jane
#> 5 105 "" Persuasion Austen, Jane
#> 6 105 "(1818)" Persuasion Austen, Jane
#> 7 105 "" Persuasion Austen, Jane
#> 8 105 "" Persuasion Austen, Jane
#> 9 105 "" Persuasion Austen, Jane
#> 10 105 "" Persuasion Austen, Jane
#> # ℹ 8,347 more rows
The result is a tibble with:
gutenberg_id - the book’s IDtext - one row per line of textDownload multiple books by providing a vector of Gutenberg IDs:
#> # A tibble: 9,579 × 4
#> gutenberg_id text title author
#> <int> <chr> <chr> <chr>
#> 1 109 "Renascence and Other Poems" Renascence, and Other Poems Millay…
#> 2 109 "" Renascence, and Other Poems Millay…
#> 3 109 "" Renascence, and Other Poems Millay…
#> 4 109 "by" Renascence, and Other Poems Millay…
#> 5 109 "" Renascence, and Other Poems Millay…
#> 6 109 "Edna St. Vincent Millay" Renascence, and Other Poems Millay…
#> 7 109 "" Renascence, and Other Poems Millay…
#> 8 109 "" Renascence, and Other Poems Millay…
#> 9 109 "" Renascence, and Other Poems Millay…
#> 10 109 "" Renascence, and Other Poems Millay…
#> # ℹ 9,569 more rows
Use the meta_fields argument to include additional
information:
#> # A tibble: 2 × 2
#> title n
#> <chr> <int>
#> 1 Persuasion 8357
#> 2 Renascence, and Other Poems 1222
Now that you have book texts as tibbles, you can:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.